Fig. 3: Variational autoencoder that learns
4 Results
We experimented with four proposed classification
methods M1, M2, M3, and M4. They were
evaluated using a small number of training data,
specifically 1, 2, 5, 10, and 100 instances per
category from the MNIST handwritten digits
database. The performance of the proposed methods
was measured using a testing dataset consisting of
100 instances per category, totaling 1000 instances.
The Generative Adversarial Network (GAN) is
trained on the MNIST dataset, which is filtered to
only include one digit category, resulting in 10
separate GANs (). Each GAN consists of two
main components: a generator and a discriminator.
The generator model has a 100-dimensional latent
input layer and is composed of three dense layers
with batch normalization and ReLU activation. On
the other hand, the discriminator model takes an
image of 28x28 pixels as input and is constructed of
three dense layers with ReLU activation, as well as
an output layer with a sigmoid activation (see Model
3.). All models were optimized using binary cross-
entropy loss and Adam optimizer with a learning
rate of 0.0002 and a decay rate of 0.5. The training
of each per digit, the category was carried out
for 30,000 epochs. Due to the limited size of the
training data, the process was efficient and took
around 2-3 hours for each GAN model.
generators were trained for individual data variants
of the following 1, 2, 5, 10, and 100 instances per
category. After they have been trained, they can be
used for classification, because the output of
individual we have probability estimates of
individual categories. The results of the
proposed methods M1, M2, M3, and M4 were
compared with the discriminator results of the GAN
model, which was used as a baseline for
comparison. The discriminator's classification
accuracy was 38.2% for 1 training instance per
category, 42.1% for 2 training instances per
category, 56.9% for 5 training instances per
category, 61.2% for 10 training instances per
category, and 89.3% for 100 training instances per
category. These results are summarized in column
‘Discriminator result accuracy’ in Table 1. In the
initial stage of our experiments, we evaluated the
performance of Method 1 with various parameter
combinations. To ensure compatibility with low-
performance computers, we selected the parameters
m=100 and N=1 as a reasonable configuration for
conducting experiments. For the M3, we used
parameters m=100 and . The results of
classification using M1 and M3 were evaluated
using different amounts of training data from each
category of the MNIST handwriting digit dataset.
The classification accuracy of M1 ranged from
41.3% to 66.4% and that of M3 ranged from 41.9%
to 66.7%. The change ratio in percentage compared
to the discriminator's classification result showed
that M1 performed better than the discriminator
with 1-2 training data from each category, but the
performance worsened as the amount of training
data increased. The same trend was observed for
M3, with a slightly better improvement over the
discriminator for all amounts of training data
compared to M1. Overall, the results indicate that
both M1 and M3 performed better than the
discriminator for small amounts of training data, but
as the amount of data increased, their performance
degraded compared to the discriminator's
classification result. After experiments with
methods M1 and M3, we carried out experiments
with M2 and M4. The Variational
autoencoder(VAE) of these methods uses the
MNIST dataset, which contains images of
handwritten digits, as input. The encoder network
maps the input data to a lower-dimensional space
(latent space=32), and the decoder network maps
back from the latent space to the original input data.
The sampling function implements the
reparameterization trick, which is a technique to
ensure that the sampling of the latent code z is
differentiable and can be backpropagated during
training. The VAE encoder network consists of
several Conv2D, Flatten, and Dense layers(see
Model 1.), and the VAE decoder network consists of
several Conv2DTranspose, Reshape, and Dense
layers(See Model 2.).The VAE is trained using the
mean squared error loss. In the initial stage of our
experiments, we evaluated the performance of
Method 2 with various parameter combinations. To
ensure compatibility with low-performance
computers, we selected the parameters m=100 and
N=1 as a reasonable configuration for conducting
experiments. For the modification of M2 (Method
4), we used parameters m=100 and See
Table 1. for more details. The proposed methods
showed better results with smaller training data sets,
but their performance worsened as the number of
training data increased. The modified versions of
WSEAS TRANSACTIONS on COMPUTER RESEARCH
DOI: 10.37394/232018.2023.11.12
Gofur Halmuratov, Arnošt Veselý