Abstract
In this study, a unique generative adversarial network (GAN) architectural variation was suggested, which engages in adversarial game serve by preserving an appropriate distance in the latent dimension of the network. This method overcomes the mode collapse problem with a small dataset. Extensive experiments are conducted using the segmented medical leaf dataset with various classes and the generator network is able to produce all the artificial image classes. This is accomplished by combining a unique training technique with a reasonably simple model design.
Introduction
In the recent past, most of the data such as images and text contain high-dimensionality features. If data consists of high-dimensional features as increases, the feature learning task will take more computational cost as well as statistical issues. To improve the accuracy of high-dimensional feature learning, Generative Adversarial Network (GAN) models are used due to data needs learning ability. GAN is a classification method that is used to learn a data description. Also, image up-scaling and style transfer tasks can be done successfully by GAN’s model. GAN’s architecture includes a generator and discriminator. generator aims to generate synthetic output from learning data and the synthetic data will be similar to training data. Whereas the discriminator decides to use a generator the sample belongs to the training data or synthetic output. Though they give a promising result, GANs are hard to train the data due to the training process instability and sensitivity to assign hyper-parameter values. As a result, it may give poor image generation. Various and large datasets are needed to train the GANs. The mode collapse problem is a main challenge of GANs since the generator could capture a small number of modes from the data distribution. Because of the above-said problem, it is very difficult to train GANs with less diverse and small amounts of data. Maximizing log-likelihood estimation or a lower bound on log-likelihood estimation has been employed to train most deep generative frameworks. As an alternative, two models such as (1) a generator, which samples data and (2) a discriminator which classifies whether the real data or generated data, both models are used to solve a minimax game. Based on these approaches, it can model an arbitrarily complex probability distribution. The original GAN was proposed by Good fellow et al. [24] to use an optimal discriminator for a given class of generators. In order to convert straightforward noise vectors, the GAN seeks to develop a generator system. It has been employed to generate accurate sampling from the distribution of data. During the GAN’s initial training, a generator and a discriminator have been taught against one another.
While the generator tries to deceive the discriminator into thinking that bogus samples are real, the discriminator may be used to distinguish between fake and real examples. Although GANs produce effective outcomes, there have been two issues that arise during GAN training, including mode collapse and instability issues. The images quality produced by the generator is severely harmed by issues with mode collapse and instabilities.
Problems with alleviating mode collapse are the primary topic of this research. It seeks to provide highly diverse data that has been predicated on the information already accessible. It may be used to downstream processes for greater performance using the enriched data. In this research, a unique GAN training approach has been proposed to address the mode collapse issue and alter the training procedure. It can work with a modest database and produces pictures with a modest data distribution.
The key contribution of the presented work is described as follows: Initially, a novel GAN model is created to tackle the mode collapse in the model. After tackling this problem, the model generates ten medicinal leaf disease images. The database employed for this research is PlantVillage dataset, from which tomato leaf and grape leaf is used. The experimental outcomes of the images generated by the presented model are evaluated in regards of accuracy, recall, AUC, F1-score, and precision. Moreover, to determine the effectiveness of the presented work, the developed model was compared with existing methods.
Literature review
This section provides relevant studies about mode collapse alleviation problems, image generation, and GANs training methods. Salimans et al. [1] developed one class approach that is focused on designing informative objective functions. The following authors namely Mao et al. [2], Kodali et al. [3], Arjovsky et al. [4], Gulrajani et al. [5], Zhou et al. [6] equally contributed to this work to design a Least Squares GANs (LSGAN). The outlier spot gets penalized using the least-squares loss function. With the help of the Jensen-Shannon differences, which constitutes a part of the GAN training approach, Arjovsky and Bottou [4] defined the role. Additionally, they recommended employing the Wasserstein range. As a result, Gulrajani et al. [5] and Arjovsky et al. [7] created WGAN-GP and WGAN, which significantly reduce the issue of mode collapse and inconsistent learning.
Other methods implemented by Makhzani et al. [8], Larsen et al. [9], and Tran et al. [10]. For example, Makhzani et al. [8] developed the Adversarial Auto-Encoders. It employs the discriminator to distinguish between Gaussian noise and the latent patterns produced by the encoder. A method of picture representation was employed by Larsen et al. [9]. As a VAE’s rebuilding foundation, it functions as a discriminator. While some sort of inconsistency is produced, their methods convert pixel-wise losses into feature-wise impairment, which may reflect the true distribution. The VAE-GAN was created by Larsen et al. [9]. It has been viewed as a supporting network. It could encourage GANs to focus extra on lacking modes. Additionally, it may generate an equivalent form to the VAE-GAN, which has a particular function.
In Energy-based GANs, Zhao et al. [11] used entropy value to measure the variety of images that is generated by the generator, at the same time, it maintains low energy state. A variational concept was created by Srivastava et al. [12] for estimating latent distributions of likelihood. It can aid in preventing mode collapse issues. The discriminator was constructed with a PacGAN framework by Lin et al. [13]. It comes to a conclusion after considering several examples from the similar class. Generators having issues with mode collapse may be penalised. The BourGAN framework was created by Xiao, Zhong, and Zheng [14] and Xiao et al. [15] to consider modes as a geometric framework of data dispersion in a measurement domain. It enables an improved generator to operate. A MSGAN paradigm was created by Mao et al. [16] to change the goal function. It has been employed as a tool to encourage generators to find minor patterns in an information field. It remains more resilient and catches patterns in feature domain.
The aforementioned GANs can convert noise to targeted images with the similar dispersion as training databases; however, they might not produce high-quality examples with a variety of styles when working with tiny examples. To address these issues, image-to-image conversion might use certain feature changes. After that, it was suggested that, in the case of unpaired databases, the latest Cycle Consistent GAN (CycleGAN) transmit the input-domain pictures to the target-domain ones. CycleGAN, nevertheless, only employs a single mapping to create samples, limiting the possibility of style transference. Yi Qin et al. [17] have been studied Tree-CycleGAN in order to produce good quality samples with a variety of styles, and a maximum diversity-loss enhances sampling diversity across multiple generators.
A large amount of info is needed for the deep neural network (DNN)-based approach’s training. Yang Wu and Lihong Xu [18] presented an Adversarial- Variational Auto-Encoder (VAE) network framework for producing ten tomato leaves diseases images in an effort to address the issue of an absence of learning images in the tomato-leaf disease identification. This network structure is employed to increase the training sample for learning a recognition model. According to the test findings, it is possible to significantly increase detection accuracy by training the Resnet recognition approach employing the extension database produced by the Adversarial-VAE approach. The model might give a sufficient number of tomato leaf illnesses images and offer a workable method for increasing the database of tomato-leaf disease images.
A deep convolutional neural network (CNN)-based method for accurately recognising apple leaf diseases has been presented by Bin Liu et al. [19]. To identify apple leaf illnesses, it also entails creating enough diseased photos and creating a fresh architecture for a deep CNN predicated on AlexNet. Using produced diseased pictures, the suggested approach’s accuracy increased by 10.83%, according to the testing findings, which demonstrated that the suggested illness diagnosis method predicated on CNN achieved 97.62% accuracy. This study shows that the suggested model offers a more effective method for preventing apple leaf illnesses with higher accuracy and a quicker convergence rate, as well as that the suggested image generating method may increase the CNN’s resilience.
Data Augmentation: Understanding image data augmentation during the GAN training procedure has been demonstrated to be efficient in use. To lessen data over-fitting, Krizhevsky, Sutskever, and Hinton [20] created a data augmentation. Additionally, Shorten et al. [21] discovered methods including rotating, cropping, and flipping, which might have a notable impact on resolving over-fitting issues. Moreover, the data augmentation approach employed by Perez and Wang [22] contains three primary subdivisions, including understanding the augmentation techniques, generative techniques, and classic transformations approaches. A approach was created by Zoph et al. [23] and it has an extremely high computational cost compared to NAS. The GAN represents the representation in generative prototypes. Nevertheless, it has been rarely employed because mode collapse issues result in a restricted variety of produced data.
Convergence and Stability of GANs: The learning procedure and the speed of convergence stability play crucial roles in GANs. Vanilla GAN was created by Goodfellow et al. [24] for producing high-quality images. Nevertheless, due to the mismatch between the discriminator and generator, learning GANs isn’t an easy operation. In substitute for KL-divergence, the wasserstein distance has been employed by Chintala, Arjovsky, and Bottou [25] to calculate the resemblance distance among distributions. Moreover, it has been employed to make the GANs learning procedure less challenging. Instead of employing weight trimming as WGAN, Gulrajani et al. [5] suggested adding a gradient penalty word to implement the Lipschiz constraint. Such challenges in learning GANs indicate that optimising loss function isn’t an easy undertaking. It shows how important it is to assess the GAN’s stability and convergence. A group of input images’ characteristics may be effectively extracted using a deep convolutional layer. A DeConvNet was created by Zeiler and Fergus [26] to help visualise the characteristics that an entirely constructed model had learnt. The CNNs may localise the image’s discriminative areas, according to Zhou et al. [27]. Grad-CAM was created by Selvaraju et al. [28]. It additionally backs up the findings from the learned discriminator, which demonstrated that PDPM was capable of reliably capturing visual characteristics. Instead of employing the photos directly, the characteristics from the discriminator have been taken from these studies and used to depict the images.
Proposed method
Since the DNN contains many configurable variables, the model’s capacity to be generalised is improved by having a huge volume of labelled data. Nevertheless, agriculture had historically had a data void, which makes it challenging to gather a substantial amount of data. Accurately labelling all of the data that has been gathered is equally challenging at the similar time. The data must be appropriately labelled by experienced professionals because it is hard to determine whether the recognition is reliable without expertise. This paper presents an image generation approach employing the GAN structure that expands the PlantVillage database’s leaf disease images and solves the over-fitting issue brought on by the recognition algorithm’s lack of training information in attempt to satisfy the learning model’s needs for a substantial quantity of image.
Current GAN methods use adversarial mechanism between Generator and Discriminator; and training of Generator and discriminator models is done alternatively. In this type of training mechanism usually binary cross entropy loss is used in discriminator which may not be efficient on small datasets for adversarial training. In our proposed method instead of a discriminator classifying real and generated/fake images we increased distance between generated synthetic and real data in the latent space. Fréchet inception distance assessment model is employed for identifying distance between real data and generated data in the latent space. This helps for discriminator to learn all the features in the data and prevents over-fitting of discriminator and forces generator to synthesize outputs with all the features in training dataset.
Generator
Because of the lesion characteristics of leaf disease images, and the subtle variations between them, they are challenging to distinguish. The generator paradigm requires the capacity to generate highly distinct lesions. The generator framework should keep the generator and discriminator algorithms in training in equilibrium in order to constantly produce distinct features. The generator network take input in latent space which is vector containing real numbers of 128 within the range of -1 to 1. These values are passed through dense layer of 2048 units, followed by four repeats of block containing Batch Normalization, ReLu activation and Convolution transpose as shown in Fig. 2(a). All the Convolution transpose layers used 2d kernel of 5A∼5 and stride of 2A∼2. In each repletion, the above-mentioned block the latent space input in compressed to output the generated image. Moreover, the tanh activating layer then outputs the ultimate image that was formed.

Block diagram of GAN architecture.

a) Block diagram of Generator b) Block diagram of discriminator.
The Discriminator network takes input as image from generator or training data. Input is passed through Convolution with activation function leaky-relu followed by three repeats of block containing Batch Normalization, leaky-relu and Convolution. Output from the last repeat block is passed through dense layer of 128 units which generated estimated latent dimension of the input images as shown in Fig. 2(b). Shared weights from earlier layers have been used in the dense connection technique, which further enhances feature extraction. The leaf disease images have undetectable characteristics that affect the outcomes of recognition models. The discriminator method is anticipated to successfully remove the minute defects. The capacity to use the derived characteristics and reuse previous data may be greatly enhanced by the extensive connectedness. The vanishing-gradient issue has been fixed by the smaller connections. The discriminator model’s vanishing-gradient is among the issues impeding GAN training, hence the dense connectivity technique is intended to help stabilise model learning. The discriminator was expected to be a truncated DenseNet since learning both the discriminator and the generator takes more storage and time.
The dense blocks, which make up the suggested discriminator architecture contain sufficient layers to enable feature extraction. It uses the ReLU activation mechanism for stability. Batch normalization is used in the normalization layer. While batch normalization concentrates on the general allocation of image and provides an instant data distribution. The accuracy of individual leaf images can be impacted by the fluctuation of each phase’s standard deviation and average as well as related leaf disease images, which frequently influence batch normalization.
Loss function
While training the Discriminator and the Generator, remains fixed, i.e., the Discriminator’s biases don’t change, because the biases have been initially set to zero. The noise is P (data) represents real data’s distribution Ex represents the expected value i.e., overall real data examples. G(z) represents the generator’s output while input noise is z. D and G represent the discriminator and generator model, respectively. x represents the grape and tomato leaf images as original data
A reliable training procedure is also needed for the discriminator framework in addition to the generator strategy. The discriminator learning will deteriorate if the discriminator framework is unable to distinguish between hazy and clear diseases and retrieve useful data. There have been two goal labels for one category of chosen leaf diseases during training. Typically, in GAN training, the genuine image has a labelling of 1, whereas the false image has a labelling of 0. These coding frequently results in overfitting since the training dataset are constrained and cannot include all the grape and tomato leaf disease images. Additionally, this coding strategy may lead to overconfidence in the model’s ability to forecast. To render the training procedure more reliable, the real image’s labels have been modified to a randomised value from 0.8 to 1.3, while the false image’s labels have been changed to an arbitrary number from 0.0 to 0.4.
Relying on unsupervised learning methods, GAN develops latent depictions from images of leaf illness, and the linear level in the discriminator framework outcomes the explicit depictions of the knowledge that the generator framework has learnt. In attempt to examine the outcomes of the suggested GAN framework, it is important to understand how the model layers the prior feature data of the tomato and grape leaf disease into a solitary output. The discriminator framework with the trustworthy weight initialization has been predicted to properly direct the image signal since the grape and tomato leaf disease characteristics are tiny and easily confuse throughout the disease detection process.
The most appropriate weight initialization technique for the training procedure is Xavier normalisation because it retains good output variations and is intended to be used in the linear level that instantly produces the explicit illustrations of grape and tomato leaf disease features. The weights W
xy
at every linear layer have been as per Equation (2).
N
in
indicates the no. of input channel N
out
indicates the no. of output channel
The effectiveness of the proposed system is examined on medical leaf image dataset, tomato leaf and grape leaf images were gathered from PlantVillage dataset from the Kaggle website [29]. The selected plant village dataset is an imbalanced dataset; Imbalanced refers to a classification database having imbalanced class ratios. Major classes have been those that represent a sizable share of the database. Minority courses comprise those, which make up a lesser percentage. Evaluation Metrics: Conventional criteria like accuracy, which don’t account for the smaller number of cases in every minority group, make it impossible to evaluate the effectiveness of classifiers used on unbalanced data.
Precision is the fraction of retrieved disinfected and infected leaves that are relevant. Precision (P) is estimated using the formula
The recall is the fraction of relevant infected and disinfected leaves that are retrieved. Recall (R) is estimated using the formula
F-measure that integrates both recall and precision to provide a more accurate assessment of performance. F-Score (Fs) is estimated using the formula
The Area Under the Curve (AUC) metric evaluates a classifier’s effectiveness across all imbalance proportions and so provides an overview of the full range.
The plantVillage database contains over 50,000 images of 14 different kinds of plants with 38 different category names make up this collection. The presented research conducts the experiment by training the developed method on segmented medical leaf image dataset. Thirty species of beneficial medicinal plants are included in the database, including Plectranthus amboinicus (Indian mint)/Coleus amboinicus (Mexican mint), Brassica juncea (oriental mustard), Muntingia calabura (jamaican cherry), Santalum album (sandalwood), and many others.
The dataset comprises of 1800 images of 30 species. Every species comprises of 90-80 RGB images. The data set is preprocessed in range of (-1 to 1) and trained the method to generate artificial images. In this method both generator and discriminator consist of approx. 4 million parameters and training curves are represented in Fig. 4.

Generated image-1 34x34 pixels.
For various instances of inputs from a uniform distribution we observed that our generator network is able to produce all the image classes in training data. Some of the images with 34x34pixels (RGB) are shown below in Figs. 3 4.

Generated image-1 34 x 34 pixels.
This experiment utilizes the tomato healthy leaf images as input, and generates leaf mold images, Septoria leaf spot images, and Mosaic virus images. As the foundational collection of crop illnesses images for the study, 18,162 leaves from tomatoes of 10 categories—correspondingly, healthy leaves and 9 varieties of sick leaves—have been selected from this group. From these images, 10 tomato leaves disease images are generated using the presented method.

Tomato leaf Input images from plant village dataset.

Grape leaf dataset.
Another set of experiment is conducted using cross validation technique. Here twenty percent of the data used in training is used for validation. The inputs are shown in Figs. 6 8. The properties of the dataset used for this cross validation is discussed below. 64×64 is the image size. Three channels are there in the image. 0.6 is the drop-down rate of the classifier. 0.001 is the optimizer learning rate. 600,000 epochs are used for training GAN. 3,000 epochs are used for training classifier.

GAN Generated images.

GAN generated image for grape leaf dataset.
This research utilizes the grape healthy leaf images as source, and generates leaf blight disease, black rot disease, and black spot leaf images. Black rot-476 images, esca-552 images, and leaf blight-420 images and healthy-171 images are only a few of the symptoms included in the database employed to test the suggested technique. From these images, 10 grape leaf disease images are generated using the presented method.
The visual appearance of the enhanced image has been crucial for enhancing classifier effectiveness. There are various issues which has to be solved before predicting the generated image quality. They are as follows The generated image’s visual quality The appropriate class should be specified for the generated image. There must be no repetition in the generated image.
The output of the images produced by GAN achieves more visual quality in terms of sharpness and it is very clear. The generated output leaf images are better in most of the cases. As per the analysis done on the existing methods, the proposed GAN model for augmentation has many variations compared to that of the existing methods.
The dataset used for proposed work is an imbalanced dataset. Accuracy by itself was not a suitable performance indicator for a database that is unbalanced. Therefore, an effective way of assessing the effectiveness of the suggested technique is by using precision, AUC, F1 score, and recall. Hence, particularly these performance metrics are significant for GAN-based approach which outperforms than other existing approaches. The performance assessment on the database for medicinal leaves is shown in Table 1, the performance evaluation on the generated tomato leaf diseases from the plant village database using presented model is shown in Table 2, and the performance assessment on the image of the grape leaf obtained from the plant village database using presented approach is shown in Table 3.
Performance analysis of proposed method on medicinal leaf dataset
Performance analysis of proposed method on tomato leaf from plant village dataset

Loss discriminator and Loss Generator training metrics on medicinal leaf dataset.

Training loss and validation loss on tomato leaf from plant village dataset.
Table 4 indicates the comparative assessment of proposed method with the existing methods like baseline, Random under sampling, Random oversampling, SMOTE and GAN based methods. The existing methods do not perform well than the proposed GAN model in terms of all the metrics used for evaluation. Because during Random under sampling there might be loss of information. In random oversampling the recall value is not better than the proposed GAN model. The outcomes clearly show that the GAN-based enhancement was superior to conventional sampling-based techniques at reducing the impact of the distorted information distribution. Additionally, we found that techniques predicated on GAN and SMOTE assist prevent overfitting. Moreover, overfitting might indicate an approach that is biassed regarding the training database since it happens while the assumption being tested has a greater training accuracy instead of testing database accuracy. The loss function values for both training and validation set during the training period can be used to make this determination. Thus, the conclusion may draw that the framework is doing a decent job of categorising the training database but unable to implement that understanding to the assessment database if the learning loss has been significantly lesser than the evaluation loss. As they incorporate additional samples beyond those found in the source database, GAN and SMOTE-based methods prevent overfitting. The method that was most susceptible to overfitting was the one predicated on random oversampling. Moreover, the idea behind this research is mode collapse tackling, which is the most challenging and unreliable issue to address in GAN that does not occur frequently but does occur occasionally. As the term implies, it happens while the generator repeatedly creates an identical image and the discriminator has been incapable to account for the variation in generated examples. As a result, the generator may easily trick the discriminator. It limits the generator’s ability to adapt and concentrates on a small number of images rather than creating a wide variety of images. However, the presented model effectively tackles the mode collapse, which was proved based on the evaluation metrics like accuracy.
Performance analysis of proposed method on tomato leaf from plant village dataset
Comparative analysis on the proposed and the existing methods
The conclusion drawn from the findings is that the suggested GAN model is superior in terms of delivering outcomes across all criteria. Additionally, the GAN-based technique has a higher F1-score and AUC, indicating that it detects both classes with a decent balance.
We propose a novel GAN training method to overcome the existing GAN problems with smaller data sets by maintaining the distance between generator and discriminator latent space. This method is capable to generate all classes of images with the smaller image dataset and overcomes the model collapse problem and out performs several problems of GAN method on larger uniform data distribution.
