Abstract
The identification and classification of plant diseases is of great significance to ecological protection and deep learning methods have made a great of progress in the common plant diseases identification for specific plant. While faced with the same plant disease of other plants, due to the insufficient or low quality training data, current deep learning methods will be difficult to identify the diseases effectively and accurately. Inspired by the advantages of GAN in dataset expansion, we propose the CycleGAN based confusion model in this paper. In this paper, GAN framework is improved by adding noise label and learn together during training stage, which migrates the data of common plant diseases to the plants with insufficient or low quality data. In order to evaluate the quality of the migrated training dataset among different GAN approaches, we introduce the quality indicators of the migration images such as MMD, FID, EMD etc. We compare our model with other GANs model, and the experimental results show that the proposed model obtains better results in the migration process, which make it more effective for the identification of cross species plant diseases.
Introduction
Introduction Plant diseases are harmful to plants themselves and may also result in a series of chain reactions, finally causing major losses to agriculture and forestry. Adoption of artificial intelligence technology has become one of the crucial means for diagnosis and control of plant diseases [1]. As the most popular artificial intelligence technology, deep learning has extensive application prospects, such as information security, smart cities, and autonomous driving. Therefore, combining researches of deep learning methods with plant diseases can stimulate more creativity and solve the domain problem effectively [2].
So far, many computational model, algorithms and methods have been used to assist the decision-making process in agriculture and forestry. Traditional machine learning methods including regression models, Gaussian mixture models, decision trees, random forests, Bayesian networks, etc. [3] can provide more optimal decisions based on the collected data, while deep learning methods are trained based on the existing data and experience and can provide more accurate results, especially for the more complex applications, such as detection of plant disease locations, identification of plant pathologies, which requires multiple deep neural networks [4, 5].
Convolutional neural network model (CNN), a kind of multiple deep neural networks performs well in general scenarios such as classification and prediction, while, research in plant diseases often faces more uncertainties and need to consider many practical factors. Researchers have made many efforts to improve the accuracy of the model by proposing a variety of variant CNN models and optimizing the algorithms [6, 7]. For example, models such as RCNN and fast RCNN are used for target detections, and tricks such as highway network, resnet and batch normalization are used to optimize the training process [8]. The above improvements are mostly based on the sufficient dataset. If a specific plant is a relatively rare type or the relevant disease images have not been collected, it will be difficult to detect with CNN alone.
It has been noticed that the GAN series model can generate new images based on the existing images and effectively expanding the dataset [9, 10]. However, the GAN models have their limitations as well. For instance, multiple species may be involved in the plant disease migration stage, and thus the traditional GAN models will have poor effects between different data domains [11–13]. Given the actual situation, the Cycle-Consistent Adversarial Networks(CycleGAN) model [14] will be a better choice to improve the performance of GAN, because it can complete image-to-image translation between different fields and deliver better applicability than traditional GANs in this stage. Although it has natural advantages such as not requiring paired data and unsupervised training, it has some shortcomings. According to past researches, CycleGAN performs better in the areas with similar appearances, like in the transition from horses to zebras or from oranges to apples, and it performs poorly in the areas with dissimilar appearances, such as in transition from cats to houses [15].
In this study of plant disease migration, although the exaggerated image fields, such as cats and houses, will not be involved, differences in appearance of plant leaves will naturally appear, so some appropriate optimizations are required [16]. Therefore, a confusion model based on CycleGAN is proposed. During the training phase, a small amount of data are added from the opposite domain to the training data as a noise label. In the early stages of training, noise is not eliminated; rather, the training data and noise data are learned together by the generator. This would help improve the stability and adaptability of the generator. What’s more, we also expand the dataset and verify the recognition correctness of our generated images [17, 18].
Material and methods
Model This chapter introduces the primary dataset, models and methods used in this study for the questions raised.
Problem setting
Given the actual situations, some limitations may be faced, such as insufficient data. It is necessary to consider both large and small samples, since two different plant species A and B may be infected with the same disease at different times. When A is infected, a large number of diseased leaf images can be collected, but B may be healthy at that time. Although it’s necessary to take appropriate preventive measures against B’s disease, since there are not enough images about B’s illness, it is difficult to judge at which stage the signs of illness will appear. Therefore, based on the above assumptions, this study conducts research and proposes a feasible solution.
Dataset description
The dataset used in this study is PlantVillage, which contains 38 categories of diseased and healthy plants for 14 crop species. The total amount of the dataset is 54,323. Available online from Kaggle and Github, the dataset has very high quality, with each image containing only one single leaf and a corresponding label. Therefore, the PlantVillage has also been widely used in various studies. For the hypotheses and questions put forward in this study, two plants with dissimilar leaf shapes and infected with the same disease and their healthy leaves are selected in this experiment.
Cycle-Consistent adversarial networks
The CycleGAN model is aimed at learning the mapping functions G : X → Y and F : Y → X from domain X and domain Y, this study considers data distribution as x to p (x) and y to p (y) for each x ∈ X, y ∈ Y. There are two discriminators named D
X
, D
Y
with different super parameters and two generators with shared super parameters. These two discriminators are purposed to distinguish the generated images G
x
, F
y
from the source images x and y while the generators are aimed to deceive D
x
and D
y
. The CycleGAN model contains adversarial losses and cycle consistent losses, which should be optimized together to reach the training objective.
Along with adversarial losses, the cycle consistent losses can be computed to guarantee that the translation procedure can be a cycle, that is the input image x → G (x) → F (G (x)) → x. The entire training process will take a tour from x and back to x. The G and F for the domain Y should be satisfied as well.
The overall optimization objective is to compute the sum of loss functions.
This study proposes a confusion model based on CycleGAN architecture, which generates new data from the source data and utilizes noise label to calculate an additional loss. In addition, it utilizes residual network structure to reduce the amount of parameters so as to ensure that the network can converge more easily [19]. The architecture of the CycleGAN based confusion model is shown in Fig. 1.

The structure of confusion model based on CycleGAN.
As shown in Fig. 1, (a) is the original CycleGAN model while (b) is the model proposed by this study. X and Y represent two data domains, and G and F represent the mapping functions as discussed in Section2.3. N y and N x stand for the noise data that are added as labels from each opposite domain. The following formulas also use the same definition.
The proposed model, in addition to the same parts as CycleGAN, the noise labels are added. Therefore, during the generating phase, it is necessary to calculate the noise loss as follows.
Through batch training, the noise loss per batch is computed and then added to the consistent loss.
During the early training phase, the generator learn both noise label data and training data.
A simple convolutional neural network consists of an input-output layer, one or more convolutional layers, pooling layers and a fully connected layer. The convolutional layer and the pooling layer are unique to the convolutional neural network, which uses multiple convolution kernels to extract and analyze the features of images. The pooling layer performs a pooling process on the convolutional layer to maintain a certain transformation-invariant characteristic, and the max pooling is to retain the largest data in the pool, and finally output the classification result.
In order to prove that the new generated images from the proposed model can be used and correctly recognized through a pre-trained CNN model. This study builds a sequential CNN model which is made up of 5 Conv2D layers, 4 pooling layers, a dense layer and 6 batch normalization layers so as to support this experiment. What’s more, this study sets dropout layer to ignore hidden layers nodes randomly. In the traning process of each batch, hidden nodes appear randomly with a certain probability so that the update of the weights no longer depends on the joint effect of implicit nodes with fixed relationships, which prevents overfitting and makes sure that the classification model is trainable [20]. The structure is shown in Fig. 2.

The structure of sequential CNN model.
The experiment in this study is based on the PlantVillage dataset. In order to simulate the situation of research background, we select two species of plants(tomato & pepper) along under two health conditions(healthy & bacterial-infected). Before feeding into the network, the images are preprocessed. All images were resized to 256 × 256, Besides, all points in the pixel matrix are normalized between 0 and 1. To guarantee the reliability of the experiment and seek a better way to expand dataset, several deep generative models are trained for a horizonal comparison at the first step.
DCGAN
Deep convolutional generative adversarial network model(DCGAN) utilizes all convolutional networks and strode convolutions instead of deterministic spatial pooling functions [21]. Besides, cancelling the fully connected layers above convolution features for stability, convolution and up-sampling methods are utilized to achieve a 64 × 64 × 3 three-dimensional image obtained by multi-layer convolution and up-sampling of noise input. The test result of the same plant species is shown as Fig. 3, while the training loss for D and G is shown as Fig. 4.

Generated 64 × 64 resolution pictures on Plant Disease dataset. The specific species is pepper with bacterial infections.

The DCGAN model’s training losses of Discriminator and Generator through iterations.The networks can converge after a certain number of iterations.
An effective extension of conditional generative adversarial network(CGAN) model is named pix2pix [22]. The model can guide the generator by feeding the image with its label corresponding to the pixel level. The label is extracted through Watershed algorithm as shown is Fig. 5. This study’s test with the same plant species is shown in Fig. 6 and the training loss is shown in Fig. 7.

The feature label showcases the basic structural characteristics of the leaf, and the model can be better supervised based on this specific label.

The pix2pix model is trained on paired data with noise and labels. Every three columns form as a group. The left column shows the label of the leaf in the pixel level, the left-second column demonstrates to be the generated image and the left-third column shows the original image. The specific species is pepper with bacterial infections.

The pix2pix model’s training losses for Discriminator and Generator through iterations. The G and D are trained independently with different parameters. Figure 6 shows that the discriminator can coverage in early training, while the generator can be stabilized eventually in the iteration.
The model of this study is tested in two cases: the same plant species and different plant species. In each case, there are healthy plants and bacterial infected ones. The advantage of this model is that it is not necessary to use paired labeled data. That is, the utilization of unsupervised training methods can effectively reduce experimental costs. What’s more, treating some opposite domain data as noise label can not only improve the stability and adaptability of the model, but can also prove the correctness of the research ideas. However, since the input data are noisy, the training time will also be extended. Image translation can be achieved across plant species with dissimilar appearances and the dataset can be expanded as well, as observed in Figs. 9. The images generated through different models

Generated samples from the same plant species. The process of the original → fake → generated represents the circulation of x → G (x) → F (G (x)) → x as shown in Section 2.3. Here, the healthy plants are translated to ill ones, vice verse, with the specific species of pepper.

Generated samples from different plant species, based on the leaves of pepper and tomato. Here the healthy plants are translated to ill ones, and ill plants are translated to healthy ones across species.
in this study are also compared, which is shown in Fig. 10. Table 1 shows the accuracy of various generative models in the pixel level with the same input image by using fully convolutional networks(FCN). Table 2 shows several metrics for GAN evaluation [23].

Results of images generated by different models trained on paired data. From left to right: pix2pix, DCGAN, CycleGAN, Proposed model and Ground truth.
FCN scores for different deep generative models at pixel level
Several evaluation methods for generative models
After generating new images, we build a sequential of CNNs to confirm whether the new images can be recognized correctly, and the results can be reflected in recognition accuracy. The architecture of CNNs is shown in Fig. 2 and the amount of parameters can be seen in Table 3. We split the data set and set 80%of the images as the training set, while the remaining part as the validating set. The total amount of parameters is 13395844, while the trainable parameters can reach 13392964, and the non-trained parameters is 2880. In order to compare the impact of different generative models on data enhancement, Tables 5 represents the accuracy and the degree of improvement after expanding the dataset.
CNNs architecture with parameters
CNNs architecture with parameters
Classification accuracy after expending the dataset with different deep generative models
Classification accuracy of the new generated species’ images using CNN model, which is pre-trained with the same parameters
This section conducts some analyses of the experimental results. Generally speaking, the research problems corresponding to this experiment can be divided into two parts. The first part uses a generative model to generate it for situations with less data, while the second is to verify that the images generated by this study’s model can be recognized normally. Given that the research subject lies in the migration of plant diseases across domains, the proposed CycleGAN based confusion model is used to generate images while comparing the effects of multiple generation models. To this end, the images generated by several models are demonstrated and compared visually.
In order to compare the quality of the generated images, several approaches have been taken. Table 1 shows the comparison of the accuracies of several models in the pixel level. For the same input image, FCN is used to segment the generated image to verify whether the local features have been transferred, and then the FCN score is calculated [24]. It can be found that in the experiment, cyclegan and the proposed model have achieved better results than DCGAN. An additional pix2pix model is added for supervised learning comparison. What’s more, some indicators are introduced to evaluate the effectiveness of the GAN model in this experiment. By using six indicators, the experiment verifies the feasibility of the proposed model from multiple perspectives. Due to the noise label added in the training process, the improvement is most obvious in the FID column.
Besides, to verify that the images generated by the proposed model can be recognized normally is another vital task. The generated images are used to expand the data set and compared with traditional methods. Table 4 shows the classification results of the model after data expansion. Traditional CNN models include such data enhancement methods as selection, enlargement, and reduction [25]. It can be observed that compared with this model, the generative model can increase the amount of data to improve the accuracy to a certain extent; however, it is limited by the quality of the generated images and will produce certain fluctuations. By using the pretrained CNN model (with the same parameters as the CNN model in Table 4) to identify the image after the disease migration is completed, this study finds that the newly generated images can deliver a good recognition accuracy, as shown in Table 5. It can be concluded that the images after disease migration obtained through this research can be used for plant disease classification.
Discussion
The analyses in the previous sections demonstrate that the CycleGAN based confusion model proposed by this study to address insufficient amount of data that may occur in the scenario of plant diseases and the problem of cross domain plant disease migration can deliver a good performance [26]. This finding is verified both by the effect of the generative model and the accuracy of the pre-trained classification model.
The first part is for the comparison of generative models. For the pix2pix model and the DCGAN model, previous studies have proved that they are not suitable for image generation of different species [27]. Therefore, this study compares them with the proposed CycleGAN based confusion model for the same species. According to the CycleGAN architecture, the CycleGAN based confusion model will get a target output and a cyclic output for each input. And the cyclic output can be considered to be the same as the input. By comparing these models, it is possible to calculate the FCN score to get pixel accuracy, class accuracy, class IOU and other indicators, while comparing the generated effects. Since the pix2pix model is based on supervised learning, the effects of supervised learning and unsupervised learning can also be compared.
For the migration of diseases involving different species, this study compares CycleGAN with the proposed CycleGAN-based confusion model, finding that CycleGAN does not perform well in the areas where the appearances are not similar, and there are also differences in the appearances of plant leaves of different species in nature. Therefore, by calculating the FCN score and indicators such as KNN, MMD, FID etc. and comparing it with CycleGAN, this study verifies that the model proposed has a better effect on the translation between different species.
The second part is about the accuracy of the classification model. The classification experiment is to verify whether the newly generated images can pass the test of the pre-trained classification model to prove the reliability of the overall process of the experiment. Before feeding the data into the generative model, this study builds a CNN model based on the plants involved in the experiment, and saves the model after training. After the generation is completed, the newly generated image is recognized and the accuracy rate is calculated.
Obviously, this study has some aspects that can be expanded and improved. It should be pointed out that this study has not paid much attention to the continuity of plant diseases. Although the model proposed in this study can obtain disease images of different species, the continuous conversion of diseases on the same species has not been completed, so it may cause extra efforts to carry out phase comparisons. In addition, although adding the corresponding noise label may bring about improvements, this experiment is mainly focused on the leaves of different plants, and there are more or less commonalities between the leaves of different plants. Therefore, the improved model also has some limitations. These aspects are also the directions that can be expanded in the future.
Despite these limitations of this study, it is still meaningful, demonstrated not only in the improvement of the model, but also in the combined research of artificial intelligence technology in the plant disease scenarios. They can give certain inspirations. The pressure and harm of plant diseases to the environment and the economic development are obvious to all. The results of this study can be applied to many aspects of plant disease scenario, such as diagnosis and classification, while further boosting the integration between artificial intelligence and ecological environment. This is also the original intention of this study.
Conclusion
By introducing some of the problems that are faced in the plant disease field, this study proposes certain feasible solutions. Literature review and experimental verification show that, in the case of insufficient data, generative confrontation network series models can alleviate the problems. In addition, a CycleGAN based confusion model is proposed that uses additional noise label to improve the model’s generation effects and the quality of the generated images across different plant species. Through the migration of diseases images, the dataset can obtain expanded and get help for the next stage of research work. Finally, it is hoped that this research can play a certain role in the many aspects of plant diseases.
Footnotes
Acknowledgement
This research is supported by the general project of the National Natural Science Foundation of China. Project approval number: 32071775.
