Abstract
In order to solve the problem of long training time and large samples required by traditional image recognition model, a method of crop pest recognition based on transfer learning and data conversion was proposed. It takes CNN models such as Inception V3, VGG16, ResNet as the backbone structure. And the transfer learning was used to improve the model effect. The original picture data was expanded through the transformation of flip, rotation, scale, crop, translation and shading. Based on the data of 11 common pests such as white grub, east asian locust and whitefly etc., the model training and recognition was carried out. The result shows that, the accuracy of transfer learning model is higher than that of non-transfer learning model. The Inception V3 model performs well of all, the recognition accuracy is more than 98.94%. Through the analysis of cross entropy and confusion matrix, data transformation is helpful to improve the accuracy of the model with small sample.
Introduction
With the development of agricultural science and technology, agricultural production efficiency has been effectively improved, but the prevention and control of diseases and insect pests has also become the focus of attention [1]. According to the survey of the Food and Agriculture Organization of the United Nations (FAO), it is 20%–40% of the expected harvest of the amount of crop lost by diseases and insect pests in the world every year, and the economic loss is US $200 billion [2]. The 2020 report of China Agricultural Technology Extension Center shows that due to the influence of climate and farming methods, the diseases, pests and weeds in China’s crop production are on the rise year by year. Providing farmers with timely pest identification, diagnosis and advice is an important part of pest control. Traditional pest identification mainly depends on experienced agricultural experts [3]. This requires an on-site inspection. This will take time and energy. It is an important research field of how to help farmers quickly identify different kinds of pests, so as to make effective prevention decisions.
In recent years, using deep learning to identify pests has become a research focus. In the United States, some researchers used plant village data, including 14 different plants, 26 diseased leaves and 12 healthy leaves. Based on the above data, a convolutional neural network recognition model was established, and the recognition effect reached the expected [4]. Chinese scholars have also done some research on pest identification based on deep learning. SuHong designed an R-CNN model. The efficiency of Citrus main disease classification and pathological detection could be improved using multi-layer neural network to identify the characteristic images of Citrus Huanglongbing, red spider mite infection, canker disease and other major diseases, and to analyze its accuracy and complexity of calculation [5]. Wei et al. built an adult recognition model of Spodoptera meadow and its related species based on deep learning and feature visualization. Its dataset contains 10177 images [6]. Cheng et al. used Google net and Alex net models to identify stored grain pest images respectively [7]. Yang Guo proposed a deep learning model based on convolutional neural network, which can locate and identify 23 kinds of pests quickly and accurately from the tea garden environment [8]. Glickd and Miller uses the Deep Convolution Neural Network (HD CNN) to identify and classify 277 insect images, with more than 210000 insect images in the samples [9].
At present, the image recognition method based on CNN does not need to establish many heuristic rules and invest manpower for long-term maintenance compared with traditional methods (Gist, hog, sift, LBP image feature extraction and SVM, random forest and decision tree classification, etc.), so as to meet the requirements of accuracy. It not only takes into account the accuracy, but also improves the efficiency of model construction. But those studies have the problems of long training time and easy over fitting. In addition, model training requires a large amount of image data. If a small sample data set is used to train the model, its effect is uncertain. In practical work, it is difficult to organize and collect data sets. This problem is more obvious when building a model requires a large amount of data. In this regard, some studies have proposed transfer learning methods [10, 11, 12, 13, 14, 15, 16]. It uses the existing knowledge of some related tasks or fields to improve learning efficiency and effectiveness by fine tuning the pretrained model.
Some studies have put forward the method of transfer learning. Aiming at the problems of time-consuming and low accuracy of traditional wild plant identification algorithms, Peng proposed a method based on ResNet 101 network and transfer learning. The test results show that the accuracy reaches 85.6% [17]. Chen studied how to classify and identify images related to terrorism by using deep learning and transfer learning technology. It designs a suitable deep neural network model and transfer learning method for the problem of less positive samples of the terrorist explosion image. The results show that the accuracy rate is 96.7% [18]. In order to improve the classification performance and timeliness of clothing images, Xiexiaohong proposes a classification method of convolutional neural network clothing images based on transfer learning. It transferred the trained model on the clothing image dataset, retained the parameters of all convolution layers of the pre-training model, and frozen the network parameters of the front layer and fine-tuned the network model, so that it could adapt it to the recognition of the clothing image. Six models such as VGG16 were selected and the experiment was carried out. The results show that these models are effectively improved in classification accuracy and timeliness after transfer learning [19]. Yu uses the transfer learning method and Inception V3 to train the road traffic sign recognition model. The recognition rate of the model for German traffic sign recognition (GTSRB) is 95.4% [20]. Based on the keras framework and Inception V3 network model, Sun adopts transfer learning and feature fusion to optimize classification, so as to effectively identify ten precious seaweeds in the Bohai Sea. The results show that the accuracy is 92.53% [21].
From the current research, it can be seen that using transfer learning methods to improve the operation effect and efficiency of image recognition models has been explored in many fields. Transfer learning can transfer the recognition ability of the image recognition model based on a large amount of data training. In professional applications, better recognition accuracy can be obtained through fine-tuning and training. Among them, the Inception V3 model avoids over fitting by increasing the breadth and depth of the network, and maintains the recognition accuracy, which has attracted the attention of researchers. However, there are few reports in the field of agricultural pest identification. To sum up, in this paper, take crop pests as an example, use the transfer learning method to improve the model training efficiency, and extend sample data through geometric transformation. And then, compare the identification results of geometric transformation processing and mainstream model with transfer learning, to construct a model with small samples and high training efficiency. So as to provide effective help for pest control in agricultural production.
Materials and methods
Image data source
The data comes from the database of pests and diseases of Beijing Agricultural Data Resource Center. Because Beijing’s agricultural production is mainly vegetables, fruit and corn. From the perspective of the harm object, common pests such as whitefly endangering tomatoes, thrips endangering cucumbers, aphid endangering zucchini, East Asian locust endangering corn and wheat and so on were selected. From the perspective of the size, mole crickets with larger size, aphids and thrips with smaller size were selected. Therefore, the data is representative, and the model based on the data will be practical. It mainly includes 11 species: white grub, leafhopper, pupa, oriental migratory locust, whitefly, thrips, scarab, mole cricket, bollworm, beet armyworm, aphid. Pictures of larvae and adults are selected. There are 50–70 pictures of each kind. According to the representativeness of picture features, this paper selects 50 different pest pictures in each species as the initial data to provide the basis for model training.
Image data preprocessing
Image scaling. If the image size is large, the model needs more epoch to achieve good accuracy. In order to stabilize the time complexity of training, the final size of the input image is 256*256. Rescaling the image will adjust the size according to the given scaling coefficient. Rescaling may affect the precision. In order not to lose the image information as much as possible, this paper uses cubic convolution interpolation algorithm to interpolate the image reasonably [22].
Normalized image input. Normalization ensures that each input parameter has a similar data distribution. Taking advantage of this feature, normalization operation can make the network converge faster during training. In this paper, the average value is subtracted from each pixel and the result is divided by the standard deviation to complete the data standardization. In this experiment, the image pixel values are normalized in the range of [0, 255].
Geometric transformation
Image data transformation is the transformation of image pixels without changing the image content. In the field, because of the geometric distortion caused by the imaging angle, perspective and even the lens itself, the images containing the same pests show different appearances. At the same time, when farmers take photos in the field, affected by the field site conditions, they will also form pictures with different angles and different lights for the same subject. So, through the angle change of the geometric transformation of the image, simulate different photographing angles in the field. Through the change of image light, simulate different light imaging conditions of the facility. It can more closely show the possible image features of field pests.
In addition, the image data of agricultural pests is limited. A class of pests can obtain dozens of pictures. It belongs to small sample data. Training deep neural networks on small sample data sets may lead to over fitting and poor performance in validation data. Through image transformation, the data set can be expanded to prevent over fitting. In this paper, the transformation of the original data set includes image flip, rotation, scale, crop, brightness, saturation and etc. The enhancement effect of some pictures is shown in the following Fig. 1.
The original picture of each pest species is 50. After transformation, there are 5437 in total. The details of the data set are shown in the following Table 1.
The details of dataset
The details of dataset
Example of geometric transformation of pest image data.
Inception V3 model is the third generation model of GoogLeNet. The effective way to obtain efficient network performance is to increase the breadth and depth of the deep learning network. However, blind increase will lead to the sharp increase of network parameters, which is easy to cause over fitting. In this regard, Google designed inception models.
Inception V3 is a convolution neural network with 47 layers of depth. The initial input is a region of the image, and then the representative features of each layer are obtained through multiple convolution operations. The underlying structure of Inception V3 is composed of convolutional neural networks. The convolution neural network reduces the number of parameters to be trained by convolution kernel and weight sharing [23, 24, 25, 26].
Inception V3 module obtains the convoluted features of the previous level from the input layer. And then, extracts the higher level features of the image through multiple parallel convolution layers and a pooling layer. The bottleneck layer is designed in Inception V3 to reduce the number of channels of the characteristic graph, and then performs other convolution operations. Inception V3 optimizes the original n * n neural network structure into an n * 1 and a 1 * n further (Fig. 2). Compared with VGG, this improvement reduces the number of parameters, over fitting, and improves the calculation speed of the network. At the same time, the result of asymmetric convolution structure resolution is more obvious than that of symmetric convolution structure resolution, so that there are receptive fields of different sizes, which can deal with more and richer spatial features.
Structure of Inception V3 model.
Network batch standardization layer (BN layer): In the training of traditional deep neural networks, although the network with the same structure can be stacked to improve the network performance, the inputs of these network layers are changing, and the distribution of inputs and outputs may be inconsistent, which brings great obstacles to feature extraction, and makes the training method based on gradient descent very difficult. In this case, the training process can only use a small learning rate. Using the BN method for each layer can effectively solve this problem. When BN method is used for a certain layer of neural network, the interior of each mini batch data will be normalized. By standardizing the output of each layer, the input and output meet the same positive distribution. The input of other network layers changes little, so the learning effect will be much better.
BN algorithm normalizes a layer through the following formula:
For the use of the batch gradient descend method,
It will affect the characteristics learned by network a of this layer if only the above normalization formula is used to normalize the output data of one layer a of the network, and then send it to the next layer B of the network. Therefore, learnable parameters are introduced
Each neuro
Loss function: Select the loss function as the cross entropy loss function. The loss function is as follow:
The
There are 11 tag values in the current model. The prediction of the
In image recognition, the softmax function represents the probability of recognizing a picture as a specific class, and its formula is as follows:
Transfer learning is a way to transfer the trained model parameters to a new model to help the new model construction. For most of the data or tasks are relevant, the learned model parameters (also known as the knowledge) can be shared with the new model in some way through migration learning, so as to speed up and optimize the learning efficiency of the new task. This method does not need to learn from scratch like most networks. In the Inception V3 model, there are about 25 million parameters. Retraining these parameters will consume a lot of time. Moreover, when the data set is small samples, the direct use of deep learning network is easy to produce the problems of over fitting and low recognition accuracy. In addition, when the network model starts training again, a special computer with strong computing power is also necessary [27].
The methods of pretraining model in transfer learning mainly include selecting source model, reusing model and adjusting model. Reuse model is to select the pre-trained model as the starting point of the second task. In this paper, the reusing model is adopted. The trainable parameters are inherited from the source model. So the model does not start gradient descent from the random initial value during fine tuning. Usually, the model can reach the optimal value suitable for the reconstructed model after a small step adjustment. This feature enables the reconstructed model to adaptively adjust the high-level parameters according to the target samples, so as to improve the target detection ability.
The transfer learning steps include:
Firstly, the network layer before the linear connection layer of Inception V3 model is defined as the bottleneck layer. Network parameters of the bottleneck layer pretrained with the public dataset Imagenet are extracted and remain fixed in the next model training. Imagetnet is a large-scale data set, which can make the network model learn low-level information such as object geometry, shape and attitude, so as to enhance the generalization ability of network recognition. The bottleneck layer is followed by the full connection layer. Using the pest image data as the input of the model, a new full connection neural network is trained for the identification of leaf pest disease. Based on the learned bottom feature recognition ability, the network model can fine tune the parameters of pest species recognition, so as to master the recognition ability of new data more quickly and accurately.
Training and parameter setting
This paper uses the deep learning framework Tensorflow to build the model. Use transfer learning to train the model on the basis of Inception V3. Sets the parameters of the convolution layer to the trainable state, and trains the parameters of the softmax full connection layer in the model. The training process is as follow (Fig. 3).
The adjustable parameters during model training are shown in the Table 2.
Parameter setting of model training
Parameter setting of model training
Training process of crop pest identification model.
The accuracy of different models
In the lab, tensorflow is used for model training. The recognition accuracy results of Inception V3, VGG16 and ResNet of transfer learning are shown in the table below.
It can be seen from Table 3, the recognition accuracy based on transfer learning is more than 91.52%, and the overall recognition accuracy is 17.21% higher than that of non-transfer learning which is trained directly with neural network model. Among them, the Inception V3 model performs well of the three models, followed by ResNet and finally VGG16. The recognition accuracy of transfer learning Inception V3 model is 98.94%, and that of the non-transfer model is 90.88%.
From the perspective of pest identification, only 11 of the 33 identification accuracy of the three models training directly is more than 90%. Using transfer learning, except for some insect pests with small size and unclear characteristics (such as leafhopper, thrips, Scarab and bollworm), 27 of the 33 recognition accuracy of the three models is more than 90%, accounting for 81%. Among them, even in the identification of leafhopper, thrips, Scarab and bollworm, the transfer learning Inception V3 model performs best, and the recognition accuracy is more than 97.73%, 98.24%, 98.88%, 98.53% and 97.73% respectively.
It can be seen from the above data that the effect of the model using transfer learning is better than that of the non-transfer learning model. The main reason is that transfer learning is a machine learning method, which is to transfer the knowledge of one field (the source field) to another field (the target field), so that the target field can achieve better learning results. The amount of data in the source domain is sufficient, while the amount of data in the target domain is small. Agricultural pest recognition is such a scenario, which is more suitable to give full play to the advantages of transfer learning. The structure characteristic of the Inception V3 model reduces the amount of parameters and over fitting. Combined with transfer learning, the Inception V3 model obtains high classification and recognition ability. At the same time, through transfer learning, the classification and recognition ability is extended to crop pest recognition, so that the model has better generalization ability. Therefore, the effect of the model based on transfer learning is better than that of the non-transfer learning model, and the effect of Inception V3 based on transfer learning is better than that of other models.
Recognition accuracy of transfer learning and non-transfer learning
Recognition accuracy of transfer learning and non-transfer learning
The accuracy. In order to explore the influence of the expanded data set on the training results of the model, the expanded data set and the original data set are used for training respectively. The comparison results of training are shown in the figure below. After 10000 times of training, based on the original data set, the accuracy of the training set is 98%, and the accuracy of the verification set is 91.5% (Fig. 4). Based on the expanded data set of data transformation, the accuracy in the training set is close to 99%, and the accuracy of the model in the verification set is more than 98% (Fig. 5). The performance of the model on the extended data set is significantly higher than that on the original data set.
The model training accuracy of original data set (yellow is the training set and blue is the verification set).
The model training accuracy of transformation data set (yellow is the training set and blue is the verification set).
Change of model cross entropy of original data (yellow is the training set and blue is the verification set).
Loss function. The loss function reflects the error between the calculation result and the actual result. The smaller the value of the loss function, the better the performance of the model. In the cross entropy change curve based on data transformation (Fig. 7), when the step is close to 2500, the value of the loss function is reduced below 0.1. When the step is about 3500, the value of the loss function decreases to less than 0.1. In the change curve of model cross entropy based on original data (Fig. 6), when the step is close to 2000, the value of the loss function decreases to less than 0.1. When the training is completed, the cross entropy of the verification set is still above 0.3. It can be seen that the model based on data transformation has lower cross entropy and better model effect.
Change of model cross entropy of transformed data (yellow is the training set and blue is the verification set).
Confusion matrix. Confusion matrix is an important tool for evaluate the performance of classification models. Based on the original data and transformed data, the confusion matrix is drawn for the operation of the model in this paper. The results are shown in Figs 8 and 9. The label of the ordinate represents the actual category and the abscissa represents the predicted category. The confusion matrix shows the details of correctly identification and wrong identification.
Confusion matrix using extended data.
In confusion matrix, WG is white grub, LG is leafhopper, PP is Pupa, OML is oriental migratory locust, WF is whitefly, TP is thrips, SR is scarab, MC is mole cricket, BW is bollworm, BA is beet armyworm. AH is aphid.
It can be seen from the figure that, except for cotton bollworm, the diagonal numbers in the confusion matrix after data transformation are greater than before. The recognition effect of the model using data transformation is better than that of the model using the original data set. Figure 8 shows that two cotton bollworms were identified as leaf cicadas, one was identified as thrips and two were identified as aphids incorrectly. The cotton bollworm and these incorrectly identified pests are small. It indicates that the transfer learning Inception V3 model needs to be improved in the recognition accuracy of these small pests.
Confusion matrix using original data set.
A small sample and efficient identification method of crop pests based on transfer learning and data conversion was proposed in this paper. From the experiment, the main conclusions are as follows.
Compared with VGG16 and ResNet, the accuracy of Incetpion V3 model based on transfer learning is 98.94%, and the performance is the best. Because of the structural superiority of Inception V3 model and the migration of big data recognition and classification ability of transfer learning, the accuracy of agricultural pest identification is effectively improved.
In the application of crop pest recognition, the identification model based on transfer learning is better than the non-transfer learning. The accuracy of the model based on transfer learning is more than 91.52%, and the accuracy of the non-transfer learning model is more than 80.13%. It shows that the image recognition method of transfer learning plays an obvious role in crop pest recognition.
Data transformation has an obvious effect on improving the Inception V3 model based on transfer learning. The analysis of the loss function and confusion matrix shows that the image recognition accuracy of the model based on data transformation is 98.9%, which is higher than that of the model without data transformation (92.5%), which effectively alleviates the problem of limited sample size of agricultural pest training, and provides a reference for similar research.
In this experiment, error identification occurs when faced with smaller pests. In addition, the pest images are mostly single objects with a simple background. In the actual production photo recognition application, there will be complex background and multiple object interference. Finally, after identifying the results, the knowledge base should be assisted to provide biological classification, occurrence law and green prevention and control methods of pests. Therefore, it is necessary to further work to provide technical support for the practical application of agricultural pest control.
Footnotes
Acknowledgments
The research work was supported by the Youth fund of Beijing Academy of agricultural and Forestry Sciences: Research and application of agricultural graphic question answering robot system based on deep learning (QNJJ201919).Youth talent program: Agricultural information visual Q & A (AVQA) technology Research (qnyc202103). Beijing Science and technology plan project: Beijing Science and Technology Commissioner intelligent response service system and innovation and entrepreneurship service demonstration (Z201100008020011).The correspondence author: luochangshou (luochangshou@ 163.com).
