Abstract
Plant disease is one of the major threats to food security. Accurate diagnosis of plant diseases can benefit the agricultural production. For the purpose of real-time plant disease diagnostics, the deep learning models are employed. In this study, we present an accurate identification method for common diseases of tomatoes based on deep-learning methods. The devising of multi-resolution detector, in line with bounding box generating and assigning, facilitates the feature extracting process of detection. The employment of an dropout and ADAMW (Adaptive moment estimation with decoupled weight decay) optimizer further resolve the overfitting problem. Using the collected images of healthy and diseased tomatoes, our detector is trained to identify 10 different diseases. Experimental results showed that the disease identification method proposed in this study could accurately and rapidly identify common diseases of tomato with an average accuracy of 85.03%and a recognition speed of 61 frames per second, which was superior to other models under the same conditions and was beneficial for tomato disease control work.
Keywords
Introduction
To the best of our knowledge, scientists have never given up the opportunities to promote agricultural production. Indeed, food supply is the most basic requirement in daily life and relates directly to the human well-being. More recently, the demand for food is largely affected by factors such as population growth, income levels, urbanization, lifestyles, and preferences [38]. To this end, advances in agricultural technology aim to significantly improve the production capacity of plants [24, 25]. Current innovations, such as Precision Agriculture or Prescriptive Planting, identify their achievements as the grand progresses of our time [8].
Despite advanced laboratory seeding techniques, plant protection is most pronounced during production. Within the whole plant cultivation period, plant diseases are major sources of food insecurity which account for annual yield losses of up to 16%globally [1]. Typically, the plant diseases involve complex interactions among the host plant, the virus and its vector [4]. Once infected, one plant develops several symptoms that, if spread, can cause a vital impact on the entire crop [11, 12]. Research on plant disease is ongoing and mainly focuses on biological characteristics [32]. Disease can result from various causes such as humidity, temperature, precipitation and even plagues. These factors are difficult fundamentally to addressinstead of the excessive use of pesticides in such circumstances [43], an earlier determination of disease is nowadays highlighted to be a promising approach with profound effects [5].
Plant disease diagnosis is one such field, with recent publications revealing the potential of optical observations of the symptoms on the plants [28]. Indeed, considering the large number of cultivated plants and their existing Phyto pathological problems, even experienced agronomists or plant pathologists may fail to diagnose the exact diseases, and consequently mistaken conclusions may be drawn [10]. For this reason, an accurate and fast detection of diseases in plants will facilitate the establishing of early treatment [29]. As such, the existing computer vision techniques provide agronomists with great opportunities to carry out diagnoses through optical observations of infected plants [12].
Within the evolution of computer vision, the deep learning networks are currently the cutting-edge methodologies for producing remarkable results in image recognition, which are distinguished as highly accurate and rapidly responsive [46]. Specifically, as a non-destructive method, the deep learning networks have already shown their potential applications in agriculture [34]. Zhao develops a novel approach for 3 common kinds of weed identification in wheat fields [33]. Tan et al. construct a deep learning model for infected fruit classification by using a self-adaptive momentum rule to update parameters [42]. Besides, Mohanty et al. tend to apply a CNN (convolutional neural network)-based classifier to identify 26 diseases for 14 crop species [29].
Current research pays more attention to plant disease classification instead of detection based on distinctive features provided by a clear image of the exact object. For identifying the disease from practical images, the classifiers will fail to recognize the various objects at an early stage. Some researchers use the classifier as a secondary tool to complement the current expert identification. The deep learning models, however, have the potential to deal with the images that is distinctly human [3]. Motivated by advances in deep learning, we take a further step by proposing a deep learning based method for field plant disease detection.
The tomato is one of the most common plants in daily life and is threatened by different types of diseases [47]. Tomato diseases will cause losses to its yield and quality. With the increase of tomato planting area and the diversification of cultivation forms, tomato diseases are becoming more and more serious. Tomato diseases mainly include viruses, fungi, bacteria and nematodes. There are more than 30 kinds of diseases, and the degree of diseases in different regions is not the same. The common diseases in production mainly include bacterial spot, septoria leaf spot, early blight, late blight, tomato mosaic virus, canker, leaf mold, gray mold, nail head spot. Therefore, this study selected these common diseases for research.
In this study, we set up an end-to-end detection model on the basis of convolutional neural network, which is named as a multi-resolution detector for object detection. Specifically, the detection of small objects using the proposed model is studied, which is of great significance for early-time plant disease diagnosis. At this point we primarily limit our research to the detection of early stage of tomato diseases. The objective of this study is to establish a reliable and practical resolution for real-time tomato plant disease detection in real cultivation conditions via deep learning networks.
This study is organized as follows: a basic theory of object detection process based on deep learning model is introduced in Section 2; the methods and materials of a proposed disease diagnosis model are presented in Section 3; Section 4 illustrates the experimental results together with the analysis to verify the working performance. Concluding remarks of this work are given in Section 5.
Related work
Anomaly identification in plants
Historically, plant disease diagnosis has been performed by farmers relying either on their own experience or on research laboratory services. In the laboratories, samples of plant tissue, water and soil are studied by plant pathologists using specialized instruments: an approach taking several days or weeks to get a conclusion. Nevertheless, in most cases, computer vision dramatically outperforms human eyes due to its computational and storage power. For this reason, researchers have started to use RGB data for disease detection. This mainly depends on RGB data analyzed by the smartphone cameras or apps. In 2014, Neumann et al. take this method to identify the disease of sugar beet [31]. In addition, infrared thermal imaging and spectral techniques are also applied to downy mildew detection [18, 19]. Specifically, the utilization of infrared thermography is usually integrated with fluorescence spectroscopy to get a more reliable outcome.
With the flourishing of machine learning methods, plant disease diagnosis takes a turn towards predictive models based on sensing data fusion. Unlike the deep learning models, traditional machine learning classifiers are handcrafted feature-based methods which still relate to human knowledge for setting their complex parameters [45]. Despite the computational cost and time consumption, machine learning models nevertheless achieve reasonable outcomes in plant disease classification. For instance, an SVM (support vector machine) classifier is trained to get a 95%accuracy for potato disease categorization [22].
Encouragingly, deep learning, as a branch of machine learning, paves a way to a more accurate and simpler diagnostics. The deep-learning methods are widespread in the field of plant disease classification with good working performances achieved. Barbedo summarizes the outcomes of CNN-based classifiers from the year of 2015 to 2018 and illustrates that the highest accuracy reaches 99%[2]. In line with the developing of image recognition, the disease classification can be regarded as the primary stage of disease diagnostics which is based on photographing a specific object with distinctive features. Whereas, the classification model is always challenged by the application scenarios. When considering an image of multiple objects within a complex scene, the classifiers will have difficulties in picking the targets as well as removing unrelated parts. This may result in low classification accuracy.
Object detecting models
Among deep-learning based methods, object detection has received a great deal of attention [39]. In this approach, instead of merely extracting the features of the given samples, a variety of objects in the image are recognized together with their locations identified. The object detection techniques aim to establish a both creative and practical algorithm for obtaining an even higher accuracy. There are basically two kinds of models to address the issue of object detection which are region-based detectors and single-shot detectors. The former contains R-CNN [14], Fast R-CNN and Faster R-CNN [15] while the latter contains YOLO (You Look Only Once) [35] and SSD (Single Shot Multibox Detector) [25]. Specifically, region-based detectors are of complex structures that result in low accuracy and poor real-time performance. More recently, the single shot detectors, targeting at developing end-to-end detection mechanism are designed to mitigate the deficiencies.
Until now, one of the most powerful detection methods is SSD on the basic of VGG16 base network. It generates scores for the presence of each object category in each bounding box and produces adjustments to the box to better match the object shape. By combining predictions from multiple feature maps with different resolutions, this network is capable of handling objects of various sizes. Besides, SSD aims to save the computational time due to the integration of all processes into a single network. According to current research outcomes, YOLO is usually taken as a comparable alternative to SSD, which uses the similar working principle [48]. As such, the end-to-end detectors seem to be promising in image detection with the specific purpose of plant disease diagnosis.
Plant disease detection process
A robust and generalizable detection system will be likely to use a combination of sensing elements and processing methods in order to model the diseases in various plant types [6]. The detection for plant disease using deep learning models can be generally summarized as follows: image acquisition; preprocessing; feature extraction; and object recognition (Fig. 1). For the purpose of plant disease diagnosis, the visual sensing devices such as digital cameras are set beforehand to capture the plant images. In this way, both the images of healthy and infected plants are collected for further processing. The preprocessing step contains, for example, image normalization, segmentation, enhancement, and filtering. These steps facilitate the following procedures. After the images are pre-processed to enhance its quality, the features are extracted using suitable schemes. At this stage, features like color, shape or texture are computed from the image data. Lastly, the image recognition algorithm is carried out using deep learning networks. The detection accuracy varies according to the distinguishing models. In some conditions, the models need to be revised for specific detection demands in order to get an even higher accuracy.

Plant disease detection process.
Structure of proposed model
To deal with the issues as described, we now present a deep convolutional neural network-based approach specifically for tomato disease diagnostics as based on an end-to-end detection principle. The proposed deep learning model architecture as well as its parameters is shown in Fig. 2 and Table 1.

Structure of proposed model.
Related parameter of proposed model
Inspired by the structure of SDD, our model is based on a standard VGG16 network [40]. The max-pooling algorithm is employed for faster convergence and better generalization performance [17]. In line with the base network, six convolutional layers with different size are deployed for feature extraction. These are multi-resolution detection layers. Since objects are to be delineated before their identification, large numbers of bounding boxes are assigned to every single layer [44]. Specifically, we initially generate the bounding boxes via a 3 × 3 convolution, together with 4 offsets to the ground-truth box. Following the detection layers, a softmax classifier and a linear regressor are employed. The former predicts the category of the bounding boxes while the latter computes the offset between the bounding box and the corresponding ground-truth box. The final decisions are output from the Non-Maximum Suppression method. The SGD (stochastic gradient descent) algorithm is applied to our detector and minimize the loss function while training [21, 30]. The parameters of the model can therefore be fine-tuned to improve their detection accuracy and convergence speed.
Note that the deep learning methods are prone to overfitting, proper steps have to be taken. In our detector, two dropout layers are added together with the ADAMW(Adaptive moment estimation with decoupled weight decay) optimizer implemented. Specifically, dropout is used following the third and the fifthlayers of Fig. 2. As such, the neurons in each layer contribute to neither the forward nor the back propagation. As long as a new image is sent to the model, the layer samples a different structure while the whole structure shares the weights [41]. In this way, the overfitting can be significantly reduced. On the other hand, ADAMW is taken to prevent the overfitting, which is currently the smartest adaptive gradient method for convergence [27]. By employing the decoupled weight decay, a better generalization performance can be therefore obtained.
Basically, a Bounding Box is the smallest box that encloses an object. As for devising the SSD, the bounding boxes (also called default boxes in SSD) for object detection are assigned to feature extraction layers beforehand. For every detection layer, the bounding boxes are set as prior boxes for feature extracting in each detection cell. After computation, the ground-truth boxes are obtained, which characterize the object in the image. However, for convolutional layers of different resolution, the number of bounding boxes is kept the same. As a result, for a high-resolution cell, the default boxes are redundant while for a low-resolution one, more boxes are needed to be stacked and handled carefully [49]. Thereby, the determination of bounding boxes is studied to improve the working performance.
Considering the convolution layers of different resolution, we tend to set more bounding boxes to the low-resolution layers and less to the high-resolution ones. In such a manner, the total number of bounding boxes will not increase.
As shown in Fig. 3, more bounding boxes are assigned to the low-resolution layers in case of losing the object to be detected. In contrast, the target object within a high-resolution layer can be effectively detected despite the bounding boxes reduction.

Bounding boxes for detection layers of different resolution.
At this stage, we take the images of plant disease from the datasets to formulate the bounding boxes precisely. The algorithm of K-means++clustering is employed for distinguishing the number of the bounding boxes.
Generally, the basic K-means++clustering uses the Euclidean distance to describe the relation between the cluster center and every single element. However, since the target objects varies in size, the Euclidean distance cannot reflect the object location accurately. In this way, the computation of distance is facilitated by using the following formula:
Where box is the target and centroid is the cluster center. The function IOU stands for the intersection over union whose output is the overlapping rate of the target object to the corresponding cluster. In line with the overlapping rate and the converge speed, the value k = 6 within the clustering algorithm is selected (Fig. 4).

Bounding box number selecting.
In this way, 6 bounding boxes are assigned to every detection cell in the image. Note that the target object in one image is so limit that we have to restrict the number of bounding boxes for computational cost reduction.
Considering the step length of bounding boxes determination, the resolution of each layer is regulated to detect more target objects. As such, the bounding boxes in the multi-feature layers, together with those of SSD512 and YOLO v3, are exhibited in Table 2.
Bounding box generating
Dataset collection
For the purpose of plant disease diagnosis, the detection dataset must be prepared in advance. Current research pays more attention to disease classification instead of detection. For this reason, the detection samples are less than 10%of the classification samples, which are taken as alternatives for model training and testing in most cases.
Targeting at establishing our detector, a special detection dataset containing both healthy and infected tomatoes was built. The experimental data were obtained by a combination of Web reptile acquisition and greenhouse photographs, and images of many different disease types and different angles were collected from different tomato growing bases. The location of image acquisition was the tomato test base of our university, and the capture equipment include different devices such as smartphones and digital cameras. The images with good light exposure and clear leaves were selected as the training set and test set image libraries.
Preprocessing
Disease images may have large differences due to the different collection environment, equipment, etc., and to reduce the interference of each type of factor on disease detection, the images need to be pretreated. In order to prevent overfitting situations and improve the generalization ability of the algorithm, the network model needs a large amount of data.
Generally, the size of the images are constrained under 512 × 512 to save the computational complexity. On the other hand, for the images of small size, the accuracy of the model will drop. For this reason, all the image samples are converted to 224 × 224 pixels to facilitate further processing.
The way of data preprocessing used in this study mainly included adjustment of brightness and contrast, addition of noise, Gaussian blurring, Gamma correction, PCA coloraugmentation. After preprocessing, the number of datasets was expanded, reducing the risk of overfitting. The dataset is named as ImagePlant Loc. The disease categories as well as the number of annotated samples for this experiment are presented in Table 3.
Information of tomato disease dataset
Information of tomato disease dataset
The clustering algorithm is employed for image segmentation. For making the detecting samples, the K-means++clustering is applied to outline the object. For a specific object represented by I ={ c
x
, c
y
, h, w, class } where (c
x
, c
y
) is the coordinate of the ground truth box center obtained by K-means++clustering, h and w are the width and height of the ground truth box while class stands for the object category. Hereafter, the left boundary and the bottom boundary are given as L and B. According to the segmentation procedure, for an image of m × n pixels, the aforementioned boundaries are written as L ={ l1, l2, ⋯ l
m
} and B ={ b1, b2, ⋯ b
n
}. Suppose that the detection object o
i
is one of the objects from the image whose pixels can be represented by the coordinate set of
Similarly, the dimension of the bounding box assigned to this object are delivered by the furthest points to the image border, which are (x1, y1), (x2, y), (x, y3) and (x4, y4), as shown in Fig. 5. Therefore, there is the following equations:
(a) Original image (b) Bounding box defining (c) Object recognition.
Where F
distance
is the function of computing the distance of vertex. Further, the geometric size of the ground-truth box is denoted by
together with the coordinate
The specific class of the object is identified using the deep residual framework ResNet [20]. As long as the confidence value of the bounding box is over 0.85, the output of the ground truth box is defined as I ={ c x , c y , h, w, class }.
An instance of the two-target detection in an image is shown in Fig. 5. Clearly, the detection samples for detector training and testing can therefore be constructed.
Experiments are carried out to evaluate the working performance of the proposed model. By labeling the tomato diseases in advance, the specially-made tomato disease dataset ImagePlant Loc is employed. Notably, the infection samples of early stages are included in this dataset, which can benefit the diagnosis in practical terms. In this way, various types of tomato images are stored in the dataset for model training and testing.
Concentrating on verifying the accuracy of our detector, the dataset is divided into three parts: 70%training set, 20%validation set and 10%testing set. The training procedure is performed on the training set, followed by the evaluation on the validation set. As long as expected results are obtained, the testing data is sent to the detector for plant disease diagnosis.
Algorithms are implemented on a Xeon-E7-8800 v3 CPU with 2 GTX1080Ti GPU of a Dell R740 server in the Linux environment Ubuntu16.4 64 bit. The model is constructed on the Tensor-flow deep leaning structure with the procedures monitored by Tensor-board. The learning rate is set as 0.001 initially. A small learning rate leads to more precise results. The probability of dropout is given as p1 = 0.5 and p2 = 0.7 to third and fifth layer to randomly drop units from the network during training. The learning rates in the optimization algorithm ADAMW are given the value of β1 = 0.9, β2 = 0.999, ɛ = 1 ×10-8.
Evaluating indicator
In this research, the precision/recall curve was used for the testing tasks evaluation. Clearly, recall is the proportion of all positive samples ranked above a given rank while precision is the proportion of all samples above that rank from the positive class [9]. For a specific dataset, the Average Precision (AP) is calculated by 11-points interpolated average precision, which is the average of the maximum precision over a set of spaced recall levels [0, 0.1, ⋯ , 1] [5]. For the current dataset, we employ the mean average precision (mAP) outcome for representing the AP computed over all the detection categories, which is
in line with
The multi-resolution detector, together with the single shot detectors, SSD512 and YOLO v3 and the region-based detectors, Fast R-CNN and Faster R-CNN, are carried out for training and validation. Following this the revised models are performed on the testing dataset. In this experiment, the iterations for the training process are fixed at 150k. Detecting accuracy of different deep learning models is shown in Table 4.
Comparison of different methods
Comparison of different methods
Based on the testing outcomes, there appears to be a considerable gap between the outcomes of region-based detectors and single-shot detectors. This suggests that the region bounding strategy has difficulties in capturing the object and recognizing the category. This suggests that the bounding boxes assignment in detecting layers is of great significance for feature mapping.
In comparison with current best practice, our detector has an even higher mAP than that of SSD (85.03%VS. 82.13%). In most cases, the proposed model has a higher detection accuracy than SSD. There is a slight drop for the detection of late blight disease. With the application of different bounding boxes to the multi-resolution detection layers, a better IOU outcome can be obtained. In this way, our detector is more capable of detecting small objects in the image.
The current datasets into three different categories, which are small, medium and large sample collections for model evaluation. In particular, the detection of small objects seems helpful to plant disease diagnosis at an early stage, based on which specific steps can be taken immediately for plant protection.
According to Fig. 6, there is a considerable gap in accuracy between the multi-resolution detector and SSD. The proposed model appears to show a good working performance on small object recognition resulting from the smaller bounding boxes assigned in the high-resolution detection layers.

Comparison of different resolutions.
Being a more efficient way of bounding box generation, both the detection accuracy and the convergence speeds can be improved. It can be noted that the convergence process of our detector is more stable instead of oscillating constantly.
Seeing that quantities of bounding boxes are generated during processing, the non-maximum suppression algorithm is taken to filter most of the bounding boxes with a 0.05 confidence threshold. For the current database with 10 different categories of images, the run time spent on the 6 multi-resolution detection layers is nearly 2 ms. Because of the bounding boxes reduction, the multi-resolution detector achieves a real-time detection speed of 61 fps (frames per second) with an accuracy of 82%(Table 5).
Outcome of inference time
Conclusions
In this work, a multi-resolution detector based on a convolutional neural network for real-time tomato disease diagnostics is presented. In comparison to the application of disease classification methods, the proposed method focus on field plant disease detection via an end-to-end learning model. Experimental results indicate that the multi-resolution detector outperforms the state-of-the-art methods in terms of detection accuracy.
Firstly, the multi-resolution detecting layers are constructed for feature extraction while a set of bounding boxes are generated and assigned to each layer. The number of bounding boxes necessary for object outlining is studied and optimized. The bounding boxes generated within different layers are designed to improve working performances.
Secondly, a detection dataset of tomato disease diagnosis for model training and testing is constructed. By collecting images via the Internet and by photography, the detection samples are assembled based on K-means++clustering.
This study also presents a detailed study of the experiments. 10 different categories of tomato diseases are detected, and the outcomes are compared to those of region-based detectors and single shot detectors. Our detector exceeds the performance of the baseline methods on the tasks of disease identification. The application of dropout layer and ADAMW optimizer are employed in order to resolve the problems of overfitting in deep learning models during training. Furthermore, the proposed model outperforms SSD specifically in small object detection. This outperformance permits the early diagnosis of tomato diseases.
This study offers an opportunity to the greenhouse disease diagnosis of the infection tomatoes at an early stage instead of taking the samples to laboratory testing. Within the detection, the determination of the parameters as well as the implementation details, even if non-optimal at present, still allows the detector to identify the diseases in natural images.
Future work
Further work should address the application of our detector on more plants in order to contribute the agricultural research. There is also a need for computer technologists and plant protection specialists to collaborate with each other and accelerate the process of intelligent identification of crop disease images.
Authors’ contributions
DG and JL designed research and developed the detection dataset. DG and XW conducted experiments, data analysis and wrote the manuscript. DG revised the manuscript. All the authors read and approved the manuscript.
Funding
This study is supported by the Facility Horticulture Laboratory of Universities in Shandong (2019YY003), Key research and development plan of Shandong Province (2020RKA07036 and 2019GNC106034), Shandong social science planning project (21CPYJ20), Natural Science Foundation of Shandong Province (ZR202102200124).
