Abstract
Horticulture crops take a crucial part of the Indian economy by creating employment, supplying raw materials to different food processing industries. Mangoes are one of the major crops in horticulture. General Infections in Mango trees are common by various climatic and fungal infections, which became a cause for reducing the quality and quantity of the mangos. The most common diseases with bacterial infection are anthracnose and Powdery Mildew. In recent years, it has been perceived that different variants of deep learning architectures are proposed for detecting and classifying the problems in the agricultural domain. The Convolutional Neural Network (CNN) based architectures have performed amazingly well for disease detection in plants but at the same time lacks rotational or spatial invariance. A relatively new neural organization called Capsule Network (CapsNet) addresses these limitations of CNN architectures. Hence, in this work, a variant of CapsNet called Multilevel CapsNet is introduced to characterize the mango leaves tainted by the anthracnose and powdery mildew diseases. The proposed architecture of this work is validated on a dataset of mango leaves collected in the natural environment. The dataset comprises both healthy and contaminated leaf pictures. The test results approved the undeniable level of exactness of the proposed framework for the characterization of mango leaf diseases with an accuracy of 98.5%. The outcomes conceive the higher-order precision of the proposed Multi-level CapsNet model when contrasted with the other classification algorithms such as Support Vector Machine (SVM) and CNNs.
Keywords
Introduction
In India, more than two-thirds of the working populace is involved directly in farming. As indicated by a report, around 58 percent of the working populace is occupied with agriculture in contrast with that of 2 to 3 percent in the USA and UK, 6 percent in France, and 7 percent in Australia [1]. Agriculture is the main significant wellspring of food source as it is giving a supply of food to such a colossal size of the populace of the world. It has been assessed that products that meet around 60% of household utilization came from agriculture. The growth of the economy in India relies much on the sector of agriculture. A good harvest consistently gives force towards the planned economic improvement of the nation by establishing a unique business atmosphere for the manufacturing industries. A good harvest likewise carries a good measure of finance to the Government for meeting its strategic expenditure. Likewise, lousy yield leads to total down in the nation’s business, which at last leads to a failure in economic standard. Hence the agriculture area plays a significant function in a country like India, and the success of the Indian economy still generally relies upon the agricultural area. Mango Farming is India’s main natural fruit crop cultivating and is viewed as the king of fruits. Other than the delectable taste, incredible flavor, and appealing aroma, it is plentiful in vitamin A & C. The tree is solid and requires similarly low support costs. Mango possesses 22% of the all-out under-natural products, including 1.2 million hectares, with an all-out creation of 11 million tons. Uttar Pradesh, Andhra Pradesh, and Telangana have the enormous territory under mango, each with around 25% of the region, followed by Bihar, Karnataka, Kerala, and Tamil Nadu [2].
In the agriculture domain, the quality and quantity of the crop depend on how healthy the plant grows. But various diseases which occur to the plants affect the plant healthiness. Consequently, the quality and quantity of the crop yield reduce [3]. A decrease in both viewpoints can straightforwardly influence the nationwide production of the harvest [4]. The primary reason for these diseases is an absence of consistent observation of the plants. In certain conditions, farmers can’t focus on the plant observation or face trouble in recognizing the infections, which reduces crop yield. In another case, the new people don’t know about the plant infections and the general period of their occurrences. But generally, infections can happen to the plant at any time. In any case, constant observation of the plants may forestall the infections. Identifying the infection type in the plant is one of the significant exploration subjects in the agriculture research domain. In this work, the diseases in mango plants are considered for classification. Mainly occurred infections on mango plants are anthracnose, powdery mildew, bacterial canker, and sooty mold [5]. Table 2 shows some of the diseases in mango plants.
Commonly occurring mango tree disease
Commonly occurring mango tree disease
Commonly occurring mango tree disease
Each plant infection has various phases of development. Each infection has an alternate solution for work out. For instance, upsetting the existing pattern of the microbe can forestall infections caused by fungi [6]. The general approach of identifying the diseases in the plants is manual, which implies farmers rely upon the manuals or utilize their encounters to recognize the type of infection. When the infection happens on a plant, farmers need to keep their eyes on the contamination. This methodology of locating the infections is tedious and requires some safeguard during the determination of pesticides.
An automated system for detecting the diseases in the mango plants gives reasonable assistance to horticulturists who will do the task generally with a manual observation of infected leaves in the plants [7–9]. The automated system is more beneficial to the farmers if it can be easily accessible because most of the locations in which farmers live lack the appropriate infrastructure to take the suggestions of the agriculturist. Automated plant disease detection systems can be integrated with other computerized systems for diagnosis throughout the cultivation process to locate and solve the problem timely and accurately. All these are legitimate if the framework achieves elite in distinguishing explicit diseases, in actual conditions, i.e., in the farming field, and working in a simple and easy-to-use way.
In recent years due to the advancements in graphical processing units and embedded processors, Artificial Intelligence applications have attained exponential growth, prominent to the advances of new procedures and models, which form an innovative category, Deep Learning [10]. Deep learning uses artificial neural networks with more processing layers, in contrast to traditional neural networks. Deep learning models have developed in various sectors like image recognition [11, 12], voice recognition [13], health sector [14], and the analysis of massive volumes of data [15], giving a considerable boost to applications self-driving vehicles, machine translation, and interpretation, etc. Introducing these deep learning techniques into agriculture [16], and specifically in the field of plant disease diagnosis [9], has taken place in the last couple of years.
In the agriculture domain, deep learning is in various sub-domains like water management, livestock management, crop management, and soil management. The applications of machine learning/deep learning in the crop management section were divided into sub-categories including yield prediction [17, 18], disease detection [19, 20], weed detection [21, 22], crop quality [23, 24], and species recognition [25]. Machine learning applications in livestock management were divided into two sub-categories; animal welfare [26, 27] and livestock production [28, 29]. In this paper, capsule networks are implemented for detecting diseases in mango plants. The residual paper is planned as follows: section two explores the state of the literature in the domains of deep learning and agriculture. Section three demonstrated the proposed implementation of capsule networks on mango leaf images. Section four covers the experimental results and discussion on those results by comparing them with existing methods. Section five covers the conclusion and future scope of the work.
Detecting the diseases and managing the plants is a challenging task that requires regular monitoring all over the crop cycle. It requires a significant burden of cost also. Early detection of the diseases plays a crucial role in reducing the cost, reducing the chemicals used and environmental impacts, and reducing yield loss. Furthermore, deep learning techniques, mainly CNN, have presented good performance in processing the leaf images and then disease classification [17–19]. To date, the application of deep learning methods in agriculture is not much [18]. Most of the existing works concentrated on laboratory scenarios with simple background conditions.
Singh et al. [21] furnish an orderly layout with a logical categorization of machine learning techniques to enable the plant organization and apply the good machine learning systems and best-practice approaches for various biotic and abiotic stress attributes. E. Hossain et al. [22] enlighten different sorts of plant diseases, various advanced image processing, and machine learning procedures to recognize diseases in the plants. The outline similarly gives huge assessment openings that will help in additional investigation towards realizing precision agriculture. The methodologies projected in [23] use machine learning and data visualization strategies to organize the remote rural area land on the terrain dataset made out of the ASTER imaging instrument to comprehend the cumulated data by using Box Plot and Heat Map.
P. Sharma et al. [24] pointed toward tweaking and evaluating forefront profound CNN for picture-based detection of plant diseases. Arora, and Agrawal [25] contemplated the phases in the framework of plant disease detection system and examination on machine learning strategies for plant disease detection. Zhang et al. [26] proposed a strategy for plant disease identification and portrayal using the K-Nearest Neighbour classifier. An improved plant optimization estimation using machine learning has been introduced in [27] that recognizes the diseases in the plants on a dataset of 236 pictures. In [28], the authors presented an artificial intelligence-based customized plant disease detection system that suggests anticipated answers for fixing that disease. Jain and Chatterjee [29] introduced the affirmation and arrangement of maize plant leaf ailments using a Deep Forest technique. In [30], a global pooling dilated CNN (GPDCNN) is proposed for perceiving diseases in plants. In [31], authors concentrated on the most recent progression concerning machine learning for extensive information consistent and different procedures concerning current processing conditions for various local-area applications.
In [32], the authors introduced Few-Shot Learning (FSL) approach for categorizing the plant leaf using deep learning with small datasets. Authors of [33] present a combination of techniques to address, improve and empower multi-disciplinary and multi-institutional machine learning research in clinical care informatics. In [34], the authors investigated the reasonableness and plausibility of presymptomatic identification of tobacco disease using hyperspectral imaging, together with the variable decision procedure and machine learning. The authors of [35] presented a model for malady detection in plant leaves using deep convolutional neural networks. In [36], researchers proposed the strategies to apply machine learning inside any affiliation and survey the feasibility, sensibility, and viability of machine learning applications. Wang et al. [37] introduce a lightweight CNN technique for examining grape infections, including dim rot, dull measles, and leaf scourge. In [38], the authors proposed an approach based on CNN with eight concealed layers for disease identification in tomato plants.
Junde Chen et al. [39] introduced a methodology for plant leaf disease detection and classification. The authors have used image processing techniques and performed feature engineering analysis, and an index system is developed for the prediction models. The authors use the bilinear interpolation method in the grayscale transformation of images. Then they applied image denoising, with the technique of GIWA followed by Gaussian filtering (GIWA GF). Gaussian filtering is applied after GIWA filtering is to remove outlier points without disturbing the edge details of the leaf diseases. After this preprocessing, the leaf images are segmented to separate the disease areas from the background and form a binary image for the subsequent feature extraction and computing. Then, feature extraction has done to extract the features related to texture, color, and disease area & pixel using the Group Method of Data Handling (GMDH) method. The selected features are fed to the GMDH-Logistic, SVM, principal component analysis with SVM, CNN, Generative adversarial networks to classify the leaf disease. They attained the highest accuracy of 86% with the GMDH-Logistic model when compared to the other models.
S. Hernández and Juan L. López [40] presented an approach based on probabilistic programming to detect plant diseases using Bayesian deep learning techniques. They have used uncertainty as a misclassification measurement. The authors pre-trained the network with a large-scale dataset, Imagenet, which has 1.2 million images of 1000 categories. At last, they used fine-tuning to train the softmax layer to acquire the 38 classes of the Plant Village dataset. The article [41] provides different deep learning architectures and the performance of those networks obtained with the plant village dataset. The authors compared the various pre-trained networks VGG16, ResNet-50, ResNet-101, ResNet-152, Inception V4, and DenseNet. Different regularization techniques have been used to avoid overfitting at the final layers of the networks. Adding L2-normalization and performing dropout sampling has given efficient results when training fully connected networks such as VGG16 [42].
Patrick Wspanialy, Medhat Moussa [43] introduced a new system using computer vision to automatically recognize diseases and estimate per-leaf severity in tomato plant leaves. The proposed disease detection model detects instances of disease by extracting the generic features. They used the architecture of Res-Net network. The experiments prove that the network is cable can detect new cases of diseases that are not seen earlier. After detecting the disease, its severity has been estimated. For estimating the severity, the authors have used a modified U-Net architecture. The introduced model for severity estimation using a measure of the proportional area is suitable only for fungal and bacterial diseases and not ideal for diseases like those caused by viruses and insects.
R. Sujatha et al. [44] have compared machine learning and deep learning models for disease detection in citrus plants. They implemented Support Vector Machine (SVM), Random Forest (RF), and Stochastic Gradient Descent (SGD) as part of machine learning algorithms and Inception-v3, VGG-16, VGG-19 as part of deep learning networks. The authors have carried this work on the Orange Data Mining tool with 10-fold cross-validation stratified classification. For comparison accuracy measure taken and they got the accuracy as 76.8%, 86.5%, 87%, 87.4%, 89% and 89.5% for RF, SGD, SVM, VGG-19, Inception-v3 and VGG-16 respectively. Some of the literature papers are summarized in Table 3.
Summary characteristics of the literature studied
Summary characteristics of the literature studied
This section describes the proposed CapsNet architecture and details description of the layers in the architecture.
Proposed CapsNet architecture
Figure 1 portrays the multi-level CapsNet architecture to identify and classify diseases in mango trees by considering mango leaf images as input. The operations convoluted in the implementation of the reformed CapsNet architecture are as follows: Convolution layer for feature extraction from the input image Primary capsule layer to process the extracted features Class capsule layer on which dynamic routing has been performed to process the features for classification SoftMax layer to convert the class capsule layer results in probabilities corresponding to each disease in mango trees

Proposed architecture of the multi-level CapsNet for classifying mango leaf diseases.
The preprocessed mango leaf images are considered as input images. The input images are of size 512 × 512.
Convolutional layers are the ones in which kernels can be applied to the input picture or to the result of another convolutional layer. A convolution operation in a convolution layer is a direct activity that includes the multiplication of weights with the given input. Given that the strategy was intended for two-dimensional information, the multiplication can be done between an input data of 2D array and a 2D array of weights, called as kernel or filter. The size of the kernel should be very small in size when compared to the input picture size. The kernel will slide across the width and height of the input volume. For each spatial position, a dot product is applied between the input volume and kernel. The dot product calculation is performed as element-wise multiplication among the kernel-sized part of the input image and the kernel weights. The element-wise multiplications are then added to form a single value. Since it brings about a single value, the activity is regularly alluded to as the "scalar product." Because the kernel is smaller than the input volume size, the same kernel is moved multiple times along the input volume size. Particularly, the kernel is applied deliberately to each kernel-sized part of the input volume, left to right, and then top to bottom.
The deliberate use of the same kernel across a picture is an influential thought. Suppose the kernel is intended to identify a particular kind of feature in the given input. In that case, the use of that kernel across the whole input picture permits the kernel a chance to find that feature that exists any place in the picture. This capacity is usually alluded to as translational invariance, i.e., it is interesting whether the feature is present or not instead of where the feature is present in the given input. The result of applying the kernel multiple times is a 2-D array of values that conveys the filtered input. This 2-D array of filtered input is denoted as "feature map," i.e., the result of convolution operation in a convolutional layer is a feature map. Then the feature map is given as input to a non-linear activation function such as ReLU, Tanh, and Sigmoid to check whether the required feature is present at a specified location of the input image. But to identify the type of picture in the given input image, single feature vector identification is not sufficient. We need to extract more feature vectors. Hence, to extract the different features in the given input image, several kernels are applied to the input image. Each kernel provides a separate feature map of the input image. The network can be designed with many convolutional filters in sequence by adding more layers and generating more feature maps. The feature maps, which are created by deeper layers are more and more abstract features to recognize the objects in the given image. The convolution layer has 4 important hyper parameters to decide on: Kernel size: The generally used kernel size is 3×3, 5×5 or 7×7 are likewise utilized, relying upon the input image and application. There exist 1×1 kernels for specific applications. The kernel dimensions are depending on the type of the input image. If the input image is black/white image, then 2D kernels are sufficient. If the input image is RGB image, then the used kernels are also 3D because the depth of a kernel at a particular layer is equivalent to the depth of its input. Kernel count: The number of kernels in the network is purely dynamic, represented as a power of two in the range 32 to 1024. As the number of filters increases, the set of extracted features also increases. But on the other side, as the kernels increase, the number of parameters also increases, leading to overfitting. In general, the number of filters will be small in the starting layers, and the number increases as the number of layers goes deeper. Stride: Stride indicates the movement of the kernel on the input image. The default value of 1 indicates that moving only one cell towards the right and bottom is generally used. Padding: The size of the image reduces by applying the convolution operation. So there should be a limit on the number of times convolution operation can be used on the input image. To avoid this, after every convolution operation, the result is again padded to the same size.
In this work, kernel sizes 3×3. 5×5 and 7×7, kernel count 256, a stride of 1 in a convolutional layer, a stride of 2 in the primary capsule layer, and padding as the same is used.
As the number of convolutional layers increases, the parameters or weights used in the layer also gets increased. For reducing the number of parameters used in the convolution layer, Parameter sharing is used. The concept of parameter sharing is, all neurons will share weights in a specific feature map. The weight sharing decreases the number of parameters to be maintained to maintain the efficiency of expression, efficiency of learning, and good generalization.
Figure 2 shows the design of the convolution layer used in the proposed multi-level CapsNet architecture. The preprocessed input image of size 512×512 is considered as input to the convolution layer. The convolution layer is designed with four internal layers of convolution called Conv-1, Conv-2, Conv-3, and Conv-4. In the first two layers, conv-1 and conv-2 layers, 256 kernels are used with each kernel of size 3×3 and stride as 1. Conv-3 layer is designed with 256 kernels of size 5×5, and conv-4 layer is designed with 256 kernel with each kernel of size 9×9 for convolving the features.

Convolution layer of the multi-level CapsNet architecture.
For the first layer (Conv-1), the input is 512 × 512, and a kernel of size 3×3 with stride one is applied. The resulting feature map is of size [(512-3)/1+1], i.e., [510 × 510, 256], which is considered as input to the next layer (Conv-2). The kernel in Conv-2 is also 3×3. Hence, the resulting feature map of Conv-2 is of size [(510-3)/1+1], i.e., [508×508, 256], which becomes an input to the next internal layer (Conv-3). Due to the kernel of size 5×5 in conv-3, the resulting feature map of Conv-3 is of size [(508-5)/1+1], i.e., [504×504, 256] which becomes the input to the next layer (Conv-4). The size of the kernel used in Conv-4 is 9×9, and 256 kernels are used. The result of Conv-4 of size [(504-9)/1+1], i.e., [496×496, 256] is the final output size of the convolution layer which is considered as input to the next layer of multi-level CapsNet architecture which is primary capsule layer. The number of parameters to be trained in each internal layer’s convolutional layer is shown in Table 4. A total of 32,768 parameters need to be trained in the convolutional layer.
Parameters to be trained in convolutional layer
The subsequent layer to the convolution layer is the primary capsule layer. It consists of three distinct processes: Convolution, Reshape function, and Squash function. The input to the primary capsule layer is fed from the convolution layer. It results in an array of feature maps; For example, consider that the output is an array of 36 feature maps. Then Reshaping function is applied to feature maps. For instance, it is reshaped into two vectors of 16 dimensions each (32=2* 16) for every location in the image. The last process squashing is applied to guarantee that each vector length is at most one only because the size of every vector indicates the probability of either the object is located or not in the given location of the image. Hence, it should be between 1 and 0. For this purpose, Squash function is used in the primary capsule layer. This function ensures that the length of the vector is between 1 and 0 without altering the position information.
The multi-level CapsNet architecture with 16D primary capsules and each primary capsule is produced by a small spatial area of the given input image. The primary capsule layer is designed by forming multi-level hierarchical capsules that control the features from various scales of the image in order to model features available in the input image. We design these as shown in Fig. 4 by forming sixteen 1st level primary capsules with the baseline architecture, i.e., based on the features taken from the convolutional layer. The extracted capsules are further used to produce another set of sixteen 2nd level primary capsules. The two Primary_Caps levels have created two different Class_Caps layers separately. Another Class_Caps is generated by concatenating the 1st and 2nd level Class_Caps in the class capsule layer.

Class Capsule Layer of the Reformed CapsNet Architecture.
Figure 3 depicts the primary capsule layer of the proposed Multi-level CapsNet Architecture. The feature map of size [496×496, 256] is taken as input to the primary capsule layer from the convolution layer. On the input feature map, a convolution is applied with a 5×5 kernel and stride as 2. The resulting feature map size is [(496-5)/2+1] i.e., [246×246, 256]. After this, it consists of 16 primary capsules. The task of primary capsules is to take main features identified by the convolution and generates the different combinations of the identified features. The layer has 16 "primary capsules" that work in the same convolutional layer manner in their characteristics. Each capsule applies 5×5 kernels (with stride 2) to the 246×246 input volume and produces 121×121 output volume. The final output volume size of level 1 primary capsules is 121×121×16 because 16 such capsules are used in level 1 of the proposed architecture. The output of the level 1 primary capsules is fed to the level 2 primary capsules. In level two, similar to level 1, convolution is initially applied with a 5×5 kernel and stride as 2. The obtained result is of size [(121-5)/2+1], i.e., [59×59, 256]. Then it is passed on to 16 capsules each of 16D. As in level1, these 16 capsules also work like convolutions individually. Each capsule applies 5×5 kernels (stride 2) to the 59×59 input volume and produces 28×28 output volume. The final output volume size of level 2 primary capsules is 28×28×16, i.e., each consists of 16 dimensions.

Primary capsule layer of the modified CapsNet architecture.
The class capsule layer in CapsNet is the replacement for the max-pooling layer of CNN with dynamic routing-by-agreement [52]. The class capsule layer of the proposed architecture is shown in Figure 4. The class capsule layer is designed in two levels. For level 1, the output of the primary capsule layer is considered as input, and for level 2, the result of level 1 is considered as input. Squashing and routing by agreement is used in level 1 and level 2 class caps.
The reshaped results of Primary capsule layer level 1 and level 2 are the inputs to the class capsule layer. The first input, which is the output of level 1 primary capsules, is of size 1461×16D. The second input, the output of the level 2 primary capsules, is of size 784×16D. In the class capsule layer, these inputs are concatenated to form another input of size 2245×16D. Class caps are implemented separately for these three, forming three class caps each as level one class caps. The level 1 class caps are then considered again to another level of class caps called level 2 class caps. Between level 1 class caps and level 2 class caps also dynamic routing is implemented. The softmax was taken for concatenated class caps in the experimentation.
The i
th
previous layer (l) capsule output denoted as v
i
is considered as the input for subsequent layer (l+1) capsules. The j
th
capsule of that layer takes the input v
i
, and applies the product with corresponding weight (C
ij
) between the i
th
capsule of layer l and and j
th
capsule of layer (l+1). The resultant is denoted as vj|i which contributes to the i
th
capsule of l layer to the j
th
capsule of the l+1 capsule.
The class capsule layer requires training of more number of parameters. The trainable parameters as C ij is computed as no. of vectors received from the Primary capsule layer multiplied with no. of vectors required as output. i.e., in level 1 we have three inputs. Two are from the primary capsule layer, and the third is the concatenation of the two-level primary caps result.
Hence for level 1:
Hence the total trainable parameters in both level 1 and level 2 are 3,251,180 (3,247,616 + 3584). Parameter training is also required at conversion of scalar values into vector which happens between the capsules, i.e., W
ij
. The trainable parameters here are computed as follows: W
ij
between 16D to 8D capsules + W
ij
between 8D to 4D capsules
The potential of the proposed model is demonstrated with a dataset of leaf images and compared with the predictions of support vector machine and CNN architecture. Implementation of the proposed algorithm is done in python because of the wide availability of the libraries and frameworks for deep learning. To build the deep learning architectures, Keras and TensorFlow are used in the backend. Experiments were done on Lambda full stack with Intel core i9-9820X and 64 GB of Memory.
Dataset
To validate the proposed multi-level capsule network based model, the dataset is created by collecting the mango leaf images from real environment. The leaf images are collected from different trees related to two diseases named anthracnose and powdery mildew. A total of 900 images are collected for experimentation. Among 900 pictures, 300 images were collected for every disease, and 300 images were taken for healthy mango leaves. Based on the category, these images are labeled to their respective classes. The details of leaf images in the collected dataset are given in Table 5. The sample leaf images of three types are shown in Figure 5.
Distribution of leaf images in the dataset
Distribution of leaf images in the dataset

Sample images in the Collected Dataset. (a) Healthy leaf (b) Anthracnose infected leaf (c) Powdery Mildew infected leaf.
For experimentation, we used a 5-fold cross-validation method. The dataset is randomly divided into five parts. The experimentation has been done five times. Every time one part is considered as testing by considering the remaining four parts as the training set. The results of the five experiments are accumulated for comparison of the results.
When constructing a classification model, estimating how precisely it predicts the correct result is significant. But, this estimation alone is not sufficient as it conveys wrong results in some cases. That is the situation where the additional measures become an integral factor in concluding the more significant estimations of the constructed model.
The performance outcomes that can be evaluated based on the confusion matrix are accuracy, precision, specificity, recall or sensitivity, F1-score. For every class, the measures are evaluated separately. The average of all classes is considered as the final value for that measure. Accuracy is an essential metric for classification models. It is easy to understand and simple to apply for binary and also for multiclass classification problems. Accuracy indicates the proportion of accurate results in the total number of records tested. Accuracy is adequate for assessing the classification model, which is constructed from balanced datasets only. Accuracy may interpret wrong results if the given dataset for classification is skewed or imbalanced.
Precision indicates the proportion of the true positives in predicted positives. Another essential measure is recall, which conveys more information if capturing all possible positives is important. Recall indicates the fraction of total positive samples that are correctly predicted as positive. Recall is 1, if all positive samples are predicted as positive. If the optimal blend of precision and recall is required, these two measures can be combined as a new measure called F1-score. F1-score is the harmonic mean of the precision and recall, which lies between 0 and 1. The formulas to evaluate all these measures are shown in Equations 7.
The experimentation has been done by considering the non-neural network method SVM, CNN-based model [56], and the proposed Multi-level capsule network-based model.
Experiment-1 is conducted by considering SVM, experiment-2 was conducted by considering Multilayer Convolutional Neural Network (MCNN) from the literature [56] and experiment-3 was conducted by considering the proposed multi-level capsule network architecture as the classification algorithms. The obtained confusion matrices are shown in Tables 6, 8 and 10 respectively. In the three experiments 300 healthy images are identified as healthy images only. Among the total of 300 anthracnose diseased leaves, 213, 251 and 272 leaves are correctly identified by SVM, MCNN and proposed multi-level capsule network architecture respectively. With SVM, 8 and 79 leaves of anthracnose diseased leaves are identified as healthy and powdery mildew disease, respectively. With MCNN, 6 and 43 leaves of anthracnose diseased leaves are identified as healthy and powdery mildew diseased, and 4 and 24 leaves of anthracnose diseased leaves are identified as healthy and powdery mildew diseased by proposed multi-level capsule network architecture. Among the total of 300 powdery mildew diseased leaves, 219, 276 and 286 are correctly identified by SVM, MCNN and proposed multi-level capsule network architecture. 5 and 76 leaves of powdery mildew diseased leaves are identified as healthy and anthracnose disease, respectively by SVM, similarly 4 & 20 by MCNN and 3 & 11 by proposed architecture.
Confusion Matrix of Experiment-1: By applying Support Vector Machines
Confusion Matrix of Experiment-1: By applying Support Vector Machines
Confusion Matrix of Experiment-2: By applying MCNN
Confusion Matrix of Experiment-3: By applying the proposed multilevel capsule network architecture
Based on the confusion matrices in Tables 6, 8 and 10, the true-positives, true-negatives, false positives, and false negatives are estimated for each class separately. The estimated values are shown in Tables 7, 9 and 11 respectively. Based on the estimations of Tables 7, 9 and 11, performance measures accuracy, precision, recall, and F1-score are evaluated for each class separately. The performance measures of all the three classification algorithm are shown in Tables 13. The SVM algorithm has an accuracy of 98.56%, 81.89%, 82.22% for healthy, anthracnose, and powdery mildew classes, respectively. The MCNN based classification accuracy is 98.89%, 92.33%, 92.56% for healthy, anthracnose, and powdery mildew classes. The proposed multi-level architecture based classification accuracy is 99.22%, 95.67%, 95.78% for healthy, anthracnose, and powdery mildew classes, respectively.
Observations based on the confusion matrix of Table 7
Observations based on the confusion matrix of Table 8
Observations based on the confusion matrix of Table 10
Accuracy and Precision Results of the Three Experiments
Recall and F1-score Results of the Three Experiments
All the performance measures of the three experiments are shown in Tables 13. Table 12 shows the accuracy and precision for each class of the three experiments. Table 13 shows the recall and F1-score for each class of the three experiments. The comparison of the performance measures are shown in Figures 9. Figure 10 shows the consolidation of all the measures. The average of all three classes is considered as the final measure of the algorithm. An accuracy of 87.6%, 94.5%, and 96.8% was obtained with SVM, MCNN, and proposed algorithm. For SVM, MCNN, and the proposed algorithm, the obtained precision was 81%, 92%, and 95.3%. 81.33%, 91.89%, and 95.33% of recall and 81.1%, 91.8%, and 95.3% of F1-score were obtained for SVM, MCNN, and proposed algorithm.

Comparison of Accuracy Measure for the Three Experiments.

Comparison of Precision Measure for the Three Experiments.

Comparison of Recall Measure for the Three Experiments.

Comparison of F1-Score Measure for the Three Experiments.

Comparison of all Performance Measures for the Three Experiments.
The quality and quantity of the yield in the agricultural domain depend on how healthy the plants are. Controlling the fungal plant diseases in mango trees can enhance the yield from the mango crop. Deep learning architectures have outperformed results in detecting and classifying plant diseases by examining the leaf images of the plant. Hence, a model named Multi-level CapsNet was proposed based on the CapsNet of deep learning domain for classifying the fungal diseases anthracnose and powdery mildew in mango trees. The proposed architecture got 98.5% accuracy when compared with the CNN architecture. The work of this article can be enhanced in the future by considering other diseases of mango trees also. The same architecture can also be extended for disease detection in other plants. In summary, the article projects deep learning-based automated disease detection architecture with mango leaf images for improving the quality and quantity of yield from mango trees.
