Abstract
Deep learning is a field of Artificial Intelligence that has recently drawn a lot of attention with the desire to build up a quick, automatic and accurate system for image identification and classification. Deep learning serves as a fundamental part of modern computer vision solutions. However, as the architectures become deep and powerful new challenges in the process of training emerge. This includes the computational cost associated with training deep and large networks. In this work, the focus is on pruning and evaluation of state-of-the-art deep convolutional neural network for image-based plant disease and plants species classification. Pruning filters allow the reduction of parameters by removing unimportant filters and its feature maps. In this paper, the performance of pruned networks is evaluated across three datasets. It is observed that pruned DenseNet with Self-Normalization Neural Network (SNN) approach learns 2x faster compared to the initial DenseNet architecture. Additionally, pruning filters allow the reduction of the number of parameters and FLOPs by approximately 14% and 25% respectively. The aim is to create a fast and efficient model for the purpose of identification of plant diseases. Fast methods are desired for early identifications of diseases before damages occur. The proposed method achieves a satisfactory accuracy performance on PlantVillage, LeafSnap and Swedish-leaf dataset using held-out dataset. Our best pruned model gives an accuracy of 99.24%, 86.64%, and 97.5% on PlantVillage, LeafSnap, and Swedish-leaf datasets respectively.
Introduction
The deep learning has become one of the 1 most powerful tools for machine learning, where underlying patterns can be learned automatically from large amounts of data. It has been used extensively in many fields including image classification for various applications in agriculture. Plant diseases are important factors to consider as they result in a severe reduction in the quality and quantity of agricultural products. Therefore, early detection and diagnosis of these diseases are important. Among machine learning approaches, deep learning has emerged as one of the most effective techniques for image-based plants disease detection [1–7]. However, the training of deep models is difficult due to: (1) expensive computational resources because of the network size; Deep learning is impractical on low memory and low energy devices (2) overfitting, (3) a large number of redundant parameters. These parameters and multiplication operations make it impractical for most deep learning models to directly execute on the target hardware, and (4) exploding/vanishing gradients and degradation.
A number of approaches have been proposed in the literature to deal with some of these issues. More research is now focusing on improving the efficiency of deep learning models. Pruning and compression strategies are such methods. Pruning deep learning network is necessary for improving memory and energy utilization for faster training and better accuracy. In the past few years, there has been tremendous progress in this area. Several state-of-the-art pruning methods have been proposed. The pruning methods can be categorized as channel pruning [8, 9], weight pruning [10], filter pruning [11–14], activation pruning [15] and layer-wise pruning [15, 16]. In this paper, a method of gradually pruning filters and its feature maps with the lowest absolute sum of kernel weights is used. This is towards offering an avenue for deep learning models to be easily deployed on limited-resource devices.
The state-of-the-art densely connected network (DenseNet) has demonstrated to be an accurate network compared to other deep learning network architectures for the task of plants disease detection [5]. However, it is still computationally expensive to train the DenseNet architecture. It requires a substantial amount of memory and time to train DenseNet. Therefore, this study aims to optimize the computational cost by reducing the number of floating-point operations (FLOPs) and the number of parameters.
We employ a filter pruning technique as suggested by Li et al. [12] in order to remove unimportant or lowly ranked filters. Thus making DenseNet faster and more parameter efficient. Additionally, a Self-Normalization Neural Network (SNN) technique is applied to different network configurations as suggested by Klambauer [17].
Our experimental results demonstrate the effectiveness of the proposed approach on three datasets viz. PlantVillage, LeafSnap and Swedish-leaf. The pruned model is able to reduce the number of parameters by approximately 10% and FLOPs by approximately 22% of the original DenseNet. Additionally, our pruned model with SNN strategy has parameter reduction by 14% and more than 25% reduction in FLOPs compared to the original DenseNet model with ReLU activation [5].
Based on the analysis of some DenseNet architecture with different layers and configurations, pruned DenseNet with SNN was found to be a parameter efficient method and faster model. Therefore, the pruned model can be more suitable for efficiently classifying plants disease and plants species using images of leaves.
Experimental results reveal our approach to be consistently advantageous for the various task of leaf image classification as summarized below: - By removing lowly ranked filters and its feature maps, we inherently curtail the proliferation of parameters in our resulting DenseNet architecture. Using SNN (SELU, Alphadropout, and initialization) over ReLU and batch normalization proves to be computationally inexpensive. Experiments show our approach to be consistent on the task of plant leaf image classification. Pruned DenseNet with SNN strategy is more efficient than unpruned and pruned DenseNet with ReLU and batch normalization. This is true based on the measure of efficiency in terms of FLOPs and the number of parameters. A pruned DenseNet with SNN trains faster than the initial DenseNet and the pruned DenseNet with ReLU and batch normalization. Further pruned DenseNet with SNN model has reduced inference time. A pruned DenseNet with SNN configurations heightens the efficiency gap of deep networks with many layers.
Pruned DenseNet with SNN is more parameter efficient than the original DenseNet. Additionally, it has reduced FLOPs, therefore it is computationally efficient than the original DenseNet with ReLU and batch normalization. However, pruning causes a slight reduction in the accuracy of the models. Although, this can be mitigated by training the network longer or introducing noise into the layers.
The rest of the paper is structured as follows: Section 2, the background and related work; Section 3, describes the datasets used for the study, training, hardware and software, performance metrics, deep learning strategies used for this study and proposed methodology; Section 4, presents the experimental setup as well as the results; Section 5, discussion and finally, Section 6, conclusion.
Background
In this section, a review of deep learning models applied in the detection of plants diseases is done. Further, pruning strategies applied in deep learning is also reviewed.
Related work
Deep learning has continuously had breakthroughs and has infiltrated into a number of diverse industries. Deep learning is a kind of Artificial Neural Network (ANN) technique that takes in data as an input and transforms this data through a number of layers to compute the output classification [18]. Deep learning has gained a lot of popularity in the area of machine learning with efforts to learn high-level abstractions in data by utilizing hierarchical architectures [19–21]. Its popularity is attributed to various factors among them, the introduction of a powerful computing system with Graphics Processing Units (GPU), increased memory and hard-disk capacity [20]. Moreover, massive data and big data that is freely and publicly available have contributed to the popularity of deep learning. Deep learning has a wide range of applications and has been successfully used in face recognition [22], behavior recognition [23], object classification [24], hand-written digit classification [20] and computer vision [25]. The performance of computer vision tasks greatly benefited from deep learning by replacing hand-crafted features with features automatically extracted from deep neural networks. Traditional machine learning approaches have been extensively adopted in the agricultural field. More so, deep learning methods have recently gained popularity in the agricultural sector especially for the tasks of plants disease detection using images. Additionally, it has been used for leaf classifications using images [26].
Too et al. [5] in their work, focused on fine-tuning and evaluation of state-of-the-art deep Convolutional Neural Network (CNN) for image-based plant disease classification. They did an empirical analysis of four CNN models with a different number of layers. They evaluated VGG, Inception V4, ResNet (Residual Network) and DenseNet. The data used for the experiment was 38 different classes including diseased and healthy images of leaves of 14 plants from PlantVillage. According to their results, their best model was DenseNet with a testing accuracy score of 99.75% model. In that work, DenseNet model showed no signs of overfitting and performance deterioration. Additionally, DenseNet requires a substantially fewer number of parameters.
Similarly, Ferentinos K. P [1] in their paper, evaluated five CNN models for plant disease detection and diagnosis using leaves images of healthy and diseased plants. The data used for the experiment is 58 different classes of images of leaves of 25 plants from PlantVillage. The models evaluated include AlexNet, AlexNetOWTBn, GoogleNet, Overfeat, and VGG. According to their work VGG was their best model with a 99.53% success rate.
Equally, Sladojevic et al. [7] adopted deep CNN to the development of plant disease recognition model using leaf images. Their model was able to recognize 14 different types of plant disease from healthy leaves. Additionally, it was able to distinguish plants from their surroundings. They achieved an average of 96.3% accuracy on their experimental analysis.
Additionally, Mohanty et al. [27] in their work applied deep learning method for detection of plants diseases using images. They evaluated two architectures AlexNet and GoogleNet. Their best model achieved an accuracy of 99.35%. Similarly, Wang et al. [28] trained a deep model for disease severity classification using the apple black rot images from PlantVillage Dataset. Their VGG 16 model achieved an accuracy of 90.4% on the test set.
Deep learning has also been applied to detect and identify plants disease for a single plant. Ramcharan et al. [4] proposed a deep CNN to identify three cassava diseases and two types of pest damage (or lack thereof). Their best model achieved an overall accuracy of 93% for data not used in the training process. They demonstrated that the transfer learning technique for image understanding of field images provides a fast, affordable, and easily deployable strategy for digital plant disease detection. Similarly, Liu Bin et al. [3] proposed CNN for apple leaf diseases identification. They evaluated their model using AlexNet architecture with a dataset of 13,689 images of diseased apple leaves, with the aim of identifying four common apple leaf diseases using the test set. Their empirical results demonstrate that their CNN based model can be used for apple disease recognition.
Luo et al. [13] proposed ThiNet, a model that can accelerate and compress CNN models in both training and at inference stages. Their focus was on pruning the whole filter if it is less important. Their method did not change the original structure of the CNN network. With only 0.52% drop in Top-5 accuracy, ThiNet is said to achieve 3.31×FLOPs reduction and 16.63×compression on the VGG-16 model. Similar experiments with ResNet-50 reveal that even for a compact network, ThiNet can also reduce more than half of the parameters and FLOPs, at the cost of roughly 1% top-5 accuracy drop. The initial VGG-16 model can be pruned into a very small model with the model size of 5.05MB. on the other hand, pruning preserves the accuracy of AlexNet at the same time achieves strong generalization ability.
Li Hao et al. [12] proposed a filter pruning technique that prunes convolutional layers that are determined as having a small effect on the output accuracy and thus acceleration of CNN models. By removing whole filters in the network, together with their connecting feature maps, the computational costs are reduced significantly. The study shows that filter pruning method can reduce VGG-16 and ResNet-110 by 34% and 38% respectively, while at the same time preserving the accuracy. They prune filters across multiple layers and retrained only once for simplicity. Their findings indicate that pruning filters can achieve about 30% reduction in FLOPs for VGG-16 (on Cifar-10) and ResNet without significant loss in the original accuracy.
Ayinde et al. [11] present pruning for deep and/or wide CNN models by eliminating redundant features (or filters). Their model was able to prune redundant features along with their connecting feature maps according to their differentiation and based on their relative cosine distances in the feature space, thus yielding significantly smaller network size with reduced inference costs and competitive performance. Their experimental results demonstrate that the costs of inference can be substantially reduced by 40%, 27% and 39% for VGG-16, ResNet-56, and ResNet-110 respectively.
Materials and methods
The proposed methodology is illustrated in Fig. 1. The DenseNet architecture is created as described in section 3.1 and then loaded with weights obtained from a model pre-trained on the ImageNet dataset.

Proposed methodology.
The pruning strategy described in section 3.2 is then applied to the DenseNet network. Once the network is pruned it is then trained with SGD with a learning rate of 1e-3, weight decay of 1e-6, a Nesterov Momentum of 0.9 and Nesterov Accelerated Gradient set to true. The training is done using training dataset as described in section 3.4 and Section 3.5. The SNN normalization technique is applied in place of ReLU and batch normalization in the original DenseNet. Once the model is trained, it then performs inference (prediction) on test data. The model is evaluated based on categorical cross-entropy loss, top-1 accuracy, top-5 accuracy, training (per epoch) and inference time.
DenseNet is a variant of ResNet architecture [29] proposed by Huang et al. [30] that connects all layers directly with each other. In this architecture, the input of each layer comprises the feature maps of all previous layers, and it passes its output to each subsequent layer. They aggregate the feature maps with depth-concatenation and report the model to allow feature reuse which makes the network highly parameter-efficient. Equally, it is able to deal with the vanishing gradients problem.
The major difference between DenseNet and ResNet is that DenseNet concatenates feature maps learned by different layers. This increases variation in the input of subsequent layers thereby improving efficiency. On the other hand, ResNet aggregates its feature maps by summation.
The output of Lth layer is defined by yL+1 where yL+1 (output of the next layer) is derived by applying fL+1. In this case, fL+1 is a SELU activation function applied to the current layer x
L
in order to transform it into non-linear value:
Where xl+1 refers to a concatenation of feature-maps produced in layer l, l - 1, l - 2, l - 3, l - 4 …… …0. ⊗ symbolizes concatenations along the channel dimension.
A typical DenseNet structure is made up a convolutional layer that captures low-level features from images. It consists of several dense blocks with a transition layer in between.
The Dense block contains several dense layers as depicted in Table 1 and Fig. 2. The dense layer consists of a 3×3 convolution layer. To further improve computation efficiency a bottleneck layer of 1×1 convolution is introduced before each 3×3 convolution so as to reduce the number of input feature maps. A typical transformation of the dense layer is illustrated in Fig. 3. The final dense block layer summarizes the information contained on all the preceding dense blocks. Following the dense layer is a transition layer which is used for dimensionality reduction of feature maps. It contains a 1×1 convolutional layer and average pooling layer of 2×2 with a stride of 2. Finally, the classification consists of 7×7 global average pooling layer with a fully connected layer and Softmax activation. Additionally, each of these layers consists of activation functions.
A typical architecture [5]
For the task of plant disease identification and plant species identification, DenseNet model with 121, 161 and 169 layers are used. Additionally, the network is loaded with weights from a model pre-trained on ImageNet. Another fully-connected model with customized softmax on the top layer is created and, a new regularization method Alphadropout [17] is used. In all the layers SELU [17] activation function is applied in place of ReLU and batch normalization as used in the original model. The proposed DenseNet architecture is shown in Fig. 2. Every dense block contains several dense layers. The dense layers are transformed as shown in Fig. 3 followed by a concatenation/merging with the previous dense layers. In this study, four dense blocks are used with a different number of dense layers. The DenseNet-121 contains [6, 24] which is 6 dense layers in the first dense block, 12 in second, 24 in third and 16 dense layers in the last dense block. On the other hand, the DenseNet-161 and the DenseNet-169 have [6, 12, 24, 36] and [6, 32] number of dense layers respectively. The output dimension of each dense layer has k feature maps, which are then stacked to the previous feature maps by concatenation and used as input of the next dense layer. Where, k is the growth rate parameter and the number of feature maps in DenseNet increases linearly along with the depth (e.g. after L layers, the output will have L×k feature maps). We set k as 32,48 and 32 for DenseNet 121, DenseNet-161 and DenseNet-169 respectively.

Proposed DNN architecture.

Transformation in the Dense block layer.
A filter pruning technique by Li et al. [12] that accelerates deep learning methods is applied. It prunes filters from convolutional layers that are identified as having a small effect on the output accuracy. By removing whole filters in the network, together with their connecting feature maps, the computational costs are reduced significantly.
The work advocate for pruning filters and its corresponding feature maps. The procedure of pruning filters (m) from the convolution layer (ith) includes; first for each filter (F i , j), calculate the absolute sum of its kernel weights (S j ). Second, sort the filters S j . Third, prune m filters with smallest values and their corresponding feature maps. The kernels in the following layer that corresponds to the pruned feature maps are removed as well. Finally, a new kernel matrix is created for that convolutional layer (ith) and the next convolutional layer (i+1 th ) and the remaining kernels are copied to the new kernels. At each pruning iteration, they rank all the filters and prune the m lowest ranking filters globally among all the layers and its corresponding output features map as depicted in Fig. 4. The resultant network is a small compact network with less number of parameters. Retraining is then done to compensate for performance degradation. This is repeated until the entire network is pruned. The pruning process is illustrated in Fig. 5.

The process of filter and feature maps pruning.

Flow-chart of network pruning procedure.
In this study, we prune the 3×3 convolution layer in each dense layer of every dense block. Once it is pruned it is then concatenated with the previous pruned dense layers. This is repeated for every dense block as shown in Fig. 4.
Klambauer et al. [17] proposed Self-Normalizing Neural Networks (SNN), a self-normalization technique that automatically converges towards zero mean and unit variance [12].
Given
The original work proposed an activation function called SELU which is a variation of Exponential Linear Unit (ELU) [17]. The ELU is defined in Equation (2) with x being input to the network and α is a fixed parameter:
Therefore, SELU (σ) is just an ELU multiplied by lambda λ as shown in Equation (3).
Where λ and α are fixed parameters which are derived from input x. λ and α are set to ∼1.0507 and ∼1.6732 respectively. SELU is said to provide properties that lead to SNN. It has positive and negative values to control the mean value and saturates closer to zero. Additionally, λ ensures that the slope is larger than one, this controls the variance.
Moreover, Klambauer and team introduced a new regularization technique called Alphadropout that randomly sets inputs to saturate the negative value of SELU. Further, the authors recommended an initialization strategy that sample weights from a Gaussian distribution with mean zero and variance of 1/n, where n is the number of weights. Furthermore, for better performance, they recommended standardizing inputs by scaling to zero mean and unit variance. SNN helps to prevent the problem of vanishing and exploding gradients as well as allowing easier training of deep networks with many layers. The SNN strategy was applied in the DenseNet model as depicted in Fig. 2. This was achieved by replacing ReLU and batch normalization with SNN’s activation function, SELU. Additionally, the initialization scheme and Alphadropout were used as suggested by the authors of SNN [17].
In this study, we employ the SNN technique as suggested by Klambauer. We employed the use of SELU activation function, Alphadropout and Initialization scheme in the pruned DenseNet architecture.
PlantVillage contains openly and freely dataset with 54,306 images, with 26 diseases for 14 crop plants [31]. The images are originally colored images of varied sizes. These images are then split into three sets: training, validation, and testing. The first split was the training set and testing set with a percentage ratio of 80% and 20% respectively. The test set was held-out dataset used for evaluation of the models and for prediction. Additionally, the training data was further split into two; training and validation dataset with the ration of 80% and 20% respectively. The training, validation and testing sets were set to 34727, 8702 and 10876 samples respectively. A sample of PlantVillage dataset is shown in Fig. 6.

Sample images of PlantVillage datasets.
The Swedish-leaf dataset consists of images with 15 different classes. Each class is composed of 75 images. Some species are indistinguishable to the untrained eye. 90% of the dataset was used for training while the rest (test set) for model evaluation. A sample of Swedish-leaf dataset is shown in Fig. 7.

Sample images of Swedish leaf dataset.
LeafSnap [32] is an image dataset that consists of images of leaves with 185 tree species. The images are taken from various sources. It is composed of 23,147 lab images and 7,719 field images. However, in this study, field images were used for training and evaluation of the network. A sample of LeafSnap dataset is shown in Fig. 8.

Sample images of LeafSnap dataset.
The processing of the images was done by first resizing to 224×224 pixels. The data were normalized using the channel means and standard deviations as recommended by Klambauer et al. [17] for all the datasets described in section 3-4. The pixel values were divided by 255 so that they are in the range of [0,1]. Furthermore, the performance of one-hot encoding of the target variable was done in order to be used in the model.
Training
The training was done for 50 epochs on DenseNet with a varied number of layers 121,161 and 169. Further, different batch sizes were used due to constraints of the models. DenseNet with 121 layers (k = 32) used a batch size of 48 while DenseNet with 169 layers (k = 32) used 32. However, due to the GPU memory constraints, wide DenseNet with 161 layers(k = 48) was trained with the batch size of 16.
Measurement of performance
To determine the learning behavior of the model the top-1 accuracy, top-5 accuracy, cross-entropy loss, and speed was recorded for each epoch. This help in determining if some approaches are overfitting and which ones have better convergence. Further, for every experiment, the mean top-1 accuracy, mean top-5 accuracy and mean cross-entropy loss were computed based on the test dataset (held out dataset).
The experiments were run across a whole range of configurations with a different number of layers. First, the experiments were done before pruning (unpruned) and after pruning (pruned). We compared the unpruned and pruned behavior using two activation functions SELU and ReLU. Additionally, FLOPs and the number of parameters in the model were used to determine the best model for image-based plant disease identification and species classification.
Therefore, FLOPs, the number of parameters, Top-1 accuracy, Top-5 accuracy, and cross-entropy loss were used for comparison of results across all the different experimental configurations.
Hardware and software
The experiments were performed on single Graphics Processing Unit (GPU) mode with 16 Gb memory and Graphics of Tesla K40c. The operation was done on Ubuntu 16.04 (64 bits) operating system. Python version 3.5 was used together with Keras framework. Further, TensorFlow was used as Keras backend.
Experimental results
An empirical demonstration of the effectiveness of the pruned model was done. Summaries of FLOPs, number of parameters (Param), top-1 accuracy (Top-1), top-5 accuracy (Top-5), loss, inferencing time in seconds (IN.) and training time in seconds (TR) were reported. The experimental data are based on a different network and architectural configurations such as depth of the network (depth), growth rate (K) and activation function (AF) Table 2 summarizes the performance on PlantVillage dataset while Tables 3 and 4 shows the performance on LeafSnap and Swedish-leaf dataset respectively.
Results with pruned and unpruned network using PlantVillage dataset
Results with pruned and unpruned network using PlantVillage dataset
Results with pruned and unpruned network using Leafsnap dataset
From the results in Tables 2–4, it is evident that pruned networks have a substantial reduction in FLOPs and the number of parameters. The FLOPs are reduced by approximately 21%, 17% and 21% for DenseNet-121, DenseNet-161, and DenseNet-169 respectively across all the datasets. On the other hand, the number of parameters is also reduced by approximately 10% on all networks across all the datasets. The reduction in FLOPs and number of parameters is uniform across all the datasets. This, therefore, demonstrates the consistency of the pruned network.
Results with pruned and unpruned network using Swedish_Leaf dataset
Results with pruned and unpruned network using Swedish_Leaf dataset
DenseNet model with SELU activation functions is more parameter efficient. SELU is 3% more parameter efficient than ReLU. This is because ReLU activation function uses batch normalization which introduces more parameters into the network. Although ReLU is not a complex activation function compared to SELU, the use of batch normalization creates complexity. This makes the network expensive to train and parameter inefficient. SELU further has approximately 4% reduced number of FLOPs compared to ReLU activation function. This is evident from the results in Tables 2–4. In terms of computation time (inference speed and training speed) SELU networks are 2x faster than ReLU.
Performance analysis
We also measure the top-1 accuracy, top-5 accuracy, loss and inference speed performance on a held-out dataset on pruned and unpruned networks. Additionally, training speed per epoch is observed. The performance is recorded for PlantVillage, LeafSnap and Swedish-leaf dataset on Tables 2–4 respectively. The inference time (IN.) and training time (TR) is substantially reduced by pruning. There is a negligible drop in the accuracy on some datasets and more or less the same on other datasets. However, on some datasets pruned network performs better than the unpruned network in terms of accuracy. This is evident from the results on pruned DenseNet with ReLU as shown in Table 4. Top-5 accuracy remains more or less the same across all the experiments.
Analysis of performance on loss
The behavior of the DenseNet-121 with SELU activation function is depicted by Figs. 8–10. The results show the performance in terms of loss on the unpruned network and pruned network on PlantVillage, LeafSnap, and Swedish-leaf datasets. From the graphs, it is evident that pruning reduces overfitting.

DenseNet-121 showing model loss of Unpruned network(Left), and pruned (Right) on PlantVillage dataset.

DenseNet-121 showing model loss of Unpruned network(Left), and pruned (Right) on LeafSnap dataset.
The classification/prediction results on each image are sorted based on the confidence level of each output class with the highest confidence. The output shows the predicted results and the actual leaf species. The predicted value is the outcome results of the prediction of the model. Figures 11–14 shows some classification cases of randomly selected images throughout the testing dataset using the pruned model. They were correctly classified, i.e., their corresponding classes were those returned with the highest certainty by the model.

DenseNet-121 showing model loss of Unpruned network(Left), and pruned (Right) on Swedish-leaf dataset.

An example of correct classifications of Tomato’s Early blight and Late blight and Grapes Black rot and Esca (Black Measles) diseases on the PlantVillage testing dataset.

An example of correct classifications of Peach Bacterial spot of the PlantVillage testing dataset and its visualization of the first convolution layer.

An example of correct classifications of Pinus koraiensis and prinus taeda leaf species on the LeafSnsp testing dataset.

An example of correct classifications of Fagus silvatica and Ulmus glabra leaf species based on the Swedish_leaf testing dataset.
Figure 11 shows the correct classification of plants diseases that are hard to distinguish by the naked eye. Our pruned model is able to classify some of the diseases by approximately 1.0 (100%). This demonstrates the effectiveness of the pruned model in terms of classification. Figure 12 shows a correctly classified plants disease and corresponding visualization of the convolutions layer 0.
Figures 13 and 14 shows the correct classification of plants species that are equally hard to distinguish. It verifies that our pruned model is able to perform classification with high certainty level.
DenseNet has shown a remarkably outstanding performance result for image classification [5, 30]. This shows that deeper networks tend to be more accurate as compared to shallow networks. However, deep networks come at a high cost in terms of computation and memory requirements. Thus becoming far-fetched especially for constrained devices. The networks are characterized by many redundant parameters which take a longer time to compute and consume high memory.
Deep learning has gained popularity in areas of agriculture for plant recognition and disease detection. However, most research is focused on the application and accuracy of the networks [1, 5]. We extend the studies done by Too et al. [5], whereby DenseNet has been found as an accurate and parameter efficient compared to the most networks for the task of disease identification.
DenseNet is a complex architecture with several restrictions and its filter and redundant feature maps removal need to be considered. Independent pruning of the filters in the convolution network of the dense block layer is considered. Thus we chose the filter pruning technique by Li [12] because it does not change the architectural design of DenseNet. From literature, a number of ways have been suggested to remove some of the redundant parameters. However, the pruning techniques found in the literature are more applicable to straight forward architecture like VGG net and AlexNet like architectures [11–13]. Therefore, in this work, we have shown that DenseNet architecture can be pruned. Pruning removes filters that are evaluated as having a small effect on the network. The resultant is a small network that is parameter efficient. Additionally, it is evident that the pruned network is computationally efficient based on the measure of FLOPs, training speed and the inference time.
From the result of three datasets (PlantVillage, leafsnap and Swedish-leaf dataset), it is evident that a pruned network can be used for classification task whilst still preserving the accuracy. As evident from Figs. 8–10 and results in Tables 2–4 pruning allows the model to converge easily and also minimizes overfitting. Additionally, the pruned DenseNet model has about a 10% reduction in the number of parameters and up to 22% FLOPs reduction. Equally, SNN approach (SELU activation, Alphadropout and initialization scheme) reduces the number of parameters and FLOPs by approximately 14% and 25% respectively compared to the DenseNet architecture applied in the original model (DenseNet with ReLU and batch normalization). Although ReLU activation function is known to be a simple function, however batch normalization introduces complexities. Batch normalization introduces more parameters that make the network large. Thus, pruned DenseNet based on SNN approach is 2x faster than the initial DenseNet network. However, it is evident that pruning has a minimal negative effect as its seen to have a slight drop in the accuracy. But the accuracy result is still state-of-the-art with the best model giving a Top-1 accuracy of 99.24%, 86.64% and 97.5 % on PlantVillage, LeafSnap, and Swedish-leaf datasets respectively. The accuracy is still better than other networks such as VGG net and Inception V4 [5]. Our pruned network also gives about the same accuracy rate as the ResNet-50, ResNet-101, and Resnet-152. However, the ResNet architectures [5] have increased parameters with ResNet-50(50 layers) having 3.7x more parameters than our pruned DenseNet with 121 layers.
We used LeafSnap and Swedish leaf datasets to validate the pruned network. The datasets are used mainly for species recognition based on the leaves. The classification /prediction results depicted by Figs. 13, 14 demonstrates that the pruned model is able to classify plants diseases with high certainty level.
Conclusion
In this study, the performance of pruned and unpruned DenseNet with a different number of layers is evaluated. It is observed that pruning helps in removing redundant and unimportant connections hence its parameter efficiency. To further improve on efficiency SNN method is applied with SELU, Alphadropout and initialization scheme. It is observed that pruned DenseNet with SNN learns 2x faster than the original DenseNet with ReLU and batch normalization. Additionally, pruning filters allow the reduction of a number of parameters and FLOPs by approximately 14% and 25% respectively. Further, the proposed method achieves a satisfactory accuracy performance on PlantVillage, LeafSnap, and Swedish-leaf dataset using held- i out dataset (test dataset). The best model gives a Top-1 accuracy of 99.24%, 86.64% and 97.5% on PlantVillage, LeafSnap, and Swedish-leaf datasets respectively. Although, the proposed method is faster and parameter efficient there is a negligible drop in accuracy compared to the original model. Therefore, to improve on accuracy quantization is recommended. Additionally, noise induction into the layers can be introduced to boost pruning and accuracy.
Funding
This work was supported by the National Natural Science Foundation of China under Grant 61876010.
Conflict of interest
The authors declare that they have no competing interests.
