Abstract
Convolutional neural networks (CNNs) for automatic classification and medical image diagnosis have recently displayed a remarkable performance. However, the CNNs fail to recognize original images rotated and oriented differently, limiting their performance. This paper presents a new capsule network (CapsNet) based framework known as the multi-lane atrous feature fusion capsule network (MLAF-CapsNet) for brain tumor type classification. The MLAF-CapsNet consists of atrous and CLAHE, where the atrous increases receptive fields and maintains spatial representation, whereas the CLAHE is used as a base layer that uses an improved adaptive histogram equalization (AHE) to enhance the input images. The proposed method is evaluated using whole-brain tumor and segmented tumor datasets. The efficiency performance of the two datasets is explored and compared. The experimental results of the MLAF-CapsNet show better accuracies (93.40% and 96.60%) and precisions (94.21% and 96.55%) in feature extraction based on the original images from the two datasets than the traditional CapsNet (78.93% and 97.30%). Based on the two datasets’ augmentation, the proposed method achieved the best accuracy (98.48% and 98.82%) and precisions (98.88% and 98.58%) in extracting features compared to the traditional CapsNet. Our results indicate that the proposed method can successfully improve brain tumor classification problems and support radiologists in medical diagnostics.
Keywords
Introduction
Brain cancers are the third most common cancers among adolescents and young adults [1, 2]. According to [3], a brain tumor is the high cause of cancer-related death worldwide. According to the American Brain Tumor Association (ABTA) statistics published in 2018, based on brain tumor statistics, about 80,000 new cases of brain tumors are expected to be diagnosed every year [1]. Detecting the brain tumor types at the early stage is important for both physicians and patients. This helps to devise a treatment plan for the patient. There are different categories of a tumor–for example, Meningioma, Pituitary, Glioma, Glioblastoma, and Astrocytomas. Fig. 1 shows the typical example of brain tumor types. Determining the correct brain type in the early stages is a challenging task but tends to be critical since it aids physicians in having a precise treatment plan and better predict patients’ response to treatment. Medical image processing is used to detect brain cancer early, provide effective treatments, and increase patient’s survival rates [4]. On the other hand, the classification of tumor types that employs human inspection relies heavily on experienced radiologists [5, 6]. However, this method consumes time, prone to errors, and sometimes is cost-effective. Figure 2 shows the statistical features of percentage infection of the brain cancers.

Scan showing the most three commonly occurring primary brain tumor types. (a) Menningioma (b) Glioma and (c) Pituitary. Best view in color.

Statistics showing the affection rate of brain tumors types. In this work, we focused on the three most rated tumors i.e. (i) Menningioma (ii) Glioma and (iii) Pituitary.
In General, cancer tumor classification methods are composed of segmenting steps [7–9]. These steps are performed to identify the exact tumor from the images correctly. Followed by this step is the extraction and selection of relevant features needed for the tumor classification X. Li et al., 2015 [10]. Havaei et al. 2017 [11] proposed work on brain tumor segmentation. This work has a two-way CNN, which considers both the pixel properties and probabilities of the neighboring pixels’ existence. One property of this work is that the moment segmentation is done for the tumor region, different or similar categories of features can be extracted and fed to the classification layer. Based on this, K. Usman et al., 2017; Y. Wei et al., 2017 [12, 13] adopted intensity and neighboring pixel as an input vector trained with a random forest classifier. In [14], J. Cheng et al. reviewed the effect of tumor augmentation. In their work, they showed that tumor augmentation could improve the classification accuracy of brain tumors. Furthermore, Gray Level Co-occurence Matrix (GLCM) and Gray Level Run Length Matrices (GLRLM) were used by [15] to effectively perform feature extraction of 18 features for classification of the tumor using Probabilistic Neural Networks (PNNs) [5]. However, the above-mentioned machine learning approaches on tumor classification have a considerable shortcoming. The drawback is the prior knowledge of the feature types that need to be extracted. This considerably reduces their generalization capability and can cause a reduction in performance.
Convolutional neural networks (CNNs) [16] have a broad learning limit that can construe the idea of an information image without earlier information. This makes CNNs an appropriate technique for image classification. Recently, the interest in adopting CNNs for brain tumor classification has increased [17]. The neural networks (NNs) and CNNs are combined with several pre-processing methods such as data augmentation. The results indicated that training CNNs without pre-processing could outperform other methods using axial MRI images. Although CNNs have successfully shown remarkable performance in image processing, they still have some drawbacks. For example, CNNs are invariant of translation. Thus, they fail to identify the position of an object to another. To achieve better generalization, CNN requires lots of data. Hence, CNNs achieve low accuracy when given small data.
Sabour et al. 2017 have proposed Capsule networks (CapsNets) [18] to overcome these problems. Each capsule within the network consists of several neurons. Each capsule’s activity vector consists of many pose parameters such as position, orientation, scaling, and skewness. CapsNet introduces routing by agreement to replace the pooling layer, where the lower-level capsules can predict the output of the higher-level capsules.
To improve segmentation and classification accuracy in vision tasks on x-ray, mammogram, and MR images, image enhancement has been widely explored, which has proven effective. MRI images were processed by combining logarithm (LoG) and contrast limited adaptive histogram equalization (CLAHE) filtering method for brain tumor segmentation [20, 21]. In their original state, obtaining real-time image sequences may not have a good viewing quality because of proper lighting loss. In order to overcome this problem, CLAHE was effectively utilized [22]. The method has been explored in many research areas such as medical images [23], segmentation of objects with ambiguous boundaries [24], retinal image enhancement [25], diagnosis of breast cancer [26, 27], underwater images enhancement [28], ultrasonic good logging image enhancement [29], finger vein image enhancement and pattern extraction [30], preserving brightness, contrast enhancement and mass segmentation of mammogram images [31] and improving the visual quality of fundus images [32].
In this paper, we capitalize on the achievability of the CapsNet architecture for brain tumor classification. For this reason, we present a multi-lane atrous feature fusion capsule network with contrast limited adaptive histogram equalization (CLAHE), which we named multi-lane atrous feature fusion CapsNet (MLAF-CapsNet).
In more detail, the contributions of our work are summarized as follows: A new CapsNet-based architecture called MLAF-CapsNet is proposed for brain tumor classification. For the first time, we use CLAHE as an enhancement layer and explore the CLAHE layer’s impact on the model’s performance. Again, MLAF-CapsNet adopts atrous for better spatial representation. We performed extensive experiments on brain MRI and segmented tumor datasets, which are composed of four (4) and three (3) categories for the brain MRI and segmented tumor, respectively. We empirically show the effectiveness of the proposed MLAF-CapsNet produced a good performance and balanced computational complexity.
The paper is organized as follows: Section 2 introduces the methods of the proposed network. Section 3 presents the experiment details of MLAF-CapsNet. Section 4 presents the experimental results and discusses the relevant findings. Section 5 concludes the paper. Finally, the appendix presents additional experiments using augmented images.
Giving a set of n train training Magnetic Resonance Images, a deep learning framework for the classification of tumors from MRI images into different classes, is proposed. Among the available types of brain tumors, we focus on classifying these 3 types of tumors, namely (i) Meningioma, (ii) Pituitary, and (iii) Glioma. In this paper, a deep learning architecture-based capsule network was implemented to classify the brain tumor type effectively. This section introduces the common building block of the proposed network for the task at hand.
Encoder method
In the encoder block, the images are down-sampled through the network with an MRI slice as input. The architecture consists of 3-block convolutional layers, for which each contains 3 levels of convolutional components. Each block layer’s first convolutional layer consists of an atrous convolution layer with a kernel size of 3×3 followed by a max-pooling layer. Each of the convolutional layers is followed by rectified linear unit (ReLU) activation function and batch normalization layer except the last 3 convolutional layers of the block 1 and 2 layers. The number of feature sizes from the convolutional layers are increased at each component, i.e., 64, 96, and 128. Processing the convolutional components, we obtained a feature size with 128 channels. This feature map is forward as input to a convolutional layer with a kernel size of 1×1 and an output channel of 512. This output feature map is achieved using the proposed multi-layer feature fusion method.
Multi-layer atrous feature fusion
Image enhancement (CLAHE)
The commonly adopted method in image enhancement often used is histogram equalization. This is due to its simplicity and low computation load. In this paper, the contrast-limited adaptive histogram equalization (CLAHE) was used to improve the color of the MRI image.
CLAHE is an advanced form of adaptive histogram equalization (AHE) for image enhancement, which works well for biomedical images like MRI and mammograms [25]. It improves the image’s quality by removing the noise and prevents high amplification of noise, resulting in the AHE technique. The method uses contrast amplification limiting each neighboring pixel’s procedure, and the transformation function is formed to reduce the noise problem.
CLAHE is used as a pre-processing method which results in occupying additional space on the storage resource. To overcome this, an enhancement CLAHE layer was implemented as a base before the atrous spatial feature pooling ASPF blocks. The layer receives input from the initial input layer, processes it, and sends the output to the ASPF block. Figure 6 shows the ASPF architecture, where the enhancement layer’s output is forwarded to the atrous convolutions.
Atrous convolution
CNNs have been used for classification, dense prediction, and semantic segmentation tasks [33–40]. CNN layers followed by a pooling layer for image classification increasingly reduce the resolution to small feature maps, thereby losing the feature map’s spatial structure. The loss of spatial vision limits image classification accuracy and complicates the transfer of the model to downstream applications that require detailed understanding. To overcome this, atrous convolution was proposed [41–44]. The atrous method uses a filter (i.e., holes) to increase the receptive fields.
The atrous or dilation convolution can be defined as
Let F0, F1, . . . , Fn-1:
The receptive field p is defined in Fi+1 as the set of element that modifies the value of Fi+1 (p). Considering the size of receptive field as the number of element, the size of the receptive field in Fi+1 increases exponentially and can be expressed as (2i+2 - 1) × (2i+2 - 1).
Figure 3 shows atrous convolution on a 2D image. The red dots represent the inputs to a kernel which is 3×3, while the yellow areas show the receptive field extracted from each input. To sum up, receptive fields are the implicit areas extracted from the upper layer’s early input.

Illustrating the concept of dilated convolution. Dilation exponentially expands the receptive fields without any loss of coverage or resolution. (a) 1-dilated convolution (b) 2-dilated convolution and (c) 4-dilated convolution.
In a simple definition, atrous convolution is the application of convolution input data with a defined set of holes. With this definition, given N images input as a 2D image, atrous rate k = 1 is the standard convolution, and k = 2 means skipping one pixel per input, and k = 4 means skipping 3 pixels. Figure 3 best show the same k values. We can see that the dilated convolution’s receptive field is larger than the standard convolution (Fig. 3(a)).
Our architecture was motivated by atrous convolution, which supports an exponential expansion of receptive fields without losing coverage or resolution. The atrous convolution maintains the resolution of the feature maps. Moreover, this method provides an enlargement of the field-view filters without maximizing the number of parameters.
A deep neural network for classification tasks can be improved when the object scale is considered. One method is adopting ASPP [32], which is motivated by spatial pyramid pooling in R-CNN. The ASPP promotes the use of more parallel atrous convolutional layers to obtain feature size. The ASFP method used several parallel block atrous convolutional layers to obtain the required feature size. The feature maps obtained from the different block layers are fused (concatenated) to form one collective feature. The ASFP block layers consist of 3 × 3 atrous convolution layers (i.e., atrous rate of 2, 4, and 6) and 1 × 1 standard convolution layers.
Capsule network
In recent deep learning and visual tasks, CNNs have been widely used for feature extraction. However, the convolution operation in CNNs seems to be simple in solving complex problems [18]. For example, given CNN with a different image that has been rotated and different orientation directions, CNN fails to detect the original image. The orientation of the components and the relative relationship in space is not important to CNN. Therefore, CNN only cares about the presence of features.
The newly proposed method called CapsNet [18] is to alleviate the aforementioned challenges, to represent a sample of visual entities. Capsules are defined as collective neurons that indicate the activity vectors representing existing pose parameters. The length of the vector shows the existence of a specific entity. The drawbacks of CNNs are mostly related to the pooling layers. With this, as a result, capsule networks have successfully replaced pooling layers with appropriate criteria called "routing by agreement." Based on these criteria, the output from the layer below is sent to all parent capsules in the layers above; however, their coupling coefficients are different. Each capsule in the lower layer predicts the output of the parent capsules. If the prediction matches the parent capsule’s output, then the coupling coefficient for these two capsules is increased. Let
u
i
be the output of capsule i and its prediction from parent capsule j is expressed as
Finally, a non-linear function is used to compress long vectors to a vector close to 1 and short vectors to a vector close to 0. This squash function prevents the output vectors from exceeding 1. Equation (6) shows the non-linear squash function.
The log probabilities are updated in the routing process based on the agreement between v
j
for the fact that the agreement between two vectors will be increased and have a large inner product. Therefore, agreement a
ij
for updating the log probability and coupling coefficient is defined as
Capsule k in the last layer is connected with a loss l
k
. This puts a big loss value on capsules with long output instantiation parameters when the entity does not exist. The loss function l
k
is expressed as follows.

Dynamic routing procedure. The variables u i , w ij , c ij and v j represent input capsule, weight matrix, the output capsule and the final output after squashing, respectively.
Figure 5 shows the proposed multi-lane atrous feature fusion CapsNet model. The model consists of contrast limited adaptive histogram equalization (CLAHE) layer, ASFP, capsule, and reconstruction layers. The ASPF is used to achieve a large receptive and extract the semantics in the sequence of features learned in the lower-level layers using a more atrous convolutional layer with different rates. The input feature from the CLAHE layer is processed with the first atrous convolution of the dilated rate of 2, 4, and 6 in each block lane. The feature map is forwarded to a convolution layer with a kernel size of 1×1. The output feature maps from the lane_1_Conv1 and lane_2_Conv1 are concatenated together and fused with lane_3_Conv1. The result is processed with lane_1_Conv2 and lane_2_Conv2 with 1×1 convolution. While the feature map of lane_3_Conv1 is separately sent to lane_3_Conv2

Proposed MLAF-CapsNet architecture.
The output feature map of the ASFP in the encoder block (Conv4) is forward to the capsule layer in the decoder block. The capsule layers are composed of primary capsules (primaryCaps) and tumor capsules (tumorCaps). The primary capsule consists of convolution with a kernel size of 3×3, filter size 512, and stride of 2. The primaryCaps is made of 32 capsules; each of the capsules is an 8D with H×W feature map, which is the output of the ASFP. The H×W represents the height and width of the feature map. These capsules in the primaryCaps are forwarded to the tumorCaps layer using the dynamic routing algorithm procedure. The tumorCaps contain the number classes in 16D capsules, where each of the capsules receives variables from the primaryCaps layer. In this paper, we experimented on 3 datasets with 3 and 4 classes. The Softmax function in equation 4 for the coupling coefficient was not well distributed. Therefore, we used the Sigmoid function shown in equation 10.
The Sigmoid function assigns larger coupling coefficients to real features and transfers true features to the next capsule layers’ class.
The tumorCaps output is sent to the reconstruction layer to reconstruct the features obtained by the tumorCaps. The features are sent to the decoder layer in the capsule to decode the entity’s property. The decoder consists of 3 fully connected layers with 512, 1024, and 2304 neurons.
Dataset
To evaluate our proposed method, we used the dataset presented in References [14, 45]. Table 1, columns 4 to 7 show the whole-brain tumor data and categories, including the total set of train and validation of the original and augmented images, respectively. The tumor dataset is provided as a set of slices and contains 3264 T1-weighted MRI images available at kaggle.com. The images are already split into training and validation folders. There are three types of tumors: meningioma (937 images), glioma (926 images), pituitary (901 images), and includes normal brain (500 images), which makes it 4 categories for this study. The images are taken in three planes: Axial, Coronal, and Sagittal. Figure 7(a) shows samples of the different tumor types, as well as different planes. The segmented tumor dataset is presented by [45–48]. Columns 10 to 12 show the segmented tumor type, including the total train and validation of the original and augmented images. It consists of 3064 MRI images with three kinds of brain tumor: meningioma (708 images), glioma (1426 images), and pituitary tumor (930 images). Samples of the segmented tumor are shown in Fig. 7(b). Table 1 shows dataset information of whole-brain MRI and segmented tumor images. In Section 3.2, we described the data augmentation process.
Data distribution
Data distribution

ASFP of the proposed MLAF-CapsNet architecture.

Typical brain tumor type. (a) Sample magnetic resonance imaging (MRI) of different categories of tumors in different planes. Example of the tumor type is given in each plane. (b) Sample of the segmented tumor type.
This section gives the augmentation procedure based on the general principle of data augmentation in [49]. The magnetic resonance images from the dataset consist of different sizes. We down-sampled and normalized the original images patch into 48 × 48 pixels. This reduced the trainable parameters, dimensionality, and computations. Data transformation was performed on the training data using augmentation in two ways. In the initial augmentation, the images were rotated in two angles, i.e., 90 degrees and 180 degrees. With the second transformation, images were flipped 4 times in their vertical and horizontal planes. After the data transformation, the whole-brain and segmented tumor images of the training data increased to 13800 and 17220 images. Table 1 shows detailed information of the individual original class set and their corresponding augmented set.
Implementation details
This paper’s experiments were implemented using a Windows system and with NVIDIA GeForce GTX 1060 6GB GPU. The codes take TensorFlow as the backend and are implemented through Keras and python (Anaconda). The network was trained for 300 and 200 epochs on the original and augmented images, respectively. The learning rate was set to 0.0001. The batch size on the original images was set to 16, whereas 32 for the augmented images. We used the Adam algorithm with momentum as the gradient optimizer. The momentum was set to 0.9, and the descent rate was set to 10-6. The code used for this study is available at https://github.com/aduk4u/Multi-Lane-Atrous-Feature-Fusion-Capsule-Network-with-CLAHE, and it is a modification of the code at https://github.com/XifengGuo/CapsNet-Keras.
Results and discussion
Results of the proposed architecture trained using original whole-brain tumor images and segmented tumor images are presented in Tables 2 and 3, respectively, and visualized using the train, validation, and loss curves, confusion matrices, and area under the curve (AUC). The non-white columns represent the actual classes in the confusion matrices, and the non-white rows correspond to the output classes (predicted). The appendix section presents an additional experiment using augmented images of both the whole-brain tumor images and segmented tumor images.
Comparison of accuracy on architectures trained and validated on the original whole-brain tumor dataset as input
Comparison of accuracy on architectures trained and validated on the original whole-brain tumor dataset as input
Comparison of accuracy on architectures trained and validated on the original segmented tumor data as input
In this section, model comparisons on image type: whole-brain tumor images and segmented tumor images are presented.
Original whole-brian image
Table 2 presents an accuracy comparison of architectures, trained and validated using the whole brain tumor data based on model accuracy, average precision, average recall, average F1-Score, model parameter (# param), and the train time evaluation metrics. The MLAF-CapsNet with CLAHE produced 93.40%, which shows 14.47%, 15.40%, 0.80% improvements over traditional CapsNet of 78.93%, Afshar et al. [5] 78.00% and Vimal et al. 92.92%, respectively. Also, the MLAF-CapsNet with CLAHE had an error rate of 6.60%, which is lower than the other models. The ablation study model, MLAF-CapsNet without CLAHE, had 92.39% higher than the traditional CapsNet and the model [5]. Figure 8 shows the training, validation, and loss curves comparison on the proposed MLAF-CapsNet architecture and the traditional CapsNet model. Through observation, we can see from confusion matrices shown in Fig. 9 that the proposed model produced the best results based on the accuracy, sensitivity, specificity metrics. The percentage of correctly classified images is shown on the diagonal. The column corresponds to specificity, whereas the last row represents the sensitivity. The overall accuracy is shown at the bottom right. Figure 10 presents the area under curve (AUC) comparison on the trained models.

Training, validation accuracy and loss curve on whole-brain image data. (a) Training and validation accuracy for MLAF-CapsNet with CLAHE, MLAF-CapsNet without CLAHE and the Traditional CapsNet and (b) Training and validation Loss for MLAF-CapsNet with CLAHE, MLAF-CapsNet without CLAHE and the Traditional CapsNet. Best view in color.

Confusion matrices on the whole-brain image.

Visualization of area under curve (auc). (a) MLAF-CapsNet with CLAHE (b) MLAF-CapsNet without CLAHE and (c) Traditional CapsNet.
Table 3 presents an accuracy comparison of architectures, trained and validated using the segmented tumor data based on model accuracy, average precision, average recall, average F1-Score, model parameter (# param), and the train time evaluation metrics. The proposed MLAF-CapsNet with CLAHE obtained the best result of 96.60% with 9.30% and 10.04% improvement over the traditional CapsNet and [5], respectively. An ablation study was performed to evaluate the effect of the CLAHE in the MLAF-CapsNet model, where the CLAHE layer was removed. With this, the ASPF lanes receive input directly from the input layer without intermediate image enhancement. This method achieved 93.42% accuracy, which is 5.92% and 6.86% improvement over the traditional CapsNet and previous work by [5]. Figure 11 shows the comparison of train, validation, and loss curves for the proposed MLAF-CapsNet with CLAHE and the traditional CapsNet architecture. In the confusion matrices are shown in Fig. 12, MLAF-CapsNet with CLAHE achieves the best classification with the least error rate of 3.66%. The percentage of the correctly classified images is presented on the diagonal. The last column represents the specificity, whiles the last row represents the sensitivity. The total accuracy is shown at the bottom right. The traditional CapsNet produced a high error rate of 12.70%, false positives and false negatives compared with the (a) MLAF-CapsNet with CLAHE and (b) MLAF-CapsNet with no CLAHE. In Table 3, average precision, recall, and F1-score are shown to alleviate the tumor classes’ imbalance in the database. Figure 13 shows the comparison of the area under the curve (AUC) of the trained models, which the proposed MLAF-CapsNet with CLAHE proved promising.

Training, validation accuracy and loss curve on segmented tumor data. (a) Training and validation accuracy for MLAF-CapsNet with CLAHE, MLAF-CapsNet without CLAHE and the Traditional CapsNet and (b) Training and validation Loss for MLAF with CLAHE, MLAF without CLAHE and the Traditional CapsNet. Best view in color.

Confusion matrices on the segmented tumor.

Visualization of area under curve (auc). (a) MLAF-CapsNet with CLAHE (b) MLAF-CapsNet without CLAHE and (c) Traditional CapsNet.
A comparison of the proposed model based on CapsNet is performed for both whole-brain tumor and segmented tumor images. Figure 14 compares the results of architecture using whole-brain tumors and segmented tumor images as input. Generally, CapsNet model focuses on everything in the input image, including the background, and because the whole-brain MRI images were taken from several different angles such as the Axial, Coronal, and Sagittal, the image background has more variations. Therefore, the ability of the CapsNet to handle whole-brain images is minimal compared to segmented tumor images. In contrast, the models in Fig. 14 based on CapsNet achieved state-of-the-art result for the brain tumor classification.

Comparison of model accuracy for brain and segmented tumor images.
Image enhancement using the CLAHE method provided a well-distributed histogram for each pixel in the image. Furthermore, the tumor was well-segmented, and the tumor can be seen. Figure 15 shows an example of the enhancement using the CLAHE method. The performance of the CLAHE as a base layer proves that using CLAHE as a layer can provide better feature information to improve network accuracy.

Enhanced image using CLAHE.
Better contextual features improve classification accuracy in deep learning. Figure 16 compares the atrous convolution’s activation maps in the (a) MLAF-CapsNet architecture and (b) the traditional CapsNet. It shows the output feature maps when the input is processed with MLAF-CapsNet and the traditional CapsNet models.

Comparison of convolution feature maps. (a) Feature maps of the atrous convolution from MLAF-CapsNet and (b) Feature map from convolutional layer from the traditional CapsNet.
Figure 16 shows that in the MLAF-CapsNet, the atrous convolution layers’ feature maps have a higher convolution level than the convolution layer in the traditional CapsNet. The resolution and contextual features of the (a) atrous convolutions are much enhanced, and tumors are well seen than the (b) traditional CapsNet, where the tumors are lost in the feature maps. In the feature maps of the traditional CapsNet, Fig. 16(b), the medium resolution can be too low to produce a nice classification result. On the other hand, in Fig. 16(a), the atrous convolutions maintain more details of the tumor to achieve better classification accuracy.
The results shown in Tables 2 and 3 show that the atrous convolution model’s training time is reduced averagely under the same experimental environment method. With the increase in training rounds, the training accuracy of both the MLAF-CapsNet model and the traditional CapsNet model increases, and the training accuracy of the MLAF-CapsNet model is always higher than that of the traditional CapsNet model. Figure 17 presents a sample of reconstructed images from the MLAF-CapsNet. It can be seen that the tumors in the reconstructed brain image (Fig. 17(b)) maintain more detailed information and very clear.

(a) Reconstructed image from the MLAF-CapsNet and (b) Reconstructed images. Red highlighted rectangle indicates tumor.
This paper proposed a new capsule network-based architecture called multi-lane atrous feature fusion capsule network (MLAF-CapsNet) for brain tumor classification. The MLAF-CapsNet introduces contrast limited adaptive histogram equalization (CLAHE) as a base layer to enhance the input image before forwarding it to the high-level convolutional layer. The atrous convolutional layers were used with the capsule network to integrate the tumor image’s contextual information. We capitalized on the advantage of atrous convolution layers to increase the receptive fields without losing resolution. The effectiveness of the MLAF-CapsNet demonstrated high performance (93.40% and 96.60%) on whole-brain image and segmented tumor datasets with low error rate than the traditional CapsNet (78.93% and 87.30%). Similarly, on the augmented images, MLAF-CapsNet produced the best result (98.48% and 98.82%) on whole-brain image and segmented tumor datasets than traditional CapsNet (76.90% and 82.46%) and other proposed works. Furthermore, based on our evaluations and experimentation, the models can produce a better result on the segmented tumor images than the whole brain images. Our proposed method’s results show that our MLAF-CapsNet model can generalize with images and have efficient execution speed. This, therefore, indicates that our method can be adopted as a supporting tool for radiologists.
Footnotes
Acknowledgment
This work was supported by National Natural Science Foundation of China (NSFC Grant No. 61550110248) and Sichuan Science and Technology Program (Grant No.2019YFG0190). The authors would like to thank the editor and the reviewers for their helpful suggestions and valuable comments.
