MLAF-CapsNet: Multi-lane atrous feature fusion capsule network with contrast limited adaptive histogram equalization for brain tumor classification from MRI images

Abstract

Convolutional neural networks (CNNs) for automatic classification and medical image diagnosis have recently displayed a remarkable performance. However, the CNNs fail to recognize original images rotated and oriented differently, limiting their performance. This paper presents a new capsule network (CapsNet) based framework known as the multi-lane atrous feature fusion capsule network (MLAF-CapsNet) for brain tumor type classification. The MLAF-CapsNet consists of atrous and CLAHE, where the atrous increases receptive fields and maintains spatial representation, whereas the CLAHE is used as a base layer that uses an improved adaptive histogram equalization (AHE) to enhance the input images. The proposed method is evaluated using whole-brain tumor and segmented tumor datasets. The efficiency performance of the two datasets is explored and compared. The experimental results of the MLAF-CapsNet show better accuracies (93.40% and 96.60%) and precisions (94.21% and 96.55%) in feature extraction based on the original images from the two datasets than the traditional CapsNet (78.93% and 97.30%). Based on the two datasets’ augmentation, the proposed method achieved the best accuracy (98.48% and 98.82%) and precisions (98.88% and 98.58%) in extracting features compared to the traditional CapsNet. Our results indicate that the proposed method can successfully improve brain tumor classification problems and support radiologists in medical diagnostics.

Keywords

Brain tumor classification capsule networks deep neural network atrous convolution dynamic routing

1 Introduction

Brain cancers are the third most common cancers among adolescents and young adults [1, 2]. According to [3], a brain tumor is the high cause of cancer-related death worldwide. According to the American Brain Tumor Association (ABTA) statistics published in 2018, based on brain tumor statistics, about 80,000 new cases of brain tumors are expected to be diagnosed every year [1]. Detecting the brain tumor types at the early stage is important for both physicians and patients. This helps to devise a treatment plan for the patient. There are different categories of a tumor–for example, Meningioma, Pituitary, Glioma, Glioblastoma, and Astrocytomas. Fig. 1 shows the typical example of brain tumor types. Determining the correct brain type in the early stages is a challenging task but tends to be critical since it aids physicians in having a precise treatment plan and better predict patients’ response to treatment. Medical image processing is used to detect brain cancer early, provide effective treatments, and increase patient’s survival rates [4]. On the other hand, the classification of tumor types that employs human inspection relies heavily on experienced radiologists [5, 6]. However, this method consumes time, prone to errors, and sometimes is cost-effective. Figure 2 shows the statistical features of percentage infection of the brain cancers.

Fig. 1

Scan showing the most three commonly occurring primary brain tumor types. (a) Menningioma (b) Glioma and (c) Pituitary. Best view in color.

Fig. 2

Statistics showing the affection rate of brain tumors types. In this work, we focused on the three most rated tumors i.e. (i) Menningioma (ii) Glioma and (iii) Pituitary.

In General, cancer tumor classification methods are composed of segmenting steps [7 –9]. These steps are performed to identify the exact tumor from the images correctly. Followed by this step is the extraction and selection of relevant features needed for the tumor classification X. Li et al., 2015 [10]. Havaei et al. 2017 [11] proposed work on brain tumor segmentation. This work has a two-way CNN, which considers both the pixel properties and probabilities of the neighboring pixels’ existence. One property of this work is that the moment segmentation is done for the tumor region, different or similar categories of features can be extracted and fed to the classification layer. Based on this, K. Usman et al., 2017; Y. Wei et al., 2017 [12, 13] adopted intensity and neighboring pixel as an input vector trained with a random forest classifier. In [14], J. Cheng et al. reviewed the effect of tumor augmentation. In their work, they showed that tumor augmentation could improve the classification accuracy of brain tumors. Furthermore, Gray Level Co-occurence Matrix (GLCM) and Gray Level Run Length Matrices (GLRLM) were used by [15] to effectively perform feature extraction of 18 features for classification of the tumor using Probabilistic Neural Networks (PNNs) [5]. However, the above-mentioned machine learning approaches on tumor classification have a considerable shortcoming. The drawback is the prior knowledge of the feature types that need to be extracted. This considerably reduces their generalization capability and can cause a reduction in performance.

Convolutional neural networks (CNNs) [16] have a broad learning limit that can construe the idea of an information image without earlier information. This makes CNNs an appropriate technique for image classification. Recently, the interest in adopting CNNs for brain tumor classification has increased [17]. The neural networks (NNs) and CNNs are combined with several pre-processing methods such as data augmentation. The results indicated that training CNNs without pre-processing could outperform other methods using axial MRI images. Although CNNs have successfully shown remarkable performance in image processing, they still have some drawbacks. For example, CNNs are invariant of translation. Thus, they fail to identify the position of an object to another. To achieve better generalization, CNN requires lots of data. Hence, CNNs achieve low accuracy when given small data.

Sabour et al. 2017 have proposed Capsule networks (CapsNets) [18] to overcome these problems. Each capsule within the network consists of several neurons. Each capsule’s activity vector consists of many pose parameters such as position, orientation, scaling, and skewness. CapsNet introduces routing by agreement to replace the pooling layer, where the lower-level capsules can predict the output of the higher-level capsules.

To improve segmentation and classification accuracy in vision tasks on x-ray, mammogram, and MR images, image enhancement has been widely explored, which has proven effective. MRI images were processed by combining logarithm (LoG) and contrast limited adaptive histogram equalization (CLAHE) filtering method for brain tumor segmentation [20, 21]. In their original state, obtaining real-time image sequences may not have a good viewing quality because of proper lighting loss. In order to overcome this problem, CLAHE was effectively utilized [22]. The method has been explored in many research areas such as medical images [23], segmentation of objects with ambiguous boundaries [24], retinal image enhancement [25], diagnosis of breast cancer [26, 27], underwater images enhancement [28], ultrasonic good logging image enhancement [29], finger vein image enhancement and pattern extraction [30], preserving brightness, contrast enhancement and mass segmentation of mammogram images [31] and improving the visual quality of fundus images [32].

In this paper, we capitalize on the achievability of the CapsNet architecture for brain tumor classification. For this reason, we present a multi-lane atrous feature fusion capsule network with contrast limited adaptive histogram equalization (CLAHE), which we named multi-lane atrous feature fusion CapsNet (MLAF-CapsNet).

In more detail, the contributions of our work are summarized as follows:

A new CapsNet-based architecture called MLAF-CapsNet is proposed for brain tumor classification.

For the first time, we use CLAHE as an enhancement layer and explore the CLAHE layer’s impact on the model’s performance. Again, MLAF-CapsNet adopts atrous for better spatial representation.

We performed extensive experiments on brain MRI and segmented tumor datasets, which are composed of four (4) and three (3) categories for the brain MRI and segmented tumor, respectively. We empirically show the effectiveness of the proposed MLAF-CapsNet produced a good performance and balanced computational complexity.

The paper is organized as follows: Section 2 introduces the methods of the proposed network. Section 3 presents the experiment details of MLAF-CapsNet. Section 4 presents the experimental results and discusses the relevant findings. Section 5 concludes the paper. Finally, the appendix presents additional experiments using augmented images.

2 Methods

Giving a set of n_train training Magnetic Resonance Images, a deep learning framework for the classification of tumors from MRI images into different classes, is proposed. Among the available types of brain tumors, we focus on classifying these 3 types of tumors, namely (i) Meningioma, (ii) Pituitary, and (iii) Glioma. In this paper, a deep learning architecture-based capsule network was implemented to classify the brain tumor type effectively. This section introduces the common building block of the proposed network for the task at hand.

2.1 Encoder method

In the encoder block, the images are down-sampled through the network with an MRI slice as input. The architecture consists of 3-block convolutional layers, for which each contains 3 levels of convolutional components. Each block layer’s first convolutional layer consists of an atrous convolution layer with a kernel size of 3×3 followed by a max-pooling layer. Each of the convolutional layers is followed by rectified linear unit (ReLU) activation function and batch normalization layer except the last 3 convolutional layers of the block 1 and 2 layers. The number of feature sizes from the convolutional layers are increased at each component, i.e., 64, 96, and 128. Processing the convolutional components, we obtained a feature size with 128 channels. This feature map is forward as input to a convolutional layer with a kernel size of 1×1 and an output channel of 512. This output feature map is achieved using the proposed multi-layer feature fusion method.

2.2 Multi-layer atrous feature fusion

2.2.1 Image enhancement (CLAHE)

The commonly adopted method in image enhancement often used is histogram equalization. This is due to its simplicity and low computation load. In this paper, the contrast-limited adaptive histogram equalization (CLAHE) was used to improve the color of the MRI image.

CLAHE is an advanced form of adaptive histogram equalization (AHE) for image enhancement, which works well for biomedical images like MRI and mammograms [25]. It improves the image’s quality by removing the noise and prevents high amplification of noise, resulting in the AHE technique. The method uses contrast amplification limiting each neighboring pixel’s procedure, and the transformation function is formed to reduce the noise problem.

CLAHE is used as a pre-processing method which results in occupying additional space on the storage resource. To overcome this, an enhancement CLAHE layer was implemented as a base before the atrous spatial feature pooling ASPF blocks. The layer receives input from the initial input layer, processes it, and sends the output to the ASPF block. Figure 6 shows the ASPF architecture, where the enhancement layer’s output is forwarded to the atrous convolutions.

2.2.2 Atrous convolution

CNNs have been used for classification, dense prediction, and semantic segmentation tasks [33 –40]. CNN layers followed by a pooling layer for image classification increasingly reduce the resolution to small feature maps, thereby losing the feature map’s spatial structure. The loss of spatial vision limits image classification accuracy and complicates the transfer of the model to downstream applications that require detailed understanding. To overcome this, atrous convolution was proposed [41 –44]. The atrous method uses a filter (i.e., holes) to increase the receptive fields.

The atrous or dilation convolution can be defined as

$(F *_{l} k) (p) = \sum_{s + lt = p} F (s) k (t)$ (1) where *l represents an atrous convolution. The familiar discrete convolution * denotes the 1-atrous convolution. When l = 1, it is standard convolution and on the other hand atrous convolution when l >1

Let F₀, F₁, . . . , F_n-1: $ℤ^{2}$ → $ℝ$ denotes discrete functions and k₀, k₁, . . . , k_n-2: Ω1 → $ℝ$ represent discrete 3×3 filters. Apply filters by exponentially increasing dilation:

$F_{i + 1} = F_{i * 2^{i}} k_{i} for i = 0, 1, . . ., n - 2 .$ (2)

The receptive field p is defined in F_i+1 as the set of element that modifies the value of F_i+1 (p). Considering the size of receptive field as the number of element, the size of the receptive field in F_i+1 increases exponentially and can be expressed as (2ⁱ⁺² - 1) × (2ⁱ⁺² - 1).

Figure 3 shows atrous convolution on a 2D image. The red dots represent the inputs to a kernel which is 3×3, while the yellow areas show the receptive field extracted from each input. To sum up, receptive fields are the implicit areas extracted from the upper layer’s early input.

Fig. 3

Illustrating the concept of dilated convolution. Dilation exponentially expands the receptive fields without any loss of coverage or resolution. (a) 1-dilated convolution (b) 2-dilated convolution and (c) 4-dilated convolution.

In a simple definition, atrous convolution is the application of convolution input data with a defined set of holes. With this definition, given N_images input as a 2D image, atrous rate k = 1 is the standard convolution, and k = 2 means skipping one pixel per input, and k = 4 means skipping 3 pixels. Figure 3 best show the same k values. We can see that the dilated convolution’s receptive field is larger than the standard convolution (Fig. 3(a)).

Our architecture was motivated by atrous convolution, which supports an exponential expansion of receptive fields without losing coverage or resolution. The atrous convolution maintains the resolution of the feature maps. Moreover, this method provides an enlargement of the field-view filters without maximizing the number of parameters.

2.2.3 Atrous spatial feature pooling (ASFP)

A deep neural network for classification tasks can be improved when the object scale is considered. One method is adopting ASPP [32], which is motivated by spatial pyramid pooling in R-CNN. The ASPP promotes the use of more parallel atrous convolutional layers to obtain feature size. The ASFP method used several parallel block atrous convolutional layers to obtain the required feature size. The feature maps obtained from the different block layers are fused (concatenated) to form one collective feature. The ASFP block layers consist of 3 × 3 atrous convolution layers (i.e., atrous rate of 2, 4, and 6) and 1 × 1 standard convolution layers.

2.2.4 Capsule network

In recent deep learning and visual tasks, CNNs have been widely used for feature extraction. However, the convolution operation in CNNs seems to be simple in solving complex problems [18]. For example, given CNN with a different image that has been rotated and different orientation directions, CNN fails to detect the original image. The orientation of the components and the relative relationship in space is not important to CNN. Therefore, CNN only cares about the presence of features.

The newly proposed method called CapsNet [18] is to alleviate the aforementioned challenges, to represent a sample of visual entities. Capsules are defined as collective neurons that indicate the activity vectors representing existing pose parameters. The length of the vector shows the existence of a specific entity. The drawbacks of CNNs are mostly related to the pooling layers. With this, as a result, capsule networks have successfully replaced pooling layers with appropriate criteria called "routing by agreement." Based on these criteria, the output from the layer below is sent to all parent capsules in the layers above; however, their coupling coefficients are different. Each capsule in the lower layer predicts the output of the parent capsules. If the prediction matches the parent capsule’s output, then the coupling coefficient for these two capsules is increased. Let u _i be the output of capsule i and its prediction from parent capsule j is expressed as $s_{j} = \sum_{i} c_{ij} {\hat{u}}_{j | i}$ (3)

Finally, a non-linear function is used to compress long vectors to a vector close to 1 and short vectors to a vector close to 0. This squash function prevents the output vectors from exceeding 1. Equation (6) shows the non-linear squash function. $v_{j} = \frac{| | s_{j} | |^{2}}{1 + | | s_{j} | |^{2}} \frac{s_{j}}{| | s_{j} | |},$ (4) where s_j in equation (6) is the input vector to the j^th capsule and v_j is the output vector. CapsNet adopts non-linearity squashing function on output vectors (v_j) in each iteration [19]. This shows the likelihood of the vector between 0 and 1, which means that it squashes small vectors and maintains long vectors in the unit length ${\begin{matrix} v_{j} \approx | | s_{j} | | s_{j} = 0 \\ v_{j} \approx \frac{| | s_{j} | |}{| | s_{j} | |} \end{matrix}$ (5)

The log probabilities are updated in the routing process based on the agreement between v_j for the fact that the agreement between two vectors will be increased and have a large inner product. Therefore, agreement a_ij for updating the log probability and coupling coefficient is defined as $a_{ij} = v_{j} {\hat{u}}_{j | i}$ (6)

Capsule k in the last layer is connected with a loss l_k. This puts a big loss value on capsules with long output instantiation parameters when the entity does not exist. The loss function l_k is expressed as follows. $\begin{matrix} [b] & l_{k} = T_{k} \max (0, m^{+} - | | v_{k} | |)^{2} + λ (1 - T_{k}) \\ \max (0, | | v_{k} | |, - m^{-})^{2} \end{matrix}$ (7) where T_k is 1 when class k is present, and is 0 otherwise. The m⁺, m^-, and λ are hyperparameters that are set before the learning process. Figure 4 indicates the dynamic routing procedure.

Fig. 4

Dynamic routing procedure. The variables u_i, w_ij, c_ij and v_j represent input capsule, weight matrix, the output capsule and the final output after squashing, respectively.

2.2.5 Proposed architecture

Figure 5 shows the proposed multi-lane atrous feature fusion CapsNet model. The model consists of contrast limited adaptive histogram equalization (CLAHE) layer, ASFP, capsule, and reconstruction layers. The ASPF is used to achieve a large receptive and extract the semantics in the sequence of features learned in the lower-level layers using a more atrous convolutional layer with different rates. The input feature from the CLAHE layer is processed with the first atrous convolution of the dilated rate of 2, 4, and 6 in each block lane. The feature map is forwarded to a convolution layer with a kernel size of 1×1. The output feature maps from the lane_1_Conv1 and lane_2_Conv1 are concatenated together and fused with lane_3_Conv1. The result is processed with lane_1_Conv2 and lane_2_Conv2 with 1×1 convolution. While the feature map of lane_3_Conv1 is separately sent to lane_3_Conv2

Fig. 5

Proposed MLAF-CapsNet architecture.

The output feature map of the ASFP in the encoder block (Conv4) is forward to the capsule layer in the decoder block. The capsule layers are composed of primary capsules (primaryCaps) and tumor capsules (tumorCaps). The primary capsule consists of convolution with a kernel size of 3×3, filter size 512, and stride of 2. The primaryCaps is made of 32 capsules; each of the capsules is an 8D with H×W feature map, which is the output of the ASFP. The H×W represents the height and width of the feature map. These capsules in the primaryCaps are forwarded to the tumorCaps layer using the dynamic routing algorithm procedure. The tumorCaps contain the number classes in 16D capsules, where each of the capsules receives variables from the primaryCaps layer. In this paper, we experimented on 3 datasets with 3 and 4 classes. The Softmax function in equation 4 for the coupling coefficient was not well distributed. Therefore, we used the Sigmoid function shown in equation 10. $cij = \frac{1}{1 + \exp (b_{ij})}$ (8)

The Sigmoid function assigns larger coupling coefficients to real features and transfers true features to the next capsule layers’ class.

The tumorCaps output is sent to the reconstruction layer to reconstruct the features obtained by the tumorCaps. The features are sent to the decoder layer in the capsule to decode the entity’s property. The decoder consists of 3 fully connected layers with 512, 1024, and 2304 neurons.

3 Experiments

3.1 Dataset

To evaluate our proposed method, we used the dataset presented in References [14, 45]. Table 1, columns 4 to 7 show the whole-brain tumor data and categories, including the total set of train and validation of the original and augmented images, respectively. The tumor dataset is provided as a set of slices and contains 3264 T1-weighted MRI images available at kaggle.com. The images are already split into training and validation folders. There are three types of tumors: meningioma (937 images), glioma (926 images), pituitary (901 images), and includes normal brain (500 images), which makes it 4 categories for this study. The images are taken in three planes: Axial, Coronal, and Sagittal. Figure 7(a) shows samples of the different tumor types, as well as different planes. The segmented tumor dataset is presented by [45 –48]. Columns 10 to 12 show the segmented tumor type, including the total train and validation of the original and augmented images. It consists of 3064 MRI images with three kinds of brain tumor: meningioma (708 images), glioma (1426 images), and pituitary tumor (930 images). Samples of the segmented tumor are shown in Fig. 7(b). Table 1 shows dataset information of whole-brain MRI and segmented tumor images. In Section 3.2, we described the data augmentation process.

Table 1
Data distribution

Original images Augmented Images

Tumor type Train set Validation set Train set Validation set

Whole-brain tumor dataset

Meningioma 822 115 4932 155

Glioma 826 100 4956 100

Pituitary 827 74 4962 74

Normal 395 105 2370 105

Total data 2870 394 17220 394

Segmented brain tumor dataset

Meningioma 534 174 3204 174

Glioma 1058 368 6348 368

Pituitary 708 222 4248 222

Total data 2300 764 13800 764}

	Original images	Augmented Images
Whole-brain tumor dataset
Meningioma	822	115	4932	155
Glioma	826	100	4956	100
Pituitary	827	74	4962	74
Normal	395	105	2370	105
Total data	2870	394	17220	394
Segmented brain tumor dataset
Meningioma	534	174	3204	174
Glioma	1058	368	6348	368
Pituitary	708	222	4248	222
Total data	2300	764	13800	764}

Fig. 6

ASFP of the proposed MLAF-CapsNet architecture.

Fig. 7

Typical brain tumor type. (a) Sample magnetic resonance imaging (MRI) of different categories of tumors in different planes. Example of the tumor type is given in each plane. (b) Sample of the segmented tumor type.

3.2 Image pre-processing and data augmentation

This section gives the augmentation procedure based on the general principle of data augmentation in [49]. The magnetic resonance images from the dataset consist of different sizes. We down-sampled and normalized the original images patch into 48 × 48 pixels. This reduced the trainable parameters, dimensionality, and computations. Data transformation was performed on the training data using augmentation in two ways. In the initial augmentation, the images were rotated in two angles, i.e., 90 degrees and 180 degrees. With the second transformation, images were flipped 4 times in their vertical and horizontal planes. After the data transformation, the whole-brain and segmented tumor images of the training data increased to 13800 and 17220 images. Table 1 shows detailed information of the individual original class set and their corresponding augmented set.

3.3 Implementation details

This paper’s experiments were implemented using a Windows system and with NVIDIA GeForce GTX 1060 6GB GPU. The codes take TensorFlow as the backend and are implemented through Keras and python (Anaconda). The network was trained for 300 and 200 epochs on the original and augmented images, respectively. The learning rate was set to 0.0001. The batch size on the original images was set to 16, whereas 32 for the augmented images. We used the Adam algorithm with momentum as the gradient optimizer. The momentum was set to 0.9, and the descent rate was set to 10^-6. The code used for this study is available at https://github.com/aduk4u/Multi-Lane-Atrous-Feature-Fusion-Capsule-Network-with-CLAHE, and it is a modification of the code at https://github.com/XifengGuo/CapsNet-Keras.

4 Results and discussion

Results of the proposed architecture trained using original whole-brain tumor images and segmented tumor images are presented in Tables 2 and 3, respectively, and visualized using the train, validation, and loss curves, confusion matrices, and area under the curve (AUC). The non-white columns represent the actual classes in the confusion matrices, and the non-white rows correspond to the output classes (predicted). The appendix section presents an additional experiment using augmented images of both the whole-brain tumor images and segmented tumor images.

Table 2
Comparison of accuracy on architectures trained and validated on the original whole-brain tumor dataset as input

Models Accuracy Average Average Average # Train

[%] Precision [%] Recall [%] F1-score [%] Param Time

Traditional CapsNet (Baseline) [18] 78.93 80.68 78.17 79.41 14M 93s

P. Afshar et al. [5] CapsNet 78.00 X X X X X

Vimal Kurup et al. [50] CapsNet 92.60 X X X X X

MLAF-CapsNet (with CLAHE) (Ours) 93.40 94.21 93.18 93.69 10M 47s

MLAF-CapsNet (No CLAHE) (Ours) 92.39 93.00 92.04 92.52 10M 47s

Models	Accuracy	Average	Average	Average	#	Train
Traditional CapsNet (Baseline) [18]	78.93	80.68	78.17	79.41	14M	93s
P. Afshar et al. [5] CapsNet	78.00	X	X	X	X	X
Vimal Kurup et al. [50] CapsNet	92.60	X	X	X	X	X
MLAF-CapsNet (with CLAHE) (Ours)	93.40	94.21	93.18	93.69	10M	47s
MLAF-CapsNet (No CLAHE) (Ours)	92.39	93.00	92.04	92.52	10M	47s

Table 3

Comparison of accuracy on architectures trained and validated on the original segmented tumor data as input

Models	Accuracy	Average	Average	Average	#	Train
	[%]	Precision [%]	Recall [%]	F1-Score [%]	Param	Time
Traditional CapsNet (Baseline) [18]	87.30	86.30	86.67	86.49	12M	59s
P. Afshar et al. [5] CapsNet	86.56	X	X	X	X	X
MLAF-CapsNet (with CLAHE) (Ours)	96.60	96.55	94.50	95.51	9.7M	35s
MLAF-CapsNet (No CLAHE) (Ours)	93.42	92.50	92.65	92.58	9.7M	35s

4.1 Model comparison with State-of-the-Art methods

In this section, model comparisons on image type: whole-brain tumor images and segmented tumor images are presented.

4.1.1 Original whole-brian image

Table 2 presents an accuracy comparison of architectures, trained and validated using the whole brain tumor data based on model accuracy, average precision, average recall, average F1-Score, model parameter (# param), and the train time evaluation metrics. The MLAF-CapsNet with CLAHE produced 93.40%, which shows 14.47%, 15.40%, 0.80% improvements over traditional CapsNet of 78.93%, Afshar et al. [5] 78.00% and Vimal et al. 92.92%, respectively. Also, the MLAF-CapsNet with CLAHE had an error rate of 6.60%, which is lower than the other models. The ablation study model, MLAF-CapsNet without CLAHE, had 92.39% higher than the traditional CapsNet and the model [5]. Figure 8 shows the training, validation, and loss curves comparison on the proposed MLAF-CapsNet architecture and the traditional CapsNet model. Through observation, we can see from confusion matrices shown in Fig. 9 that the proposed model produced the best results based on the accuracy, sensitivity, specificity metrics. The percentage of correctly classified images is shown on the diagonal. The column corresponds to specificity, whereas the last row represents the sensitivity. The overall accuracy is shown at the bottom right. Figure 10 presents the area under curve (AUC) comparison on the trained models.

Fig. 8

Training, validation accuracy and loss curve on whole-brain image data. (a) Training and validation accuracy for MLAF-CapsNet with CLAHE, MLAF-CapsNet without CLAHE and the Traditional CapsNet and (b) Training and validation Loss for MLAF-CapsNet with CLAHE, MLAF-CapsNet without CLAHE and the Traditional CapsNet. Best view in color.

Fig. 9

Confusion matrices on the whole-brain image.

Fig. 10

Visualization of area under curve (auc). (a) MLAF-CapsNet with CLAHE (b) MLAF-CapsNet without CLAHE and (c) Traditional CapsNet.

4.1.2 Original segmented image

Table 3 presents an accuracy comparison of architectures, trained and validated using the segmented tumor data based on model accuracy, average precision, average recall, average F1-Score, model parameter (# param), and the train time evaluation metrics. The proposed MLAF-CapsNet with CLAHE obtained the best result of 96.60% with 9.30% and 10.04% improvement over the traditional CapsNet and [5], respectively. An ablation study was performed to evaluate the effect of the CLAHE in the MLAF-CapsNet model, where the CLAHE layer was removed. With this, the ASPF lanes receive input directly from the input layer without intermediate image enhancement. This method achieved 93.42% accuracy, which is 5.92% and 6.86% improvement over the traditional CapsNet and previous work by [5]. Figure 11 shows the comparison of train, validation, and loss curves for the proposed MLAF-CapsNet with CLAHE and the traditional CapsNet architecture. In the confusion matrices are shown in Fig. 12, MLAF-CapsNet with CLAHE achieves the best classification with the least error rate of 3.66%. The percentage of the correctly classified images is presented on the diagonal. The last column represents the specificity, whiles the last row represents the sensitivity. The total accuracy is shown at the bottom right. The traditional CapsNet produced a high error rate of 12.70%, false positives and false negatives compared with the (a) MLAF-CapsNet with CLAHE and (b) MLAF-CapsNet with no CLAHE. In Table 3, average precision, recall, and F1-score are shown to alleviate the tumor classes’ imbalance in the database. Figure 13 shows the comparison of the area under the curve (AUC) of the trained models, which the proposed MLAF-CapsNet with CLAHE proved promising.

Fig. 11

Training, validation accuracy and loss curve on segmented tumor data. (a) Training and validation accuracy for MLAF-CapsNet with CLAHE, MLAF-CapsNet without CLAHE and the Traditional CapsNet and (b) Training and validation Loss for MLAF with CLAHE, MLAF without CLAHE and the Traditional CapsNet. Best view in color.

Fig. 12

Confusion matrices on the segmented tumor.

Fig. 13

Visualization of area under curve (auc). (a) MLAF-CapsNet with CLAHE (b) MLAF-CapsNet without CLAHE and (c) Traditional CapsNet.

4.2 Whole-brain image verse segmented tumor

A comparison of the proposed model based on CapsNet is performed for both whole-brain tumor and segmented tumor images. Figure 14 compares the results of architecture using whole-brain tumors and segmented tumor images as input. Generally, CapsNet model focuses on everything in the input image, including the background, and because the whole-brain MRI images were taken from several different angles such as the Axial, Coronal, and Sagittal, the image background has more variations. Therefore, the ability of the CapsNet to handle whole-brain images is minimal compared to segmented tumor images. In contrast, the models in Fig. 14 based on CapsNet achieved state-of-the-art result for the brain tumor classification.

Fig. 14

Comparison of model accuracy for brain and segmented tumor images.

4.3 Impact of CLAHE

Image enhancement using the CLAHE method provided a well-distributed histogram for each pixel in the image. Furthermore, the tumor was well-segmented, and the tumor can be seen. Figure 15 shows an example of the enhancement using the CLAHE method. The performance of the CLAHE as a base layer proves that using CLAHE as a layer can provide better feature information to improve network accuracy.

Fig. 15

Enhanced image using CLAHE.

4.4 Impact of atrous

Better contextual features improve classification accuracy in deep learning. Figure 16 compares the atrous convolution’s activation maps in the (a) MLAF-CapsNet architecture and (b) the traditional CapsNet. It shows the output feature maps when the input is processed with MLAF-CapsNet and the traditional CapsNet models.

Fig. 16

Comparison of convolution feature maps. (a) Feature maps of the atrous convolution from MLAF-CapsNet and (b) Feature map from convolutional layer from the traditional CapsNet.

Figure 16 shows that in the MLAF-CapsNet, the atrous convolution layers’ feature maps have a higher convolution level than the convolution layer in the traditional CapsNet. The resolution and contextual features of the (a) atrous convolutions are much enhanced, and tumors are well seen than the (b) traditional CapsNet, where the tumors are lost in the feature maps. In the feature maps of the traditional CapsNet, Fig. 16(b), the medium resolution can be too low to produce a nice classification result. On the other hand, in Fig. 16(a), the atrous convolutions maintain more details of the tumor to achieve better classification accuracy.

The results shown in Tables 2 and 3 show that the atrous convolution model’s training time is reduced averagely under the same experimental environment method. With the increase in training rounds, the training accuracy of both the MLAF-CapsNet model and the traditional CapsNet model increases, and the training accuracy of the MLAF-CapsNet model is always higher than that of the traditional CapsNet model. Figure 17 presents a sample of reconstructed images from the MLAF-CapsNet. It can be seen that the tumors in the reconstructed brain image (Fig. 17(b)) maintain more detailed information and very clear.

Fig. 17

(a) Reconstructed image from the MLAF-CapsNet and (b) Reconstructed images. Red highlighted rectangle indicates tumor.

5 Conclusion

This paper proposed a new capsule network-based architecture called multi-lane atrous feature fusion capsule network (MLAF-CapsNet) for brain tumor classification. The MLAF-CapsNet introduces contrast limited adaptive histogram equalization (CLAHE) as a base layer to enhance the input image before forwarding it to the high-level convolutional layer. The atrous convolutional layers were used with the capsule network to integrate the tumor image’s contextual information. We capitalized on the advantage of atrous convolution layers to increase the receptive fields without losing resolution. The effectiveness of the MLAF-CapsNet demonstrated high performance (93.40% and 96.60%) on whole-brain image and segmented tumor datasets with low error rate than the traditional CapsNet (78.93% and 87.30%). Similarly, on the augmented images, MLAF-CapsNet produced the best result (98.48% and 98.82%) on whole-brain image and segmented tumor datasets than traditional CapsNet (76.90% and 82.46%) and other proposed works. Furthermore, based on our evaluations and experimentation, the models can produce a better result on the segmented tumor images than the whole brain images. Our proposed method’s results show that our MLAF-CapsNet model can generalize with images and have efficient execution speed. This, therefore, indicates that our method can be adopted as a supporting tool for radiologists.

Footnotes

Acknowledgment

This work was supported by National Natural Science Foundation of China (NSFC Grant No. 61550110248) and Sichuan Science and Technology Program (Grant No.2019YFG0190). The authors would like to thank the editor and the reviewers for their helpful suggestions and valuable comments.

Conflict of interest: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability: The brain tumor data that support the findings of this study are openly available at

Appendix

References

B.T. Statistics, American Brain Tumor Association. January (2017).

Key Statistics for Brain and Spinal Cord Tumors, American Cancer Society January 4, (2018).

Siegel

R.L.

, Miller

K.D.

and Jemal

, Cancer Statistics, A Cancer Journal for Clinicians 67(1) (2017), 7–30.

Zhang

, Li

, Peng

and Wang

, Improve Glioblastoma Multiforme Prognosis Prediction by Using Feature Selection and Multiple Kernel Learning, IEEE/ACM Transactions on Computational Biology and Bioinformatics 13(5) (2016), 825–835.

Afshar

, Mohammadi

and Plataniotis

K.N.

, Brain Tumor Type Classification via Capsule Network, Proceedings – International Conference on Image Processing, ICIP, (2018), 3129–3133.

Ravi

, Fabelo

, Callic

G.M.

and Yang

G.Z.

, Manifold Embedding and Semantic Segmentation for Intraoperative Guidance With Hyperspectral Brain Imaging, IEEE Transactions on Medical Imaging 36(9) (2017), 1845–1857.

Marsousi

, Plataniotis

K.N.

and Stergiopoulos

, An Automated Approach for Kidney Segmentation in Three Dimensional Ultrasound Images, IEEE Journal of Biomedical and Health Informatics 21(4) (2017), 1079–1094.

Sanjun

, Price

C.J.

, Mancini

, Josse

, Grogan

, Yamamoto

A.K.

, Geva

, Leff

A.P.

, Yousry

T.A.

and Seghier

M.L.

, Automated Identification of Brain Tumors from Single MR Images based on Segmentation with Refined Patient-Specific Priors, Frontiers in Neuroscience (2013).

Roy

, Nag

and Maitra

I.K.

and Prof. Bandyopadhyay

S.K.

, A Review on Automated Brain Tumor Detection and Segmentation from MRI of Brain, CoRR., (2013).

10.

and Plataniotis

K.N.

, Color Model Comparative Analysis for Breast Cancer Diagnosis using H and E Stained images, Proceedings of Spie (2015).

11.

Havaei

, Davy

, Warde-Farley

, Biard

, Courville

, Bengio

, Palc

, Jodoin

and Larochelle

, rain Tumor Segmentation with Deep Neural Networks, Medical Image Analysis 35 (2017), 18–31.

12.

Usman

and Rajpoot

, Brain Tumor Classification from Multi-Modality MRI using Wavelets and Machine Learning, Pattern Analysis and Applications 20(3) (2017), 871–881.

13.

Wei

, Feng

, Liang

, Cheng

M.-M.

, Zhao

and Yan

, Object region mining with adversarial erasing: A simple classification to semantic segmentation approach, In IEEE CVPR, (2017).

14.

Cheng

, Huang

, Cao

, Yang

, Yun

, Wang

and Feng

, Enhanced Performance of Brain Tumor Classification via Tumor Region Augmentation and Partition, PloS One (2015).

15.

Abbadi

N.K.E.

and Kadhim

N.E.

, Brain Cancer Classification Based on Features and Artificial Neural Network, International Journal of Advanced Research in Computer and Communication Engineering 8(1) (2017).

16.

Krizhevsky

, Sutskever

, Hinton

G.E.

, ImageNet classification with deep convolutional neural networks, in Proc. Adv. Neural Inf. Process. Syst. (NIPS), (2012), 1097–1105.

17.

Paul

J.S.

, Plassard

A.J.

, Landman

B.A.

and Fabbri

, Deep learning for brain tumor classification,Art. no, Proc. SPIE, Med. Imag., Biomed. Appl. Mol., Struct., Funct Imag 10137 (2017), 1013710–10.1117/12.2254195.

18.

Sabour

, Frosst

, Hinton

G.E.

, Dynamic Routing Between Capsules, 31st Conference on Neural Information Processing Systems (NIPS 2017), (2017).

19.

Atefeh

, Parnian

, Konstantinos

and Arash

N.M.

, Improved explainability of capsule networks: Relevance path by agreement, 2018 IEEE Global Conference on Signal and Information Processing, GlobalSIP 2018 - Proceedings (2019), 549–553.

20.

Thillaikkarasi

and Saravanan

, An Enhancement of Deep Learning Algorithm for Brain Tumor Segmentation Using Kernel Based CNN with M-SVM, (2019), 1–7.

21.

Reza

A.L.I.M.

, Realization of the Contrast Limited Adaptive Histogram Equalization (CLAHE) for Real-Time Image Enhancement, (2004), 35–44.

22.

Bhat

and T. P. M. S, Adaptive Clip Limit for Contrast Limited Adaptive Histogram Equalization (CLAHE) Of Medical Images Using Least Mean Square Algorithm, 978 (2014), 1259–1263.

23.

Aykut

, An Improvement on Grab Cut with CLAHE for the Segmentation of the Objects, Springer International Publishing (2018).

24.

Setiawan

A.W.

, Mengko

T.R.

, Santoso

O.S.

and Suksmono

A.B.

, Color Retinal Image Enhancement using CLAHE, 5–7.

25.

Kharel

, Alsadoon

, Prasad

P.W.C.

and Elchouemi

, Early Diagnosis of Breast Cancer Using Contrast Limited Adaptive Histogram Equalization ( CLAHE) and Morphology Methods, (2017), 120–124.

26.

Kaur

, Singh

and Vig

, Medical Fusion of CLAHE Images Using SWT and PCA for Brain Disease Analysis, Springer Singapore

27.

Mishra

, Enhancement of Underwater Images using Improved CLAHE, 5 (2007), 1–2.

28.

, Celenk

and Wu

, An improved algorithm based on CLAHE for ultrasonic well logging image enhancement, Cluster Comput 22(5) (2019), 12609–12618. doi: 10.1007/s10586-017-1692-8

29.

Ganesan

and Rajendran

A.J.

, An Efficient Finger Vein Image Enhancement and Pattern Extraction Using CLAHE and Repeated Line Tracking Algorithm, Springer International Publishing (2020).

30.

Gupta

and Tiwari

, A tool supported approach for brightness preserving contrast enhancement and mass segmentation of mammogram images using histogram modified grey relational analysis, Multidimens Syst Signal Process 28(4) (2017), 1549–1567. doi: 10.1007/s11045-016-0432-1.

31.

Wahid

F.F.

, Sugandhi

and Raju

, Two Stage Histogram Enhancement Schemes to ImproveVisualQuality of Fundus Images. Springer Singapore, (2018).

32.

Chen

L.C.

, Zhu

, Papandreou

, Schroff

and Adam

, Encoder-decoder with atrous separable convolution for semantic image segmentation, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018), doi:10.1007/978-3-030-01234-2-49

33.

Long

, Shelhamer

and Darrell

, Fully Convolutional Networks for Semantic Segmentation, IEEE Trans Pattern Anal Mach Intell (2015). doi:10.1109/TPAMI.2016.2572683

34.

Ghiasi

and Fowlkes

C.C.

, Laplacian pyramid reconstruction and refinement for semantic segmentation, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016). doi:10.1007/978-3-319-46487-9-32

35.

Lin

, Milan

, Shen

and Reid

, RefineNet: Multipath refinement networks for high-resolution semantic segmentation, in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, (2017). doi:10.1109/CVPR.2017.549

36.

Pohlen

, Hermans

, Mathias

and Leibe

, Fullresolution residual networks for semantic segmentation in street scenes, in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, (2017), doi:10.1109/CVPR.2017.353

37.

Peng

, Zhang

, Yu

, Luo

and Sun

, Large kernel matters – Improve semantic segmentation by global convolutional network, in Proceedings – 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, (2017), doi:10.1109/CVPR.2017.189.

38.

Badrinarayanan

, Kendall

and Cipolla

, SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation, IEEE Trans Pattern Anal Mach Intell (2017), doi:10.1109/TPAMI.2016.2644615

39.

Ronneberger

, Fischer

and Brox

, U-net: Convolutional networks for biomedical image segmentation, in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), (2015). doi:10.1007/978-3-319-24574-4-28

40.

Zhao

, Shi

, Qi

, Wang

and Jia

, Pyramid scene parsing network, in Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, (2017). doi:10.1109/CVPR.2017.660

41.

Holschneider

, Kronland-Martinet

, Morlet

and Tchamitchian

, A Real-Time Algorithm for Signal Analysis with the Help of the Wavelet Transform, (1989).

42.

Giusti

, Cireşan

D.C.

, Masci

, Gambardella

L.M.

and Schmidhuber

, Fast image scanning with deep max-pooling convolutional neural networks, in 2013 IEEE International Conference on Image Processing, ICIP 2013 - Proceedings, (2013), doi:10.1109/ICIP.2013.6738831

43.

Sermanet

, Eigen

, Zhang

, Mathieu

, Fergus

and LeCun

, Overfeat: Integrated recognition, localization and detection using convolutional networks, in 2nd International Conference on Learning Representations, ICLR 2014 – Conference Track Proceedings, (2014).

44.

Papandreou

, Kokkinos

and Savalle

P.A.

, Modeling local and global deformations in Deep Learning: Epitomic convolution, Multiple Instance Learning, and sliding window detection, in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (2015), doi:10.1109/CVPR.2015.7298636

45.

Cheng

, Yang

, Huang

, Jiang

, Zhou

, Yang

, Zhao

, Feng

and Chen

, Retrieval of Brain Tumors by Adaptive Spatial Pooling and Fisher Vector Representation, PloS One (2016).

46.

and Koltun

, Multi-scale context aggregation by dilated convolutions, In ICLR, (2016).

47.

Chen

L.C.

, Papandreou

, Kokkinos

, Murphy

and Yuille

A.L.

, Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully-connected crfs, IEEE Transactions on Pattern Analysis and Machine Intelligence PP(99) (2017), 1–1.

48.

Chen

L.-C.

, Papandreou

, Schroff

F.N.

and Adam

, Rethinking atrous convolution for semantic image segmentation, CoRR,] abs/1706.05587, (2017).

49.

Wong

S.C.

, Gatt

, Stamatescu

and McDonnell

M.D.

, Understanding data augmentation for classification: When to warp? (2016), arXiv:1609.08764. [Online]. Available:

50.

Vimal Kurup

, Sowmya

and Soman

K.P.

, Effect of Data Pre-processing on Brain Tumor Classification Using Cap sulenet, in ICICCT 2019 – System Reliability, Quality Control, Safety, Maintenance and Management, (2020).

51.

Sultan

H.H.

, Salem

N.M.

and Al-Atabany

, Multi-Classification of Brain Tumor Images Using Deep Neural Network, IEEE Access 7 (2019), 69215–69225. doi: 10.1109/ACCESS.2019.2919122.

52.

Badža

M.M.

and Barjaktarović

M.C.

, Classification of brain tumors from MRI images using a convolutional neural network, Appl Sci (2020). doi:10.3390/app10061999

	Original images		Augmented Images
Tumor type	Train set	Validation set	Train set	Validation set
Whole-brain tumor dataset
Meningioma	822	115	4932	155
Glioma	826	100	4956	100
Pituitary	827	74	4962	74
Normal	395	105	2370	105
Total data	2870	394	17220	394
Segmented brain tumor dataset
Meningioma	534	174	3204	174
Glioma	1058	368	6348	368
Pituitary	708	222	4248	222
Total data	2300	764	13800	764}