Abstract
The chest X-ray examination is one of the most important methods for screening and diagnosing of many lung diseases. Diagnosis of pneumonia by chest X-ray is one of the common methods used by medical experts. However, the image quality of chest X-Ray has some defects, such as low contrast, overlapping organs and blurred boundary, which seriously affects detecting pneumonia in chest X-rays. Therefore, it has important medical value and application significance to construct a stable and accurate automatic detection model of pneumonia through a large number of chest X-ray images. In this paper, we propose a novel hybrid system for detecting pneumonia from chest X-Ray image: ACNN-RF, which is an adaptive median filter Convolutional Neural Network (CNN) recognition model based on Random forest (RF). Firstly, the improved adaptive median filtering is employed to remove noise in the chest X-ray image, which makes the image more easily recognized. Secondly, we establish the CNN architecture based on Dropout to extract deep activation features from each chest X-ray image. Finally, we employ the RF classifier based on GridSearchCV class as a classifier for deep activation features in CNN model. It not only avoids the phenomenon of over-fitting in data training, but also improves the accuracy of image classification. During our experiment, the public chest X-ray image dataset used in the experiment contains 5863 images, which comprises 4265 frontal-view X-ray images of 1574 unique patients. The average recognition rate of pneumonia is up to 97% by the proposed ACNN-RF. The experimental results show that the ACNN-RF identification system is more effective than the previous traditional image identification system.
Introduction
Medical imaging is one of the most important information in the diagnosis of diseases. Since German physicist Roentgen discovered X-ray in 1895 [1], various imaging technologies have developed rapidly. For example, magnetic resonance imaging (MRI) [2], ultrasound imaging (UItrasound) [3], computed tomography (CT) [4] and positron emission computed tomography (PET) [5] and so on. Although these advanced imaging techniques have been widely studied and applied, the traditional X-ray diagnosis is still one of the important diagnostic criteria for radiology. The chest contains many tissues, which can provide various information of human body, such as the diagnosis of lung disease, cardiopulmonary coefficient, rib fracture and injury, etc. which can be identified and determined by chest X-ray film. The diagnosis of chest X-ray radiographs takes up more than 40% of all radiographic diagnoses with its low price and weak radiation dose [6], which perfectly reflects the important application value of chest X-ray radiography in the medical field.
Pneumonia is a disease caused by pulmonary infection. Pneumonia can be divided into lobar pneumonia, interstitial pneumonia and bronchopneumonia according to pathological morphology. Different kinds of pneumonia need different treatment schemes. Meanwhile, Chest X-ray (shown in Fig. 1) is the most effective method for diagnosis of pneumonia, which plays a vital role in clinical nursing and epidemiological research. [7]. However, some chest X-ray images are very similar. Doctors have to judge according to lung texture, patchy shadow, shadow density, inflammation site and so on. Even experienced doctors are prone to misjudgment and misdiagnosis. Therefore, the identification of pneumonia in chest X-ray image is a challenging task that depends on the personal experience of doctors.

Illustrative examples of chest X-ray. The normal chest X-ray (left) shows clear lungs without any abnormal opacity, whereas the ‘fog’ likes discoloration in the pneumonia positive chest X-ray image.
With the increasing popularity of computers and the development of imaging technology, people realize that the use of computers can effectively help doctors diagnose diseases. The appearance of pneumonia in X-ray images is often low contrast, overlapping organs and blurred boundary, so detecting pneumonia in chest radiography can be difficult for radiologists [8]. In tradition Pneumonia identification systems (as shown in Fig. 2) consist of the following stages: data pre-processing (such as de-noising, color correction processes, etc.), image segmentation, feature extraction, feature selection and classification. Graph segmentation stage affects the accuracy of the whole recognition system [9]. Feature extraction stage extracts different features (such as morphologic, color, texture, etc.). Classification stage is often used Machine learning (such as SVM, KNN, etc.). However, the imbalance of large amount of data affects the accuracy of recognition system. The graph segmentation errors (such as over segmentation, under segmentation, etc.) have a negative effect on the accuracy of recognition system. Complex features are difficult to extract or take a long time. Meanwhile, the tradition identification systems used multi-step classification, which increase the system complexity.

A traditional chest X-ray image identification system.
To overcome the above reasons, the method of deep learning (DL) [10–12] is widely used in pneumonia recognition system. It does not need either graph segmentation stage or manual feature extraction stage, because the deep features are extracted by the Convolutional Neural Network (CNN) [13, 14]. In recent years, many researchers have used DL technology to detect pneumonia in chest X-rays. Yao et al. [15] exploited using Welch power spectrum estimation and wavelet transform to obtain statistical values of lung sound signals, and then using BP neural network to identify, but this method tends to produce blocky shadows in the smooth area of the image. Lakhani et al. [16] used convolutional neural networks to automatically classify tuberculosis. Rajpurkar et al. [17] developed the CheXNet algorithm that can detect pneumonia from chest X-rays that exceed the level of practicing radiologists. Wang et al. [18] released ChestX-ray8, an order of magnitude larger than previous datasets of its kind, and also benchmarked different convolutional neural network architectures pre-trained on ImageNet. Nevertheless, CNN is a learning method based on the principle of empirical risk minimization. When the training sample is large enough, it can get a good learning effect. But when the training sample is not large enough, it usually leads to over-fitting. Therefore, this paper combines deep learning with machine learning, so that the advantages of both complement each other, and the final hybrid model has better performance than a single classifier.
In this paper, we proposed the hybrid ACNN-RF pneumonia recognition model which used the CNN based on improved adaptive median filtering (as shown in Section 2.2.2) to extract high-level features from the chest X-ray image. Meanwhile, through different node retention probability p of dropout layer, a relatively suitable CNN is selected, which can effectively reduce the over-fitting of CNN. In this novel hybrid identification system, the improved adaptive median filtering has good de-drying effect, high efficiency, and small calculation, which is helpful for the extraction of deep features of CNN, and then the extracted image features are used to train an RF classifier based on GridSearchCV class (as shown in Section 2.2.3). This identification system not only considers the local and global information of the image features, but also makes full use of the advantages of the RF classification algorithm to solve the over-fitting problems and shorten the running time of the program.
In summary, we evaluated the performance of pre-trained CNN based on adaptive median filtering as feature extractors toward classifying the normal and pneumonia images. The major contributions of this paper are as follows. a) We constructed a CNN feature extraction model based on improved adaptive median filtering, which can extract deep features from the chest X-ray image and effectively reduce the run time of program. b) A dropout layer is added behind each pooling layer in the CNN model, and a relatively suitable CNN is selected through different node retention probability p of the dropout layer, thus effectively reducing the risk of over-fitting of the CNN model. c) Investigating the novel hybrid identification system: ACNN-RF. The design of RF classifier based on GridSearchCV class and the parallel implementation of CNN based on GPU not only avoid over-fitting but also solve the non-linear problem, which greatly shortens the running time of classification model.
Data processing
In this paper, chest X-ray images (anterior-posterior) were selected from retrospective cohorts of pediatric patients of one to five years old from Guangzhou Women and Children’s Medical Center, Guangzhou. All the chest X-ray images ware performed as part of patients’ routine clinical care. Meanwhile, these chest X-ray images have been desensitized. For the analysis of chest X-ray images, all chest radiographs were initially screened for quality control by removing all low quality or unreadable scans. The diagnoses for the images were then classified by two expert physicians. The dataset contains subfolders for each image category (Pneumonia/Normal) [19]. There are 5,863 X-ray images (JPEG) and 2 categories (Pneumonia/Normal). Initialize the data and subtract the images that do not meet the input criteria. The number of pneumonia samples is 4265, and the number of Normal samples was 1574. The experimental data distribution is shown in Fig. 3.

Experimental data distribution (Normal represents a normal chest X-ray image; Pneumonia represents a chest X-ray image of pneumonia).
The dataset divided into training sets and test sets in this paper. Figure 3 shows that the data is biased to pneumonia positive results. Since the samples in the original dataset are not balanced, our select training set and test set according to the distribution of positive and negative samples in the dataset. The final the positive and negative samples have similar distribution in the test set and training set (as shown in Table 1), which effectively solves the problem of data imbalance.
Distribution of training set and test set
To ensure gradients do not diverge, and the convergence of the training network is accelerated, the image data is unified and normalized, pixel-type data whose data itself is 0-255 is converted to 0-1. Finally, 70% were randomly selected from the data set as the training set and 30% as the test set. In order to ensure the consistency of dataset and reduce computing time, the size of input image data ware re-sampled to 50×50, 64×64, 80×80, 100×100, 224×224. After convolution layer, pooling layer and full connection layer, the output feature size is 256. The format of the initialization data is shown in Table 2:
Data initialization
During our experiments, the data of chest X-ray images dataset are trained and tested on the hybrid ACNN-RF classification model. The algorithms of Pneumonia detection ware implemented using Python3.5 as development language. The system platform containing Intel® Xeon® E5-2620 V3 with eight 16GB DDR4 memory (128GB RAM). The GPU computation used the nVidia Tesla K80 with 128GB RAM.
In this paper, our proposed a novel hybrid system for detecting pneumonia from chest X-Ray image: ACNN-RF, which was designed to integrate the improved adaptive median filtering, RF based on GridSearchCV class and CNN classifiers based on Dropout. We will first briefly introduce the improved adaptive median filtering theory in Section 2.2.1, RF based on GridSearchCV class theory in Section 2.2.2 and the CNN based on Dropout structure in Section 2.2.3. Then, the novel hybrid ACNN-RF model will be presented in Section 2.2.4. Finally, an analysis of ACNN-RF model merits at the end of this section.
Evaluation metrics of model performance
The performance of traditional image identification systems is evaluated employing several evaluation parameters such as accuracy (Acc), precision rate (P), recall rate (R), F measure (F) and so on [20]. Aiming at the CNN-RF classification model, its performance is evaluated using similar metrics in our experiment. The ACNN-RF classification model accuracy is obtained based on 5-folds cross-validation method. Meanwhile, we used the confusion matrix to visualize the performance of classification model, where contains True positive (TP), True negative (TN), False positive (FP) and False negative (FN) [21]. The average of several evaluation parameters are used to evaluate the classification model performance, such as the F β is the harmonic average of P and R. The evaluation parameters are defined as follows:
Adaptive median filtering [22] changes the size of the noise detection window to detect the noise of the pixels in the neighborhood. For the window where the noise points are detected, the median pixels are used as the output value of the filter, and for the window without noise points, the original value is output. The fundamental purpose of modifying the training set is to remove the noise of the image and make the image easier to distinguish. Compared with linear filtering, nonlinear filtering can remove image noise to the greatest extent and make the image clear and realistic. However, the local extremum point is regarded as a noise point and the window filter which identifies the noise point is easy to mistake the image edge point as a noise point. The mean filter can smoothen the image noise better. Therefore, combined with the mean filter, the adaptive median filter can remove noise while preserving the details of the original image.
Definition symbols: S xy denotes area covered by the filter window, the center of which is the pixel of column X in row Y of the image; the smallest gray value in S xy f ij represents the gray value of pixel (i, j) in S xy , fmax, fmin are the maximum and minimum gray values, respectively; f med represents the gray value median; Smax represents the maximum preset window.
In this paper, a modified mean filter [23] is selected to remove the pixels of 0 and 225 in g (s, t) in the neighborhood of S
xy
. g (s, t) is an m × n image contaminated by noise. Assuming that there are p pixels with gray value of 0 in g (s, t) and q pixels with gray value of 255 in g (s, t), the modified mean filter is expressed as follows:
Compared with traditional median filtering, adaptive median filtering tries to keep the details of the image while smoothing non-impulse noise [24]. The main principle of adaptive median filtering is to limit the working area to a rectangular area S
xy
centered at point (x, y). The size of S
xy
is adjusted according to the conditions in the filtering, and the output value of the pixel is replaced by the value of the filter. The mathematical expression of median filtering is defined as follows.
Among them, k is the number of pixels in the filter window; n is the ranked gray value sequence; x is the abscissa of the pixels; y is the ordinate of the pixels.
The method of improved adaptive median filter is as follows:
Assume that the filter window of pixel (i, j) is S ij ; Smax represents the maximum preset window; f ij represents the gray value, fmax, fmin are the maximum and minimum gray values, respectively; f med represents the gray value median.
(1) Monitor the noise of image pixel values. Calculation the value of A1 = f med - fmin and A2 = f med - fmax.
(2) Determine the size of the filter window based on the noise value. Initial window size is 3 × 3. If A1 > 0 and A2 < 0, then continue the operation (Step 3 is executed directly for judgment.). Otherwise, increase the size of the window (S ij + 2) until S ij > Smax, the output value of the pixel is represented as f ij (f med is centered on f ij , f ij is neighbored by Smax, and then the result of modified mean filter is used. As shown Formula 1), it shows that there are noise points in the window.
(3) Filter image noise. Calculation the value of B1 = f ij - fmin and B2 = f ij - fmax. If B1 > 0 and B2 < 0, then the value of the pixel is f ij = f ij , it shows that there are not noise points in the window. Otherwise, the output value of the pixel is represented as f ij = f med .
In the use of the algorithm, if fmin < f med < fmax, then f med is not impulse noise. Next, it is judged whether f ij is impulse noise. If neither f ij nor fmax is impulse noise, then f ij is preferentially used as the filter output value. The flow chart of the improved adaptive median filtering part of the algorithm is shown Fig. 4:
Compared with other filtering algorithms, the improved adaptive median filtering algorithm has better performance. For different noises, improved adaptive median filtering completely solves the contradiction between denoising and protecting unpolluted pixels in other filtering methods. The improved adaptive median filtering algorithm combines good denoising ability with less algorithm running time, which effectively improves the operating efficiency.

Flow chart of improved adaptive median filter.
In this paper, In order to observe the filtering effect of the improved adaptive median filtering algorithm, the experiment of the two-dimensional gray scale image (Normal/Pneumonia) of X-Ray Images is carried out, taking the noise with intensity of 0.4 as an example. The experimental results are shown in Fig. 5.

The chest X-Ray images filtering effect. The results of denoising with improved adaptive median filter for the chest X-ray image are the best, whereas the median filter has the worst filtering effect.
It can be seen from the Fig. 5 that the filtering effect of the improved adaptive median filtering algorithm is obviously better than that of other algorithms. The image becomes clearer after drying by improved adaptive median filter algorithm. But the median filter algorithm has the worst filtering effect.
We analyzed the performance of different filter algorithms through specific data analysis. The results of comparison between improved adaptive median filtering and median filtering are shown in Tables 3 and 4.
From Fig. 5, we can see that the filtering effect of different algorithms is different only in vision, and it is easy to see that the improved adaptive median filtering algorithm is obviously superior to median filtering algorithm. In Tables 3 and 4, we can analyze the performance of different algorithms through specific data. With the increase of noise intensity, the peak signal-to-noise ratio (PSNR) of image decreases continuously. The table shows that the filtering effect of the traditional median filter is obviously worse than that of the improved adaptive termination filter. Although there are some other filtering algorithms which are better than median filtering algorithms, in order to maintain the advantages of convolutional neural network without complex pretreatment, the advantages of improved adaptive median filtering such as simplicity, rapidity and good effect are very important.
X-ray images (normal) peak signal-to-noise ratio comparison table in different filtering algorithms
X-ray images (pneumonia) peak signal-to-noise ratio comparison table in different filtering algorithms
RF (random forest) is a combination method based on decision tree classifier, which can be used for classification and regression [25]. Basic Principle: For an unknown sample data, each decision tree in the RF will classify the input unknown samples and make a statistical analysis of all decision results. Finally, the RF classifier identifies the category of input unknown samples according to the statistical results.
In this paper, our proposed propose a novel hybrid system for detecting pneumonia from chest X-Ray image: ACNN-RF, which the softmax classifier and the last layer is replaced by RF classifier, and the 256 nodes fully connected layer are used to extract deep features from the chest X-ray images. We use the random forest regression model contained in Sklearn [26] to train a classifier. The RF training algorithm based on GridSearchCV class is shown in Table 5.
RF training algorithm based on GridSearchCV class
RF training algorithm based on GridSearchCV class
The overall process is as follows: The image depth features extracted by ACNN are randomly divided into training set (70%) and testing set (30%). Set parameters. We employed GridSearchCV class of sklearn.grid_search library, not only parameters can be automatically adjusted, but also the average accuracy of each parameter combination can be calculated by cross-validation [27]. The training set is trained by the fit function in the random forest model, and then the tester is predicted by the predict function.
A Convolutional Neural Networks (CNN) is a feed forward neural network that includes multi-layer neural network with a deep supervised architecture. CNN consists of two parts: an automatic feature extractor and a classifier. The feature extractor contains convolution layers, down sampling layers and feature map (Full connection) layers. The advantage of CNN is that it can automatically extract features, thus avoiding the difficulty and trouble of human extracting features. However, it inevitably leads to over-learning phenomenon, also known as “over-fitting” because of the small data set or too many training times.
At present, many researchers have conducted in-depth research on the effective solution of over-fitting problem. Hinton et al. [28] first introduced Dropout method to solve the over-fitting problem in the training process of neural network. Srivastava et al. [29] tried to use Dropout in convolution layer. Wan et al. [30] Dropout is proposed DropConnect method, which improves the delicacy of feature learning and avoids over-fitting more effectively. Wu et al. [31] Introducing Dropout into maximum pooling, stochastically suppressing pooling values and then using maximum pooling to select pooling activation values, effectively suppressing over-fitting. When Dropout is added to training of neural network, some activation units are randomly inhibited. The training results can be regarded as a combination of models of different neuron subsets, while all neurons are retained in the test. The average of all possible models trained is obtained by model averaging method. Take the average approximation. Therefore, the maximum pooling Dropout method is used to solve the over-fitting problem in convolutional neural networks.
In the CNN, the pooling layer samples the features of the convolution layer to reduce the computational complexity and improve the computational efficiency. The usual pooling operations can be expressed as:
After the introduction of Dropout into the pooling layer, before the pooling operation, the masking process is carried out, so that the activation values of some neurons in the pooling area are set at 0 with a certain probability, and then the maximum pooling operation is carried out in the remaining neurons. The process can be expressed as follows:
Among them,
As shown in Fig. 6, when Dropout is not introduced, the maximum output of each 2*2 pooling region in the feature graph is the result of pooling of the region. For example, the maximum output of the last pooling region is 4. When Dropout is introduced, the last pooling region 4 is suppressed and 1, 0 and 3 are retained, the maximum pooling result is 3.

Method operational diagram of maximum pooling dropout.
In order to solution the over-fitting problem, we also used the “early stopping” method [32] to find the optimal number of iterations. So the number of training iterations needed to minimize the cost function of the model is limited. The AdaMax [33] method (shown in Table 6) was used to optimize the CNN model which makes use of a cross-entropy loss function, and then selection of appropriate learning rate.
AdaMax algorithm
AdaMax is an Adam variant based on infinite norm. It can replace the first-order optimization algorithm of the traditional stochastic gradient descent process and update the weight of the neural network iteratively based on the training data.
In this paper, we proposed the CNN model has two convolutional layers, two Max-pooling layers and two full connection layers. A detailed description of the CNN model for feature extraction is shown in Fig. 7.
As shown in Fig. 7, the proposed CNN has two convolutional layers and two fully connected layers. The input image size of the model is 64 *64 *3 pixel resolution. In the first convolution layer, the filter size is 3*3, the step is 1 pixel, and the number of filters is 10. In the second convolution layer, the filter size is 5*5, the step is 1 pixel, and the number of filters is 8. Max-pooling layers with a pooling window of 2*2 and 2pixel strides. The Max-pooling reduces the computational complexity of the upper layer by eliminating non-maxima. The pooled output of the second convolutional layer is fed to the first full connection layer which has 256 neurons, and the output of second full connection layer feeds into the Softmax classifier. Dropout regularization [34] with dropout ratios of 0.25 and 0.4 is applied to output of convolution layer and the first full connection layer, respectively. We initialized search ranges to be [1e-5, 5e-2] for the learning rate. At the same time, we also determine the optimal number of feature extraction layers based on experience, so as to improve the accuracy of classification.

CNN for chest X-ray image classification, Softmax function maps the output of multiple neurons into (0, 1) interval, thus completing the task of multi-classification.
In this paper, the architecture of hybrid ACNN-RF model was designed by replacing the softmax classifier of the CNN model with an RF classifier. we employ the RF as a classifier for deep activation features in CNN model. The output unit of the last layer of CNN network is the estimated probability of input samples. The output of a hidden layer is linearly combined with trainable weights, and then a deviation is added to generate the input of the next activation function. Although the output values of the hidden layer are meaningless, they are meaningful to the CNN network itself and can be regarded as the characteristics of any other classifier. In addition, we employ an improved adaptive median filter (as shown in 2.2.2 Section) to remove image noise, which reduces the training time of CNN and improves the accuracy of image recognition. We also construct a CNN based on Dropout (as shown in 2.2.4 Section), which effectively solves the problem of over-fitting in CNN model.
The overall framework of the new hybrid ACNN-RF model is shown in Fig. 8. Firstly, the preprocessed image is sent to the input layer, and the CNN based on Dropout is trained with many epochs until the training process converges. Then, the RF classifier based on GridSearchCV class (as shown in 2.2.3 Section) replaces the output layer.

The overall framework of ACNN-RF model, the size of the convolution network window is 3*3 and 5*5; Maximum Pooling Method for Choosing Pooling Layer; Random Forest Classifier (RF) for Image Classification.
CNN can extract eigenvectors without complicated parameter adjustment. However, as the human brain detects objects, the presence of noise and other interference in the image has a certain impact on the extracted features. Therefore, if the detected image is clear and there is no noise interference, the effectiveness of the eigenvector can be improved, and reduce the training time of CNN extraction features. Based on the above ideas, we proposed a CNN algorithm with improved adaptive median filtering. As can be seen from the previous section, the improved adaptive median filter has good denoising effect and high efficiency, so it will not impose a heavy computational burden on the overall algorithm, and it will also be helpful to the feature extraction of CNN.
The hybrid ACNN-RF algorithm steps are as follows:
Step 1: The adaptive median filtering method is used to remove the noise of chest X-ray image in the data set, which makes the image clear and easy to recognize. Calculate the values of A1, A.2, B1, B2, If A1 > 0 & & A2 < 0 and B1 > 0 & & B2 < 0, then the value of the pixel is f ij . Otherwise, the output value of the pixel is represented as f med .
Step 2: Setting the parameters of CNN. Suppose the size of the filtered image is m1 × m2; the size of the convolution kernel is n1 × n2, and step size is set to 1. After image convolution, the size of feature image is (m1 - n1 + 1) × (m2 - n2 + 1). Finally, i denotes category i; j denotes the j pixel of a picture; b
i
denotes bias. The extracted image features are defined as follows:
Step 3: The filter output of the convolution kernel in step 2 is processed by a non-linear activation function. The non-linearity used in the convolution neural network is the rectifier or thresholding function h (x) = max(0, x). Which is a broken line on the axis of the coordinate. When x ≤ 0, then h (x) =0; when x > 0, then h (x) =1. This makes the convolutional layers similar to rectified linear units (ReLU) [35]. Finally, the eigenvectors after convolution operation are shown as follows:
Step 4: After image features are obtained by convolution, the main features are extracted by pooling in order to simplify computational complexity of network. Generally, there are two kinds of pooling operations: average pooling and maximum pooling [36]. In this paper, maximum value of each feature is selected by maximum pooling, and pooling operation is as follows:
Step 5: The extracted eigenvector is used to train the random forest [37]. After meeting the training requirements, the trained model can be used to classify the unknown test sets.
The hybrid ACNN-RF classification model is trained under different parameters to obtain better feature extraction model and improve the performance of ACNN-RF classification model. In order to ensure the consistency of datasets and reduce computing time, the chest X-ray images size adjusted to 64*64. The optimum number of iterations is found to be 30 by “early stop” method. We initialized search ranges to be [le-5, 5e-2] for learning rate, and used cross validation to determine the best three parameters of RF to help improve classification. The optimal parameters of CNN-RF classification model are shown in Table 7.
CNN-RF classification model parameter configuration
In Table 7, the parameter n_estimators denotes the number of decision trees; the parameter criterion denotes the strategy of sample set segmentation (Gini Index and Information Gain); the parameter max_depth denotes the maximum depth of the tree; the parameter min_samples_spli denotes the minimum number of samples for each partition according to attributes; and the parameter min_weight_fraction_leaf denotes the minimum weight required for leaf nodes.
The CNN algorithm based on improved adaptive median filtering has three main stages in the detection of pneumonia: (1) Preprocessing chest X-ray films with improved adaptive median filtering method; (2) Constructing CNN model based on the maximum pooling Dropout method to extract image depth features; (3) Training RF classifier with image features extracted by convolution neural network, and purchasing efficient and stable classification models. The flow chart of the hybrid CNN-RF algorithm based on improved adaptive median filter (ACNN-RF) is shown in in Fig. 9.

The flow chart of CNN-RF algorithm based on adaptive median filter (ACNN-RF), the flow chart on the left side is the process of feature extraction, while the flow chart on the right side is the process of image classification.
Results and discussions
In our first experiment, the experiment of input image size selection. The size of input image has an effect on the prediction accuracy of classification model. We investigate the effect of input image resizing as a preprocessing stage in the hybrid ACNN-RF model. In order to determine the size of the input image, the pixel sizes 50×50×3, 64×64×3, 80×80×3, 100×100×3, 224×224×3 are used as input image in the experiment. The ACNN-RF classification model with input image size 64×64×3 achieves the highest accuracy (96.9%), however, the image size re-sampled to 224×224×3, classification accuracy dropped to 93.8% as shown in Fig. 10. According to the experimental results, we determined to sample the size of input image to 64×64×3.
From the results shown in Fig. 10, the classification accuracy of the hybrid ACNN-RF model severely degrades with high down-sampling. We have come to the conclusion that too high or too low sample sampling will have a certain impact on the classification results.

Classification accuracy of ACNN-RF classification model with different input image size.
In the second experiment, we study the influence of the improved adaptive median filter on the performance of the classification model in the pre-processing stage. Using the CNN (as shown in Section 2.2.3) we constructed in the experiment. As shown in Fig. 11, under the same input, the running time (561.8 s) of ACNN is lower than that of CNN, and the classification accuracy (95.6%) of ACNN is obviously higher than that of CNN. Meanwhile, we also studied the impact of CNN based on Dropout on classification prediction results. Under the same input, the running time and accuracy of CNN (Maxpooling Dropout) and CNN (No Dropout) are compared as shown in Fig. 12. The running time (921.2 s) of CNN (Maxpooling Dropout) is lower than that of CNN (No Dropout), and the classification accuracy (94.5%) of CNN (Maxpooling Dropout) is obviously higher than that of CNN (No Dropout).

The influence of improved adaptive median filter on the performance of classification model.

The influence of Maxpooling Dropout on the performance of classification model.
From the results shown in Figs. 11 and 12, De-noising the input image or introducing Dropout into the pooling layer of CNN can shorten the running time of the program and improve the classification accuracy. We conclude that the improved adaptive median filter introduced in the data preprocessing stage is effective in removing image noise, and the method of maximizing the pooling Dropout effectively improves the classification performance of CNN.
In the third experiment, using the CNN (as shown in Section 2.2.3) we constructed in the experiment. We studied the effect of combination of deep learning and machine learning on pneumonia recognition results. CNN, CNN-SVM (RBF) and CNN-RF classification model ware tested on the same size dataset. The experimental results are shown in Table 8. The results show that the average F measure and average accuracy of the five classification model are all above 95%. The hybrid ACNN-RF model not only achieves high accuracy (96.85%), but also shortens the running time (625.4 s) of the classification model. The main reason is that the 2.2.2. RF classifier based on GridSearchCV class performs well and does not over-fit with the increase of the number of random trees. At the same time, RF can avoid multiple training of classifiers when classifying. The calculation time of the classification model is greatly shortened and the power consumption is reduced.
Performance comparison of five algorithms under the same input
It is generally believed that deep learning models are slow to train the network model. However, we have to emphasize that those deep architecture, like all other neural networks models, are extremely parallelization, with dramatically reduces the training time when graphics processing unit (GPU) [38]. The running time of the five classification models on the test set is shown in Fig. 13.

Running time corresponding to the five classification model.
From the results shown in Fig. 13, the hybrid ACNN-RF model achieved the shortest running time while guaranteeing high classification accuracy. We conclude that the combination of deep learning and machine learning improves the accuracy of pneumonia recognition, and effectively reduces the running time of pneumonia recognition program.
In the last experiment, we compare the hybrid ACNN-RF classification model with the classification model constructed by Kermany et al [39]. the chest X-ray images in the sample set were normalized, and the final size was adjusted to 64x64, As shown in Table 9, under the same input, according to the generated obfuscation matrix (as shown in Fig. 14), the calculated evaluation parameters for our proposed hybrid ACNN-RF model are illustrated. It achieves high score of precision rate 90%, recall rate 95%, F measure 97.7%, sensitivity 95%, specifity 95.9% and accuracy rate 96.9%. The result of performance evaluation index of the hybrid ACNN-RF model is obviously higher than that of classification model of Kermany et al.
Several evaluation parameters for classification model

Confusion matrix for the hybrid ACNN-RF model performance.
In order to prove the validity and stability of the hybrid ACNN-RF model, we observed the trend of accuracy and loss of classification model on training set and test set with the number of training epochs. The results are shown in Fig. 15.

trends of accuracy and loss in test set with the number of training epochs, Accuracy is plotted against the training step, and cross-entropy loss is plotted against the training step during the length of the training of the two-class classifier over the course of 30 epochs.
As can be seen from Fig. 15, the accuracy of the test set and training set increases gradually with the increase of the number of epochs, and finally reaches a stable; the loss rate of the test set and training set decreases gradually with the increase of the number of epochs, and finally reaches a stable. It shows that the hybrid ACNN-RF classification model is effective and stable.
In this paper, The CNN classifier based on Dropout (as shown in Section 2.2.3) was used to extract the morphological features of the chest X-ray image, and the improved adaptive median filter (as shown in Section 2.2.2) was used to get rid of the chest X-ray image noise. Then we have used Python Scikit-learn library [40] to construct RF classifier based on GridSearchCV class (as shown in Section 2.2.2) with Auto-tuning parameters. The experimental results show that the hybrid ACNN-RF classification model has a good performance for image classification. It was not only improves significantly the test accuracy but also reduce the training time.
Although some progress has been made in the study of chest X-ray images categorization, Chest X-ray is an effective method to diagnose pneumonia and plays an important role in clinical nursing and epidemiological research. So it still needs to be further studied. In the next work, we will continue to do some related research. Although the adaptive median filter can improve the classification accuracy of CNN, it adds additional preprocessing. We will look for a new way to improve the performance and efficiency of CNN model without additional preprocessing work. In addition, we will study how to visualize the features extracted by convolution neural network, and verify the performance of our classification model by visual observation.
Footnotes
Acknowledgments
This research is financially supported by National Natural Science Foundation of China (Grant No. 61672470 and 61802350) and the National Key Research and Development Plant (Grant No. 2016YFE0100600 and 2016YFE0100300). It is also partially supported by the project of the International Cooperation of Henan Province of China (Grant No.162102410076).
