Convolution neural network is often superior to other similar algorithms in image classification. Convolution layer and sub-sampling layer have the function of extracting sample features, and the feature of sharing weights greatly reduces the training parameters of the network.
OBJECTIVE:
This paper describes the improved convolution neural network structure, including convolution layer, sub-sampling layer and full connection layer. This paper also introduces five kinds of diseases and normal eye images reflected by the blood filament of the eyeball “yan.mat” data set, convenient to use MATLAB software for calculation.
METHODSL:
In this paper, we improve the structure of the classical LeNet-5 convolutional neural network, and design a network structure with different convolution kernels, different sub-sampling methods and different classifiers, and use this structure to solve the problem of ocular bloodstream disease recognition.
RESULTS:
The experimental results show that the improved convolutional neural network structure is ideal for the recognition of eye blood silk data set, which shows that the convolution neural network has the characteristics of strong classification and strong robustness. The improved structure can classify the diseases reflected by eyeball bloodstain well.
Convolution neural network is often used to solve the classification of various pictures [1, 2]. The famous network LeNet-5 is used to recognize the numbers written by hand. With the rapid development of medicine, there are hundreds of millions of images produced by medical detection. How to process medical images quickly has become a research hotspot for researchers [3]. In the routine medical examination, the doctor can know the patient’s disease from the eyeball blood silk examination chart. If we can use automatic software to complete this operation, it will be greatly convenient for doctors and patients. In order to solve this problem, the classic LeNet-5 structure will be modified in this chapter to better identify the diseases reflected in the blood silk images of human eyes [4].
LeNet-5 network construction
LeNet-5 convolutional neural network has been used in the United States to identify the numbers filled in by bank customers when they handle business. It has high accuracy and is widely used in business [5]. Figure 1 shows the overall structure of LeNet-5, from left to right, corresponding to input layer, C1 convolution layer, S2 sub-sampling layer, C3 convolution layer, S4 sub-sampling layer, C5 convolution layer, F6 full connection layer, and finally output layer.
Before the samples are input into the LeNet-5 network, all the original handwritten digital images need to be normalized to 32 32 before training. C1 convolution layer convolutes an image, such as handwritten numeral “5”. At this time, the convolution kernel used is 5 5, and the convolution kernel with six different weight parameters is used to process the image of 32 32 handwritten numeral “5”, and six 28 28 characteristic graphs will be obtained, and C1 convolution layer is composed of these six feature maps. The use of multiple convolution kernels ensures that the convolution neural network can fully check the stroke features when learning handwritten digital images, and better carry out the next classification operation. According to the size and number of convolution kernels, the size and number of feature graphs and the size of convoluted graphs, it can be concluded that there are 6 (5 5 1) 156 parameters to be trained in C1 layer and 156 (28 28) 122,304 neurons to be connected. From the number of C1 layer parameters, it can be concluded that the convolution neural network can control the number of parameters well.
C1 convolution layer is connected with S2 sub-sampling layer. The LeNet-5 sub-sampling layer uses the mean sampling, and the sampling window is 2 2. In this window, all the values are added and summed, multiplied by a parameter, and then the offset value is added. The obtained value is activated by sigmoid function. The number of feature maps in S2 sub-sampling layer is the same as that in C1 layer, and the size of feature map is 14 14 due to the sampling operation. Therefore, the training parameters of this layer are 12, and the number of connections of all neurons is 5880.
The connection between C3 layer and S2 layer is special. Network designer adopts “random connection”, that is, there are one, several or all connections between a feature graph of C3 layer and feature output extracted from the previous layer. As shown in Table 1, the horizontal bar represents the characteristic map of C3 layer, with a total of 16 pieces; the vertical column represents the feature map of S2 layer, with a total of 6 pieces. The “” in the table indicates that there is a connection. For example, the No. 0 feature map of C3 layer is generated by convolution and merging of No. 0, No. 1 and No. 2 feature graphs in S2 layer. This connection method can not only control the number of connections, but also further extract the features of the upper layer feature map. There are 1516 parameters needed to be trained in C3 layer, and 151600 connections are required for all neurons.
LeNet-5 structure.
Connection mode of layers C3 and S2
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
0
1
2
3
4
5
S4 layer is still a sub-sampling layer, which also adopts the mean sampling method. Because the sub-sampling operation will not increase the number of feature images, sixteen 5 5 feature images can be obtained by sampling the mean value of C3 layer feature images. Each feature map needs one multiplicative parameter and offset value, the number of parameters in this layer is 32, and all neurons need 2000 connections.
C5 layer is a convolution layer, in which each element is fully connected with the previous layer. Because the feature graph size of S4 layer is 5 5 and the convolution kernel size of C5 layer is also 5 5, each feature graph in C5 layer is composed of one element, that is, the size is 1 1, and there are 120 feature graphs at the same time, and all neurons need 48,120 connections. F6 layer is designed to have 84 units, which are fully linked with the previous layer, and the training parameters are 10164. In the last part of the network, Gaussian connection is used, and 10 categories are output, corresponding to 10 numbers from 0 to 9. It can be noted that the authors of LeNet-5 do not use Softmax regression as classifier, which limits the application of LeNet-5 network to some extent.
There are many weight parameters involved in CNNs, which need to be given initial values before network training starts. The size of the parameters is directly related to the convergence speed and classification effect of the network. There is no best method to optimize the initial value of the parameters. However, two principles are generally recognized: one is that the value used in the assignment should be small; the other is that the value of each parameter should be different.
Recognition of ocular hemofilament disease
The disease of human body can be observed on the surface, no matter from the personal mental state, or the performance of the body skin, especially in the eye area, the reaction is most obvious. Traditional Chinese medicine theory shows that, due to the abnormal close relationship between the eye meridian and various organs of the human body, the pathological changes of the organs will directly appear in the eyeball, and the performance in the eyeball is whether there is blood thread in the white eye, where the blood silk appears in the white eye, the direction of the blood silk, and the number of blood silk, etc. Therefore, the observation of blood filament in the eyes can tell the patient’s illness, which is of great help to promote the diagnosis of the disease.
As shown in Fig. 2, ‘looking at the eye syndrome differentiation’ is a traditional Chinese medicine examination. The old Chinese medicine generally observes the blood thread, and judges the severity of the disease and disease by observing the position and characteristics of the white eye where the blood silk is. It is time-consuming and laborious to rely solely on traditional Chinese medicine to carry out one-to-one examination on patients. Moreover, the examination results of the same patient are easy to be different due to different doctors, so there is no specific quantitative examination. Nowadays, many traditional Chinese medicine examinations rely on advanced instruments, such as the use of stethoscope for heart and lung auscultation, which plays a great role in the development of traditional Chinese medicine, so it is of great significance to realize the modernization of ‘looking at the eyes and distinguishing symptoms’. Since the bloodstain of eyeball can reflect the diseases of the body, is it feasible to use convolutional neural network to classify the diseases reflected by eyeball bloodstain according to its strong feature extraction ability and classification ability? This will be addressed in this study.
Ocular hemofilament disease data set
The eye blood silk pictures were provided by the tutor, and the symptoms reflected were shoulder diseases, stomach diseases, intestinal diseases, reproductive diseases, heart diseases, and normal eye images, as shown in Table 2.
Pictures and disease of ocular bloodstain
Disease
Picture
Explanation
Shoulder diseases
The hyperemia of the upper part of the white eye is characterized by shoulder pain (the lower left corner is a schematic diagram).
Stomach diseases
In the lower part of the white eye, there are cord like vascular hyperplasia and varicose blood, which is the manifestation of gastric lesions (the upper right corner is a schematic diagram).
Intestinal diseases
White eye inside the blood curl hyperplasia, for intestinal abnormalities (the lower left corner is a schematic diagram).
Reproductive diseases
The blood filaments in the lower triangle of the white eye are hyperemia and hyperplasia, which can be diagnosed as prostatitis, that is, reproductive diseases (the upper right corner is a schematic diagram).
Heart diseases
There was embolism in the left side of the white eye, indicating that there was a problem with the blood circulation inside the heart (the lower right corner is a schematic diagram).
Normal eyes
In normal eye images, there was no blood thread in the white eye area.
Eye diagram.
The size of the acquired images varies from 85 85 to 360 300 due to the difference of image acquisition equipment and the slight difference of patient’s eye pose. Some original images not only contain the eye part, but also include the eyebrow part. At this time, it is necessary to cut down the image properly while keeping the eye part. For smaller and larger images, you need to zoom in and out to the right size. In order to facilitate the network training, the size of all pictures is processed by square. After normalization, 1350 samples with size of 126 126 were obtained. There are 5 kinds of diseases reflected by bloodstain and normal eye image, so the sample label of the data set is 6. It can be taken as {x, y} for shoulder diseases, {x, y} for stomach diseases, {x, y} for intestinal diseases, {x, y} for reproductive diseases, {x, y} for heart diseases and {x, y} for normal eye images.
MATLAB software is used in this experiment platform. In order to facilitate MATLAB to read pictures in batches, all pictures need to be converted into data set files at the end of “.mat”. For example, the label of training set is expressed by train_y, which must be converted into vector form first, and the classification K is read by using the function “char” in MATLAB, that is, Train_y_file char (‘k’); Then compile with the “decodefile” function, that is, train_y decodefile (train_y_file, ‘label’); Other training set vector train_x, test set vector test_x, and test set label test_y operate similarly. Using the function “save”, that is, save (‘yan.mat’, ‘test_x’, ‘test_y’, ‘train_x’, ‘train_y’), the picture can be conveniently converted into “”. As shown in Fig. 3, the data set fragment of train_x has 990 “rows” and 126 126 15876 “columns” because the normalized picture size is 126 126 and 990 pictures are randomly selected from 1350 samples in the training set.
Improving network structure
In the construction of typical convolution neural network, the input-output layer and full connection layer, convolution layer and sub-sampling layer are essential. The main difference between different network structures lies in the design of convolution layer and sub-sampling layer. The reason for this difference is that the problems to be solved by the network are different.
Design of the convolution and subsampling layers in different network structures
In Table 3, when designing the convolution neural network structure, different network designers will determine the convolution layer, the number of sub-sampling layers and the size of convolution kernel according to the actual problems to be solved. At the same time, from the data analysis in the table, when the input image size is fixed, the more layers of convolution layer and sub-sampling layer, the smaller the size of convolution kernel.
LeNet-5 network, as a typical convolution neural network, has achieved success in handwritten numeral recognition, but it is not suitable for the images with more texture features such as eyeball blood, so it is necessary to improve the network. The improvements are as follows:
The number of convolution kernels increases and the size changes. Each convolution kernel of convolution neural network can extract one feature. In order to extract more features of eyeball blood, the number of convolution kernels in C1 convolution layer is increased from 6 to 10. The size of the input layer image is 126 126, which is convoluted by C1 convolution layer. Ten convolution kernels with size of 7 7 are used. When each convolution kernel is convoluted in the input image, the moving step size of each convolution kernel is 1, and the size of convolution image is (126 7 1) (126 7 1) 120 120.
When using the maximum sampling, the window changes. The number of feature maps will not increase or decrease after sub-sampling operation, so the number of feature maps output by S2 sub-sampling layer is still 10, and the sampling window used is 4 4, and it is the maximum sampling. LeNet-5 network uses mean sampling, which is very useful for dealing with the noise of handwritten digital pictures on white background and black characters. The main feature of digital pictures lies in the strokes of numbers, while the noise in other blank parts will affect the learning process of convolutional neural networks. Therefore, using mean sampling can effectively remove the noise and retain the handwriting features of numbers to the maximum extent. However, the blood filaments in the eyeball are relatively small, and the color contrast with the surrounding white eyes is relatively large; if the average sampling is used, the obvious characteristics of blood filaments will be destroyed, which is not conducive to accurate classification. Therefore, the use of maximum sampling can better collect the characteristics of blood filaments, which is conducive to the classification of diseases. In the sub-sampling, if the size of the feature map of the upper layer cannot be divided by the size of the sampling window, the window will exceed the feature map. At the same time, in order to ensure the integrity of the features, the extra number of rows and columns cannot be deleted, so “0” should be added to the edge to facilitate sampling. The sampling window does not overlap in the input feature map, so the size of the feature map is (120/4) (120/4) 30 30.
Increase of feature map. The C3 convolution layer uses the feature map output from S2 layer. There are 20 characteristic graphs in this layer. Each feature map is locally linked with the feature map of the previous layer. The number of convolution kernels used is 20 and the size is 7 7. Therefore, the size of the feature map is (30 7 1) (30 7 1) 24 24. There are 20 feature maps in S4 layer, and the maximum sampling window is 4 4. Since the size of feature map of the upper layer is 24 24, the feature map of S4 output is 6 6. F5 is the full link layer. As the name implies, each nerve node is connected with each nerve node in the upper layer S4 layer, including 360 nerve nodes.
Improve the classifier. The LeNet-5 network does not use Softmax regression as the classifier, but connects a full link layer F6 after F5 and uses the Euclidean Radial Basis Function as the output layer. The improved convolutional neural network structure directly removes F6 layer, and F5 layer is fully connected with Softmax, and Softmax is used as classifier to output classification results. This time, the number of eye blood filament recognition disease categories is taken as 6 categories, that is, Softmax will output 6 different results. As shown in Fig. 4, the training process of convolution neural network for sample set is shown, including random selection of samples, initialization of parameter values, convolution operation, maximum sub-sampling operation and back propagation updating weights.
Training flow of the convolutional neural network.
The improved convolution neural network algorithm can be concluded “yan.mat”. The training steps of the data set in the training stage are as follows:
Before “yan.mat” enters the improved CNNs, the weights of the network are given initial values, including connecting weights w and offset values b.
In order to improve the training efficiency, the input into the network is changed from one input to one batch input.
The convolution operation of convolution layer and the sampling operation of the sub-sampling layer are carried out alternately, and then the classifier is classified in the full connection layer to obtain the output results.
If the error meets the requirements of the network designer or completes the preset iteration times, the training has been finished and the final classification results are output. Otherwise, the network will continue to train to meet the requirements.
Back propagation, according to the principle of minimizing the error, returns the error layer by layer from the output layer to the input layer, and calculates the weight of the update network at the same time.
From step (2), train the network again.
There are two major processes when CNNs is applied to classification and recognition, one is the training stage of network, and the other is the testing stage of network. The test stage is to detect whether the network has converged, and whether the convolutional neural network model can achieve better recognition accuracy for a group of training set pictures independent of the test set. Compared with the training stage, the big difference in the testing stage is that the network no longer updates the weights. The testing steps are as follows:
After the training, the first mock exam has already learned the characteristics of training set samples, and the parameters of each layer are fixed.
The input layer samples are changed to the data in the test set to start the test.
The data propagates from the input layer to the output layer, and outputs the classification results of the sample through the convolution layer and sub-sampling layer alternately.
The classification results of the network are compared with the test sample label, and the correct number of samples is recorded to calculate the accuracy.
Check whether the test samples have been classified. If yes, count the final classification accuracy, otherwise continue to cycle from step (2).
Experiment and analysis
The software platform MATLAB was used in the experiment, with CPU i3 and 4G memory. The diseases identified by bloodshot eyes in the experiment are divided into five categories, plus normal eye images. There are six categories of pictures, that is, the classifier is set to k 6. After normalizing all the pictures, a total of 1350 samples with a size of 126 126 were obtained, all of which were gray images. In order to ensure the preciseness of the network test results, the training set samples and the test set samples are independent of each other, and there is no intersection; 990 samples out of 1350 samples are randomly selected as the training set of CNNs, and the remaining 360 samples are counted as the network test set. The distribution of each disease and normal eye image in the test set and training set is shown in Fig. 5. As can be seen from the figure, there are more samples of reproductive diseases and heart diseases in the training set and test set, while the samples of gastric diseases are relatively small.
The quantity of each disease and normal eye image.
Sample size and recognition rate
The sample size used in this chapter is 126 126, which is larger than 32 32 samples of handwritten digits used in the LeNet-5 network. In order to explore the influence of different sample size on the recognition rate of convolutional neural network, and simplify the experimental process, only two kinds of eye blood samples reflecting reproductive disease and heart disease were selected. Among them, there are 210 reproductive disease training sets and 79 test sets; 224 heart disease training sets and 82 test sets. In this experiment, the original sample size of 126 126 was reduced by the same ratio of length to width, and then the reduced images were taken as the sample set, and the reduced sample size was 94 94, 62 62, 46 46, and a group of samples with the size of 156 156 was added as the comparison. The number of samples for reproductive disease and heart disease remains unchanged. The training sets of the above samples with different sizes are input into the improved convolutional neural network structure respectively for training until the network reaches convergence and the recognition rate basically does not change, and then the test set samples are tested until the test is completed. The training time and test time were recorded respectively. The relevant results are listed in Table 4.
Experimental results of different sample sizes
Sample size
46 46
62 62
94 94
126 126
156 156
Training time/s
335.5
477.2
949.3
1041.4
1837.6
Test time/s
3.1
4.9
6.3
8.7
10.4
Test accuracy/%
55.48
62.32
70.21
81.57
77.63
As can be seen from Table 4, in a certain range, the larger the sample size, the better the classification accuracy. Moreover, the larger the size of the image contains more feature information, the more time the convolutional neural network takes in extracting features and training weights; It can be seen that the network test time is greatly reduced compared with the training time, mainly because the convolution neural network in the training time spent most of the time on the weight update, and in the test phase, the network does not need to update the weight, so the test time is greatly reduced. When the sample size is 126 126, the accuracy rate is 81.57%; when the size is 156 156, the accuracy rate decreases to 77.63%, which indicates that this size is not suitable for improving the structure; while when the training sample size is 46 46, the recognition accuracy is only 55.48%, which is meaningless for only identifying two types of convolutional neural networks. Therefore, in order to further improve the recognition accuracy of convolution neural network, for a specific network, in a certain size range, the input layer data should choose a larger sample.
The larger sample size can improve the recognition accuracy of convolution neural network, which is also reflected in the intuitive feeling of the picture. As shown in Table 5, the sizes of the pictures in the table are 126 126, 94 94, 62 62, 46 46. Compared to the two diseases reflected by the blood filament of the eyeball, reproductive disease and heart disease. From the sample size of 126 126, it can be seen that the blood filaments of the eyeballs of reproductive diseases are distributed in the white eye, and the blood filaments are evenly shaped and extend to the area near the iris; the blood filaments of the eyeballs with heart disease are located on the left side of the white eye, and the blood filaments near the left side are more concentrated, and the blood filaments in the middle of the white eye are relatively small and dim. The sample size ranges from 94 94 to 62 62. It can be seen that the blood silk characteristics of the two have become more and more blurred. By the time the sample size is 46 46, it is difficult for naked eyes to distinguish the difference between the two pictures, and the bloodshot feature has basically disappeared. Convolutional neural network extracts the features of input images, and for the classification of images, the network extracts different features of different label samples in training; once the features in the input image are very fuzzy or basically indistinguishable, the convolutional neural network cannot update its own weights well, and then it is difficult to converge, thus the network training fails and the recognition accuracy is naturally low.
The difference of visual perception of different size pictures
Sample size
Reproductive diseases
Heart diseases
Explanation
126 126
The location and shape of blood filaments in reproductive diseases and heart diseases can be observed clearly
94 94
Although the blood is blurred, it is sufficient for the observation
62 62
It is difficult to distinguish the two images
46 46
The bloodstains of the two images are almost invisible, and look like the same picture
So why does small sample handwritten digital character data set achieve high recognition accuracy in LeNet-5 network? Because of the particularity of handwritten numeral characters, the input sample size used to recognize handwritten digits is only 32 32. However, it is difficult to see the characteristics of bloodshot eyes in this size. Under the condition of the same characteristics of LeNet-5 network, the small sample size helps to reduce the training time of the network. When using Mnist database for training, the database contains thousands of samples, so it takes a long time to train. At this time, it is more appropriate to use 32 32 size samples to train LeNet-5 network.
Conclusion
The improved convolution neural network structure is described in this paper, including the convolution layer, sub-sampling layer and full connection layer. At the same time, this paper introduces five kinds of diseases and normal eye images reflected by the blood filament of the eyeball “yan.mat” data set, convenient to use MATLAB software for calculation. At the same time, this paper explores the impact of different sample size on the classification results. It is found that in a certain range, the larger the sample size is, the better the recognition accuracy of the network is. The iteration times of convolution neural network are also studied. Only in the case of sufficient iteration can the network classify the test set images better. In the last part of this paper, we show that the improved convolutional neural network structure is ideal for the recognition of eye blood silk data set, which shows that the convolution neural network has the characteristics of strong classification and strong robustness. It is worth noting that if enough samples can be collected, the recognition accuracy of the improved convolutional neural network can be further improved.
Footnotes
Conflict of interest
None to report.
References
1.
Daoxi, Huosheng, Pan, et al. Pose estimation-dependent identification method for field moth images using deep learning architecture. Biosystems Engineering, 2015.
WangKGuoPLuoAL. A new automated spectral feature extraction method and its application in spectral classification and defective spectra recovery. Monthly Notices of the Royal Astronomical Society, (4): 4311–4324.
4.
AroraRRaiPKRamanB. Deep feature-based automatic classification of mammograms. Medical and Biological Engineering and Computing, 2020, 58(6): 1199–1211.
5.
CuiYZhangGLiuZ, et al. A deep learning algorithm for one-step contour aware nuclei segmentation of histopathological images. Medical and Biological Engineering and Computing, 2018, 67(dec): 1–8.
6.
YenYSSunHM. An Android mutation malware detection based on deep learning using visualization of importance from codes. Microelectronics Reliability, 2019, 93(FEB.): 109–114.
7.
D’IsantoAPolstererKL. Photometric redshift estimation via deep learning – Generalized and pre-classification-less, image based, fully probabilistic redshifts. Astronomy and Astrophysics, 2018, 609.
8.
XuJDengZSongQ, et al. Multi-UAV counter-game model based on uncertain information. Applied Mathematics and Computation, 2020, 366: 124–129.
9.
HengLUXiaoFUChaoL, et al. Cultivated land information extraction in UAV imagery based on deep convolutional neural network and transfer learning. Journal of Mountain Science, 2017, 14(4): 731–741.
10.
XiangyuZFriedrichFFranzK, et al. Optimization of OpenStreetMap building footprints based on semantic information of oblique UAV images. Remote Sensing, 2018, 10(4): 624–630.
11.
WangYLuoXLuoL, et al. UAV tracking based on saliency detection. Soft Computing, 2020: 1–14.
12.
SteinhardtUVolkM. Meso-scale landscape analysis based on landscape balance investigations: problems and hierarchical approaches for their resolution. Ecological Modelling, 2003, 168(3): 251–265.
13.
LiYGongJIbrahimAN, et al. Orchard identification using landform and landscape factors based on a spatial-temporal classification framework. International Journal of Remote Sensing, 2014, 35(5–6): 2118–2135.
14.
WangLZhangZ. Automatic detection of wind turbine blade surface cracks based on UAV-taken images. IEEE Transactions on Industrial Electronics, 2017, 64(9): 7293–7303.