Abstract
Manual analyzing and interpreting of the microscopic images of thin blood smears for diagnosis of the malaria is a tedious and challenging task. This paper aims to develop a computer assisted system for quantification of erythrocytes in microscopic images of thin blood smears. The proposed method consists of preprocessing, segmentation, morphological filtering, cell separation and clump cell segmentation. The major issues, required to be addressed to enhance the performance of the system are cell separation (i.e. isolated and clump erythrocytes classification) and clump cell segmentation. The geometric features such as cell area, compactness ratio and aspect ratio have been used to define the feature set. Further, the performance of the system in classifying the isolated and clump erythrocytes is evaluated for the different classifiers such as Naive Bayes, k-NN and SVM. Moreover, the clump erythrocytes are segmented using marker controlled watershed with h-minima as internal marker. Based on the experimental results, it may be concluded that the proposed model provides satisfactory results with an accuracy of 98.02% in comparison to the state of art method.
Introduction
Malaria is a serious infectious disease caused by parasite belonging to the genus Plasmodium. Plasmodium malariae, Plasmodium vivax, Plasmodium ovale and Plasmodium falciparum are the four different species of the parasite. According to the latest World malaria report 2014, 3.3 billion people in 97 countries and territories are at risk of being infected with malaria and Plasmodium falciparum is responsible for most deaths from malaria. In developing countries, manual microscopic examination of peripheral blood smears is considered as gold standard method for laboratory malaria diagnosis [1]. The accuracy of detection of erythrocyte in blood smears depends on the expertise of the microscopist and so it can lead to erroneous results and is time consuming.
Some of the disadvantages of microscopic examination are its low sensitivity at low parasite levels, time consuming nature and often incorrect detection of parasite. Malaria parasitaemia estimation is done by using microscopic image of stained blood smears. In thin blood smear examination, malaria parasitaemia estimation is done in two essential steps: (a) Erythrocyte quantification (b) Infected erythrocyte detection. Erythrocyte quantification is one of the most important tasks for parasitaemia estimation [2]. Overestimation and underestimation of erythrocyte quantification can lead to increase or decrease in parasitaemia level. The image analysis process for erythrocyte quantification has to deal with different difficulties such as the presence of artefacts, presence of both isolated and clump erythrocytes, overlapping (clump) of erythrocytes. There are several techniques available in the literature to segment the erythrocytes from the complicated background. S.S. Devi et al. did a review on erythrocyte segmentation techniques [3]. Several segmentation technique using threshold method such as mean method, P-tile method, histogram dependent technique (HDT), edge maximization technique (EMT) and visual technique have been reported [4]. Di Rubeto [5] introduced a morphological based cell segmentation which uses a non-flat disk-shaped structuring element to enhance the roundness and compactness of the erythrocytes to improve the accuracy of the classical watershed algorithm, using a flat disk-shaped structuring element to separate overlapping cells. Tek [6] applied the modified technique of original watershed algorithm called minimum area watershed transform as initial operator for cell segmentations. Further, circle radon transform is applied to locate the cell centres and final segmentation results are obtained by using original marker controlled watershed transform in the output of radon transform, with its markers obtained from the regional maxima. Ross et al. [7] developed an automatic system to diagnose the malaria using thin blood smear based on image processing approached. In this method, erythrocytes have been detected using the morphological and thresholding techniques. Further, features such as color, texture and geometry have been generated to classify the normal and infected erythrocytes. Microscopic imaging based automatic malaria diagnosis has also been discussed. In this approach, erythrocytes were segmented using watershed transform. Here, watershed segmented area were further examined. If the area is found to be too small to represent the cell, then it is merged with the nearest neighboring region with which it shares the longest border [8]. Supervised pixels classification has been performed based on different color model i.e. RGB color space, normalized RGB, HSV, YCbCr color space in order to segment the erythrocyte and parasite in microscopic images [9]. Sharif et al. [10] proposed an approach for erythrocyte segmentation to automate the counting of erythrocytes (red blood cells). The method consists of YCbCr color conversion, masking, morphological operation and watershed segmentation. The process, combining YCbCr color conversion and morphological operation has been used to remove the white blood cell (WBC) nucleus, present in the microscopic image of the thin blood smear. After removing artifacts and WBC from the image, only erythrocytes are left as foreground objects. The resulted erythrocytes then undergo through marker-controlled watershed, to obtain the segmented erythrocytes. Hari et al. [11] proposed a system to separate and count the blood components (i.e. white blood cell, Erythrocytes, platelets) based on the geometrical features and distance transform watershed segmentation.
Yu et al. [12] proposed a technique for analysis and recognition of cell phases based on the features such as gray scale, shape and geometrical features. Here, global thresholding method has been used to obtain the foreground region. Further, line segment, critical points, convexity and concavity are used to analyze the shape of the image contour. The method gives both the phase as well as dynamic tracing of cell nuclei. Chen et al. [13] proposed an automatic segmentation and classification of erythrocytes in peripheral blood smears based on 8-connection chain codes technique. The three steps such as image binarization, morphological filtering and erythrocyte separation (i.e. isolated and overlapping) were performed as preprocessing phase. Overlapping and isolated erythrocytes were segmented based on automatic thresholding. Here, mean plus standard deviation of the erythrocytes area was used as threshold criteria to separate the overlapping and isolated erythrocytes. Vromen et al. [14] discussed an automatic erythrocytes segmentation model using scanning electron microscope image based on contour tracing approach. The author reported a second order polynomial model with Bayesian approach to obtain smooth boundaries, and an ellipse fitting procedure to reduce noise in contours. A color image segmentation based on seeded region growing algorithm has been proposed. In this method, histogram of each band has been analyzed to obtain the seed automatically. Image pixels have been considered as a seed if the gray value of each band falls in some interval [15]. Boon et al. [16] performed a comparative analysis of different image segmentation techniques of erythrocytes microscopic image i.e. gray level thresholding, RGB color thresholding technique, color matching, edge detection operators, filtering operators, Gradient-in method, morphological operators, HSL (hue, saturation, lightness) and pattern matching. Further, it has been observed that there is no single method which gives the better erythrocyte segmentation result. Berge et al. [17] discussed an algorithm based on iterative analysis which can identify and count the erythrocytes in blood smear images. Here, RGB image was converted to gray scale image. Further, median filtering and morphological closing with disk shaped structuring element of radius 2 has been performed to remove the unwanted artifacts. Moreover, boundary extraction, curvature calculation and point of maximum curvature have also been used to split the erythrocytes which were in clump.
From various literature studies, it has been observed that all the proposed methods have a certain degree of limitation such as no consideration of the presence of artifacts, splitting of clumps made of erythrocytes, problem in separation of isolated and clump erythrocytes, considerations of only the erythrocytes which are inside the image and not of the erythrocytes touching image boundaries. Thus, in this paper, erythrocyte quantification system has been developed which consist of pre-processing, segmentation, cell separation and clump cell segmentation.
The contributions of the paper are as follows: Database collection: As the microscopic images of thin blood smears are not freely available for academic research, we have collected our own database from the registered pathological clinic of Cachar district, Assam, India. The details of the database are shown in Table 1. An Erythrocyte quantification method in microscopic images of the thin blood smears has been proposed. In the cell separation method, the geometric features such as cell area, compactness ratio and aspect ratio have been used as feature set. Moreover, the performance of the system has been evaluated using different classifiers such as Naive Bayes, k-NN and SVM. From the experimental analysis, it has been observed that SVM with the proposed three features provides better results in comparison to other classifiers. Clump erythrocyte segmentation based on the marker-controlled watershed with h-minima as internal marker has also been proposed. A comparative analysis has also been performed between the H-minima based marker-controlled watershed and classical watershed for clump erythrocyte segmentation.
The paper is organized as follows. In Section 2, the proposed method for erythrocyte quantification is explained. This section explains erythrocyte quantification which consists of segmentation, morphological filtering, cell separation and clump cell segmentation. Section 3 gives the experimental result analysis of the proposed method. Section 4 deals with the conclusions of research contribution with scope for future direction.
Proposed method
A general overview of the system for erythrocyte segmentation in microscopic images of thin blood smears is shown in Fig. 1. In our system, erythrocytes are segmented using the global thresholding technique and further morphological filtering to remove the unwanted artifacts. Isolated and clump cell separation using different classifiers model, clump cell segmentation and overall erythrocyte count are the main contribution of the proposed system.
Microscopic image database
The microscopic images of thin blood smear collected from the registered pathological clinic of Cachar district, Assam, India have been used for the experimental analysis of the proposed method for erythrocyte segmentation. The detail of the collected database is shown below in the Table 1.
Pre-processing
Due to the staining variability of thin blood smears and effect of camera light source, non-uniform illumination takes place in microscopic images of thin blood smears. Here, the non-uniform illumination has been corrected by using adapted gray world normalization method [8]. The normalized RGB image has been converted to gray scale image and further median filtering with kernel size 5 × 5 was applied to remove the unwanted artifacts.
Segmentation
Erythrocyte segmentation is done by using global thresholding method [18]. The algorithm presumes that the image contains two distinct classes of pixels, as is obvious in bi-modal histogram. Further, threshold value t for segmenting foreground from background is selected by minimizing the intraclass variance between foreground and background. The original microscopic image and histogram of microscopic image of blood cells are shown in Fig. 2 (a), (b) respectively. The general formula for the threshold that minimizes the intra-class variance is given as follows:
After segmenting erythrocytes from the background, some of the erythrocyte centers are segmented as background region. This background region surrounded by a connected border of foreground pixels is called erythrocyte holes. And artifacts present in microscopic image are also segmented as foreground region. Image is binarized to obtain the desired cells portion for complete analysis. For this proposed system, morphological image processing is used. The main aim of morphological image processing is to fill the erythrocyte holes and to remove the undesired artifacts from the binary image obtained after segmentation process.
The two complex filtering operations by combining dilation and erosion are used, i.e. closing and opening. In opening, erosion is followed by a dilation operation, but in closing, it is reverse process. Both these operations are controlled by a shape called structural element. In this system, erythrocyte holes filling are done by using flood-fill operation on the binary image of microscopic blood smear [19–21]. In the hole filled binary image, area opening operation is further performed to remove the unwanted artifacts. Disk-shaped structuring element with radius of about 25 percent of average radius of the foreground region obtained after segmentation process is selected for morphological opening. Figure 2(c), (d) shows the binary image before and after morphological filtering process respectively. In the filtered image, both the isolated and clump erythrocytes are present. Once the filtered binary image is obtained, isolated and clump erythrocytes need to be separated. The proposed cell separation process is shown in the next section.
Cell separation
For erythrocytes counting system, isolated and clump erythrocyte needs to be separated from the filtered binary image. The isolated cells are immediately considered for the counting system and clump cells are additionally processed to find the exact number of cells which are in a clump. Various literature studies considered the mean and standard deviation of the erythrocyte areas to separate the isolated and clump erythrocytes [7, 13]. However, there is some limitation to set the threshold using mean plus standard deviation of the erythrocyte area. In order to improve the performance of cell separation, the features such as area, compactness ratio and aspect ratio of the erythrocytes have been used to define the feature set for classification of isolated and clump erythrocyte. The main steps of cell separation consist of feature extraction and classification.
Feature extraction
The main purpose of feature extraction is to extract the information from the filtered binary image which can discriminate the isolated and clump cells. The area and circularity of the clump cells are much more deviated as compared to isolated cells. Hence, the features such as cell area, compactness ratio and aspect ratio have been used to characterize the isolated and compound erythrocytes [20, 21]. The compactness ratio and aspect ratio are the features which are used to define the roundness of the cell. The three extracted features are shown in Table 2. Area of the erythrocyte: Consider the function (X
u
, Y
v
) described in blood cell label map of an N × N image. The area in pixels of the nth cell is then given by
Compactness Ratio: Compactness ratio is another feature which is used to identify the roundness of a cell. It takes a value of 1 for all isolated cells. It can be defined by
Aspect Ratio: And the third feature is aspect ratio, which is defined as the ratio of the major axis to the minor axis of ellipse fitted to the boundaries of the extracted erythrocytes. It can be defined as
2.5.2.1. Naive Bayes based classification. Naive Bayes is a supervised machine learning method that classifies the data based on the probability distribution over a set of classes [22]. The different distributions i.e. normal and kernel were used for training phase to obtain the better model for testing. Finally, 4-fold cross validation has been performed in testing phase with distribution which provides the better training accuracy.
2.5.2.2. k-NN based classification. The k Nearest Neighbor (k-NN) technique is a non-parametric classification method [23]. k-NN was trained with different k value i.e. 1, 3, 5. Further, in testing phase, k-NN with k value which provides the highest training accuracy was used for 4-fold cross validation.
2.5.2.3. SVM based classification. SVM is a machine learning method which is used for classifying the data into two classes by finding an optimal hyperlane that maximizes the margins of decision boundaries. In order to classify the data into two classes, input data (x
i
, y
i
) is mapped into higher dimensional feature space using φ (x) as an operator where x
i
∈ R
n
and y
i
∈ { 0, 1 } For mapping the input data (x
i
, y
i
) into higher dimensional space with non-linear operator φ (x), the optimal hyperplane can be computed as a decision surface:
The coefficient α
i
and b can be obtained by using the quadratic programming (QP) problem:
The kernel function which gives the best training performance was choosen as suitable kernel for testing. 4-fold cross validation has been performed in testing phase.
The classical watershed based segmentation is mainly based on flooding simulation from all the regional minimums present in the binary image which lead to over-segmentation due to local irregularities of the gradient and noise. There are two main approaches to solve this over-segmentation problem. First one is adjacent region merging according to some predefined criteria after using the watershed. It is very difficult to choose the merging criteria [27]. Another method is marker-controlled watershed. A marker is a connected component present in an image. A set of internal markers is required in marker-controlled watershed segmentation which can represent the existence of an object. To extract the object marker correctly in order to segment the overlapping erythrocytes, regional minima of H-inima transform is used as an internal marker in marker-controlled watershed segmentation. The H-minima is a powerful mathematical tool to suppress the unwanted minima [28]. Applying H-minima transform on the inverse distance image can drastically decrease over-segmentation. The H-minima transform is given by
The total segmented erythrocyte cell is obtained by combining the isolated and clump segmented erythrocyte.
Performance of cell separation
As the database of the microscopic images of thin blood smears is not publicly available for academic research, the experimental analysis of the proposed method is evaluated by using our own collected clinical database. The collected database consists of 812 erythrocytes which contain both the isolated and clump erythrocytes. Two set of experimental analysis have been carried to check the efficiency of the proposed method. In the first, the best parameter for each classifier to classify the isolated and clump erythrocytes has been selected. For choosing the best parameter, the performance of the training dataset was examined in terms of training accuracy alongwith testing accuracy.
Once the best parameter of each classifier was chosen, we apply 4-fold cross validation for the analysis of all the classifiers i.e. Naive Bayes, k-NN and SVM. In 4-fold cross validation, one subset is used as the test set and the remaining as training subset. From the filtered binary image which contained both isolated and clump erythrocytes as shown in Fig. 2(d), we have extracted the three proposed features such as cell area, compactness ratio and aspect ratio. Moreover, the performances of all the classifiers have been evaluated to classify the erythrocytes. The experimental results of our collected database to separate the erythrocytes are listed in Tables 3 to 8. Here, 812 erythrocytes were used for experimental analysis, 609 for training and 203 for testing.
Table 3 shows the training and testing performance of Naive Bayes classifier with different distribution function. Form the Table 3, it has been observed that the kernel distribution function provides better training and testing accuracy with 91.78% and 89.65% respectively in comparison to others. Once the parameter was obtained, 4-fold cross validation has been performed. From the 4-fold cross validation result of the Naive Bayes listed in Table 4, it has been observed that the classifier provides an overall erythrocyte separation accuracy of 92.36%. Similarly, the training and testing performance of the k-NN with different k value was also evaluated as shown in Table 5. From the analysis, it has also been observed that the k = 3 provides better training (95.23%) and testing (94.08%) performance. Hence, k = 3 has been used for 4-fold cross validation for k-NN classifier listed in Table 6. The overall performance of the 4-fold cross validation for k-NN is 95.43%. From Tables 4 and 6, it has been observed that the k-NN provides better performance in comparison to Naive Bayes with an increment of 3.07%.
Further, the training and testing performance of the SVM with different kernel function has been evaluated and listed in Table 7. From the Table 7, it has been observed that the polynomial kernel function provides better performance with training accuracy of 97.53% and testing accuracy of 96.05%. Finally, 4-fold cross validation of the SVM with selected parameter has been evaluated and it resulted in an overall accuracy of 97.78% and is listed in Table 8. From Tables 4, 6 and 8, SVM shows an improvement in performance of 5.42% and 2.35% in comparison to Naive Bayes and k-NN respectively. From the comparative analysis of the performance of different classifiers as shown in Fig. 3, it has been observed that the SVM performed better in comparison to other classifiers. The performance of the classifiers for erythrocyte separation into isolated and clump may be rated as SVM > k-NN>Naive Bayes. The clump erythrocytes separated from isolated erythrocyte using SVM classifier were further used for segmenting the erythrocyte into individuals.
Performance of erythrocyte segmentation
Once the clump erythrocytes were obtained from the cell separation process, these clump erythrocytes were further segmented by using the marker-controlled watershed segmentation techniques. Here, a regional minimum of H-minima transform is used as an internal marker as it can segment the clump erythrocyte properly by suppressing the unwanted noise. The total segmented erythrocytes have been obtained by combining the isolated cell and segmented clump cell by using the marker-controlled watershed segmentation. The overall accuracy of the erythrocyte segmentation is obtained by using Equation (11). On manual counting, a total of 812 erythrocytes was observed, where both isolated and clumped erythrocytes were present. When the clumped erythrocytes were considered as made of more than one erythrocyte, the count rose to 912. From the experimental analysis of the proposed algorithm, a total of 894 individual erythrocyte was obtained. Hence, the proposed algorithm obtained a segmentation accuracy of 98.02%. Moreover, the performance comparison of proposed method with the state of art existing method has also been performed. The quantitative analysis of the proposed method in comparison to classical methods is also shown in Fig. 4. From the performance comparison shown in Table 9, it can be concluded that the proposed system provides satisfactory results for the erythrocyte segmentation with an error rate 1.98%. The misclassification of the erythrocyte segmentation is mainly due to some of the highly overlapping erythrocytes present in the microscopic images of thin blood smears. Segmented erythrocytes may be used for analysis of the morphological properties to detect the abnormalities in the erythrocytes.
Conclusion
In this paper, erythrocyte segmentation for quantification in microscopic images of thin blood smears has been proposed. The main aim of the proposed system is to segment the erythrocyte into individual cells which can be used for the analysis and the quantification of the cells in the microscopic images. In comparison to manual estimation by clinical expert, the proposed method is significantly a more efficient, accurate and fast process. The proposed technique consist of segmentation of the foreground cell, morphological filtering to remove unwanted artifacts, isolated and clump cell separation and clump erythrocyte segmentation. In cell separation process, the features such as cell area, compactness ratio and aspect ratio have been used to characterize the isolated and clump erythrocytes. Moreover, the performances of the different classifiers such as Naive Bayes, k-NN and SVM have been evaluated. From the experimental analysis, the SVM classifier with proposed features such as cell area, compactness ratio and aspect ratio provide satisfactory results with an accuracy of 97.78% in comparison to other classifiers. Hence, SVM based cell separation has been used for the proposed system of cell separation. The technique has the advantages of considering the erythrocytes which are present at the image boundaries for erythrocyte quantification. Further, clump erythrocytes were segmented using H-minima based marker-controlled watershed technique. The overall system accuracy is 98.02% for erythrocyte segmentation. In future contribution, segmentation of highly overlapping cells may be performed. Moreover, the cell analysis can be done from the segmented erythrocytes to detect the abnormality in the morphological features of the erythrocytes.
