An automated eye disease prediction system using bag of visual words and support vector machine

Abstract

This paper proposes a prediction system to identify the type of eye diseases like glaucoma and diabetic retinopathy. The proposed system processes the images captured using the fundus camera that is connected to the computer. The acquired fundus images are fed into the proposed prediction system which can be deployed in the cloud, and it identifies the type of disease. This forms a cyber-physical system. Underdeveloped countries which do not have the necessary infrastructure can utilize this service when this system is deployed in the cloud. For identifying these diseases, ophthalmologists extract parameters manually from the fundus image, which is a difficult task. Hence, this research work attempts to develop a system to automate the feature extraction from fundus images and with the extracted features, eye diseases are predicted. From the literature, it is found that many research works were focused on the binary classification of any one disease. In this paper, a novel classification methodology is proposed that helps the experts and clinicians to classify Diabetic Retinopathy, Glaucoma and healthy eye images with more accuracy. The proposed system with high accuracy is designed with the following phases: i) image acquisition, ii) image enhancement, iii) local features extraction using Speeded Up Robust Feature (SURF), iv) Bag of Features/Visual Words (BoF/BoVW) obtained through k-means clustering of local features, and v) classification using Error-Correcting Output Code (ECOC) linear SVM. It is inferred from the results that proposed method of classification using BoVW provided a maximum accuracy of 92% when compared to other state-of-the-art recent literature.

Keywords

Fundus image image enhancement bag of features support vector machine classification

1 Introduction

Change in lifestyle and work stress has made the younger generation vulnerable to health conditions like high blood pressure and high blood sugar. Diabetes Mellitus is a health condition in which the sugar level in blood is above the prescribed limit. Number of patients with Diabetes Mellitus is poised to grow drastically in the near future. It leads to a health condition in eye called Diabetic Retinopathy (Fig. 1). If untreated, diabetic retinopathy causes permanent blindness.

The anterior and posterior chambers of eye are filled with aqueous humor. It is a transparent watery fluid, which maintains the intraocular pressure and the ciliary processes continuously produces it. The fluid is drained from the anterior chamber into the blood vessels through the Canal of Schlemm. If there is a blockage in the draining of the fluid, pressure builds in the eye leading to a medical condition called Glaucoma (Fig. 2). Both Glaucoma and Diabetic Retinopathy are the major causes of permanent vision loss.

Fundus photography helps in capturing the image of back of the eye i.e., fundus. Fundus images are ocular documentation that exhibits the appearance of patient’s retina. Clinically, these images are used to examine and diagnose diabetic retinopathy and glaucoma. Blood vessels & neurons of retina are analyzed for diagnosing diabetic retinopathy and optic disc & cup of the eye are analyzed for diagnosing glaucoma.

Manual examination of these images may lead to human errors and these errors can be reduced by analyzing the images using advanced image processing algorithms. Diagnosis of Glaucoma is done by segmenting optic cup and optic disc from the fundus image. Extraction of blood vessels, exudates, and microaneurysms will help in diagnosing diabetic retinopathy.

Fig.1

Diabetic retinopathy (Image courtesy:http://www.eagleeyecentre.com.sg/service/diabetic-retinopathy/).

Fig.2

Glaucoma (Image courtesy:http://pgheyemds.com/our-services/glaucoma-treatment-laser-surgery/).

This research focuses on developing a classification methodology that can be deployed in cloud so that clinicians can predict the type of eye disease remotely. The architecture of the system in cloud environment is given in Fig. 3. Workflow of the prediction system in given in Fig. 4. In the proposed system, fundus images are acquired using fundus photography and it is given to the prediction system. The images are enhanced, and local features are extracted using SURF. After the extraction of local features, clustering is performed. Bag of Visual words or Bag of Features is formed using the cluster centers. This Bag of Features is given as input for classification of diseases.

Fig.3

Architecture of the proposed system 1) captured fundus image is given to the prediction system, 2) the result of the classification is given back to the hospital, 3) & 4) hospitals with access to the cloud can utilize the system, 5) images of different diseases are stored in the database to train the system and the database is updated as new images are given as input from different source.

Fig.4

Workflow of the prediction system.

Fig.5

(a) Fundus image of healthy eye (Left eye). b) Fundus im-age of healthy eye (Right eye).

The proposed classification approach is implemented and validated with the images from High Resolution Fundus (HRF) image database [1]. The dataset has a total of 45 images that include healthy eye, glaucoma, and diabetic retinopathy with a resolution of 3504x2336. This dataset is chosen because of the image quality and proper categorization. Sample fundus images of healthy eye, glaucoma, and diabetic retinopathy are shown in Figs. 5, 6, and 7.

1.1 Related works

With advanced image processing and machine learning algorithms, diagnosis of eye related diseases has become an emerging field of research and some significant contributions that established groundwork for the proposed approach are given:

Fig.6

Fundus image of Diabetic Retinopathy.

Fig.7

Fundus image of Glaucoma.

Detection of Glaucoma: Liu et al. proposed an automated Cup-to-Disc Ratio (CDR) measuring scheme to analyse fundus images of glaucoma using Level-set Image Processing called ARGALI for automatic glaucoma risk assessment [2]. Liu et al. framed a system architecture named AGLAIA that uses active contour for segmentation of Optic Disc and Optic Cup [3]. Bock et al. proposed a probabilistic two-stage classification to extract Glaucoma Risk Index (GRI) to detect glaucoma. It combines Generic feature types compressed by an appearance-based dimension reduction [4]. The proposed method has an accuracy of 88%. Muramatsu et al. proposed a method that uses Retinal nerve fiber layer defect (NFLD) which is major sign for glaucoma [5]. Madhusudhan et al. proposed segmentation process that uses Multi-Thresholding, Active contours and Region growing for segmenting disc and cup; using this CDR was calculated to diagnose Glaucoma [6]. Optic disc and optic cup segmentation is done using super pixel classification [7]. The proposed method uses histograms and center surround statistics for classifying each pixel as disc or non-disc. For optic cup, location in-formation is used along with histograms and statistics around center. Anusorn et al. proposed a method to extract Optic cup and disc [8]. Optic disc was segmented using edge detection and variational level-set. To detect optic cup, color component analysis and threshold level-set method were used. Salam et al. proposed a methodology, which is a fusion of CDR with Hybrid textural and intensity features [9]. The methodology achieves an accuracy of 92%. Salam et al. proposed a novel algorithm that uses structural and non-structural features to diagnose glaucoma [10]. It also introduces a suspect class in case of a conflict in decision from structural and non-structural features. A ratio called CDR is found based on which glaucoma is identified. To find CDR, precise segmentation of optic disc and cup from fundus image is required. Multi-label Deep Learning architecture called M-Net and polar transformation are used for segmenting the optic disc and optic cup from fundus image [11]. The accuracy achieved was 89% using CDR to diagnose glaucoma.

Diabetic Retinopathy (DR): It is caused by a medical condition called Diabetes Mellitus. Segmentation of exudates, hemorrhages and microaneurysm from fundus image is required to diagnose DR. Gardner et al. proposed Artificial Neural Networks (ANN) that detect vessels, exudates and hemorrhages [12]. Using ANN, DR was detected with a sensitivity of 88.4% and a specificity of 83.5%. Csurka et al. proposed a new approach based on Bag of Keypoints for visual categorization [13]. This method uses vector quantization of descriptors of image blocks which are affine invariant and k-means clustering for generating the bag of keypoints. This bag of keypoints is used as input by classifiers for categorization. Fleming et al. used multi-scale morphological process to detect candidate exudates [14]. The likelihood of the candidate being classified as exudate, drusen or background is decided based on local properties. Optimally adjusted Morphological operators are used for detecting exudates which is a primary sign of diabetic retinopathy [15]. Acharya et al. proposed SVM classifier that uses higher-order Spectra to detect DR stages with a sensitivity of 82% and specificity of 88% [16]. Ophthalmologists recognize DR based on features, such as blood vessel area, exudates, hemorrhages, microaneurysms and texture [17]. Sujatha et al. proposed a Fuzzy based Multiple Dictionary Bag of Words in which a dictionary is built using soft clustering algorithm [18]. Welikala et al. proposed a system for automated detection of DR that uses standard line operator and modified line operator to generate the vessel maps [19]. Local morphological features are extracted from the two vessel maps to create two separate feature sets. Classification was performed using Support Vector Machine (SVM) individually for these feature sets and the system combines the result of the two individual classification outcomes to produce final decision. The system produced a sensitivity of 86.2% and a specificity of 94.4%. Shinomiya et al. proposed the use of Fuzzy C-Means clustering along with Scale Invariant Feature Transform for generating the Bag of Features which can be used for classification [20]. Azzopardi et al. proposed a filter called B-COSFIRE (Bar-Combination Of Shifted Filter Responses) for vessel delineation that can be used in DR classification [21]. Gulshan et al. used Deep learning algorithm for automatic detection of DR and Macular edema in retinal fundus image [22]. Abramoff et al. used Convolutional Neural Network based anatomy detector and lesion detectors to extract the features from the fundus images [23]. These features were given as input to a fusion classifier implemented using random forests. Costa et al. proposed a Bag of Visual Words (BoVW) that uses local features to learn a visual dictionary and creates mid-level representations of the image using the visual dictionary [24]. The generated mid-level representations are used for identifying the DR in fundus image.

From the literature survey, it is inferred that many of the research works are carried out to do binary classification (i.e.) the classification is done to check whether the given fundus image has glaucoma or not. In this paper, a multi-class classification is implemented that takes bag of visual words extracted from fundus images of glaucoma, diabetic retinopathy, and healthy eyes as input.

This article is organized as follows: Section –2 presents the methods used for classification; Section –3 gives the proposed classification approach; Section –4 comprises of the results and discussion; and Section –5 concludes the research work with its future scope.

2 Materials and methods

2.1 Image acquisition

Fundus photography: The retina converts the images into the electric impulses and sends them to the brain. Details of the retina can be captured using Fundus photography [25]. Retina can be acquired using a fundus camera by illuminating the retina with the help of imaging light rays passed through the pupil. Ophthalmic photographer captures the fundus image of the eye. The ophthalmic photographer focuses and aligns the camera to capture the retina of the patient. As the photographer presses the shutter release, a flash is fired to capture the fundus. The captured images are stored for further analysis and classification of diseases.

2.2 Image enhancement

Image enhancement is the process of boosting the image quality through noise removal and improving the contrast such that the enhanced image suits for further analysis. Prominent enhancement techniques are contrast stretching, histogram equalization, bit plane slicing, intensity transformation, etc. In this research work, contrast stretching is employed to improve the quality of the images and it helps in extracting fine features from the image.

2.3 Bag of Features (or) Visual words

BoF comprises of the following steps: i) extraction of local features from the image, and ii) building the vocabulary. Local features of the image are extracted using SURF descriptor. From the extracted local features, the key points or predominant features are elected using k-means clustering and the vocabulary is built.

2.3.1 SURF

Speeded Up Robust Features (SURF) is a scale- and rotation-invariant detector and descriptor proposed by Bay et al. [26]. It outperforms the other feature extractors like Shift Invariant Feature Transform (SIFT), automatic scale selection, cascade filtering approach, Maximally Stable External Regions (MSER) with respect to repeatability, distinctiveness and robustness. It is claimed that SURF features are extracted from the images much faster than the other local descriptors. SURF descriptor detects the interest points using basic Hessian matrix approximation. It makes use of integral images which are made prominent by Viola et al. [27].

i) Integral images

Integral images are used in this descriptor to support fast computation of box convolution filters. Integral image I_Σ (x) at a location x = (x, y) ^T presents “the sum of all pixels of the input image I within a rectangular region formed by the origin and x”. It is given in the Equation (1). $I_{Σ} (x) = \sum_{i = 0}^{i \leq x} \sum_{j = 0}^{j \leq y} I (i, j)$ (1)

After computing the integral image, sum of the intensities within the upright rectangular area is computed throughout the image.

ii) Hessian Matrix based interest points

Hessian matrix and its properties of an image gives the details about corners and blob-like structures of an image. Maximum determinant value of the Hessian matrix represents the blob-like structures of an image more precisely. Consider a point x = (x, y) in an image I, the Hessian matrix $H (x, σ)$ at scale σ is defined as follows: $H (x, σ) = [\begin{matrix} L_{xx} (x, σ) & L_{xy} (x, σ) \\ L_{xy} (x, σ) & L_{yy} (x, σ) \end{matrix}]$ (2) where L_xx (x, σ) is the convolution of the second order Gaussian derivative $\frac{\partial^{2}}{\partial x^{2}} g (σ)$ with the image I at point x. In general, Gaussians are optimal for scale space analysis, but it must be cropped and discretized during implementation [28, 29]. The detected interest points should be repeatable when the images are rotated. Cropping and discretization may affect the property of repeatability but the loss in accuracy is negligible. Repeatability loss is maximum under image rotations at angles which are odd multiples of $\frac{π}{4}$ . It is minimum when the images are rotated at the angles which are multiples of $\frac{π}{2}$ . Square shape of the filters is the major cause for the variation in repeatability. Computations involved in finding the determinant of the Hessian matrix at different scales and sizes of box filters are explained in Bay et al. [26].

iii) Scale space representation

Interest points in the image are identified at different scales of the image. It will be helpful in comparing the interest points at different scales. Image pyramids (Fig. 8) are used to implement the scale spaces.

Fig.8

Image pyramid with different scales (Image Courtesy: https://en.wikipedia.org/wiki/Pyramid_(image_processing).

These images at different scales are subtracted to get the Difference of Gaussian images where the blobs and edges can be identified easily [30]. Box filter of any size is applied to the original image and to the subtracted images at different scales. Hence, scale-space of the image is analyzed by up-scaling the size of the filter instead of reducing the size of the image as shown in the Fig. 9.

Fig.9

Up-scaling the filter size (Image Courtesy: Bay et al. [26]).

As down-sampling is not implemented in the images, there is no need for aliasing. Pixels with high frequencies may be lost when the image is zoomed out at different scales. But, use of box filters preserves these high frequencies. It results in limited scale-invariance. The scale space shown in image pyramid (Figs. 8 and 9) is divided into octaves. Generally, in each octave, the resolution of the image is halved or doubled to fetch the finer features from the image. But, in SURF descriptor, the image resolution remains same and the size of the filter is scaled by a factor of 2 at each octave. Entire scale space is sub-divided into a fixed number of octaves. As integral images are discrete in nature, “the minimum scale difference between two subsequent scales depends upon the length (l₀) of the positive or negative lobes of the partial second order derivative in the direction of derivative either x or y”.

iv) Interest point localization

Interest points in the image and over the scales are localized by applying a non-maximum suppression of 3×3×3 neighborhood. Fast variant proposed by Neubeck et al, is used [31]. Maximum determinant values of the Hessian matrix are interpolated in scale and image space [32]. This interpolation is mandatory because the difference between every octave’s first layer is relatively large.

2.4 k-means clustering

Local features extracted from the image using SURF descriptor are given as input for k-means clustering [33, 34]. Initially, key points are chosen randomly and then clustering is done for k iterations to identify features from the image. k-means clustering is an unsupervised learning algorithm that classifies the given data with certain k-number of clusters and to define k-centers, one for each cluster. In first iteration, centers are placed far away from one another. Next, each point belonging to the dataset is associated with the center which is closer to it. When all the points are associated with their closest center, first level of clustering is done. Then, new cluster centers are recalculated, and the data points are reclustered. Iteration stops when the cluster centers cannot move or change further. In BoF, the local features are given as input for clustering algorithm and the resulting clusters are the vocabulary built for further classification. Algorithmic representation of k-means clustering is given as below:

Algorithm:k-means clustering
Input: Local features extracted from the image
Output: Cluster with predominant features as cluster centers (Vocabulary)
Let X = {x₁, x₂, x₃, x₄,⋯, x_n} be the set of points in the dataset X and G={g₁, g₂, g₃, g₄,⋯, g_c} be the set of centers.
Step 1: Select “c” cluster centers randomly.
Step 2: Distance between the data points and the cluster centers is calculated.
Step 3: Data points are associated with the cluster center whose distance is minimum.
Step 4: Cluster centers are recalculated using the following equation (2)
$g_{i} = (\frac{1}{c_{i}}) \sum_{i = 0}^{c_{i}} x_{i}$ (2)
where c_i = number of data points in ith cluster.
Step 5: Distance between the data points and new cluster centers are recalculated.
Step 6: If no data points is reassigned, then stop the process. Else repeat the process from step 3.

2.5 Support Vector Machine (SVM)

In this research work, support vector machine is used for classifying the images of glaucoma, diabetic retinopathy and healthy fundus images. Basically, SVM is designed as a binary classifier. Decision tree algorithm C4.5 [35], CART [36], and binary concept learning with distributed output learning are some of the approaches to solve the multi-class problems. SVM algorithm is extended to solve multi-class problems by employing Error Correcting Output Codes (ECOC) for distributed output representation [37]. The extended SVM algorithm with ECOC is used for the classification of the images with its vocabulary.

ECOC model reduces the multi class problems into a set of binary classifiers. If there are 3 classes, then three SVM classifiers are employed for classification. ECOC model requires a proper coding and decoding schemes to decide the number of learners to be used in the classification. Coding determines the classes for which the learners are to be trained, decoding aggregates the results from the binary classifiers. There are many coding schemes like, one-versus-all (OVA), one-versus-one (OVO), binary complete, ternary complete, ordinal, dense random and sparse random. Based on the number of classes, coding and decoding schemes will be selected and multi class problems are solved.

3 Proposed classification approach

The proposed classification methodology to classify the fundus images comprises of following steps: i) contrast stretching, ii) Image flipping, iii) SURF, iv) k-means clustering, and v) SVM-ECOC.

3.1 Contrast stretching

Fundus images are acquired from the HRF image dataset [1] and the images are enhanced by using contrast stretching technique. It is the process of normalizing the contrast level throughout the image. Pixel intensities are normalized such that these intensities span the required range of values. The global maximum and minimum intensity values of the RGB image are 255 and 0 respectively. Maximum and non-zero minimum intensity values of the image are considered as local maximum and minimum respectively. With these maxima and minima values, the contrast of the input image is enhanced pixel by pixel through Equation (3). $E n h a n c e d_{p i x e l} = (i n p u t P i x e l - m i n_{l o c a l}) * \frac{(m a x_{g l o b a l} - m i n_{g l o b a l})}{(m a x_{l o c a l} - m i n_{l o c a l})} + m i n_{g l o b a l}$ (3) where $max_{global}$ = 255 and $min_{global} = 0$ . The result of contrast stretching is shown in Fig. 10.

3.2 Image flipping

HRF image dataset comprises of fundus images of healthy eye, glaucoma, and diabetic retinopathy. It holds the images of both left eye and right eye. When the combination of left eye and right eye images are given, it may lead to misclassification. In this intuition, all the images are flipped towards right or left and then given as input for the next phase.

3.3 SURF descriptor

SURF descriptor explained in section 2.2.1 is applied on the flipped images. From 45 enhanced flipped fundus images of glaucoma, diabetic retinopathy and healthy eyes, 2,30,21,280 features are extracted. From the 2,30,21,280 features, 80% of the strong features are retained from each category, and these strong features are 1,14,17,024 in total.

Fig.10

(a) Original image. (b) Contrast enhanced image.

3.4 k-means clustering

k-means clustering is an unsupervised learning algorithm and the steps involved in the clustering process are explained in section 2.3. The algorithm is implemented with the features extracted from the images and the steps are given as follows:

Step 1: Dataset comprises of 1,14,17,024 datapoints are given as input.

Step 2: From the 1,14,17,024 datapoints, 10,000 cluster centers are selected randomly.

Step 3: Euclidean distance is calculated between the cluster centers and the data points.

Step 4: Data points which are closer to the cluster centers are associated with the same.

Step 5: Cluster centers are recalculated using Equation (2).

Step 6: Repeat step 3 to calculate the distance between new cluster centers and the data points are associated accordingly.

Step 7: If no data points are reassigned then stop the process, else repeat step 5.

In the experiments conducted with the HRF dataset, maximum of 100 iterations are fixed and clustering process converged between 27 and 30 iterations. The resultant local features are clustered, and the cluster centers form the bag of visual words that is used for classification. The bag of visual words is plotted as histogram and it is shown in Fig. 11.

Fig.11

Vocabulary of the image features.

3.5 SVM-ECOC

Ensemble of SVM learner is designed with ECOC. Basics of SVM and ECOC are explained in section 2.4. In this paper, SVM-ECOC is used for solving problem with three classes. To propose an ensemble to classify three classes, prominent binary learner SVM, one-versus-one (OVO) coding scheme and loss_g decoding scheme is chosen. In OVO coding scheme, if there are k classes, then $\frac{k (k - 1)}{2}$ learners are involved in the classification process. $Number of classes k = 3$ $Number of learners = \frac{k (k - 1)}{2} = \frac{3 (3 - 1)}{2} = 3$

Here, for glaucoma, diabetic retinopathy, and heathy images, separate SVM learners are assigned. OVO coding for fundus image classification is given in Table 1. From Table 1, OVO coding determines the classes that the binary learners need to be trained for.

Table 1
Proposed OVO coding for Fundus image classification

Learner1 Learner 2 Learner 3

Class 1: DR 1 1 0

Class 2: Glaucoma –1 0 1

Class 3: Healthy 0 –1 –1

	Learner1	Learner 2	Learner 3
Class 1: DR	1	1	0
Class 2: Glaucoma	–1	0	1
Class 3: Healthy	0	–1	–1

In this scheme, learner-1 considers class-1 as positive observations, class-2 as negative observations, and class-3 is ignored. Likewise, learner-2 considers class-1 as positive observations, class-3 as negative observations and class-2 is ignored. Learner-3 considers class-2 as positive observations, class-3 as negative observations and class-1 is ignored.

Let CM be the coding matrix with elements cm_kl, and score_l be the predicted classification score for the positive class of learner l. Arbitrary input is assigned to the class that minimizes the aggregation of losses for the L learners [37]. It is given in Equation (4). $\hat{k} = arg min \frac{\sum_{i = 1}^{L} | c m_{kl} | g (c m_{kl}, {score}_{l})}{\sum_{i = 1}^{L} | c m_{kl} |}$ (4)

Escalera et al. [38] claimed that ECOC models provide better classification accuracy than the other multi-class classification models [37, 39]. In total of 45 images, 33 images are given for training and 12 images are given for testing. Results of classification are tabulated in Table 2. From Table 2, it is inferred that 92% of classification accuracy is achieved.

Table 2

Classification accuracy achieved with enhanced, flipped images with 10,000 cluster centers

Known	Predicted
	DR	Glaucoma	Healthy
DR	0.89	0.05	0.06
Glaucoma	0.07	0.90	0.03
Healthy	0.01	0.01	0.98

4 Results and discussion

An ensemble of multi class learner that is composed of linear SVM [40] is proposed and the images are classified based on the Bag of Visual Words acquired from the SURF descriptor and k-means clustering. The experiments are conducted with the images acquired from HRF images dataset [1] that comprises of 45 images (15 glaucoma, 15 diabetic retinopathy, and 15 healthy images). Input data is categorized in separate folders and given as data store for processing. In feature extraction process, during each iteration, images are given as input in random. Local features are extracted using SURF and clustering is done with minimum of 500 and maximum of 10000 clusters. The extracted features are given as input for classification. From the data store, 33 images are given for training and 12 images are given for testing. Classification results acquired with minimum and maximum cluster centers are explored here. Following are the different cases of the experiments conducted:

Case 1: The raw fundus images are given as input and the clustering is performed with 500 clusters centers. Extracted features are given for classification and the results are tabulated in Table 3. From Table 3, it is inferred that classification accuracy of 75% is achieved.

Table 3
Classification accuracy achieved with raw fundus images with 500 cluster centers

Known Predicted

DR Glaucoma Healthy

DR 0.50 0.25 0.25

Glaucoma 0.00 0.75 0.25

Healthy 0.00 0.00 1.00

Known	Predicted
DR	0.50	0.25	0.25
Glaucoma	0.00	0.75	0.25
Healthy	0.00	0.00	1.00

Case 2: Fundus images are enhanced by contrast stretching and clustering is performed with 500 clusters. Extracted features are given for classification and the results are tabulated in Table 4. From Table 4, it is inferred that 58% of classification accuracy is achieved.

Table 4

Classification accuracy achieved with enhanced fundus images with 500 clusters

Known	Predicted
	DR	Glaucoma	Healthy
DR	0.75	0.00	0.25
Glaucoma	0.25	0.50	0.25
Healthy	0.5	0.00	0.50

Case 3: Fundus images are enhanced by contrast stretching and the enhanced images are flipped uniformly. Clustering is performed on the flipped images with the limit of 500 cluster centers. Extracted features or cluster centers are given for classification and the results are tabulated in Table 5. From Table 5, it is inferred that 83% of classification accuracy is achieved.

Table 5

Classification accuracy achieved with flipped, enhanced fundus images with 500 clusters

Known	Predicted
	DR	Glaucoma	Healthy
DR	0.75	0.25	0.00
Glaucoma	0.25	0.75	0.00
Healthy	0.00	0.00	1.00

Case 4: The raw fundus images are given as input and the cluster centers are increased to 10,000. These 10,000 features are given for classification and the results are tabulated in Table 6. From Table 6, it is inferred that 67% of classification accuracy is achieved.

Table 6

Classification accuracy achieved with raw fundus images with 10000 clusters

Known	Predicted
	DR	Glaucoma	Healthy
DR	0.50	0.00	0.50
Glaucoma	0.00	0.50	0.50
Healthy	0.00	0.00	1.00

Case 5: Fundus images are enhanced using contrast stretching and clustering is implemented with 10000 cluster centers. Extracted features are given for classification and the results are tabulated in Table 7. From Table 7, it is inferred that 75% of classification accuracy is achieved.

Table 7

Classification accuracy achieved with enhanced fundus images with 10000 clusters

Known	Predicted
	DR	Glaucoma	Healthy
DR	0.75	0.00	0.25
Glaucoma	0.00	0.75	0.25
Healthy	0.00	0.25	0.75

Case 6: Fundus images are enhanced by contrast stretching and the enhanced images are flipped uniformly. Clustering is performed on the flipped images with the limit of 10,000 cluster centers. Extracted features or cluster centers are given for classification and the results are tabulated in Table 2.

In the experiments, for each case, the learner is trained with different sets of images in a random manner and maximum accuracy achieved is quoted here. In literature, dichotomous classification is implemented for any single disease. Hence, the multi-class classification results of the proposed work cannot be compared with the dichotomous classification. So, the performance analysis is done by comparing the cases 1 to 6 implemented with the Decision SVM classifier [41] and evaluation metrics like accuracy, precision, sensitivity, and specificity are calculated and it is given in Table 8.

From the Table 8 it is inferred that,

ECOC – SVM performed better than the DT – SVM in terms of all the three classes

DT – SVM performs better with respect to true negative rate, i.e. Specificity

Both ECOC – SVM and DT – SVM classifies the healthy eye images correctly with maximum of 100% accuracy

Classification done by ECOC-SVM with enhanced-flipped fundus images with 10,000 features gave consistent results for all the three classes with overall accuracy of 92%.

The limitation encountered while conducting these experiments is the exponential increase of processing time with the increase in number of cluster centers. Even though, the processing time is increasing exponentially, classification accuracy achieved is promising when the number of cluster centers is increased.

In future, the clustering process can be optimized for reducing the processing time and also an ensemble can be constructed with different learners.

Table 8

Performance analysis

Classifier	Case. #	Overall	Accuracy			Sensitivity			Specificity
		Accuracy (%)	DR	G	H	DR	G	H	DR	G	H
ECOC – SVM	Case 1	75	1.00	0.75	0.67	0.50	0.75	1.00	1.00	0.88	0.75
	Case 2	58	0.50	0.50	0.50	0.75	0.50	0.50	0.63	1.00	0.75
	Case 3	83	0.75	0.75	1.00	0.75	0.75	1.00	0.88	0.88	1.00
	Case 4	67	1.00	0.50	0.50	0.50	0.50	1.00	1.00	1.00	0.50
	Case 5	75	1.00	0.75	0.60	0.75	0.75	0.75	1.00	0.87	0.75
	Case 6	92	0.92	0.90	0.91	0.89	0.90	0.98	0.96	0.97	0.95
DT-SVM	Case 1	72	0.80	0.67	0.63	0.67	0.67	0.83	0.92	0.91	0.75
	Case 2	61	0.63	0.58	0.62	0.41	0.58	0.83	0.87	0.79	0.75
	Case 3	81	1.00	0.91	0.84	0.58	0.91	0.91	1.00	0.79	0.91
	Case 4	69	0.71	0.75	0.73	0.41	0.75	0.91	0.91	0.79	0.83
	Case 5	72	0.83	0.83	0.73	0.41	0.83	0.91	0.95	0.79	0.83
	Case 6	91	0.83	0.83	0.92	0.91	0.83	0.98	0.91	0.99	0.95

DR – Diabetic Retinopathy; G – Glaucoma; H – Healthy; DT – Decision Tree.

5 Conclusion

In this research work, a system is developed that classifies the fundus images of glaucoma, diabetic retinopathy and healthy images by using Bag of Visual words or Bag of features. The proposed system achieved a maximum classification ac-curacy of 92%. The system can be deployed in cloud so that the clinicians around the world can utilize remotely. The experiment is conducted by taking the images from HRF image database and these images are shuffled randomly and given for classification. To achieve better classification accuracy, prominent features from the images are acquired using SURF and k-means clustering process. It is concluded that from various test cases of the experiment, classification done with 10000 cluster centers chosen from contrast enhanced and flipped images achieved better classification accuracy of 92%. And, the classification accuracy achieved is compared with the different test cases of the experiments with ECOC-SVM and Decision tree-based SVM (DT-SVM). As most of the research works are carried out as dichotomous classification, it is not possible to compare the existing results with multi–class classification problem. In the proposed approach, ECOC–SVM is replaced with DT-SVM and performance analysis is done. The limitation encountered during the implementation process is the exponential increase of processing time with respect to the increase in cluster centers. In future, it can be resolved by optimizing the clustering process. Also, new ensembles with different supervised and unsupervised learners can be developed for solving multi-class problems.

Footnotes

Acknowledgement

We, the authors would like to thank the Department of Science and Technology, India for their financial support through Fund for Improvement of S&T Infra-structure (FIST) programme (SR/FST/ETI-349/2013). We also sincerely thank the SASTRA Deemed to be University for providing an excellent infrastructure to carry out the research work.

References

Odstrcilik

, Kolar

, Budai

, Hornegger

, Jan

, Gazarek

, Kubena

, Cernosek

, Svoboda

, Angelopoulou

, Retinal vessel segmentation by improved matched filtering: evaluation on a new high-resolution fundus image da-tabase, IET Image Processing7(4) (2013), 373–383.

Liu

, Wong

D.W.

, Lim

J.H.

, Li

, Tan

N.M.

, Zhang

, Wong

T.Y.

, Lavanya

ARGALI: An automatic cup-to-disc ratio measurement system for glaucoma analysis using level-set image processing, In Proceedings of 13th In-ternational Conference on Biomedical Engineering, 2009, Springer, Berlin, Hei-delberg, 559–562.

Liu

, Wong

D.W.

, Tan

T.M.

, Zhang

, Yin

, Cheng

, Lee

B.H.

, Liang

, Lim

J.H.

, Li

, Wong

T.Y.

Aglaia system architecture for glaucoma diagno-sis, In Proceedings of 2nd APSIPA Annual Summit Conference, 2010, 657–661.

Bock

, Meier

, Nyúl

L.G.

, Hornegger

, Michelson

, Glaucoma risk index: automated glaucoma detection from color fundus images, Medical Image Analysis14(3) (2010), 471–481.

Muramatsu

, Hayashi

, Sawada

, Hatanaka

, Hara

, Yamamoto

, Fujita

, Detection of retinal nerve fiber layer defects on retinal fundus im-ages for early diagnosis of glaucoma, Journal of Biomedical Optics15(1) (2010) 016021.

Madhusudhan

, Malay

, Nirmala

S.R.

, Samerendra

Image pro-cessing techniques for glaucoma detection, In proceedings of International Con-ference on Advances in Computing and Communications, 2011, Springer, Berlin, Heidelberg, 365–373.

Cheng

, Liu

, Xu

, Yin

, Wong

D.W.

, Tan

N.M.

, Tao

, Cheng

C.Y.

, Aung

, Wong

T.Y.

, Superpixel classification based optic disc and optic cup segmentation for glaucoma screening, IEEE Transactions on Medical Imaging32(6) (2013) 1019–1032.

Anusorn

C.B.

, Kongprawechnon

, Kondo

, Sintuwong

, Tung-pimolrut

, Image processing techniques for glaucoma detection using the cup-to-disc ratio, Thammasat International Journal of Science and Technology18(1) (2013) 22–34.

Salam

A.A.

, Akram

M.U.

, Wazir

, Anwar

S.M.

, Majid

, Autonomous Glaucoma detection from fundus image using cup to disc ratio and hybrid fea-tures, In proceedings of IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), 2015, 370–374.

10.

Salam

A.A.

, Khalil

, Akram

M.U.

, Jameel

, Basit

, Automated detec-tion of glaucoma using structural and nonstructural features, Springerplus5(1) (2016), 1519.

11.

, Cheng

, Xu

, Wong

D.W.

, Liu

, Cao

, Joint Optic Disc and Cup Segmentation Based on Multi-label Deep Network and Polar Transformation, IEEE Transactions on Medical Imaging37(7) (2018), 1597–1605.

12.

Gardner

G.G.

, Keating

, Williamson

T.H.

, Elliott

A.T.

, Automatic detec-tion of diabetic retinopathy using an artificial neural network: a screening tool. British Journal of Ophthalmology80(11) (1996), 940–944.

13.

Csurka

, Dance

, Fan

, Willamowski

, Bray

, Visual categoriza-tion with bags of keypoints, In Workshop on statistical learning in computer vi-sion, ECCV1(1–22) (2004), 1–2.

14.

Fleming

A.D.

, Philip

, Goatman

K.A.

, Williams

G.J.

, Olson

J.A.

, Sharp

P.F.

, Automated detection of exudates for diabetic retinopathy screening, Physics in Medicine & Biology52(24) (2007) 7385.

15.

Sopharak

, Uyyanonvara

, Barman

and T.H. Williamson

T.H.

, Automatic detection of diabetic retinopathy exudates from non-dilated retinal images using mathematical morphology methods, Computerized Medical Imaging and Graphics32(8) (2008) 720–727.

16.

Acharya

, Chua

C.K.

, Ng

E.Y.

, Yu

, Chee

, Application of higher order spectra for the identification of diabetes retinopathy stages, Journal of Medical Systems32(6) (2008), 481–488.

17.

Faust

, Acharya

, Ng

E.Y.

, Suri

K.S.

, Algorithms for the automated de-tection of diabetic retinopathy using digital fundus images: a review, Journal of Medical Systems36(1) (2012), 145–157.

18.

Sujatha

K.S.

, Keerthana

, Priya

S.S.

, Kaavya

, Vinod

, Fuzzy based multiple dictionary bag of words for image classification, Procedia Engineer-ing38 (2012), 2196–2206.

19.

Welikala

R.A.

, Dehmeshki

, Hoppe

, Tah

, Mann

, Williamson

T.H.

, Barman

S.A.

, Automated detection of proliferative diabetic retinopathy using a modified line operator and dual classification, Computer Methods and Pro-grams in Biomedicine114(3) (2014), 247–261.

20.

Shinomiya

, Hoshino

Bag of Features Based on Feature Distribution Using Fuzzy C-Means, In International Conference on Human-Computer Interac-tion, Springer, Cham2014, 546–550.

21.

Azzopardi

, Strisciuglio

, Vento

, Petkov

, Trainable COSFIRE filters for vessel delineation with application to retinal images, Medical Image Analysis19(1) (2015) 46–57.

22.

Gulshan

, Peng

, Coram

, Stumpe

M.C.

, Wu

, Narayanaswamy

, Venugopalan

, Widner

, Madams

, Cuadros

, Kim

, Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs, JAMA316(22) (2016) 2402–2410.

23.

Abramoff

M.D.

, Lou

, Erginay

, Clarida

, Amelon

, Folk

J.C.

, Niemeijer

, Improved automated detection of diabetic retinopathy on a publicly available dataset through integration of deep learning, Investigative Ophthalmology & Visual Science57(13) (2016) 5200–5206.

24.

Saine

P.J.

, Tyler

M.E.

, Ophthalmic Photography: Retinal Photography, Angi-ography, and Electronic Imaging., Boston: Butterworth-Heinemann; 2002.

25.

Costa

, Campilho

, Convolutional bag of words for diabetic retinopathy detection from eye fundus images, IPSJ Transactions on Computer Vision and Applications9(1) (2017) 10.

26.

Bay

, Ess

, Tuytelaars

, Van

, Gool, Speeded-up robust features (SURF), Computer Vision and Image Understanding110(3) (2008) 346–359.

27.

Viola

, Jones

, Rapid object detection using a boosted cascade of simple features, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition1 (2001) I–I.

28.

Koenderink

J.J.

, The structure of images. Biological Cybernetics50 (1984) 363–370.

29.

Lindeberg

, Scale-space for discrete signals, PAMI12(3) (1990) 234–254.

30.

Lowe

D.G.

, Distinctive image features from scale-invariant keypoints, In-ternational Journal of Computer Vision60(2) (2004) 91–110.

31.

Neubeck

, Van

, Gool, Efficient non-maximum suppression. In Proceedings of 18th International Conference on Pattern Recognition, IEEE 3 (2006) 850–855.

32.

Brown

, Lowe

D.G.

, Invariant features from interest point groups, BMVC, (2002), 4.

33.

MacQueen

, Some methods for classification and analysis of multivariate ob-servations, In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1(14) (1967) 281–297.

34.

Hartigan

J.A.

, Wong

M.A.

, Algorithm AS 136: A k-means clustering algo-rithm, Journal of the Royal Statistical Society, Series C (Applied Statistics)28(1) (1979) 100–108.

35.

Quinlan

J.R.

C4.5: Programs for Empirical Learning, Morgan Kaufmann, San Francisco, CA, 1993.

36.

Breiman

, Friedman

J.H.

, Olshen

R.A.

, Stone

C.J.

Classification and Re-gression Trees. Wadsworth International Group, 1984.

37.

Dietterich

, Bakiri

, Solving multiclass learning problems via error-correcting output codes, Journal of Artificial Intelligence Research2 (1995), 263–286.

38.

Escalera

, Pujol

, Radeva

, Separability of ternary codes for sparse de-signs of error-correcting output codes, Pattern Recognition Letters30(3) (2009), 285–297.

39.

Lorena

A.C.

Carvalho

A.C.

Evolutionary design of multiclass support vector machines, Journal of Intelligent & Fuzzy Systems18(5) (2007), 445–454.

40.

Hong

, Multimodal brain-computer interface combining synchronously elec-troencephalography and electromyography, Journal of Intelligent & Fuzzy Systems2017, 1–8.

An automated eye disease prediction system using bag of visual words and support vector machine

Abstract

Keywords

1 Introduction

2.1 Image acquisition

2.2 Image enhancement

2.3 Bag of Features (or) Visual words

2.3.1 SURF

3 Proposed classification approach

3.1 Contrast stretching

3.3 SURF descriptor

Table 1 Proposed OVO coding for Fundus image classification Learner1 Learner 2 Learner 3 Class 1: DR 1 1 0 Class 2: Glaucoma –1 0 1 Class 3: Healthy 0 –1 –1

Table 3 Classification accuracy achieved with raw fundus images with 500 cluster centers Known Predicted DR Glaucoma Healthy DR 0.50 0.25 0.25 Glaucoma 0.00 0.75 0.25 Healthy 0.00 0.00 1.00

Footnotes

Acknowledgement

References

Table 1
Proposed OVO coding for Fundus image classification

Learner1 Learner 2 Learner 3

Class 1: DR 1 1 0

Class 2: Glaucoma –1 0 1

Class 3: Healthy 0 –1 –1

Table 3
Classification accuracy achieved with raw fundus images with 500 cluster centers

Known Predicted

DR Glaucoma Healthy

DR 0.50 0.25 0.25

Glaucoma 0.00 0.75 0.25

Healthy 0.00 0.00 1.00