Abstract
The development of Computer-Aided Design (CAD) and Computer-Aided Manufacturing (CAM) systems in the past decade has led to a remarkable advance in biomedical applications and devices. Particularly, CAM and CAD systems are employed in medical engineering, robotic surgery, clinical medicine, dentistry and other biomedical areas. Hence, the accuracy and precision of the CAD and CAM systems are extremely important for proper treatment. This work suggests a new CAD system for brain image classification by analyzing Magnetic Resonance Images (MRIs) of the brain. Firstly, we use the proposed Downsized Rank Kernel Partial Least Squares (DR-KPLS) as a feature extraction technique. Then, we perform the classification using Support Vector Machines (SVM) and we validate with a k-fold cross validation approach. Further, we utilize the Tabu search metaheuristic approach in order to determine the optimal parameter of the kernel function. The proposed algorithm is entitled DR-KPLS+SVM. The algorithm is tested on the OASIS MRI database. The proposed kernel-based classifier is found to be better performant than the existing methods.
Introduction
A Computer Aided Design (CAD) system is an automated process to give a second objective diagnosis and to assist in medical-image analysis [1]. The main CAD system purpose is to distinguish whether the tumor is malignant or benign [2]. A lot of research has recommended to incorporate the CAD systems in diagnostic procedures, indicating that they ameliorate the diagnostic decision since they decrease inter-observer variation and provide a quantitative support for clinical decisions like biopsy [2]. Precisely, CAD systems have been proven efficient in the diagnostic process; these systems significantly reduce unneeded biopsies and thoracotomy operations [3].
To identify malignant tumors, the traditional architecture of CAD systems is built out of two fundamental steps: dimensionality reduction, which includes feature extraction or feature selection, and classification [4]. Both steps are very delicate and require to be treated independently at first and then combined in a global CAD system [5, 28]. Researchers and engineers, alike, have considered the step of dimensionality reduction as the key step for an accurate classification [27]. Extracting and reducing features resides in determining the most discriminating features of each image. The extracted features are usually associated with the pertinent directions in the distribution of data. Features are then reduced by projecting each point into the most relevant orientation [6].
In a classification task [7], an object is assigned to one of the predefined classes [8]. A number of different classification tasks can be found in the field of radiology, such as: classification of an image or examination to determine the presence or absence of an abnormality; classification of benign or malignant anomalies; classification of cancerous lesions according to their histopathological and genomic characteristics; prognosis; and classification for the purpose of organizing radiological data [9–11]. Often, we encounter the problem of redundancy of unnecessary information which causes the deterioration of the classifier [12], hence the idea of reducing the information contained in each MRI image [13].
In this paper, we propose a new kernel method based on KPLS for feature extraction from MRIs. We propose a new Downsized Rank KPLS (DR-KPLS) for CAD systems, entitled DR-KPLS+SVM. Our approach considers only the observations that approximate the retained important components to produce a downsized kernel matrix. Our contribution is mainly a modification in the KPLS algorithm to serve as a feature reduction method in an MRI classification system. The simple and elegant proposed DR-KPLS technique insures at the same time, to have a good classification accuracy and to minimize the computation time. The effectiveness and robustness of the proposed work were proven with experiments on real MRIs. The DKPLS-SVM approach is tested successfully on real MRI database. We present the results of the DR-KPLS+SVM classifier applied to the OASIS dataset. The results are found to be excellent compared to other well-known kernel-based algorithms. The proposed method took advantage of the traditional KPLS algorithm and present a reduced version of the feature reduction approach to perform significant reduction of the number of features. We select a downsized number of observations among the initial measurement variables of the data matrix using a simple yet an effective technique. The defined observations are utilized as a new information matrix.
In this work, the optimized DR-KPLS consists in computing the optimized parameters of DR-KPLS to improve the performances of the proposed CAD system. Thus, a metaheuristic technique is chosen to determine the optimal value. The considered technique is the Tabu Search (TS) algorithm [14]. The method is addressed using two objective functions; the error rate and the downsized size.
The paper is organized as follows: in section 2 we present the methodology utilized in this paper. In section 3, we give an idea about the basic concept of PLS and KPLS. In section 4, we present the proposed DR-KPLS model and we introduce the concept of optimization with the TS algorithm. In section 5 we present the utilized classifier in this work. In section 6 we give the experimental results. Finally, in section 7 we conclude the paper.
Methodology
In this paper, the final proposed CAD system involves three steps; (1) data acquisition and preprocessing of brain Magnetic Resonance Images (MRI), (2) feature extraction utilizing the proposed DR-KPLS, (3) classification with an SVM classifier. The block diagram of the three phases is given in Fig. 1 and the sequential flow of this work is presented in Fig. 2.

Block diagram of the proposed work.

Sequential flow of the proposed work.
PLS
The PLS method selects a group of vectors, named Latent Components (LCs), from an original input/output data space and constracts linear multivariable regression conficguration. Hence, the input matrix X ∈ RN × m comprises N samples with m process variables, and the output matrix Y ∈ RN × J contains N observations with J quality variables.
The PLS samples intends to determine a set of vectors (LCs); T = [t1 ; t2 ; . . . ; t l ] and U = [u1 ; u2 ; . . . ; u l ]. The LCs give the closest representation of the variations in the input and output observations [15]. PLS technique projects the input and the output matrices to a low-dimensional space with a certain number l of Latent Variables (LVs). The PLS method illustrates the X and Y matrices as follows:
where P = [p1 ; p2 ; . . . ; p l ] and Q = [q1 ; q2 ; . . . ; q l ] are respectively the loadings for X and Y. Therefore, matrices E and F are the PLS residuals corresponding to the input matrix X and to the output matrix Y. We determine the number of LCs using the Cross Validation (CV) approach. CV helps the classifier to make better predictions on other independent datasets [16].
The PLS model consists mainly in extracting each pair of the corresponding LCs as a linear combination of the input and output variables [17]. World. H. [18] presented the basic form of the PLS approach: This latter was built upon a Nonlinear Iterative Partial Least Squares (NIPALS) algorithm. During each iteration, the weight vectors
w
i
and
c
i
are calculated. They introduce the weights of the input and output projections:
During each iteration, variables u i , t i , p i and q i are stored each in a column matrix. For each iteration, the results of the previous iteration are used as the input, to assure that the scores are extracted from the input/output matrices.
The standard phases of PLS can be summarized in four steps shown in Algorithm 1 and the organigram presented in Fig. 3 illustrates the fundamental steps of PLS.

Flowchart of PLS.
Algorithm 1:
Mean-center and scale X and Y. Use the NIPALS algorithm and compute variables
u
i
,
t
i
,
p
i
and
q
i
. Reduce X and Y by subtracting the computed latent vectors from them. Return to step 2 and compute the next latent vector.
The PLS method is mainly restricted by its linearity. Kernel methods have come with the answer for this problem by mapping the data to a nonlinear high-dimensional feature space [15]. These methods determine the nonlinear LVs with an approximately linear computational cost.
As a result, the PLS approach can be applied to linear systems in combination with the kernel trick. This new model is the Kernel PLS (KPLS) [19], which is defined in the feature space. It extends linear PLS to its nonlinear kernelized form [19]. Moreover, the purpose of the KPLS method is to map the input variables x i , with i = [1 ; 2 ; . . . ; N], into a feature space F utilizing a nonlinear transformation, as shown in equation 4:
Under the previous conditions, and taking into account the curse of dimensionality, there is no possible way of computing the nonlinear mapping of the unfolded samples of data.
To overcome this problem, we define a Mercer kernel k(.,.) as the inner product of two mapped samples [20]:
where Φ (x i ) ∈ R 1×D, [i=1;2;..;N], and D is the dimension of the feature space. The kernel function must satisfy the Mercer–s theorem. As stated in equation 5, the Gram matrix K ∈ R N×N is calculated as follows:
with Φ (X) = [φ (x1) T , φ (x2) T , . . . , φ (x N ) T ].
There are several kernel functions. The two commonly used kernels are the polynomial kernel and the Radial Basis Function (RBF) kernel (Table 1) [20]. The kernel parameters are fixed using the cross-validation approach.
Two common kernels with their formula and parameters
The kernelized PLS approaches that take into consideration the optimization of data matrices were given by Lindgren et al. [21]. First, mean centering in a high-dimensional space is performed. Then, the Gram matrix K is calculated as follows:
where 1 n is a vector of ones with an N length, and I n is an identity matrix with N dimensions. At this stage, we consider an adjusted form of the PLS algorithm.
Instead of scaling the weight vectors W and C, score vectors T and U are scaled to a unit norm [22]. Furthermore, we assure a deflation phase based on the rank-one reduction in the K and Y matrices using a new extracted score vector T. Actually, K and Y are calculated as follows:
where I n is an identity matrix with N dimensions.
Once the kernel is computed, we calculate the prediction outputs on training and testing samples. The prediction outputs that correspond to the training samples are written as:
The prediction outputs that correspond to the testing samples are written as:
where K t denotes the kernel matrix of the test samples.
The organigram presented in Fig. 4 depicts the fundamental steps of KPLS and the standard phases of KPLS can be summarized in ten steps presented in Algorithm 2.

Flowchart of KPLS.
Algorithm 2:
Compute the kernel matrix and then center. Set i=1, K1 = K, Y1 = Y. Randomly initialize u
i
as any column of Y
i
.
If t
i
converges go to (7), else return to (3). Deflate K and Y. Repeat (3) to (6) to extract further latent variables. Obtain the cumulative matrices T and U.
In this paper, a new kernel method is proposed. This method reduces the feature matrix in order to reduce the characteristics of MRIs.
In this section, we propose a kernel method, with reduced rank, (DR-KPLS) for the reduction of the characteristics of MRI images. The principle of this method is to eliminate all dependencies between the variables in the characteristic space and to preserve a reduced set of data represented by X r = [x1, x2, . . . , x r ] T ∈ R r×m, where r represents the number of observations retained.
The CAD system, based on the DR-KPLS method that we propose, consists of two main stages [23]: Identification of the offline model during the learning phase. Evaluation of the model online, during the test phase.
The DR-KPLS method consists in recording, each time, the new observations rich in information, in the downsized matrix X r . Let us consider that the system works for N0 instants under normal conditions. At t0, the initial downsized matrix is described as follows: X r = [x1] ∈ R 1×m.
At each instant, we collect new information x t , we compute the kernel vector k x t associated with this information and we update the kernel matrix as follows:
Then we calculate the rank of the new updated matrix. The value found leads to two different situations:
The matrix has a full rank: this information shows that there is an independence between the data projected in the space of characteristics. In this situation, the following observation will be added to the downsized matrix.
The matrix has not a full rank: this information shows that there is a dependency between the data projected in the space of characteristics. In this situation, the reduced matrix remains intact, and the kernel matrix is brought back to the previous state.
After evaluation of all the observations, we obtain a downsized data matrix, we then calculate the kernel matrix as follows:
Finally, we calculate the DR-KPLS model. The main steps of the proposed algorithm are presented in Algorithm 3, and the flowchart in Fig. 5 summarizes the proposed DR-KPLS method.

Flowchart of DR-KPLS.
Algorithm 3:
Normalize and calculate the kernel vector k
x
t
of each new observation x
t
∈ R
m
obtained. Update kernel matrix K
r
. Test the rank of K
r
. If the row is complete go to the next step if not cancel the last update and return to step (1). Update the downsized data matrix X
r
. Normalize the downsized training data. Construct and center the downsized kernel matrix (Obtaining the Gram matrix G
r
). Set the reference model based on the DR-KPLS method. Evaluate the performance of the classification system.
An optimized kernel function is the key to an optimized KPLS method, in general, and an optimized DR-KPLS, in particular. To ensure a DR-KPLS algorithm with a reduced complexity and an efficient generalization, we utilize in this work the Tabu search (TS) algorithm. Since the aim of the study is to build a fast and efficient classifier, we opte for the RBF kernel as a first step [24]. As a second step, we optimize σ the parameter of this kernel since it exerts a considerable influence on the generalization ability of the DR-KPLS approach. In fact, σ has an effect on the partitioning outcome in the feature space. If σ is too small, it will lead to under-fitting, if it is too large, it will lead to over-fitting. The optimal kernel parameter for the DR-KPCA approach is defined as the one that can improve the classification rate. For MRI classification, minimizing the ER in the classification process is the greatest performance criterion. Therefore, the TS algorithm is applied to optimize σ when applying the DR-KPLS method.
In this work, the initial solution of the TS algorithm is determined in order to optimize the DR-KPLS model. First, the initial solution is randomly chosen taking into consideration the constraints of parameter σ. This latter is initially chosen to be in [2-6, 26]. The solution is calculated by appending the nearest unutilized neighbor values of σ which ameliorates the best classification accuracy rate.
The process is repeated until all the neighbors are visited. Finally, the optimal value of σ is fixed.
SVM based classifier
Supervised learning is the application of learning methods on labeled data. The classifier is designed to predict the class of new images. In this paper, after selecting the significant features from MRIs, we perform classification to evaluate the accuracy rate of the classification process. Researchers have proposed different classifiers in the past decades. One of the most known ones is SVM. This latter partitions the data into two classes:
((a1, b2) , . . . , (a i , b i ) , . . . , (a N , b N ))
where a i is in F D (a D-dimensional feature space) and b i is in {-1, + 1} representing the class label with i ∈ [1, N] [24].
This classifier builds a hyperplane separating the classes using a kernel function K. The data with a feature vector belonging to one side are considered as in the first class (denoted as -1), and the other data are considered as in second class (denoted as +1) [24].
SVM was originally created for binary classification. Later, it was enlarged to multiclass classification. In this work, binary SVM and multiclass SVM are utilized. The scheme of the work is as follows: (1) First, we train the SVM in an offline phase, and (2) then we test the model in an online phase.
Experiments
OASIS dataset
In this section we present the results of the experiments conducted on the MRIs taken from the "Open Access Series of Imaging Studies (OASIS)" dataset [25, 26]. OASIS was made available by Dr. Randy Buckner at the Howard Hughes Medical Institute (HHMI) at Harvard University, the Neuroinformatics Research Group (NRG) at Washington University School of Medicine, and the Biomedical Informatics Research Network (BIRN).
The OASIS dataset consists of 416 subjects aged 18 to 96. For each subject, three or four individual T1-weighted MRI scans obtained in single scan sessions are included. The subjects are all right-handed and include both men and women. One hundred of the included subjects over the age of 60 were clinically diagnosed with very mild to moderate Alzheimer’s Disease (AD).
Dementia status were staged utilizing the clinical dementia rating (CDR) scaling. AD determination and diagnosis was based on clinical information that the patient was facing gradual onset and progression of decline in memory/mind capacities. Specifically, CDR scales dementia according to six fields: memory, orientation, judgment and problem solving, function in community affairs, home and hobbies, and personal care. From the person’s score at each field, a general CDR score was derived from individual ratings in each domain. A final CDR score of 0 expresses no dementia and a CDR of 0.5, 1, 2, and 3 represent, respectively, very mild, mild, moderate, and severe dementia [24, 25].
The previous procedure was accomplished and granted by the OASIS dataset using the CDR scaling, as presented in Table 2. The OASIS database does not contain any images for severe dementia (CDR = 3).
CDR scaling
CDR scaling
We use in this work samples of 198 individuals: 98 individual subjects presenting Non-Cognitive Impairment (NCI) and 100 subjects clinically diagnosed with AD. All the used subjects are over the age of 60. The characteristics of the utilized set of MRIs are given in Table 3, and the general characteristics of the OASIS database are given in Table 4. Some images from the OASIS dataset are presented in Fig. 6.
General database characteristics
Specificities of OASIS database

(1) Patient with no dementia / CDR=0, (2) Patient with very mild AD / CDR=0.5, (3) Patient with mild AD / CDR=1, (4) Patient with moderate AD / CDR=2.
For the evaluation of this algorithm, we used 198 MRI from the OASIS database. In order to define the DR-KPLS model, we opted for the RBF kernel (Table 1). The scaling factor σ associated to this kernel was chosen utilizing the TS algorithm as presented in the previous section. An SVM classifier was combined with the proposed dimensionality reduction method to classify the images. In order to generalize the proposed algorithm to other independent databases, we used the Cross-Validation (CV) technique. A 10-CV was utilized in this work. During our multiclass classification experiments, the one-against-all approach was used and no dementia class (with CDR = 0) was fixed as the reference class. Final results, after 10-CV is applied, are presented in Table 5 and Fig. 7. The results of the first run of CV are given Table 6.
Confusion matrix of DR-KPLS method
Confusion matrix of DR-KPLS method

Barchart of comparison of DR-KPLS+SVM for OASIS database.
Results of the proposed DR-KPLS method tested on the OASIS database
The values presented in the Table 6 indicate that the precision, sensitivity and specificity of the proposed method are respectively 90.80%, 94.5% and 84% for DR-KPLS using SVM as a classifier.
From the tables 5 and 6 we deduce that our proposed classifier based on the new kernel suggested method as a dimensionality reduction approach gives promising results, since it gives better results than the other well know methods used by researchers to reduce dimensions.
We compared our approach to these methods in terms of computation time as well. The training time and the testing time are presented in Table 7. From Table 7 we can conclude that the proposed scheme is faster than the other approaches in both training time and testing time on the OASIS database.
Total CPU time for training and testing for OASIS database
In this section, we give the findings of the DR-KPLS+SVM classifier tested to the OASIS database. The results are excellent compared to other well-known kernel-based algorithms in terms of classification accuracy and computation time.
This work proposes a new efficient method for dimensionality reduction. The proposed DR-KLPS method was combined with an SVM classifier to build the final scheme. The OASIS dataset was used the test the performances of the classification system. In order to obtain the best results, the kernel parameter was optimized using the TS algorithm. Experiments conducted on the OASIS database showed that the proposed DR-KLPS+SVM classifier gives the best accuracy and the minimal computational cost compared to other known dimensionality reduction methods.
In future work, we will be developing a dimensionality reduction method based on a multiple-kernel learning.
Footnotes
Acknowledgments
In this study, MRIs from the OASIS dataset has been used. Data were provided [in part] by OASIS [OASIS: Longitudinal: Principal Investigators: D. Marcus, R, Buckner, J. Csernansky, J. Morris; P50 AG05681, P01 AG03991, P01 AG026276, R01 AG021910, P20 MH071616, U24 RR021382].
