Recognition of the parkinson’s disease using a hybrid feature selection approach

Abstract

Accurate and efficient recognition of Parkinson’s disease is one of the prominent issues in the field of healthcare. To address this problem, different methods have been proposed in the literature. However, existing methods are lacking in accurately recognizing the Parkinson’s disease and suffer from efficiency problems. To overcome these problems faced by existing models, this paper presents a machine-learning-based model for Parkinson’s disease recognition. Specifically, a hybrid feature selection algorithm has been designed by integrating the Relief and ant-colony optimization algorithms to select relevant features for training the model. Moreover, the support vector machine has been trained and tested on the selected features to achieve optimal classification accuracy. Additionally, the K-fold cross-validation technique has been employed for the optimal hyper-parameters value evaluation of the model.The experimental results on a real-world dataset, i.e., Parkinson’s disease dataset is revealed that the proposed system outperforms baseline competitors by accurately recognizing the Parkinson’s disease and achieving 99.50% accuracy on the selected features. Due to high performance is achieved our proposed method, we are highly recommended for the recognition of PD.

Keywords

Relief ant colony optimization parkinson’s disease recognition feature selection algorithm classification machine learning

1 Introduction

Parkinson’s disease (PD) is a dangerous sickness after Alzheimer ’s disease in the world, and many people around the world have been suffered from it [62]. The PD is a progressive and long-term degenerative disorder of central nervous system, which badly affects old people. The major symptom of the PD is movement impairment, such as tremor, slowed movement, rigid muscles, impaired posture and balance, loss of automatic movements, speech changes and writing changes [56]. The cells suffering from PD do not have a consistent flow of dopamine with motor system. The disorder in vocal is the initial symptoms of the PD patient [18]. Such patients have an issue in speaking, such as abnormality in volume level and pronunciation [57]. These vocal issues of disorders can be assessed for early PD diagnosis. The Diagnosing and managing PD by speech signals is more accurate as well as effective way. The outcomes data are often used to diagnose PD through the voice recording methodology by a neurologist to assist the patients and reached conclusive remarks. The new diagnostic criteria for Parkinson’s disease have been published, and constitute the first Parkinson’s disease diagnostic criteria of the Movement Disorder Society. Their goal was to support standardized clinical research [56]. Recognition of Parkinson’s disease is typically conducted by some techniques such as empirical assessment and examinations of patient’s medical records [12]. The methods, such as the invasive method to detect the PD, is not reliable in terms of accuracy and efficacy. Institute of Medicine reported that existing diagnosis systems are not correctly identified the Parkinson’s diseases yet [2]. To overcome these limitations we need a reliable method which can be used for recognition and provide help to prevent PD. In this connection, the role of machine learning methods is vital in the recognition, prevention, and treatment of the PD [4]. In literature [15 , 70], different machine learning-based methods have been developed to diagnosis patients of PD. Little et al. [42] proposed a method to identify Parkinson’s disease using voice signals data. They identified 23 PD patients and eight healthy subjects. SVM was used for classification of Parkinson’s disease, and healthy people. The proposed method was recorded accuracy 91.4%. In another study [70], 132 features were selected based on the signals of speech by dysphonia. Algorithms of feature selection (FS) such as the LASSO, Relief, MRMR, and LLBFS were utilized [70]. Additionally, the model selected 10 features out of 132 using feature selection algorithms, and these features were used for the classification of Parkinson’s disease and healthy people. In contrast, Sakar et al. [57] collected multiple recordings of voice from 40 subjects, in which Parkinson’s disease subjects are 20, and Non-Parkinson’s subjects are 20. Recording of 26 speech signals containing daily life pronunciation, words, numbers, and vowels was collected for each subject. They used Praat acoustic analysis software for the speech recording [30]. Additionally, leave one subject out (LOSO) and S-LOO techniques of validation were used to check the performance of K-NN and SVM classifiers [31]. The research work [7] proposed a method based on ML algorithms using speech signals for diagnosing Parkinson’s disease. The algorithms for the features selection, such as relief, LLBS, LASSO, and mRMR, were deployed, and the proposed method achieved high performance in terms of accuracy. Sakar et al. [58] developed a diagnostic system using SVM and achieved 92.75% accuracy. Similarly, Der et al. [40] proposed a model for diagnosis of PD by utilizing fuzzy-based non-linear transformation techniques integrated with SVM and achieved 93.47% accuracy. Andre et al. [63] proposed a diagnostic system for PD detection by using evolutionary-based methods and optimal path forest classifiers. The system achieved 84.01% accuracy. Chai et al. [6] designed a new intellectual architecture to detect PD. SVM and relief algorithms were integrated with a bacterial-foraging-optimization algorithm and obtained significant accuracy. Emary et al. [19] developed a technique using fuzzy logic, K-NN, PCA for diagnosing the PD and achieved 96.07% accuracy. Duffy [18] designed the diagnosis method for the detection of PD using PSO and enhanced FKNN and obtained 97.47% accuracy. To this end, Gok [23] proposed the PD diagnostic system by employing the Rotation Forest Ensemble (RFE) KNN classifier results 98.46% accuracy. In this direction, Das [15]compared the classification performance of ANN, logistic regression (LR), and (Decision Tree)DT. The classification performance of ANN in terms of accuracy was excellent as compared to other LR, DT and obtained 92.9% accuracy. In [51] proposed a PD diagnosis system using the mRMR feature selection algorithm along with a complex valued ANN classifier. The proposed system has been obtained 98.12% accuracy.

In light of the explored literature, we concluded that for the successful diagnosis of PD, a highly intelligent diagnosis system is mandatory. In designing of PD diagnosis system, current studies [10, 62] have been used different classification algorithms such as logistic regression [74], supports vector machine [13], k-NN [74], DT [73], NB [53], and ANN [74] for detection of PD. Among these classifiers, the support vector machine performance was excellent as compared to other classifiers. The classification performance of a classifier is improved by choosing an appropriate feature selection method because sometimes, irrelevant features affect the classification performance as well as the computational complexity of the model. The well-known feature selection and parameters optimization algorithms including: relief [71], mRMR [52], LASSO [68], LLBFS [64], Genetic-Algorithm (GA) [33], Particle-Swarm-Optimization [41], Whale-Optimization-Algorithm (WO) [67], fruit fly optimization (FFO) [60], differential flower pollination [29], and bacterial foraging optimization (BFO) [6] have been used for feature selection in existing works to select a list of features. However, these methods are limited in terms of choosing the list of appropriate features and therefore suffer from inadequate PD detection and efficiency problems. n the literature, the proposed PD diagnosis techniques, limitation and advantages have been summarized in the Table 1 for the better understanding as well as the importance of our proposed method.

Table 1
Summary of the previous methods

Reference Proposed Method FS Method Advantages Limitations Accuracy (%)

[42] PD Diagnosis method using SVM - Accuracy is High. High Computationally complex due to the use of all features. 91.4

[70] PD Detection method LASSO, Relief, MRMR, and LLBFS The performances are high due to FS selection. Computationally Complex -

[7] PD Diagnosis using ML Algorithms relief, LLBS, LASSO, and mRMR Computationally less complex. Performances are low. -

[58] PD diagnosis using SVM - Computationally less complex. Low Accuracy. 92.75

[40] PD Detection method using fuzzy-based non-linear transformation techniques integrated with SVM - Computationally less complex Low accuracy. 93.47

[63] PD detection method using evolutionary-based methods and optimal path forest classifiers - Less running Time Low prediction accuracy. 84.01

[6] Relief-Bacterial-Foraging-Optimization-SVM Relief High accuracy. Not efficient -

[23] PD diagnostic system by employing the Rotation Forest Ensemble (RFE) KNN classifier. - High performances in terms of accuracy. Computationally complex. 98.46

[51] mRMR-ANN mRMR Performances are excellent in terms of accuracy High execution time. 98.12

[28] PD detection method using ML models(Least Square, SVM, PNN, and General Regression Neural Network (GRNN)) - Performances are high. Computationally is complex. -

[49] TSCA Two Stage approach for FS Selection Computation Time is low. Low prediction accuracy 86.20

[6] RF-BFO-SVM RF-BFO Computation Time is low. Accuracy is low. 97.42

[26] L1-Norm-SVM and CPD L1-Norm SVM based FS Selection The prediction performances are high. Computationally is complex. 99

Reference	Proposed Method	FS Method	Advantages	Limitations	Accuracy (%)
[42]	PD Diagnosis method using SVM	-	Accuracy is High.	High Computationally complex due to the use of all features.	91.4
[70]	PD Detection method	LASSO, Relief, MRMR, and LLBFS	The performances are high due to FS selection.	Computationally Complex	-
[7]	PD Diagnosis using ML Algorithms	relief, LLBS, LASSO, and mRMR	Computationally less complex.	Performances are low.	-
[58]	PD diagnosis using SVM	-	Computationally less complex.	Low Accuracy.	92.75
[40]	PD Detection method using fuzzy-based non-linear transformation techniques integrated with SVM	-	Computationally less complex	Low accuracy.	93.47
[63]	PD detection method using evolutionary-based methods and optimal path forest classifiers	-	Less running Time	Low prediction accuracy.	84.01
[6]	Relief-Bacterial-Foraging-Optimization-SVM	Relief	High accuracy.	Not efficient	-
[23]	PD diagnostic system by employing the Rotation Forest Ensemble (RFE) KNN classifier.	-	High performances in terms of accuracy.	Computationally complex.	98.46
[51]	mRMR-ANN	mRMR	Performances are excellent in terms of accuracy	High execution time.	98.12
[28]	PD detection method using ML models(Least Square, SVM, PNN, and General Regression Neural Network (GRNN))	-	Performances are high.	Computationally is complex.	-
[49]	TSCA	Two Stage approach for FS Selection	Computation Time is low.	Low prediction accuracy	86.20
[6]	RF-BFO-SVM	RF-BFO	Computation Time is low.	Accuracy is low.	97.42
[26]	L1-Norm-SVM and CPD	L1-Norm SVM based FS Selection	The prediction performances are high.	Computationally is complex.	99

To overcome the aforementioned problems, this paper proposes an integrated novel method based on relief and ant-colony-optimization (ACO) to select a list of appropriate features. Relief algorithm has been used to compute weights of features and rank them accordingly. Then, the model uses ACO to select the list of best features. The integration of both algorithms gives excellent performance as compared to the individual performance of relief and ACO. Then, we train and test the SVM classifier employing the selected features to predict the PD patients. The contributions of this paper are given as follows:

Firstly, a hybrid method based on Relief and ACO has been proposed for the selection of appropriate features. That is the Relief algorithm which assigns appropriate weight to each feature in the features set, and based on that weight, the features are ranked, and finally, relevance is determined. Then, the ACO algorithm optimizes the feature weight and chooses the best relevant features for accurate classification.

Secondly, the performance of the SVM has been evaluated using the selected features. The results reveal that SVM has produced significant results on the ACO-based features compared to the original feature set. Additionally, the performance of the SVM with Relief-ACO outperforms the relief and ACO algorithms.

We performed extensive experiment on real-world datasets which demonstrate that the proposed diagnosis method (Relief-ACO-SVM) is achieved significant results in terms of high accuracy, low computation cost compared to its counterparts.

The rest of the paper is organized into four sections. Section 2 briefly discusses dataset, Relief, ACO, SVM, cross-validation method, performance evaluation metrics, and the proposed architecture. Section 3 presents the experimental results and discussion. Section 4 concludes our paper and presents future research directions.

2 Materials and methods

Before providing the technical details and basic concepts used in the proposed model, some common symbols, notations are used in this paper that is briefly presented in the Table 2.

Table 2
Mathematical symbols and notations used in the paper

Symbol Description

D Data set

h Number of instances in dataset

F Features in dataset

S Reduce feature set

Y Output classes label

b Bias or offset value from the origin

w d-dimensional coefficient vector

n Number of Support vectors

i i^th instance in the data set

x _i i^th instance of dataset sample

y _i Target labels to x

R Training set

T Test set

C Regularization parameter used to achieve low training and testing errors and generalizes the classifier to unseen data

γ The gamma parameter defines the influence of a single training example, where low value denotes ‘far’ while high represents ‘close’.

V and V^- old and new features values.

k i the ant’s number

δ _ij Heuristic

η Heuristic-desirability

φ Feature subset length

S _k Subset of feature found by ant k

θ Parameter to control the relative feature weight

K Nearest samples to the samples

τ pheromone

w _i Weight of i^th feature coefficient

EER Error term

E Average performance

C Complexity

Symbol	Description
D	Data set
h	Number of instances in dataset
F	Features in dataset
S	Reduce feature set
Y	Output classes label
b	Bias or offset value from the origin
w	d-dimensional coefficient vector
n	Number of Support vectors
i	i^th instance in the data set
x _i	i^th instance of dataset sample
y _i	Target labels to x
R	Training set
T	Test set
C	Regularization parameter used to achieve low training and testing errors and generalizes the classifier to unseen data
γ	The gamma parameter defines the influence of a single training example, where low value denotes ‘far’ while high represents ‘close’.
V and V^-	old and new features values.
k	i the ant’s number
δ _ij	Heuristic
η	Heuristic-desirability
φ	Feature subset length
S _k	Subset of feature found by ant k
θ	Parameter to control the relative feature weight
K	Nearest samples to the samples
τ	pheromone
w _i	Weight of i^th feature coefficient
EER	Error term
E	Average performance
C	Complexity

2.1 Dataset

In this research work, we used the dataset which is obtained from the machine learning repository and and available online [43]. The dataset has been used in different research works [6 , 49] for the detection of Parkinson’s diseases.This data set is Multivariate neurologists’ assessment outcomes dataset. The data set have 23 attributes and 195 voice samples. The original study published that the feature extraction methods for general voice disorders. The voice recordings of 31 people, including 23 people with Parkinson’s disease (16 males and 7 females) and 8 health controls (3 males and 5 females) were used in the study. In the dataset table each column for particular voice and each row are related to one of 195 voice recording from these individual subjects. Additionally, the people ranged from 46 to 85 years of age with mean age of 65.8 and standard division 9.8. The main aim of this dataset was to discriminate people with Parkinson’s disease from healthy people by finding differences in their vowel vocalization according to “status” column which is set to 0 for health and 1 for PD. For each subject an average of 6 phonation of a vowel was recorded for 36 second and total 195 samples were recorded. The phonations were recorded in an industrial acoustic company (ICA) sound-treated booth using microphone which at distance 8 cm from mouth and microphone was calibrated. The voice signals were recorded in computer using computerized speech laboratory.

2.2 Classification using support vector machine

SVM is used for binary and multi-classification related problems [6 , 59]. SVM is a supervised classification algorithm which has been adopted for binary and mutli-classification problems [9]. In a binary classification problem, the instances are distinguished with a hyperplane w^Tx + b = 0, Where w is a d-dimensional coefficient vector which is normal to the hyper-plane, the bias term b, is the offset values from the origin, and x are represents data points. The main job of SVM is to get the results of w and b. In linear case, w can be solved using Lagrangian function. On the maximum border, the data points are called support vectors. As an outcome, the solution of w can be expressed mathematically as in Equation (1). $w = \sum_{i = 1}^{n} α_{i} Y_{i} X_{i}$ (1) Where n represents support vectors, Y_i is target output labels corresponding to instances x. The bias b can be calculated by y_i (w^Tx_i + b) - 1 =0. When w and b are computed then the non-linear case, for kernel trick and decision function, can be written as in Equation (2), In non-linear case, the idea of kernel trick is employed, and the function decision is computed as follows: $f (x) = sgn (\sum_{i = 1}^{n} α_{i} y_{i} K (x_{i}, x) + b)$ (2) The semi positive definite functions, which obeys the Mercer ’s condition like kernel functions [74]: Such as the polynomial kernel expressed as:

$K (x, x_{i}) = ((x^{T} x_{i}) + 1)^{d}$ (3) The Gaussian kernel is expressed as: $K (x, x_{i}) = exp (- γ | | x - x_{i} | |^{2})$ (4)

Here, two parameters need to be defined in the SVM model namely C and γ.

2.3 Feature selection technique

Suppose we consider a feature set to be represented as x having n carnality. The selection of features is a problem of discrete optimization of choosing m, out of n feature set, that m ≤ n [45]. The performance and execution time of classifier affected by non-relevant features. Thus, eliminate irrelevant features from features set are critically necessary. The conventional feature selection techniques are filtered feature and wrapper features selection. In wrapper, the search algorithm is applied to the features set and calculated each subset of the features. Filter techniques are the same as wrappers techniques in the search process. In filter method use filter metrics such as class rep-arability, inner class space, error probability, probabilities distance, consistency and, entropy [46].

2.3.1 Pre-processing and relief algorithm

Pre-processing is used for the representation of useful data so that the classifier can use it conveniently. Thus, Pre-processing is required for proper training and testing of the model. The methods of pre-processing such as deletion of missing values of feature instances, Standard Scalar, Min-Max Scalar. The standard scalar method is used features coefficient values standardization, and similarly mean value 0 and variance 1 for each variable, respectively. Min-max scalar arranged the data in such a way that all features have been assigned values between 0 and 1 [38]. Mathematical form of min-max normalization and z-score normalization are expressed in Equation. (5) and (6) respectively as:

$V^{-} = (\frac{V - min}{max - min}) ({new}_{max} - {new}_{min}) + {new}_{min}$ (5) $Z_Score = V^{-} = (\frac{V - mean}{st_dev})$ (6)

Where V and V^- are old and new features values, respectively. Relief(R) [71], algorithm assigns weights to the features, hence these weights are modified with time. Relief algorithm iterate through m numbers of training samples, and each iteration randomly selects the target samples Rk and then score feature vector w is updated [61]. The update rule of the relief method for hits of the nearest list is given as follows:

$w [i] = w [i] - \frac{diff (i, X_{k}, X_{h})}{mn}$ (7)

The nearest misses instances are computed as follows in Equation (8): $w [i] = w [i] + \frac{p_{y}}{1 - p_{yk}} \frac{diff (I, X_{k}, X_{h})}{mn}$ (8)

Where, $\frac{P_{y}}{1 - P_{yk}}$ is the class of Y to all data except last Yk; Or $w [i] = w [i] - {(X_{i} - {nearHit}_{i})}^{2} + {(X_{i} - {nearMiss}_{i})}^{2}$ (9)

2.3.2 Ant colony optimization

The aim of Ant Colony Optimization was to address the hard combinatorial-optimization problems [34]. It tackles a hard problem solution in the right way and foraging the function of real-life ants [34]. In [16], the algorithm ant colony optimization was designed in the ant system, and modifications in the ant system have been done [20, 21]. The ACO followed the model of computation rules of real ant colonies. The rules of each agent are thus influenced by the rules of real ants [47]. The technique is inspired by the algorithms of the ACO family [47]. ACO algorithm is mostly used for optimization problems, and its dependent property can be defined as follows [1 , 17].

Generally, an ACO algorithm can be employed for addressing all those combinatorial problems that are possible to define:

Appropriate problem representation. The representation of the problem should be in the form of a graph consists vertices and links between vertices.

Heuristic desirability η of edges. There should be an appropriate heuristic measure of "goodness" of paths concerning the nodes linked with each other.

Construction of feasible solutions. A technique should be established where a possible solution can be efficiently defined.

It is required to update the pheromone levels on edges employing the relevant evaporation rule. A typical solution is to select the n best ants and update their chosen paths accordingly.

Probabilistic transition rule. It defines the probability to traverse an ant going from one vertex to another in the graph.

2.3.3 Ant colony optimization for the selection of features

One of the prominent problem is the selection of adequate features set. Suppose, the original features set is represented by n. In order to create a feature subset s, where s < n, then it must maintain the accuracy same as the original features set n so, there is exist no such idea of path.

It is critical to notice that, a partial solution would be define ordering among the constituents concerning the solution. Therefore, it is not sure that the next component to choose would be influenced if we incorporate a last component in the partial solution [3, 39]. Furthermore, solutions to the feature selection problems are not necessarily to have the same size. Keeping the aforementioned issues in mind, we need to tackle these problems before going to apply the ACO algorithm for solving the feature selection problem. In this connection, we address the first problem by redefining graph representation as follows.

2.3.4 Ant graph representation

Tn this section, the problem of DF is formulated into an ACO problem. That is described using a graph, where features are represented by nodes, while links (i.e., edges) between the nodes denote choice of the corresponding feature. For the optimal subset of features search, the graph is visited, the small number of nodes are traversed that fulfill the condition of traversal stops. Figure 1 demonstrated this setup.

Fig.1

Ant Colony Optimization based problem representation of features selection.

The nodes in the graph are fully connected, which allow features to be chosen next. If the ant is at vertex f1 then it can choose feature f2 by employing the transition rule. In this way, it chooses other features (i.e., f3 and f4)using the transition rule as depicted in the figure. It is clear that when it arrives at f4, then the current subset f1,f2, f3, f4 is decided in order to meet the traversal stopping criterion. The ant then finishes the traversal process and generates the selected feature subset for data reduction purposes[36]. Using this reformulation, we can apply the transition as well as pheromone update rules of standard ACO algorithms. This way, all features possess unique pheromone and heuristic values and these values are no more linked with edges.

2.3.5 Heuristic desirability

The appropriate Heuristic Desirability to traverse between features could be any subset measures functions namely entropy-based evaluation [35], Fisher discrimination rate [22] and Rough-set dependency measure [50]. Traversal of heuristic desirability and node pheromone levels are integrated to design, thus named probabilistic transaction techniques, representing the probability of ant k will include featuring i in its solution travel to feature j in time step t. As expressed in Equation (10).

$p_{ij}^{k} (t) = {\begin{matrix} [c] c \frac{[τ_{ij} (t)] α . [η_{ij}] β}{\sum_{l ɛ j_{i}^{k}} [τ_{ij} (t)] α . [η_{ij}] β} if j ɛ J_{i}^{k} \\ 0 Otherwise \end{matrix}$ (10) Here J^k denotes the feasible features set, which is added to the partial solution; η_i and τ_i are respectively the pheromone value and heuristic desirability linked with feature i.e α and β represents the parameters, which are used for determining the relative importance between the pheromone values and heuristic information. ACO uses the transition probability as a balance between pheromone intensity η_i and heuristic information τ_i. This way, the trade-off between exploitation and exploration is balanced. The search mechanism adopted prefer those actions which are proven effective in the past.

Nevertheless, to identify such actions, the search needs to analyze the unobserved actions by exploiting the search space. To balance the exploration-exploitation trade-off, the best solution is to choose an appropriate set of parameters i.e., α, and β. If α = 0, it shows that no pheromone information is exploited, and hence past search experience has been ignored. Therefore, the search becomes a stochastic greedy search. If β = 0, it reveals that the potential benefit of moves is ignored.

2.3.6 Rule of pheromone update

After the completion of ant’s solutions on every node evaporation, pheromone is triggered as depicted in Equation (11): ant k deposits some amount of pheromone, $Δ τ_{i}^{k}] (t)$ on each node that it has used $Δ_{i}^{k} (t) = {\begin{matrix} [c] c φ γ (S^{k} (t)) \frac{φ (n - | S^{k} (t) |)}{n} if i ɛ S^{k} (t) \\ 0 Otherwise \end{matrix}$ (11)

Here, S^k (t) is a subset of features that compute by ant k at t iteration and |S^k (t) | is the length of subset. This way,pheromone is updated concording the performance of classifier i.e., (S^k (t)) and feature subset length. Where φ and θ, denotes the parameters used for tuning the relative weight between feature subset length and classifier performance, φ = 1 - θ and θ ∈ [0, 1].

It is evident from the formula that both these parameters have different impact and significance in selecting the best features set. In this experiment, we assign more weight to the classifier performance compared to subset length. Therefore, we set their values as Φ = 0.8, and Θ = 0.2. To add a new pheromone, the following rules are adopted for all the vertices as given in Equation (12) and (13): $t + 1 = (1 - ρ) . τ_{ij} (t) + Δ_{ij} (t)$ (12) where $Δ_{ij} (t) = \sum_{k = 1}^{n} (γ^{-} \frac{S^{k}}{| S^{k} |})$ (13)

Node (i, j) if traversed Δτ_ij (t) or 0. Value 0 ≤ ρ ≤ 1 is decaying constant applied to simulate the evaporation of the pheromone. S_k is the subset of features found by ant k.

2.3.7 ACO process

The ACO process generates n random ants, which are placed on the corresponding graph. That is, every ant gets start with a random feature, while the amount of ants that need to be placed on the graph may be equal to the total number of features available in the data. Additionally, the path construction of each ant is initiated based on different feature. Based on the initial setup, they adopt a probabilistic approach to traverse nodes and stop until they meed a stopping criteria. This way, the obtained subsets are evaluated for further processing. Finally, if the algorithm has come up with an optimal solution or its executed until the required results are achieved then it halts and outputs the results in the form of best subset features set. Otherwise, the pheromone is updated and the process is repeated for a new set of ants [32, 55].

2.3.8 Proposed integrated feature selection algorithm

The proposed algorithm is designed for optimal feature subset selection. The relief algorithm uses feature weight for proper feature selection. Initially, all features weights are initialized with zero value. In every iteration, the feature vector X is selected as a random instance, and feature vectors of the instance closest from each class. According to Equation (7) and (8) near-hit and near-miss list of features are generated, and weight vector w_i is updated in each iteration. Hence, the weight of a feature gets increase if it matches that feature in nearby instances in the same class compared to nearby instance in other classes and vice versa.

The relief algorithm selects relevant and non-redundant features base on feature weight. However, the weight not enough only to select the list of an adequate feature list, because other control control information i.e parameters are also necessary, such as the number of features, number of instances in the dataset, and threshold weight using which certain corresponding features are either selected or rejected. Furthermore, the weight computation process of feature is time complex. In the Ant Colony Optimization algorithm, the construction of the graph and assign ant to each node in the graph adequately based on the ant weight because weight computation for node is complicated due to space. Therefore, both algorithms are integrated for suitable feature selection. Thus, in the proposed Relief-ACO method, the weights of all features are computed by relief algorithm incorporated in ACO for an optimal feature set selection. The ACO algorithm optimizes the weight of each feature, and other control parameters such as α, β, and γ are used for controlling the weight of features. The psucode of proposed algorithm is given in algorithm 1.

Fig.2

Ant Colony Optimization for selection of feature.

Algorithm 1: Integrated Relief-ACO FS Algorithm
Data: Data set D, Original features set F, total instances in the data set h, target instance R_k, k is i^th instance, Target output classes label is y, w [i]
Result: Reduced feature set S.
1	Begin
2	Initialize all the weights of features w [i] : =0.0
3	For K:=1 to h do step 4 to 8; // Until we reach the required samples
4	Randomly select a ‘target’ samples R_k;
5	Determine K nearest samples in step 5 and nearest miss samples and store it in the nearest list;
6	Select samples from samples found in step 6 and eliminate from the nearest list;
7	If the selected samples in step 7 belong to samples, update w for all features through Equation (7) else update it using Equation (8);
8	IF no instances are remaining in the nearest List, then processed Otherwise, move to step 7;
9	End For;
10	Initialization: Determine the ants population size, Maximum Iterations, Assignment node to feature and intensity pheromone to the node;
11	Create a new ant ’s population;
12	For every ant, assign a random feature and label such features as visited;
13	For all ants, do step 15, and 16;
14	From non-visited features, choose such feature that has a higher probability score calculated using (9) and label such feature as visited, where denotes the weight of heuristic information and represents the pheromone value, J_k defines the non-visited feature set and τ_i (t) and _i are used for pheromone value and heuristic desirability associated with feature i, respectively [37].
15	Repeat step 15 when the ant reaches ρ_i its threshold: Equation (14).
$Ant_threshold = φ {exp}^{\frac{- FN}{N}} + w {exp}^{- EER}$ (14)
Where FN feature cardinality of selected feature by ant. Thus N is the number of all features φ and w are the parameters which control the effect of size the of feature and EER respectively φ + w = 1; [55]
16	Compute feature pheromone deposited by ants: using Equation 15.
$Δ_pherom (i, k) = α (\frac{100 - EER}{100}) + β (\frac{FN - fn (k)}{fn}) + γ w .$ (15)
where Δpheromone (i, k) is the amount of pheromone deposit by ant k on feature i, EER_k is system equal error rate based on features founded by ant k, α, β, γ are three parameters that control the relative weight of classifier performance, feature subset length and feature weight respectively that α + β + γ =1.
17	Update pheromone: Features selected in step 14 pheromone intensity are updated;
18	If the stopping condition is obtained or the number of maximum iterations is reached, go to step 19 else go to step 10;
19	Finish;

2.4 K-fold cross validation

For best model selection, we have adopted Cross-validation K-fold method. That is, we split the data into K equal parts. In each iteration, we have been chooses K-1 and K-10 for both training and testing respectively. This way, the process is repeated K times until we get to the required results. Additionally, the model perform on average K computations to achieve the required performance of the model. To conduct our simulations, we have selected K=10. Furthermore, we have chosen 90% of the dataset for the training and 10% for testing the model. Finally, we computed the average value of 10 folds as adopted in [26]. The estimated performance E_i concerning a fold is computed, average of all such folds is computed to measure the overall performance E of the model. To compute the average estimated performance, we employed the following Equation 16. $E = \frac{1}{10} \sum_{i = 1}^{10} E_{i}$ (16)

2.5 Performance evaluation metrics:

To evaluate the performance of our model, we have employed different evaluation methods namely: accuracy, sensitivity, specificity,and F1 score [14 , 44]. These metrics are defined in Equation (17), (18), (19), (20) and (21) as follows: $Accuracy (Acc) = \frac{TP + TN}{TP + TN + FP + FN} 100 %$ (17) $Sensitivity (Sn) = \frac{TP}{TP + FN} 100 %$ (18) $Specificity (Sp) = \frac{TN}{TP + FP} 100 %$ (19)

$\begin{matrix} MCC \\ = \frac{(TP) (TN) - (FP) (FN)}{\sqrt{(TP + FP) (TP + FN) (TN + FP) (TN + FN)}} 100 % \end{matrix}$ (20)

$F 1 - score = 2 \frac{(precesion) (recall)}{precesion + recall}$ (21)

2.6 The proposed relief-ACO-SVM method

The detailed working process of the proposed Relief-ACO-SVM is given in Fig. 3. The main steps and procedures are given as follows:

Algorithm 2: Relief-ACO-SVM based method
for Parkinson’s disease recognition
1	Begin
2	Pre-process the data by a standard scalar and Min-Max scalar methods;
3	Weighting and ranking of features by Relief, using Equation (7) and (8); Feature selection and parameters optimization by using Relief-ACO;
4	For j = 1 : K, evaluate model using K-Folds;
5	Training set = K - 1, and testing set K - 9;
6	Train the model with K - 1 subgroups on feature selected and with optimized parameter pair (C, γ) and validate with K - 10 subgroup on features selected; Repeat steps 5 and 6; End For Calculate the average accuracy 10 Folds;
7	The best model performance check on j test dataset;
8	Finish

Fig.3

Flowchart of the proposed PD recognition method.

3 Experimental results and discussion

In the following section, we discuss the experimental setup and experimental results.

3.1 Experimental setup

In this study, the required experiments have been conducted for different perspectives of PD diagnosis. In the first experiments, feature subsets have been selected by Relief, ACO, and Relief-ACO method. The classification performances have been checked by SVM on dataset with K-Fold Cross-Validation, and the value of K has been set to 10. Lastly, classifier performances have been checked on full and on the selected feature subsets. All performances evaluation metrics have been computed automatically. Python programming language has been used with Intel(R) Cor^TM i5 - 2410M 4GB RAM, 3.10 GHz PC for simulations and experiments.

3.2 Experimental results

3.2.1 Pre-processing of the dataset

Pre-processing techniques such as deletion of missing values feature samples, standard scalar, and min-max scalar have been applied on the dataset for the effective use of classifier training and testing. These statistical techniques are essential for the basic understanding of the dataset. The 6 instances have been deleted from the dataset due to missing value in the feature column. So the leftover dataset has 195 instances and 22 real value attributes and one output class label. The mathematical operation performed on the dataset has been reported in Table 3. Figure 4 is a histogram that visualized the data set features. The histogram plots are based on a single feature and show the frequency of unique values of a given feature, and it is useful in understanding the count of the data range. Figure 5 is a heat map, which is a two-dimensional representation of data in which colors represent values. A single heat map provides a quick visual summary of information. More elaborate heat maps allow the viewer to understand complex datasets. Furthermore, Heatmap can be super useful when we want to see which intersections of the categorical values have higher concentration of the data compared to the others.

Table 3
Statistical operation on dataset

Label Feature Name Min-Max Mean, ± Standard division

S1 MDVP.Fo (Hz) 88.33-60.105 154.3, ± 41.4

S2 MDVP.Fhi(Hz) 102.145-92.03 197.2, ± 91.5

S3 MDVP.Flo(Hz) 7.5-239.2 116.4, ± 43.6

S4 MDVP. Jitter (%) 0.00168-0.03316 0.006, ± 0.01

S5 MDVP.Jitter(Abs) 0.000007-0.00026 0.000044,± 0.000035

S6 MDVP.RAP 0.000680-0.02144 0.003306,± 0.002968

S7 MDVP.RAP 0.00092-0.01958 0.003306,± 0.002968

S8 MDVP.RAP 0.00204-0.06433 0.00992,± 0.008903

S9 MDVP.Shimmer 0.00954-0.11908 0.029709,± 0.018857

S10 MDVP.Shimmer(dB) 0.085-1.302 0.283, ± 0.194877

S11 MDVP.Shimmer(dB) 0.00455-0.05647 0.016, ± 0.010

S12 Shimmer.APQ5 0.0057-0.0794 00.018, ± 0.02

S13 SMDVP.APQ 0.00719-0.13778 0.024081, ± 0.016947

S14 Shimmer.DDA 0.02337-0.1047 0.06004, ± 0.029933

S15 NHR 0.00065-0.31482 0.024847, ± 0.040418

S16 HNR 8.441-33.047 21.885974, ± 4.425764

S17 RPDE 0.256570-0.685151 0.498536, ± 0.103942

S18 D2 1.423287-3.671155 2.381826, ± 0.382799

S19 DFA 0.574282-0.825288 0.718099, ± 0.055336

S20 spread1 -7.964984-2.434031 5.684397, ± 1.090208

S21 spread2 0.006274-0.450493 0.226510, ± 0.083406

S22 PPE 0.044539-0.527367 0.206552, ± 0.090119

Label	Feature Name	Min-Max	Mean, ± Standard division
S1	MDVP.Fo (Hz)	88.33-60.105	154.3, ± 41.4
S2	MDVP.Fhi(Hz)	102.145-92.03	197.2, ± 91.5
S3	MDVP.Flo(Hz)	7.5-239.2	116.4, ± 43.6
S4	MDVP. Jitter (%)	0.00168-0.03316	0.006, ± 0.01
S5	MDVP.Jitter(Abs)	0.000007-0.00026	0.000044,± 0.000035
S6	MDVP.RAP	0.000680-0.02144	0.003306,± 0.002968
S7	MDVP.RAP	0.00092-0.01958	0.003306,± 0.002968
S8	MDVP.RAP	0.00204-0.06433	0.00992,± 0.008903
S9	MDVP.Shimmer	0.00954-0.11908	0.029709,± 0.018857
S10	MDVP.Shimmer(dB)	0.085-1.302	0.283, ± 0.194877
S11	MDVP.Shimmer(dB)	0.00455-0.05647	0.016, ± 0.010
S12	Shimmer.APQ5	0.0057-0.0794	00.018, ± 0.02
S13	SMDVP.APQ	0.00719-0.13778	0.024081, ± 0.016947
S14	Shimmer.DDA	0.02337-0.1047	0.06004, ± 0.029933
S15	NHR	0.00065-0.31482	0.024847, ± 0.040418
S16	HNR	8.441-33.047	21.885974, ± 4.425764
S17	RPDE	0.256570-0.685151	0.498536, ± 0.103942
S18	D2	1.423287-3.671155	2.381826, ± 0.382799
S19	DFA	0.574282-0.825288	0.718099, ± 0.055336
S20	spread1	-7.964984-2.434031	5.684397, ± 1.090208
S21	spread2	0.006274-0.450493	0.226510, ± 0.083406
S22	PPE	0.044539-0.527367	0.206552, ± 0.090119

Fig.4

The visual Features representation of PD dataset.

Fig.5

Heat map for co-relationship among the features 0f the dataset.

3.2.2 Feature selection by relief, ACO, relief-ACO, filter based LLBFS, PCA and embedded based LASSO algorithms

In this section, the experimental results of feature selection algorithm Relief, ACO, and Relief-ACO have been reported and discussed in details. The features selected by relief algorithm are S1, S2, S3, S5, S6, S7, S9, S11, S12, S13, S14, S15, S16, S17, S18, S19, S21, and, S22 as defined in Table 4. The ACO algorithm selected features are S1, S2, S3, S5, S7, S10, S11, S13, S14, S15, S16, S17, S18, S19, S21, and S22 as defined in Table 4. Similarly, the features selected by Relief-ACO algorithm are S1, S2, S3, S16, S19, S17, S10, S11, S7, S18, S21, S22, and S14 as defined in Table 4. Any feature selection algorithm did not select the feature S20= spread.Therefore, the feature S20 is low impact in recognition of PD. The feature sets selected by Relief, ACO, and Relief-ACO are tabulated in Table 4 and graphically shown in Figs. 6, 7 and 8 respectively. The features selected by Filter based LLBFS [65], PCA [72] and Embedded based LASSO [69] algorithms are tabulated in Table 4. Furthermore, the parameters used by ACO are also tabulated in Table 5.

Table 4
Feature Selection by Relief, ACO, Relief-ACO, Filter based LLBFS, PCA and Embedded Based LASSO Algorithms

Relief ACO Relief-ACO LLBFS PCA LASSO

S1 S2 S2 S1 S1 S1

S2 S1 S1 S2 S2 S2

S6 S3 S3 S3 S3 S3

S13 S11 S16 S5 S5 S4

S3 S5 S19 S6 S6 S5

S9 S13 S17 S8 S9 S6

S712 S15 S10 S12 S13 S16

S22 S16 S11 S18 S16 S10

S21 S6 S7 S22 S20 S19

S19 S7 S18 S4 S18 S22

S14 S14 S21 S15 S17 S18

S17 S18 S22 S7 S8 S9

S18 S19 S14 S14 S15 S11

S16 S17 S4 S19 S11 S12

S5 S10 S16 S21 S20

S15 S21 S17 S22 S13

S22 S21 S4 S21

S13 S7

S9 S14

S10

Relief	ACO	Relief-ACO	LLBFS	PCA	LASSO
S1	S2	S2	S1	S1	S1
S2	S1	S1	S2	S2	S2
S6	S3	S3	S3	S3	S3
S13	S11	S16	S5	S5	S4
S3	S5	S19	S6	S6	S5
S9	S13	S17	S8	S9	S6
S712	S15	S10	S12	S13	S16
S22	S16	S11	S18	S16	S10
S21	S6	S7	S22	S20	S19
S19	S7	S18	S4	S18	S22
S14	S14	S21	S15	S17	S18
S17	S18	S22	S7	S8	S9
S18	S19	S14	S14	S15	S11
S16	S17	S4	S19	S11	S12
S5	S10		S16	S21	S20
S15	S21		S17	S22	S13
	S22		S21	S4	S21
			S13	S7
			S9	S14
			S10

Table 5

List of parameters of the ACO Algorithm

Parameter name and values	Parameter description
Number of iterations=100	The number of iteration described that how much time the graph is traversed for searching the optimal subset of features.
Size of populations=100	Size of population is number of ants creating for traversing the graph.
Initial Pheromone=1	It is the numbers of solutions completed at each node by each ant and each time the pheromone.
Cross Probability =1	It is the probability that any of several mutually exclusive events will occur is equal to the sum of events individual probabilities.
Mutation probability =1	The mutation probability value should be 1 otherwise no mutation will occur if selecting any pheromone will be produced.
Clusters numbers=300	Number of clusters creating in whole graph.
φ = 0.1	It represents the feature subset length. K=4
The numbers of ants in each cluster.
α = 0.2, β = 0.1, γ = 0.6	These are control parameters which control the feature weights at each node.
ρ = - .1	It is decaying constant applied to simulate the evaporation of the pheromone.

Fig.6

FS by Relief algorithm.

Fig.7

FS by ACO algorithm.

Fig.8

FS by Relief-ACO algorithm.

3.2.3 Classification performance without feature selection

A support vector machine has been applied in these experiments for Parkinson’s disease recognition. The k-folds Cross-Validation has been used for the hyper parameters tuning and for the best model selection and the value of k is 10 in our experiments. The performance of the support vector machine is high with RBF kernel on 10 folds cross-validation on full features along with hyper-parameter values of C = 2, and γ=0.019. These results are obtained on full feature sets such as classification accuracy 95%, specificity 94%, and sensitivity 100%. While the classification performance of SVM (linear) with features full set using hyperparameter pairs (C = 2, γ (1 =0.019) are better than the RBF kernel. SVM (linear) obtained average performance of classification with 10 folds validation in terms of Accuracy, Specificity and Sensitivity are 97%, 96%, and 100%, respectively. All these results have been tabulated in Table 6. Furthermore, classification results of SVM (RBF) and SVM (linear) have been demonstrated visually in the Fig. 9. The time of computation on the full feature set is also reported in Table 6.

Fig.9

Classification performance of the SVM without feature selection.

Table 6

Classifier performance with out feature selection

Classifier	Parameter	Performance evaluation metrics
	(C, γ)	Acc (%)	Spec (%)	Sen (%)	MCC(%)	Exe Time(s)
SVM(RBF)	1, 0.015	93	91	90	94	0.53
	2, 0.019	96	94	100	95	0.64
	1, 0.015	85	88	79	95	0.64
SVM(Linear)	1, 0.015	95	94	100	94	0.56
	10, 0.019	97	96	100	94	0.27
	1, 0.009	89	95	81	95	0.27

3.2.4 Classification performance of SVM on FS selected by relief, ACO, and relief-ACO, filter based LLBFS, PCA and embedded based LASSO algorithms

To recognize the PD on reduce feature subset successfully, different experiments have been conducted. In these experiments, the features sets selected by Relief, ACO and Relief-ACO algorithm have been used for effective training and testing of the classifier SVM. SVM along with different kernels have been utilized for classification along with optimized hyperparameter values of C and γ. For the generation of correct results, the k-folds cross-validation method has been applied for hyperparameters tuning and best model selection. Each of the k subsets acted as an independent holdout test set for the model trained with the remaining k-1 subset. The average of cross-validation that all of the test sets are independent, and the reliability of the results could be improved. Thus the experimental results of feature selected by Relief, ACO, and Relief-ACO with classifier SVM have been reported in Table 7. The experimental results show that the classifier classification performances are high on reduced feature sets as compared to the full feature set. According to Table 7 the Relief-SVM achieved 97.10% accuracy, 95% specificity, 98% sensitivity, 97% MCC, 97% F1-score and execution time is 0.072 seconds with RBF kernel while Relief-SVM(Linear) achieved 97.10% accuracy, 96% specificity, 99% sensitivity, 98% MCC, 97 F1-score and execution time is 0.162 seconds.The SVM performance on the selected features set selected by ACO is good. The ACO-SVM(RBF) achieved 98.20% 99%, 97%, 97%, 98%, 98 accuracy, specificity, sensitivity, MCC, F1-score respectively and execution time is 0.061 seconds. The SVM linear performance on the selected feature is also outstanding in terms of accuracy and achieved 98.90% accuracy. Thus, the RF-ACO-SVM achieved 99%, 100%, and 100% accuracy, specificity, and sensitivity respectively with hyper parameters values C =10 and γ = 0.005 using RBF kernel.

Table 7
SVM Classifier performance on selected features sets Selected by Relief, ACO, and Relief-ACO, Filter based LLBFS, PCA and Embedded Based LASSO Algorithms

FS algorithm Classifier Parameter Performance evaluation metrics

(C, γ) Acc(%) Spe(%) Sen(%) MCC(%) F1-score(%) Exe Time(s)

Relief SVM(RBF) 10, 0.0009 97.10 95 98 97 97 0.07

ACO 10, 0.007 97.70 96 99 98 97 0.06

Relief-ACO 10, 0.009 98.20 99 97 97 98 0.06

LLBFS 1, 0.009 97.51 88 90 96 97 0.09

PCA 1, 0.009 96.40 89 96 86 96 0.12

LASSO 1, 0.009 94.10 99 89 90 95 0.07

Relief SVM(Linear) 10, 0.009 98.90 98 96 98 98 0.05

ACO 10, 0.005 99.00 100 100 95 99 0.02

Relief-ACO 10, 0.005 99.50 100 100 97 99 0.01

LLBFS 1, 0.009 98.01 83 98 95 98 0.08

PCA 1, 0.009 97.40 85 93 96 95 0.12

LASSO 1, 0.009 96.10 97 89 93 96 0.09

FS algorithm	Classifier	Parameter	Performance evaluation metrics
Relief	SVM(RBF)	10, 0.0009	97.10	95	98	97	97	0.07
ACO		10, 0.007	97.70	96	99	98	97	0.06
Relief-ACO		10, 0.009	98.20	99	97	97	98	0.06
LLBFS		1, 0.009	97.51	88	90	96	97	0.09
PCA		1, 0.009	96.40	89	96	86	96	0.12
LASSO		1, 0.009	94.10	99	89	90	95	0.07
Relief	SVM(Linear)	10, 0.009	98.90	98	96	98	98	0.05
ACO		10, 0.005	99.00	100	100	95	99	0.02
Relief-ACO		10, 0.005	99.50	100	100	97	99	0.01
LLBFS		1, 0.009	98.01	83	98	95	98	0.08
PCA		1, 0.009	97.40	85	93	96	95	0.12
LASSO		1, 0.009	96.10	97	89	93	96	0.09

Similarly, on the same hyper parameter values with a linear kernel, achieved 99.50%, 100% specificity, 100% sensitivity. The accuracy of SVM (Linear) on the selected feature by Relief-ACO is high as compared to the accuracy of SVM on other feature selection algorithms. The Relief-ACO-SVM achieved 99.50% accuracy which shows that the hybrid feature selection algorithm chooses more suitable features for recognition of PD. The specificity 100% shows that the system accurately detected the health people. Similarly, the Relief-ACO-SVM sensitivity is 100%, which demonstrated that it effectively recognized the PD patients. The classification accuracy of Relief-SVM, ACO-SVM, and Relief-ACO-SVM has graphically shown in Figure 10 which demonstrated that the Relief-ACO-SVM performance in term of accuracy is high as compared to Relief-SVM and ACO-SVM. The execution time of Relief-ACO-SVM also lows as compared to the execution times of Relief-SVM and ACO-SVM. The execution time of the three feature selection algorithms with classifier SVM shown graphically in Fig. 11. In Fig. 12 execution of SVM on the full feature set and on selected feature set selected by Relief-ACO) has been graphically demonstrated. The Classification performances of Classifier SVM on features sets selected by Filter based LLBFS, PCA and Embedded Based LASSO Algorithms have been reported in Table 7. From all experimental results analysis, we suggest that Relief-ACO-SVM method is suitable for effective recognition of the PD. Additionally, the proposed hybrid feature selection algorithm (Relief-ACO) is suitable for relevant feature selection and intelligently tackle the feature selection problem.

Fig.10

Accuracy of the SVM on selected features.

Fig.11

Execution time of the SVM on selected features sets.

Fig.12

Execution time of the SVM on the full features set and on features set selected by Relief-ACO.

3.2.5 Classification performance of other ML classifiers on FS sets selected by relief, ACO, and relief-ACO, filter based LLBFS, PCA and embedded based LASSO algorithms

The classification performance of other ML classifiers such as Logistic regression(LR), K-Nearest Neighbor(K-NN) and Decision Tree(DT), Naive Bayas (NB) and Radmom Forest (RF) also have been checked on feature sets selected by Relief, ACO, and Relief-ACO, Filter based LLBFS, PCA and Embedded Based LASSO FS algorithms with different hyper-parameters values of these classifiers. In Table 8 the classification performances of these classifiers have been reported. According to Tables 7 and 8, the classification performances of the SVM is higher then the LR, K-NN, DT, NB and RF classifiers. Therefore, SVM is more suitable classifier for classification of PD and healthy people. Relif-ACO-SVM method is more higher in performances then Relief-SVM, ACO-SVM, and with other Filter based LLBFS, PCA and Embedded Based LASSO FS Algorithms. Furthermore, Relief-ACO algorithm is more suitable for adequate features selection from PD dataset.

Table 8
Classifier LR, K-NN, DT, NB and RF performance on selected features sets Selected by Relief, ACO, and Relief-ACO, Filter based LLBFS, PCA and Embedded Based LASSO Algorithms

FS algorithm Classifier Parameter Performance evaluation metrics

(C, K) Acc(%) Spe(%) Sen(%) MCC(%) F1-score(%) Exe Time(s)

Relief LR 10, - 90.01 95 95 93 95 0.10

K-NN -, 7 89.44 92 94 95 94 0.12

DT -, - 90.33 93 96 94 93 0.21

NB -, - 88.50 99 99 97 97 0.23

RF -, - 91.30 95 95 91 95 0.22

ACO LR 10, - 93.56 95 96 96 95 0.09

K-NN -, 5 87.21 96 97 97 98 0.10

DT -, - 96.60 94 97 96 99 0.09

NB -, - 94.10 99 88 95 97 0.06

RF -, - 93.50 95 98 97 97 0.05

Relief-ACO LR 1, - 96.50 95 93 96 97 0.08

K-NN -, 5 85.30 95 98 97 94 0.05

DT -, - 97.10 98 98 97 97 0.09

NB -, - 95.89 95 98 97 97 0.07

RF -, - 96.10 96 99 77 97 0.08

LLBFS LR 10, - 92.11 99 93 94 92 0.02

K-NN -, 5 87.50 90 91 92 88 0.05

DT -, - 94.33 93 86 91 91 0.14

NB -, - 87.90 96 94 93 87 0.16

RF -, - 90.50 95 91 95 90 0.19

PCA LR 10, - 94.09 95 90 92 93 0.03

K-NN -, 9 87.30 88 71 96 88 0.10

DT -, - 90.33 91 81 90 90 0.12

NB -, - 86.60 92 90 91 87 0.22

RF -, - 96.30 91 92 99 97 0.19

LASSO LR 10, - 94.01 100 90 94 94 0.07

K-NN -, 7 86.58 98 92 99 87 0.06

DT -, - 95.33 92 82 92 95 0.10

NB -, - 86.94 96 94 93 87 0.16

RF -, - 95.10 91 921 96 95 0.18

FS algorithm	Classifier	Parameter	Performance evaluation metrics
Relief	LR	10, -	90.01	95	95	93	95	0.10
	K-NN	-, 7	89.44	92	94	95	94	0.12
	DT	-, -	90.33	93	96	94	93	0.21
	NB	-, -	88.50	99	99	97	97	0.23
	RF	-, -	91.30	95	95	91	95	0.22
ACO	LR	10, -	93.56	95	96	96	95	0.09
	K-NN	-, 5	87.21	96	97	97	98	0.10
	DT	-, -	96.60	94	97	96	99	0.09
	NB	-, -	94.10	99	88	95	97	0.06
	RF	-, -	93.50	95	98	97	97	0.05
Relief-ACO	LR	1, -	96.50	95	93	96	97	0.08
	K-NN	-, 5	85.30	95	98	97	94	0.05
	DT	-, -	97.10	98	98	97	97	0.09
	NB	-, -	95.89	95	98	97	97	0.07
	RF	-, -	96.10	96	99	77	97	0.08
LLBFS	LR	10, -	92.11	99	93	94	92	0.02
	K-NN	-, 5	87.50	90	91	92	88	0.05
	DT	-, -	94.33	93	86	91	91	0.14
	NB	-, -	87.90	96	94	93	87	0.16
	RF	-, -	90.50	95	91	95	90	0.19
PCA	LR	10, -	94.09	95	90	92	93	0.03
	K-NN	-, 9	87.30	88	71	96	88	0.10
	DT	-, -	90.33	91	81	90	90	0.12
	NB	-, -	86.60	92	90	91	87	0.22
	RF	-, -	96.30	91	92	99	97	0.19
LASSO	LR	10, -	94.01	100	90	94	94	0.07
	K-NN	-, 7	86.58	98	92	99	87	0.06
	DT	-, -	95.33	92	82	92	95	0.10
	NB	-, -	86.94	96	94	93	87	0.16
	RF	-, -	95.10	91	921	96	95	0.18

3.2.6 Recognition of PD using backward propagation neural network(BPNN)

In order to compare the performance of machine learning models with deep learning models we use BPNN for classification problem. The training parameters are updated of BPNN in order to generate high classification results. Therefore, different number of hidden layers, hidden neurons, learning rate and epochs are applied for producing excellent result in our experiments. In Table 9 the BPNN architectures of different networks are given such as BPNN1, BPNN2, and BPNN3.These networks are trained and validated with full features set and on features set selected by Relief-ACO. According to Table 9 the performance of BPNN2 is high and achieved 96.00% classification accuracy. Thus deep neural network performance is not high as compared to transitional machine learning classifiers. The traditional classification algorithm SVM according to Table 6 achieved 99.50% accuracy on selected features set. However, deep backward neural network no need feature selection for classification. Deep neural network automatically select import features for improving the result of classification. These are great advantages of deep neural network. However, in our experiments, the Deep neural network performances are not good comparatively to Machine learning models because DNN require more numbers of instances for training the model effectively. The dataset used in our experiments have 197 instances which are insufficient for training the model of DNN to achieved good results.Therefor, ML models are more suitable in case of small dataset.

Table 9
Training parameters for BPNs

Nework BPN1 BPN2 BPN3

Training instances 160 160 160

Validating instances 35 35 35

Learning rate 0.0110 0.0001 0.0101

Activation function relu relu relu

Epochs 200 600 700

Training Time(s) 120 150 200

Accuracy (%) 94.00 96.00 91.00

Nework	BPN1	BPN2	BPN3
Training instances	160	160	160
Validating instances	35	35	35
Learning rate	0.0110	0.0001	0.0101
Activation function	relu	relu	relu
Epochs	200	600	700
Training Time(s)	120	150	200
Accuracy (%)	94.00	96.00	91.00

3.2.7 Statistical analysis for predictive models comparison

Statistically, we applied McNemar’s test to compare the performance of different predictive models. In our experiments the hypothesis setting in such a way that H₀ : n₀₁ = n₁₀, if the performance of the predictive models is the same accuracy. And H₁ : n₀₁ ≠ n₁₀, the alternate hypothesis, the two model have different accuracy. To validate the null and alternate hypothesis we calculate the test statistic or p-value for different models using McNemar’s test. The value of alpha for all experiments is 0.5, and confidence level 95%. Thus on the basis of p-value and alpha we accept or reject the null hypothesis on the following conditions If P > α : then H₀ is fail to reject, the models have no difference. If p < = α: then H₀ is rejected and alternate H₁ is accepted the models have difference performance when trained on the particular training set R. The experimental results of test-statistic or p-value are computed for each model and reported in Table 10 and level of significant is 0.5. The p-value of SVM is 0.4 which is less then alpha so this model is significant.The LR, K-NN, DT, NB, RF and BPNN p-values are 0.19, 0.30, 0.29, 0.35, 0.33, 0.31 which are less then 0.5, thus these models are significantly different from each other and null hypothesis is rejected. The the McNemar’s test statistic p value of SVM is near to alpha as compared to other models. Therefore, p=0.4 <0.5 so null H₀ is rejected it means that models are not significantly similarly and different exist in terms of accuracy between the these models.

Table 10
P-Value of the Predictive model

Predictive model Ac (%) p-value

SVM 99.50 0.04

LR 96.50 0.19

k-NN 85.00 0.30

DT 97.10 0.29

NB 95.89 0.35

RF 96.10 0.33

BPNN 96.00 0.31

Predictive model	Ac (%)	p-value
SVM	99.50	0.04
LR	96.50	0.19
k-NN	85.00	0.30
DT	97.10	0.29
NB	95.89	0.35
RF	96.10	0.33
BPNN	96.00	0.31

On the basis of these statistical results, we concluded that SVM significantly outperform other counterparts in term of accuracy, thus the proposed Relief-ACO-SVM method is suitable for the PD recognition. Thus, the selection of more appropriate features using the Relief-ACO FS algorithm and classifier SVM helped the model to effectively diagnose PD. The proposed FS algorithm has selected features including MDVP:Fhi(Hz), MDVP:Fo(Hz), jitter MDVP: Flo (Hz), HNR, DFA, RPDE, MDVP: Shimmer (dB), DVP.Shimmer(dB), MDVP.PPQ, D2, spread2, PPE, and Shimmer.DDA. In this research, we did not choose the feature spread as it has less impact in the prediction of PD. Additionally, the nonlinear features as well as jitter, and HNR are employed to discriminate the PD. A highly relevant nonlinear feature is RPDE. This feature quantifies the uncertainty in the measurement of the pitch period and it has been identified by the proposed method as a relevant feature for the PD detection. Nonlinear features are not only efficient for PD detection, but also they can identify other voice-related diseases. In a nutshell, the proposed method can be useful for detection of PD especially in early detection of the PD.

3.2.8 Performance comparison with baseline methods

In this section, we evaluate the performance of the proposed method (R-ACO-SVM) with other baseline methods. Table 11 demonstrates the accuracy of various counterparts against our model. All these studies used various techniques for PD and healthy people classification. However, the proposed classification system R-ACO-SVM performance is excellent and achieved a high accuracy of 99.50% comparatively. Additionally, the specificity of the R-ACO-SVM is 100%, which shows that the system effectively detected the healthy people. Similarly, the proposed model gained 100% sensitivity, which is significant for the diagnosis of the PD subjects. Also, the execution time of the proposed method is 0.01 seconds. The performance of the proposed method can be credited to the utilization of relevant features.

Table 11
Performance comparison of the our method with other counterparts

Reference Method Accuracy (%) Ex-Time(s)

[49] TSCA 86.20 670

[6] RF-BFO-SVM 97.42 -

[51] mRMR-ANN 98.12 -

[26] L1-Norm-SVM and CPD 99 0.22

our method R-ACO-SVM 99.50 0.01

Reference	Method	Accuracy (%)	Ex-Time(s)
[49]	TSCA	86.20	670
[6]	RF-BFO-SVM	97.42	-
[51]	mRMR-ANN	98.12	-
[26]	L1-Norm-SVM and CPD	99	0.22
our method	R-ACO-SVM	99.50	0.01

4 Conclusion and future work

Parkinson’s disease is a dangerous human disease, and numerous people have been suffered from this disease around the world. Therefore, reliable technique is required for an adequate recognition of the PD. In this article, we proposed a reliable method of Parkinson’s disease recognition using appropriate machine learning approaches. Specifically, SVM has been applied for the classification of Parkinson’s disease and healthy subjects. Relief and ACO based integrated method has been adopted for the selection of related features. Additionally, K-folds cross validation method has been used for optimal value selection of Hyper-parameters for the best model. Furthermore, evaluation metrics have been used to evaluate the performance of proposed model. The experimental results demonstrate that the R-ACO-SVM classifies Parkinson’s disease and healthy subjects correctly. The High performance of our method is due to the exploitation of adequate features selected by the hybrid feature selection algorithm (Relief-ACO). Additionally, the selected features set by the Relief-ACO algorithm is demonstrated critically significant features which detect Parkinson’s disease accurately as compared to other feature selection algorithms. The proposed method R-ACO-SVM achieved excellent results in terms of accuracy and achieved 99.50% accuracy. Furthermore, the proposed method can be easily used in the healthcare organization. In future work, deep neural work techniques will be used to classify Parkinson’s diseases and healthy people because deep neural network automatically selects appropriate features for classification while machine learning algorithms require feature selection algorithm. The proposed method will be applied on the other datasets for the detection of similar kind diseases. The treatment and recovery after disease diagnosis is necessary, therefore, in future we will work on disease treatment and recovery.

Data Availability

The dataset used in this research work available on the UCI machine learning repository.

Conflicts of interest

The authors declare that they have no conflicts of interest.

Footnotes

Acknowledgment

This work was supported by the National Natural Science Foundation of China (Grant No. 61370073), the National High Technology Research and Development Program of China (Grant No. 2007AA01Z423), the project of Science and Technology Department of Sichuan Province.

References

Aghdam

M.H.

, Ghasem-Aghaee

, Basiri

M.E.

, Text feature selection using ant colony optimization, Expert Systems with Applications 36 (2009), 6843–6853.

Ball

J.R.

, Balogh

, Improving diagnosis in health care: highlights of a report from the national academies of sciences, engineering, and medicine, Annals of Internal Medicine 164 (2016), 59–61.

Blum

, Dorigo

, The hyper-cube framework for ant colony optimization, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34 (2004), 1161–1172.

Bodrova

T.A.

, Kostyushev

D.S.

, Antonova

E.N.

, Slavin

, Gnatenko

D.A.

, Bocharova

M.O.

, Legg

, Pozzilli

, Paltsev

M.A.

, Suchkov

S.V.

, Introduction into pppm as a newparadigm of public health service: an integrative view, EPMA Journal 3 (2012), 16.

Bonabeau

, Dorigo

, Marco

D.d.R.D.F.

, Theraulaz

, Théraulaz

, et al., Swarm intelligence: from natural to artificial systems. 1, Oxford university press. 1999.

Cai

, Gu

, Chen

H.L.

, A new hybrid intelligent framework for predicting parkinson’s disease, IEEE Access 5 (2017), 17188–17200.

Cantürk

, Karabiber

, A machine learning system for the diagnosis of parkinson’s disease from speech signals and its application to multiple speech signal types, Arabian Journal for Science and Engineering 41 (2016), 5049–5059.

Chang

C.C.

, Lin

C.J.

, Libsvm: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology (TIST) 2 (2011a), 27.

Chang

C.C.

, Lin

C.J.

, Libsvm: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology (TIST) 2 (2011b), 27.

10.

Chen

H.L.

, Wang

, Ma

, Cai

Z.N.

, Liu

W.B.

, Wang

S.J.

, An efficient hybrid kernel extreme learning machine approach for early diagnosis of parkinson s disease, Neurocomputing 184 (2016), 131–144.

11.

Chen

H.L.

, Yang

, Liu

D.Y.

, A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis, Expert Systems with Applications 38 (2011), 9014–9022.

12.

for Chronic Conditions (Great Britain), N.C.C., 2006. Parkinson’s disease: national clinical guideline for diagnosis and management in primary and secondary care, Royal College of Physicians.

13.

Cristianini

, Shawe-Taylor

, et al., An introduction to support vector machines and other kernel-based learning methods, Cambridge university press. 2000.

14.

da Cruz

L.B.

, Souza

J.C.

, de Sousa

J.A.

, Santos

A.M.

, de Paiva

A.C.

, de Almeida

J.D.S.

, Silva

A.C.

, Junior

G.B.

and Gattass

, Interferometer eye image classification for dry eye categorization using phylogenetic diversity indexes for texture analysis, Computer Methods and Programs in Biomedicine (2019), 105269.

15.

Das

, A comparison of multiple classification methods for diagnosis of parkinson disease, Expert Systems with Applications 37 (2010), 1568–1572.

16.

Dorigo

, Optimization, learning and natural algorithms. PhD Thesis, Politecnico di Milano. 1992.

17.

Dorigo

, Maniezzo

, Colorni

, et al., Ant system: optimization by a colony of cooperating agents, IEEE Transactions on Systems, man, and cybernetics, Part B: Cybernetics 26 (1996), 29–41.

18.

Duffy

, Motor speech disorders: Substrates, differential diagnosis, and management 2nd edition (st louis, mo: Mosby). Google Scholar. 2005.

19.

Emary

, Zawbaa

H.M.

, Hassanien

A.E.

, Binary ant lion approaches for feature selection, Neurocomputing 213 (2016), 54–65.

20.

Gambardella

L.M.

, Dorigo

, Ant-q: A reinforcement learning approach to the traveling salesman problem, in: Machine Learning Proceedings 1995. Elsevier, pp. 252–260.

21.

Gambardella

L.M.

, Dorigo

, Solving symmetric and asymmetric tsps by ant colonies, Proceedings of IEEE international conference on evolutionary computation, IEEE (1996), pp. 622–627.

22.

Gao

H.H.

, Yang

H.H.

, Wang

X.Y.

, Ant colony optimization based network intrusion feature selection and detection, in: 2005 international conference on machine learning and cybernetics, IEEE (2005), 3871–3875.

23.

Gök

, An ensemble of k-nearest neighbours algorithm for detection of parkinson’s disease, International Journal of Systems Science 46 (2015), 1108–1112.

24.

Haq

A.U.

, Li

, Memon

M.H.

, Khan

, Din

S.U.

, Ahad

, Sun

, Lai

, Comparative analysis of the classification performance of machine learning classifiers and deep neural network classifier for prediction of parkinson disease, 2018 15th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), IEEE (2018a), pp.101–106.

25.

Haq

A.U.

, Li

J.P.

, Memon

M.H.

, Khan

, Ud Din

, A novel integrated diagnosis method for breast cancer detection, Journal of Intelligent Fuzzy Systems 2019.

26.

Haq

A.U.

, Li

J.P.

, Memon

M.H.

, Malik

, Ahmad

, Ali

, Nazir

, Ahad

, Shahid

, et al., Feature selection based on l1-norm support vector machine and effective recognition system for parkinson’s disease using voice recordings, IEEE Access 7 (2019b), 37718–37734.

27.

Haq

A.U.

, Li

J.P.

, Memon

M.H.

, Nazir

, Sun

, A hybrid intelligent system framework for the prediction of heart disease using machine learning algorithms, Mobile Information Systems 2018b.

28.

Hariharan

, Polat

, Sindhu

, A new hybrid intelligent system for accurate detection of parkinson’s disease, Computer Methods and Programs in Biomedicine 113 (2014), 904–913.

29.

Hoang

N.D.

, Bui

D.T.

, Liao

K.W.

, Groutability estimation of grouting processes with cement grouts using differential flower pollination optimized support vector machine, Applied Soft Computing 45 (2016), 173–186.

30.

Howell

, When technology is too hot, too cold or just right, The Emerging Learning Design Journal 5 (2017), 2.

31.

Hsu

C.W.

, Lin

C.J.

, A comparison of methods for multiclass support vector machines, IEEE transactions on Neural Networks 13 (2002), 415–425.

32.

Huang

C.L.

, Aco-based hybrid classification system with feature subset selection and model parameters optimization, Neurocomputing 73 (2009), 438–448.

33.

Huang

C.L.

, Wang

C.J.

, A ga-based feature selection and parameters optimization for support vector machines, Expert Systems with Applications 31 (2006), 231–240.

34.

Huang

G.R.

, Cao

X.B.

, Wang

X.F.

, An ant colony optimization algorithm based on pheromone diffusion, ACTA ELECTRONICA SINICA 32 (2004), 865–868.

35.

Jensen

, Combining rough and fuzzy sets for feature selection. 2005.

36.

Kanan

H.R.

, Faez

, Hosseinzadeh

, Face recognition system using ant colony optimization-based selected features, in: 2007 IEEE Symposium on Computational Intelligence in Security and Defense Applications, IEEE. (2007), pp. 57–62.

37.

Kashef

, Nezamabadi-pour

, An advanced aco algorithm for feature subset selection, Neurocomputing 147 (2015), 271–279.

38.

Kotsiantis

, Kanellopoulos

, Pintelas

, Data preprocessing for supervised leaning, International Journal of Computer Science 1 (2006), 111–117.

39.

Leguizamon

, Michalewicz

, A new version of ant system for subset problems, in: Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406), IEEE (1999), pp. 1459–1464.

40.

D.C.

, Liu

C.W.

, Hu

S.C.

, A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets, Artificial Intelligence in Medicine 52 (2011), 45–52.

41.

Lin

S.W.

, Ying

K.C.

, Chen

S.C.

, Lee

Z.J.

, Particle swarm optimization for parameter determination and feature selection of support vector machines, Expert Systems with Applications 35 (2008), 1817–1824.

42.

Little

, McSharry

, Hunter

, Spielman

, Ramig

, Suitability of dysphonia measurements for telemonitoring of parkinson’s disease, Nature Precedings (2008), 1.

43.

Littlei

, Parkinson

2019.

, disease dataset,” uci machine learningrepository.

44.

Memon

M.H.

, Li

J.P.

, Haq

A.U.

, Memon

M.H.

, Zhou

, Breast cancer detection in the iot health environment using modified recursive feature selection. Wireless Communications and Mobile Computing 2019.

45.

Mladeni¢

, Feature selection for dimensionality reduction, in: International Statistical and Otimization Perspectives Workshop” Subspace, Latent Structure and Feature Selection”, Springer. (2005), pp. 84–102.

46.

Molina

L.C.

, Belanche

, Nebot

À.

, Feature selection algorithms: A survey and experimental evaluation, in: 2002 IEEE International Conference on Data Mining, 2002. Proceedings., IEEE. (2002), pp. 306–313.

47.

Montemanni

, Gambardella

L.M.

, Rizzoli

A.E.

, Donati

A.V.

, A new algorithm for a dynamic vehicle routing problem based on ant colony system, in: Second international workshop on freight transportation and logistics (2003), pp. 27–30.

48.

Mourao-Miranda

, Bokde

A.L.

, Born

, Hampel

, Stetter

, Classifying brain states and determining the discriminating activation patterns: support vector machine on functional mri data, NeuroImage 28 (2005), 980–995.

49.

Naranjo

, Pérez

C.J.

, Martín

, Campos-Roca

, A twostage variable selection and classification approach for parkinson’s disease detection by using voice recording replications, Computer Methods and Programs in Biomedicine 142 (2017), 147–156.

50.

Pawlak

, Imprecise categories, approximations and rough sets, in: Rough sets. Springer, (1991), pp. 9–32.

51.

Peker

, Sen

, Delen

, Computer-aided diagnosis of parkinson’s disease using complex-valued neural networks and mrmr feature selection algorithm, Journal of Healthcare Engineering 6 (2015), 281–302.

52.

Peng

, Long

, Ding

, Feature selection based on mutual information: criteria of max-dependency, max-relevance, and minredundancy, IEEE Transactions on Pattern Analysis & Machine Intelligence (2005), 1226–1238.

53.

Pernkopf

, Bayesian network classifiers versus selective k-nn classifier, Pattern Recognition 38 (2005), 1–10.

54.

Qureshi

A.U.H.

, Larijani

, Mtetwa

, Javed

, Ahmad

, et al., Rnn-abc: A new swarm optimization based technique for anomaly detection, Computers 8 (2019), 59.

55.

Rashno

, Ahadi

S.M.

, Kelarestaghi

, Text-independent speaker verification with ant colony optimization feature selection and support vector machine, in: 2015 2nd International Conference on Pattern Recognition and Image Analysis (IPRIA), IEEE (2015), pp. 1–5.

56.

Ronald B Postuma † Daniela Berg, Charles H Adler, B.R.B.P.C.G.D.e.a.,. The new definition and diagnostic criteria of parkinson’s disease. The LANCET NEUROLOGY.

57.

Sakar

B.E.

, Isenkul

M.E.

, Sakar

C.O.

, Sertbas

, Gurgen

, Delil

, Apaydin

, Kursun

, Collection and analysis of a parkinson speech dataset with multiple types of sound recordings, IEEE Journal of Biomedical and Health Informatics 17 (2013), 828–834.

58.

Sakar

C.O.

, Kursun

, Telediagnosis of parkinson’s disease using measurements of dysphonia, Journal of Medical Systems 34 (2010), 591–599.

59.

Sánchez

V.D.

, A, Advanced support vector machines and kernel methods, Neurocomputing 55 (2003), 5–20.

60.

Shen

, Chen

, Yu

, Kang

, Zhang

, Li

, Yang

, Liu

, Evolving support vector machines using fruit fly optimization for medical data classification, Knowledge-Based Systems 96 (2016), 61–75.

61.

Silva

A.M.D.

, 2015. Feature selection. Springer 13, 1–13.

62.

Singh

, Pillay

, Choonara

Y.E.

, Advances in the treatment of parkinson’s disease, Progress in Neurobiology 81 (2007), 29–44.

63.

Spadoto

A.A.

, Guido

R.C.

, Carnevali

F.L.

, Pagnin

A.F.

, Falcão

A.X.

, Papa

J.P.

, Improving parkinson’s disease identification through evolutionary-based feature selection, in: 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, IEEE (2011), pp. 7857–7860.

64.

Sun

, Todorovic

, Goodison

, Local-learning-based feature selection for high-dimensional data analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (2009a), 1610–1626.

65.

Sun

, Todorovic

, Goodison

, Local-learning-based feature selection for high-dimensional data analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (2009b), 1610–1626.

66.

Tahir

, Ahmad

, Shah

S.A.

, Morison

, Skelton

D.A.

, Larijani

, Abbasi

Q.H.

, Imran

M.A.

, Gibson

R.M.

, Wifreeze: Multiresolution scalograms for freezing of gait detection in parkinson’s leveraging 5g spectrum with deep learning, Electronics 8 (2019), 1433.

67.

Tharwat

, Gabel

, Hassanien

A.E.

, Classification of toxicity effects of biotransformed hepatic drugs using optimized support vector machine, in: International conference on advanced intelligent systems and informatics, Springer. (2017), pp. 161–170.

68.

Tibshirani

, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological) 58 (1996a), 267–288.

69.

Tibshirani

, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society: Series B (Methodological) 58 (1996b), 267–288.

70.

Tsanas

, Little

M.A.

, McSharry

P.E.

, Ramig

L.O.

, Nonlinear speech analysis algorithms mapped to a standard metric achieve clinically useful quantification of average parkinson’s disease symptom severity, Journal of the Royal Society Interface 8 (2010), 842–855.

71.

Urbanowicz

R.J.

, Meeker

, La Cava

, Olson

R.S.

and Moore

J.H.

, Relief-based feature selection: introduction and review, Journal of Biomedical Informatics 85 (2018), 189–203.

72.

Vidal

, Ma

, Sastry

S.S.

, Principal component analysis, in: Generalizedrincipal component analysis. Springer, (2016), pp. 25–62.

73.

Wagacha

P.W.

, Induction of decision trees, Foundations of Learning and Adaptive Systems (2003), 12.

74.

, Kumar

, Quinlan

J.R.

, Ghosh

, Yang

, Motoda

, McLachlan

G.J.

, Ng

, Liu

, Philip

S.Y.

, et al., Top 10 algorithms in data mining, Knowledge and Information Systems 14 (2008), 1–37.

Recognition of the parkinson’s disease using a hybrid feature selection approach

Abstract

Keywords

1 Introduction

2.2 Classification using support vector machine

2.3.1 Pre-processing and relief algorithm

2.3.3 Ant colony optimization for the selection of features

2.3.4 Ant graph representation

2.3.8 Proposed integrated feature selection algorithm

3.1 Experimental setup

3.2 Experimental results

3.2.1 Pre-processing of the dataset

Table 9 Training parameters for BPNs Nework BPN1 BPN2 BPN3 Training instances 160 160 160 Validating instances 35 35 35 Learning rate 0.0110 0.0001 0.0101 Activation function relu relu relu Epochs 200 600 700 Training Time(s) 120 150 200 Accuracy (%) 94.00 96.00 91.00

Table 10 P-Value of the Predictive model Predictive model Ac (%) p-value SVM 99.50 0.04 LR 96.50 0.19 k-NN 85.00 0.30 DT 97.10 0.29 NB 95.89 0.35 RF 96.10 0.33 BPNN 96.00 0.31

Table 11 Performance comparison of the our method with other counterparts Reference Method Accuracy (%) Ex-Time(s) [49] TSCA 86.20 670 [6] RF-BFO-SVM 97.42 - [51] mRMR-ANN 98.12 - [26] L1-Norm-SVM and CPD 99 0.22 our method R-ACO-SVM 99.50 0.01

Data Availability

Conflicts of interest

Footnotes

Acknowledgment

References

Table 9
Training parameters for BPNs

Nework BPN1 BPN2 BPN3

Training instances 160 160 160

Validating instances 35 35 35

Learning rate 0.0110 0.0001 0.0101

Activation function relu relu relu

Epochs 200 600 700

Training Time(s) 120 150 200

Accuracy (%) 94.00 96.00 91.00

Table 10
P-Value of the Predictive model

Predictive model Ac (%) p-value

SVM 99.50 0.04

LR 96.50 0.19

k-NN 85.00 0.30

DT 97.10 0.29

NB 95.89 0.35

RF 96.10 0.33

BPNN 96.00 0.31

Table 11
Performance comparison of the our method with other counterparts

Reference Method Accuracy (%) Ex-Time(s)

[49] TSCA 86.20 670

[6] RF-BFO-SVM 97.42 -

[51] mRMR-ANN 98.12 -

[26] L1-Norm-SVM and CPD 99 0.22

our method R-ACO-SVM 99.50 0.01