Feature selection by a distance measure method of subnormal and non-convex fuzzy sets

Abstract

Distance measures of fuzzy sets have been developed for feature selection and finding redundant features in the fields of decision-making, prediction, and classification problems. Terms commonly used in the definition of fuzzy sets are normal and convex fuzzy sets. This paper extends the general fuzzy set definitions to subnormal and non-convex fuzzy sets that are more precise when implementing uncertain knowledge representations by weighing fuzzy membership functions. A distance measure method for subnormal and non-convex fuzzy sets is proposed for embedded feature selection. Constructing fuzzy membership functions and extracting fuzzy rules play a critical role in fuzzy classification systems. The weighted fuzzy membership functions prevent the combinatorial explosion of fuzzy rules in multiple fuzzy rule-based systems. The proposed method was validated by a comparison with two other methods. Our proposed method demonstrated higher accuracies in training and test, with scores of 97.95% and 93.98%, respectively, compared to the other two methods.

Keywords

Embedded feature selection sub-normal fuzzy sets non-covex fuzzy sets distance measures bounded sum fuzzy neural networks

1 Introduction

Decision-making, prediction, and classification problems dealing with uncertainty in nature are complex and hard to define at the human cognitive level. Fuzzy sets, introduced by Zadeh [1] to characterize the vagueness of uncertain knowledge, enable humans to interface with computers. A fuzzy set uses linguistic variables (LVs) and membership functions to define input–output attributes for inference of logical reasoning. The construction methods of fuzzy membership functions determine the fuzzy logic system’s performance by tuning parameters or linguistic hedges of the fuzzy membership functions. A fuzzy logic system can initially be designed by experts directly constructing fuzzy membership functions of LVs. However, methods to optimize fuzzy membership functions for the automatic tuning of fuzzy models are required. Neuro-fuzzy systems have been suggested for developing fuzzy logic systems as an alternative to neural network systems because they are powerful in automatic learning with high-performance gains without loss of readability.

The adaptive network-based fuzzy inference system [2] and Kosko’s adaptive fuzzy associative memory [19] are neuro-fuzzy approaches to approximate the automatic formation of the linguistic rule base. Zadeh [3] introduced powering modifiers called linguistic hedges that emphasize the importance of fuzzy sets. The linguistic hedges were applied to an adaptive neuro-fuzzy classifier that simplifies the characteristics of overlapping classes with feature selection using the scaled conjugate gradient algorithm [4, 5].

Feature selection approach [6, 7] is another major area wherein computational complexity can be reduced and discriminating ability of fuzzy logic systems can be increased by decreasing the feature vector dimensionality when redundant or irrelevant features exist. The selected features that preserve the original meaning are required to represent knowledge in fuzzy systems. The filter, wrapper, and embedded methods were the three types of feature selection that were categorized [6]. In contrast to the filter and wrapper methods, without consideration of the machine learning classifiers in the preprocessing step, the embedded methods perform feature selection while the machine learning classifiers are processing the classification procedure. The embedded methods are less computationally intensive than the wrapper methods and have greater interaction with the classifier than the filter methods [8].

To discriminate between fuzzy sets for feature evaluation, distance measures have been applied to select features in the machine learning preprocessing step [9, 10]. Decision-making or inference to formalize human reasoning has been emulated by fuzzy sets that are implicitly normal and convex LVs in a triangular or trapezoidal shape or other types of curve, including linguistic hedges. The subnormal and non-convex fuzzy sets imply rich aggregated information of LVs and are appropriate for modeling at a human cognitive level in fuzzy sets. The ways of distance measures of subnormal and non-convex fuzzy sets have been based on the traditional α-cut [11] or vertical slice approaches [9]; the more appropriate methods are required later for accurate assessment. This paper proposes a distance measure method to select the most salient features for supervised learning in classification problems.

Generally, inter-class distance measures for evaluating feature selection for learning perform better than probabilistic measures [12]. This paper introduces a new embedded feature selection method called distance–difference ratio (DDR) that efficiently measures distances between subnormal and non-convex membership functions learned from a neuro-fuzzy classifier.

A low temperature is not of much interest in the summer. Therefore, the maximum value of the low-temperature variable does not need to be a fuzzy value. However, the feverish temperature in the summer is significant and requires a higher weight for the high-temperature linguistic variable. This example shows that the subnormal LVs may contain more information than normal LVs. The subnormal LVs for an attribute in accordance with the suprema are represented by the weighted fuzzy membership functions (WFMs) [13]. The bounded sum of WFMs (BSWFMs) that combines the sub-normal LVs into one by bounded sum operation in fuzzy set theory forming convex or non-convex fuzzy sets was introduced in the neural network with weighted membership function (NEWFM) model [13]. As the BSWFM merges all WFMs into one for each attribute, when more attributes are added to a system, the high complexity of the fuzzy rules can be reduced to a linear number for efficient inference, resulting in greater accuracy. The simplicity of BSWFM with good interpretability and accuracy overcomes the weaknesses of fuzzy modeling in the Principle of Incompatibility [14] that interpretability and accuracy are conflicting attributes, and one overrides the other. Consequently, ranking the features by the distance measure by DDR between BSWFMs of features contributes to the selection of features for better accuracy or decision-making.

This paper verifies the concept of WFM in computational aspects for fuzzy classification systems. In addition, the new approach of a feature selection method based on the BSWFM is suggested. Parkinson’s disease classification tasks using the proposed feature selection in this paper are demonstrated. The results show that the proposed feature selection approach achieves an accuracy comparable to or better than that of state-of-the-art techniques, with a reduced computation time for the embedded method.

2 Definitions for BSWFM

Fuzzy sets introduced by Zadeh [1] have been extended in many fields with advanced notions and operations. Regular fuzzy sets overlook the inferencing of human reasoning. On the contrary, the subnormal and non-convex BSWFMs can emulate human reasoning by as they reflect a flexible human cognitive level. The main purpose of this paper is to demonstrate that the BSWFMs have powerful feature selection characteristics. The definitions and operations of fuzzy sets for explaining BSWFM are as follows:

(1) Subnormal

Let ${\bar{μ}}_{A}$ represent the supremum of a membership function μ_A over the universe of discourse such that ${\bar{μ}}_{A} = \cup^{Sup} μ_{A} (y) .$

A fuzzy set A is said to be subnormal if ${\bar{μ}}_{A}$ <1; A is normal if ${\bar{μ}}_{A}$ =1 [3].

(2) Convex and Non-convex

A fuzzy set A is convex only when the sets Γα, defined by $Γ_{α} = {x | f_{A} (x) ⩾ \propto}$ are convex for all α in the interval (0, 1] [1]. A fuzzy set A is non-convex if it is not convex. $M_{α} = {x | f_{A} (x) < \propto}$

(3) Bounded Sum of Subnormal Fuzzy Sets

The bounded sum [15] of subnormal fuzzy sets A, B, ... , and N with membership functions is the fuzzy set whose membership function is given by $\begin{matrix} μ_{A \oplus B \oplus \dots \oplus N} = \min {1, μ_{A} (u) + μ_{B} (u) \\ + \dots + μ_{N} (u)} \forall u \in U \end{matrix}$

(4) Weighted Fuzzy Membership Functions (WFMs)

Fuzzy membership functions learned by NEWFM are said to be WFMs that are subnormal and convex in triangular shapes. The height of a WFM ${\hat{μ}}_{A}$ , hgt( ${\hat{μ}}_{A}$ ), is the supremum of the membership grades of ${\hat{μ}}_{A}$ . A set of WFMs according to LVs of a feature is adapted by NEWFM. By learning the scheme of NEWFM, the x-axis values b, c, and d of hgt( ${\hat{μ}}_{A}$ ), hgt( ${\hat{μ}}_{B}$ ), and hgt( ${\hat{μ}}_{C}$ ) of Fig. 1, respectively, share the neighboring WFM’s left or right end-points. For example, ${\hat{μ}}_{A}$ and ${\hat{μ}}_{B}$ share the point of c such that the right end of ${\hat{μ}}_{A}$ and the x-axis value of hgt( ${\hat{μ}}_{B}$ ) are the same value of c on the x-axis.

Fig. 1

An example of a BSWFM $({\tilde{μ}}_{A \oplus B \oplus C})$ constructed by three WFMs $({\hat{μ}}_{A}, {\hat{μ}}_{B}, {\hat{μ}}_{C})$ for a feature.

(5) Bounded Sum of Weighted Fuzzy Membership Functions (BSWFMs)

A feature is represented by WFMs (usually 5± 2) according to their LVs. Through the operation of bounded sum of subnormal fuzzy sets, the WFMs are combined into one fuzzy membership function such that $\begin{matrix} {\tilde{μ}}_{A \oplus B \oplus \dots \oplus N} = \min {1, {\hat{μ}}_{A} (u) + {\hat{μ}}_{B} (u) \\ + \dots + {\hat{μ}}_{N} (u)} \forall u \in U \end{matrix}$

A BSWFM constructed by three WFMs $({\hat{μ}}_{A}, {\hat{μ}}_{B}, {\hat{μ}}_{C})$ in Fig. 1 can be represented by ${\tilde{μ}}_{A \oplus B \oplus C} = (a, b, c, d, e; hgt ({\hat{μ}}_{A}), hgt ({\hat{μ}}_{B}), hgt ({\hat{μ}}_{C}))$

–BSWFM is subnormal and either convex or non-convex:

Assuming that there are three WFMs, ( ${\hat{μ}}_{A}, {\hat{μ}}_{B}, {\hat{μ}}_{C}$ ), for a feature and hgt ( ${\hat{μ}}_{B}$ ) is the maximum among hgt( ${\hat{μ}}_{A}$ ), hgt( ${\hat{μ}}_{B}$ ), and hgt( ${\hat{μ}}_{C}$ ), and as hgt( ${\hat{μ}}_{B}) < 1$ and both $hgt ({\hat{μ}}_{A})$ and $hgt ({\hat{μ}}_{C})$ are less than $hgt ({\hat{μ}}_{B})$ , then ${\tilde{μ}}_{A \oplus B \oplus C} = \min {1, {\hat{μ}}_{A} (c) + {\hat{μ}}_{B} (c) + {\hat{μ}}_{C} (c)} = {\hat{μ}}_{B} (c) < 1$ Therefore, the BSWFM, ${\tilde{μ}}_{A \oplus B \oplus C}$ , is subnormal. Consequently, it is non-convex only if $hgt ({\hat{μ}}_{B}) < \min {hgt ({\hat{μ}}_{A}), hgt ({\hat{μ}}_{C})} .$

(6) Bounded Difference [11]

The difference of a pair ${\tilde{μ}}_{A}$ and ${\tilde{μ}}_{B}$ is measured by ${\tilde{μ}}_{A ⊖ B} = \max {0, {\tilde{μ}}_{A} (u) - {\tilde{μ}}_{B} (u)} \forall u \in U$

(7) Bounded Distance

An operator ø for the bounded distance of a pair ${\tilde{μ}}_{A}$ and ${\tilde{μ}}_{B}$ is said to be $\begin{matrix} {\tilde{μ}}_{A ø B} & = \max {0, | {\tilde{μ}}_{A} (u) - {\tilde{μ}}_{B} (u) |} \\ = {{\tilde{μ}}_{A ⊖ B} + {\tilde{μ}}_{B ⊖ A}} \forall u \in U \end{matrix}$

(8) Bounded Intersection [1]

The bounded intersection of two BSWFMs ${\tilde{μ}}_{A}$ and ${\tilde{μ}}_{B}$ is defined as ${\tilde{μ}}_{A \cap B} = \min {{\tilde{μ}}_{A} (u), {\tilde{μ}}_{B} (u)} \forall u \in U$

3 Feature selection by distance measure of BSWFMs

Decision-making or inference to formalize human reasoning has been emulated by fuzzy sets that are implicitly normal and convex LVs in a triangular or trapezoidal shape or even as types of curves, including linguistic hedges. Subnormal and non-convex BSWFMs imply rich aggregated information of LVs and are appropriate for modeling at a human cognitive level in fuzzy sets. The ways of distance measure of subnormal and non-convex fuzzy sets have been based on the traditional α-cut [11] or vertical slice approaches [9], and more appropriate methods are then required for accurate assessment. This paper proposes a distance measure method to select the most salient features in classification problems.

3.1 Distance-difference ratio (DDR) method for feature selection

Classification performance with respect to accuracy and speed in learning systems is significantly affected by the selection of salient features. This paper suggests a feature selection method, named DDR, that measures a distance using bounded distance and bounded difference of a pair of BSWFMs. The purpose of the DDR method is to find the distinguishing features among all input features that have the strongest power of classification.

Let ${\tilde{μ}}_{A}$ and ${\tilde{μ}}_{B}$ be a pair of BSWFMs of a feature for two classes sharing the leftmost and rightmost points, respectively. DDR measures the degree of discriminative power of a feature using a pair of BSWFMs ${\tilde{μ}}_{A}$ and ${\tilde{μ}}_{B}$ , as shown in Fig. 2. The higher value of DDR indicates a higher discriminative power for classification.

Fig. 2

Example of ${\tilde{μ}}_{A ø B} = {{\tilde{μ}}_{A ⊖ B} + {\tilde{μ}}_{B ⊖ A}}$ and ${\tilde{μ}}_{A \cap B}$ for a pair of ${\tilde{μ}}_{A}$ and ${\tilde{μ}}_{B}$ .

For a pair of ${\tilde{μ}}_{A}$ and ${\tilde{μ}}_{B}$ of an input feature, the regions of bounded differences, bounded distances, and bounded intersection are illustrated in Fig. 2. DDR is measured by the following equation $DDR ({\tilde{μ}}_{A}, {\tilde{μ}}_{B}) = (\frac{{\tilde{μ}}_{A ø B}}{{\tilde{μ}}_{A \cap B}}) / (\frac{1}{1 + e^{- | \frac{{\tilde{μ}}_{A ⊖ B} - {\tilde{μ}}_{B ⊖ A}}{{\tilde{μ}}_{A \cap B}} |}})$ where ${\tilde{μ}}_{A ø B}$ is the bounded distance, ${\tilde{μ}}_{A ⊖ B}$ and ${\tilde{μ}}_{B ⊖ A}$ are the bounded differences, and ${\tilde{μ}}_{A \cap B}$ is the bounded intersection of ${\tilde{μ}}_{A}$ and ${\tilde{μ}}_{B}$ .

This paper asserts that the DDR feature selection method for supervised learning problems can be achieved on the distance measure between a pair of BSWFMs with the following hypothesis.

A salient feature for a pair of BSWFMs ${\tilde{μ}}_{A}$ and ${\tilde{μ}}_{B}$ has properties such as

high value of ${\tilde{μ}}_{A ø B}$ ratio on ${\tilde{μ}}_{A \cap B}$

low value of $| {\tilde{μ}}_{A ⊖ B} - {\tilde{μ}}_{B ⊖ A} |$ ratio on ${\tilde{μ}}_{A \cap B}$

The evaluation of the above properties is accomplished by the DDR evaluating the value of feature sets.

3.2 Feature selection method in NEWFM

DDR feature selection is an embedded method that integrates with the training process in NEWFM. The method is based on backward selection by deleting the feature with the smallest DDR value in each iteration until the classification accuracy does not improve.

- Procedure for DDR Feature Selection

// Let ${\tilde{μ}}_{i, c}$ be BSWFM of fi (i-th feature) for class c,

where c = 1 or 2.

do{ GCA = 0

// GCA: global classification accuracy

while (GCA < DA)

{

// DA: predefined desired accuracy by user, decrease when it falls into an infinite loop

LMCA = 0

// LMCA: local maximum classification accuracy

for (i = 1ton){ // n: number of features

Besti = 0

// Besti: accumulated best cases of fi

Worsti = 0

// Worsti: accumulated worst cases of fi

}

for (i = 1 tom){ // m: number of trainings

LCA = 0

// LCA: local classification accuracy

construct a pair of ${\tilde{μ}}_{i, 1}$ and ${\tilde{μ}}_{i, 2}$ using NEWFM

for (j = 1ton) // n: number of features

adjust ${\tilde{μ}}_{j, 1}$ and ${\tilde{μ}}_{j, 2}$ for fj according to inputj by NEWFM

for (k = 1 to n) // n: number of features

calcualate ${DDR}_{k} ({\tilde{μ}}_{k, 1}, {\tilde{μ}}_{k, 2})$

calculateLCA using NEWFM

if (LCA > LMCA) then LMCA = LCA

// store greater LCA into LMCA

}

if (LMCA > GCA) thenGCA = LMCA

// update GCA by LMCA

Bestb = Bestb+1 such that fb has the highest ${DDR}_{k} ({\tilde{μ}}_{b, 1}, {\tilde{μ}}_{b, 2})$ among all features

Worstw = Worstw+1 such that fw has the lowest ${DDR}_{w} ({\tilde{μ}}_{w, 1}, {\tilde{μ}}_{w, 2})$ among all features

}

delete fm that has min(Bestm - Worstm) among all features // backward feature selection

}until (GCA cannot be increased by deleting the feature fm)

4 Experimental results

4.1 Dataset

The dataset used in the experiment was obtained from the UCI repository of machine learning databases [16]. The dataset was composed of 195 sustained vowel phonations from 31 subjects. Twenty-three of 31 subjects were Parkinson’s disease (PD) patients. Some patients with Parkinson’s disease had been diagnosed for less than one year while others had been diagnosed as long as 28 years. The patients were between 45 and 85 years of age. Each patient recorded an average of 6 times. The length of each recording ranged from 1 to 36 s, and the phonations were recorded as described in [17].

4.2 Experiment

The dataset was normalized in the range [0, 1] through three steps. In the first step, the normalization equation was as follows. $y_{i, j} = \frac{x_{i, j} - \min [x_{1, j} : x_{k, j}]}{\max [x_{1, j} : x_{k, j}] - \min [x_{1, j} : x_{k, j}]},$ where x_i,j is ith sample of jth feature, k is the number of samples in jth feature, and y_i,j is the normalized value of x_i,j in the first step. In the second step, a z-score formula was used to transform values from the first step to new values. The z-score was calculated by the following equation. $z_{i, j} = \frac{y_{i, j} - average [y_{1, j} : y_{k, j}]}{standard deviation [y_{1, j} : y_{k, j}]}$ The z-score z_i,j was rescaled in the range [0,1] using the following sigmoid formula in the third step. $r_{i, j} = \frac{1}{1 - e^{z_{i, j}}}$

NEWFM, which is a neuro-fuzzy learning algorithm, was used as the classifier distinguishing between normal and PD. The feature selection method using DDR was embedded in NEWFM. We performed experiments by removing features one by one according to the result of DDR for each iteration. Initially, all 22 features were used for classification, resulting in 22 DDRs being obtained. Figure 3(a) and 3(b) represent the BSWFMs with the best and worst DDRs, respectively.

Fig. 3

The example of DDRs(x axis means range of r_i,j, y axis means result of weighted fuzzy membership function).

In Fig. 3(a), a pair of BSWFMs has the best DDR. It has a higher value of ${\tilde{μ}}_{A ø B}$ ratio on ${\tilde{μ}}_{A \cap B}$ and lower value of $| {\tilde{μ}}_{A ⊖ B} - {\tilde{μ}}_{B ⊖ A}$ ratio on ${\tilde{μ}}_{A \cap B}$ than the pair of BSWFMs in Fig. 3(b). Moreover, Fig. 3(b) produces the smallest value among results when subtracting the variable “Worst” from the variable “Best” for each feature according as “Procedure for DDR Feature Selection”. Consequently, the feature used in Fig. 3(b) is determined to be the one that needs to be removed. In the second iteration of the experiment, after the removal of that feature, the experiment continues being performed on 21 features. The experiment continues by reducing the number of features in each iteration. Table 1 displays the results of the experiment. Even with reduced features, the training accuracy remains almost unchanged. We performed the test using ten-fold cross-validation method. The highest average accuracy obtained in the test was 93.98% when testing 19 features.

Table 1

The results of training and test after selecting features for the experiment

The number of features	Training Acc (%)	Test Acc (%)
22	98	91.81
20	97.95	86.55
19	97.95	93.98
17	97.95	89.47

Our experimental results compared with [17] and [18] are shown in Table 2. The result of the comparison demonstrates that the degree of accuracy in the proposed method was higher than those obtained for the other methods.

Table 2

Comparison of accuracy for DDR and the other two methods

Classifier	Feature selection method	Test method	Training accuracy (%)	Test accuracy (%)
SVM [Little 09]	Based on probability density	Bootstrap	91.40
Minimum Distance Classifier [Guo 10]	GP-EM detector	Ten-fold CV	95.06	93.12
NEWFM	DDR	Ten-fold CV	97.95	93.98

These results demonstrated that the features selected by DDR have a greater effect on accuracy than the removed features, thereby demonstrating that DDR works well for feature selection. DDRs are automatically generated from BSWFMs in NEWFM during testing. They provide a logical and mathematical basis for automatically extracting features. Moreover, DDR is based on subnormal and non-convex fuzzy functions. There have been no previous experiments with subnormal and non-convex fuzzy functions. The experiment in [17] was performed based on data distribution using probabilities. However, experiments in this paper demonstrate results based on real data through learning rather than from the statistical method. The experimental results are significant as the use of actual data for classification improves accuracy by reflecting real meaning of data. The non-convex fuzzy function is similar to the graph drawn by the probability density function, but it may be more finely and nonlinearly adjusted by learning.

5 Conclusion

In this paper, we propose DDR for feature selection that is a distance measure method of subnormal and non-convex BSWFMs. After each iteration, DDRs were calculated from the overlapped BSWFMs generated for each feature, and the features were then ranked according to those DDR values. Some features were removed by their rank, and the next iteration was then performed on the remaining features. The iterations were repeated until the accuracy did not improve.

The proposed feature selection method was compared with two other methods, one based on probability density and the other on genetic programming.

The results of the experiment demonstrated that 97.95% training accuracy was maintained even if the features were reduced. The accuracy of the test set was 93.98%, which was highest when using 19 features. Moreover, compared with the other method, the method using DDR demonstrated higher accuracies for training and test of 97.95% and 93.98%, respectively.

We observed that the accuracy was maintained even though the features were reduced in our experiment, indicating that valid features were selected using feature selection via DDR. The neuro-fuzzy classifier, NEWFM, which generates a linear number of fuzzy rules, produces the same number of BSWFMs as the number of classes per feature. This is the subnormal and non-convex fuzzy set resulting from the experimentation. The subnormal and non-convex fuzzy set can include more information and are more natural than the convex fuzzy set because it can be adjusted more finely and linearly by learning enough to reflect the precise distribution of the actual data. Specifically, this means that there is a reduction in error through natural adjustment. The concept of difference area and distance area that are generated from BSWFMs of classes is based on the distribution of data included in each class. Therefore, it is very reasonable and logical that these concepts can be used for selecting the features and classifying data. We believe that we have demonstrated this via experimentation.

This study may impact the study of automatic feature extraction techniques that reflect the characteristics of the data itself within various types of datasets. It can be implied that the experiment proved the effect on the accuracy of the non-convex fuzzy set containing rich aggregated information.

Footnotes

Acknowledgments

This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2021-2017-0-01630) supervised by the IITP (Institute for Information & communications Technology Promotion).

This research was supported by the Bio & Medical Technology Development Program of the National Research Foundation (NRF) funded by the Ministry of Science & ICT (2017M3A9E2072689).

References

Zadeh

L.A.

, Fuzzy sets, Information and Control8 (1965), 338–353.

Jang

J.S.R.

, ANFIS: adaptive-network based fuzzy inference system, IEEE Trans. Syst. Man and Cybernetics23(3) (1993), 665–685.

Zadeh

L.A.

, A fuzzy-set –theoretic interpretation of linguistic hedges, Journal of Cybernetics2(3), 4–34.

Cetisli

Bayram

, Development of an adaptive neuro-fuzzy classifier using linguistic hedges: part 1, Expert Syst. Appl.37(8) (2010), 6093–6101.

Cetisli

Bayram

, The effect of linguistic hedges on feature selection: Part 2, Expert Systems with Applications37(8) (2010), 6102–6108.

Guyon

Isabelle

and Elisseeff

Andre

, An introduction to variable and feature selection, Journal of Machine Learning Research3 (2003), 1157–1182.

Liu

and Yu

, Towards integrating feature selection algorithms for classification and clustering, IEEE Transactions on Knowledge and Data Engineering17(4) (2005), 491–502.

Saeys

, Inza

and Larranaga

, A review of feature selection techniques in bioinformatics, Bioinformatics23(19) (2007), 2507–2517.

Zwick

, Carlstein

and Budescu

D.V.

, Measures of similarity among fuzzy concepts: A comparative analysis, International Journal of Approximate Reasoning1(2) (1987), 221–242.

10.

Nahook

Hassan Nosrati

and Eftekhari

Mahdi

, A feature selectionmethod based on ∩ - fuzzy similarity measures using multiobjective genetic algorithm, International Journal of SoftComputing and Engineering (IJSCE)3(2) (2013), 37–41.

11.

Zadeh

L.A.

, “Calculus of fuzzy restrictions,” in Fuzzy Sets and TheirApplications to Cognitive andDecision Processes L.A. Zadeh, K. Fu, K. Tanaka, and M. Shimura (Eds.), Academic Press, New York, 1975.

12.

Piramuthu

, Evaluating feature selection methods for learning indata mining applications, Eur. J. Oper. Res.156(2004), 483–494.

13.

Lim

J.S.

, Finding features for real-time premature ventricular contraction detection using a fuzzy neural network system, IEEE Trans. on Neural Networks20(3) (2009), 522–527.

14.

Zadeh

L.A.

, Outline of a new approach to the analysis complex systems and decision processes, IEEE Trans. Syst. Man. Cybernet.3 (1976), 28–44.

15.

Giles

, Lukasiewicz logic and fuzzy Theory, Int. J. Man-Mach. Studies8 (1976), 313–317.

16.

UCI Reposotory of Machine Learning. http://archive.ics.uci.edu/ml/datasets/Parkinsons/

17.

Little

Max A.

, Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease, IEEE Trans. Biomed. Eng.56, 1015–1022.

18.

Guo

Pei-Fang

and Bhattacharya

Prabir

, Naw waf Kharma, Advances in Detecting Parkinson’s Disease. ICMB 2010 LNCS, 6165 (2010), 306–314.

19.

Kosko

, Neural Networks and Fuzzy Systems: A Dynamical Systems Approach to Machine Intelligence, Prentice Hall, Englewood Cliffs, New Jersey, 1992.

20.

Feature selection for machine learning classification problems: a recent overview

21.

Little

Max A.

, McSharry

Patrick E.

, Hunter

Eric J.

, Spielman

Jennifer

and Ramig

Lorraine O.

, “Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease,’’, IEEE Trans. Biomedical Engineering56 (2009), 1015–1022.

22.

Xue-cheng

LIU

, Entropy, distance measure and similarity measure of fuzzy sets and their relations, Fuzzy Sets and Systems52 (1992), 305–318.

23.

Newman

D.J.

, Hettich

, Blake

C.L.

and Merz

C.J.

, UCI Repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science. (2007).

24.

Svec

, Popolo

and Titze

, Measurement of vocal doses in speech: experimental procedure and signal processing, Logoped Phoniatr Vocol28 (2003), 181–192.

25.

Takagi

and Sugeno

, Fuzzy identification of systems and its applications to modeling and control, IEEE Trans., Syst. Man, Cybern15 (1985), 116–132.

26.

Newman

and Liu

E.T.

, Perspective on BRCA1, Breast Disease10 (1998), 3–10.

27.

Pilkey

D.F.

, Happy conservation laws, in: Neural Stresses, J. Frost, ed., Controlled Press, Georgia, 1995, pp. 332–391.

28.

Wilson

, Active vibration analysis of thin-walled beams, Ph.D. Dissertation, University of Virginia, 1991.