Affective state recognition using audio cues

Abstract

This paper presents a technique to detect the six affective states of individual using audio cues. Bi-spectral features extracted from entire speech signal and voiced part of speech are used to create feature vectors. For classification K-Nearest Neighbor (KNN) and Simple Logistic Classifiers (SL) are used. eNTERFACE audio-visual emotional speech corpus that consists of six archetypal affective states: Fear, Anger, Disgust, Sad, Happy, and Surprise is considered. The performance of the system is analyzed based on features obtained from voiced part of speech and features obtained from the entire speech signal. The work proposed is first of its kind in affect computation, where a compact 13-dimensional Bi-spectral features extracted from the voiced speech segments is able to yield promising performance. A considerable improvement of 8.46% – 27.6% recognition rate is achieved with the proposed methodology compared to the existing approaches using emotion samples from the same speech corpus adding novelty to the proposed work.

Keywords

Bi-spectral voiced speech affective state recognition

1. Introduction

Affect is a physiological term used to describe feeling or emotion. Affective state of a human is characterized by the change in the feeling tone and by physiologic change in behavior. The factors that are responsible for change in affective state are self-esteem, confidence, abuse, physical ill health, family loss or breakup etc. [1]. There are different modalities that can be used to recognize the affective state of a human. They are speech, facial expressions, blood volume pulse, facial electromyography, galvanic skin response, visual aesthetics and their amalgamations can be used [2, 3]. Among these modalities speech is widely used since it provides information about human’s affective state as well as his demands and intentions. The applications of our proposed work include medicine, e-learning, monitoring, entertainment, law, marketing etc. In medicine, it can be used in analyzing the patient’s feelings about the treatment and understanding the client’s affective state during the counseling and based on that analysis better treatment and counselling can be provided.

There are four main units in speech emotion realized system viz. Speech Corpus, Pre-processing, Feature Extraction and Classification. The first unit is the Speech Corpus that consists of a collection of speech samples of different emotions viz. anger, disgust, fear, happiness, sad, surprise etc. There are many speech corpus available on human’s affective state such as AFEW speech corpus, SAVEE speech corpus and eNTERFACE [4]. Preprocessing unit consists of windowing, framing and pre-emphasis [3]. We use Pre-emphasis to enhance the high frequencies in the signal. Feature extraction unit involves extraction of various speech features such as wavelet features, spectral features and temporal features. Features like pitch, short-time energy, Zero crossing Rate (ZCR) etc. are time domain features. Mel Frequency Cepstral Coefficients (MFCC), Bi-spectral, spectral roll off etc. are frequency domain features [5, 6]. Classification is the final unit. The input to the classifier are the extracted features, which determine the affective state of the speech signal. Few examples of classifiers are Simple Logistic, K-Nearest Neighbors (KNN), and Sequential Minimal Optimization (SMO) [7]. Usage of compensation technique’s like Cepstral Mean Normalization, Speaker and stressed information based compensation leads to improvement in the recognition rate [8].

Various methods are proposed in literature to extract the affective state of human. H. Hermansky and N. Morgan studied the affective state recognition and extracted the relative spectral features (RASTA) based upon the perceptual linear prediction (PLP) using Hidden Markov Model (HMM) based Gaussian Mixture Model [9]. M. Swain et al. presented a method on extracting various prosodic features like pitch, Zero crossing, Teager Energy Operator, Jitter, Shimmer, log energy and MFCC features using Hidden Markov Model (HMM) and Support Vector Machine (SVM) [10]. Schuller, B. et al. [11] suggested an approach to extract Low-Level Descriptors combinations using Support Vector Regressor (SVR) for further empirical analysis. Metallinou et al. [12] presented an approach to extract Mel-Frequency Cepstral Coefficients, Mel-Frequency bank Coefficients, pitch, energy and first derivatives using Hidden Markov Model (HMM) for further practical analysis. Eyben et al. [13] did further pragmatic analysis and proposed an approach to extract acoustic LLD’s using Long Short Term Memory-Recurrent Neural Network (LSTM-RNN). Sayedelahl et al. [14] recommended an approach to extract Short-time energy, MFCC and Fundamental Frequencies using SVR using radial basis function kernel for better analysis. Rosas et al. [15] suggested an approach to extract Pause duration, pitch, intensity and loudness using SVM with Linear Kernel for further analysis.

This work aims to recognize the affective state of a person using Bi-spectral features from the entire speech and from voiced part of speech. Simple Logistic and K-Nearest Neighbor classifiers are used for classification. An insights into the system performance with the use of each classifier on the Bi-spectral feature set is analyzed.

The paper has been organized in following manner: Section 2 discuses about the proposed algorithm, Section 3 gives the experimental background, Section 4 explains the experimental results and analysis, Section 5 provides the Comparison with the existing works and Section 6 gives conclusion and future directions.

2. Proposed framework for affective computing using audio cues

Proposed model consists of three stages viz. Preprocessing, Extraction of Features and Classification. The illustration of proposed work using features determined from the entire speech and from voiced part of the speech signal is shown in Fig. 1.

Fig.1

Block diagram of proposed model (a) Using Entire Speech signal (b) Using Voiced Speech.

2.1 Speech corpus

In order to create a new document, do the following: eNTERFACE is the corpus considered in this work. It is an audio-visual corpus with emotion samples from six affective states viz. anger, disgust, fear, happy, sad and surprise. The speech corpus consists of emotion samples recorded from forty two subjects from fourteen different countries. Here, audio samples are extracted from the video. It consists of 1293 audio samples. The affective states anger, disgust, fear, sad and surprise contains 216 audio samples and happy state contains 213 audio samples. These recording are done with 44.1 kHz sampling frequency [16].

2.2 Pre-processing

Voiced sections from the speech signal are extracted in the pre-processing stage. The vibration of vocal cords produces the voiced speech. As observed in Fig. 2, voiced part of speech will have periodic nature.

Fig.2

(a) Entire speech signal (b) Voiced speech.

Pitch parameter is used to extract the voiced speech. Pitch determines the sound quality controlled by the vibrations causing it and plays a prominent role in the detection of voiced section in speech. If pitch can be determined from a speech segment, it indicates that the segment is voiced. The voiced part of the speech signal is identified by finding the endpoints using Zero Crossing Rate and Energy. Zero-crossing rate plays a prominent role in voiced/unvoiced classification. It indicates the rate of change of signals from positive to negative and vice versa. Zero crossing rate is less for voiced speech whereas high for unvoiced. Energy calculated from speech is capable of determining the voiced/unvoiced portions in speech. The voiced speech has high energy whereas the unvoiced speech has low energy. The speech signal is divided into segments using framing. Then, the signal is multiplied with a hamming window to reduce the discontinuities present in each ends of the segments. Next, the average power of each frame is computed to find the Short time Energy (STE). A high Short Time Energy and a low Zero Crossing Rate indicates that the segment of speech is voiced [17].

The flow process for extracting the voiced speech segments is depicted in Fig. 3. Initially, framing is done to divide the speech signal into several segments. Hamming window with size 32 ms is applied to smoothen edges of each frame, while the sampling frequency is 44.1 kHz. Thus, each frame consists of 1411 samples. Next, to find the voiced part of the speech, endpoints are detected. The endpoint detection is performed to obtain the beginning and end of relevant partitions and it is accomplished by calculating the Zero-Crossing Rate and Energy from the signal.

Fig.3

Block diagram for extracting voiced part of speech.

2.3 Feature extraction

The bi-spectrum is a two-dimensional Fourier transform of the cumulant function of order three. $P (f 1, f 2) = E [X (f 1) X (f 2) X * (f 1 + f 2)]$ (1) where the Bi-spectrum of frequencies (f1, f2) is P (f1, f2), X (f) represents the Fourier transform, * denotes complex conjugate, E[.] represents expectation of the operation [18]. The Bi-spectrum of speech signal contains redundant data. Hence Bi-spectral features are chosen from non-redundant region (Ω) shown in Fig. 4.

Frequencies shown in Fig. 4 are normalized by the Nyquist frequency. Equations (2 to 14) represents the procedure to derive Bi-spectral features.

Fig.4

Non-redundant region.

Mean magnitude of Bi-spectrum is given by Equation (2) $Amp = (1 / n) * \sum_{Ω} | P (f 1, f 2) |$ (2) where ‘n’ represents the number of points present in that region [19]. Weighted center of Bi-spectrum (WCOB) is calculated using Equations (3 and 4). $g_{1 m} = \frac{\sum_{Ω} l * P (l, m)}{\sum_{Ω} P (l, m)}$ (3) $g_{2 m} = \frac{\sum_{Ω} m * P (l, m)}{\sum_{Ω} P (l, m)}$ (4) $g_{3 m} = \frac{\sum_{Ω} l * | P (l, m) |}{\sum_{Ω} | P (l, m) |}$ (5) $g_{4 m} = \frac{\sum_{Ω} m * | P (l, m) |}{\sum_{Ω} | P (l, m) |}$ (6) where l and m, gives the frequency bin index present within that region, where g_1m, g_2m, denotes WCOB and g_3m, g_4m denotes the absolute values of WCOB [20].

Entropy denotes the irregularity or the regularities in the signal. In this study, the entropy features are computed using Equations (7–9) [20]. $E 1 = - \sum_{n} b_{i} * log b_{i} where b_{i} = \frac{| P (f 1, f 2) |}{\sum_{Ω} | P (f 1, f 2) |}$ (7) $E 2 = - \sum_{n} b_{i} * log b_{i} where b_{i} = \frac{| P (f 1, f 2) |^{2}}{\sum_{Ω} | P (f 1, f 2 |^{2}}$ (8) $E 3 = - \sum_{n} b_{i} * log b_{i} where b_{i} = \frac{| P (f 1, f 2) |^{3}}{\sum_{Ω} | P (f 1, f 2 |^{3}}$ (9)

The summation of log of amplitudes of Bi-spectrum is derived by: $T 1 = \sum_{Ω} log (| p (f 1, f 2) |)$ (10)

The summation obtained for the log of amplitudes of diagonal elements in Bi-spectrum is derived by: $T_{2} = \sum_{Ω} log (| p (fd, fd) |)$ (11)

The diagonal elements amplitude for the spectral moments of order one and two is derived by: $T_{3} = \sum_{d = 1}^{N} d * log (| p (fd, fd) |)$ (12) $T_{4} = \sum_{d = 1}^{N} (d - T 3)^{2} * log (| p (fd, fd) |)$ (13) $T_{5} = \sum_{Ω} \sqrt{l^{2} + m^{2}} * | p (l, m) |$ (14)

Bi-spectral features are obtained from the entire speech and from voiced part of speech signal. Thirteen features are extracted from the entire speech signal and thirteen features are extracted from the voiced part of the speech signal from the each recording in the corpus and results are compared. The objective behind extracting only 13 features is to increase the affective state recognition with minimum features.

2.4 Classification

The classification is performed using Simple Logistic and K-Nearest Neighbor classifier.

2.4.1 Simple logistic

Logistic classifiers are non-symmetric models. Simple Logistic classifier yields better result when signal to noise ratio is less. The Simple Logistic regression models has numerical input variables and Gaussian distribution. Simple Logistic classifier performs regularization using a ridge estimator. Simple Logistic classifier merge the coefficients into a regression function and transforms it using logistic function and those coefficients are used to understand the model during training and then are minimized in order to reduce its size. Simple Logistic classifiers are multi nominal logistic regression models and are symmetric. Simple Logistic classifier have a built in attribute selection technique [5].

2.4.2 K-Nearest Neighbors (KNN)

KNN classifier classifies instances based upon similarity between the instances and easily interprets the output while consuming less computational time. It is a widely used algorithm to recognize patterns. It is a Lazy learning algorithm where the local approximation of the function is done [21].

KNN algorithm can be used to estimate continuous variables. The weighted average of inverse distance of KNN is used for one of the implementation. The algorithm is as follows.

KNN classifier computes the Euclidean distance between the target plot and samples.

KNN orders sample based on the calculated distances.

KNN chooses its heuristically optimal K nearest neighbor based upon the calculation of root mean square.

KNN calculates weighted average of the inverse distance using k-nearest multivariate neighbor.

3. Experimental setup and evaluation metrics

Thirteen Bi-spectral features are derived from Voiced speech and entire speech signal. The thirteen features form the feature vector. The total dimension of the input given to classifier is 1293*13. Recall, Precision, False Positive rate (FP), F-measure, True Positive rate (TP) and Receiver Operating Characteristics are the performance metrics considered for evaluating the performance of the system.

Recall conveys information about the number of instances which are pertinent, retrieved among the complete set of pertinent instances. It indicates recognition rate of classification.

Precision is also called as Positive predictive value (PPV).Precision is a ratio between TP and predicted positive. $Precision = \frac{TP}{predicted positive}$ (15)

Harmonic average of recall and precision is called as F-measure. $F = 2 * \frac{precision * recall}{precision + recall}$ (16)

FP rate determines the probability of falsely rejected instances. TP rate determines the probability of truly predicted instances. Receiver Operating Characteristic (ROC) curve is a plot of FP and TP rate at various thresholds.

The ROC curve for disgust state is given in Fig. 5. Since the curve is aligned towards the upper left corner, it indicates that the classifier have better predicting capability.

Fig.5

Example of ROC curve.

Classification is performed using weka tool kit. The analysis is done with five-fold cross validation. The feature vectors computed for voiced part of speech are labeled as Feature Set-1 and the feature vectors computed for the entire speech signal is labelled as Feature Set-2 respectively.

4. Experimental results and analysis

Firstly, the affective state of human is obtained with Feature Set-1 classified with Simple Logistic and K-Nearest Neighbor. The performance achieved by each classifier is evaluated using six performance metrics discussed in the earlier section. In order to understand the effect of voiced part of speech, Bi-spectral features are extracted from the entire speech signal i.e. without separating the voiced part of speech signal (Feature Set-2). Results for Feature Set-1 and Feature Set-2 are computed individually and tabulated in Table 1.

Table 1
Performance Evaluation using proposed algorithm

Performance metrics Classifier

K-Nearest Neighbor Simple Logistic

Feature Set-1 Feature Set-2 Feature Set-1 Feature Set-2

Recall 84.7% 23.0% 82.83% 28.2%

Precision 84.7% 22.9% 83.1% 22.7%

F-measure 84.7% 23.0% 82.8% 21.3%

TP rate 84.7% 23.0% 82.8% 28.2%

FP rate 3.1% 15.4% 3.4% 14.4%

ROC area 91.7% 53.4% 93.8% 62.6%

Performance metrics	Classifier
Recall	84.7%	23.0%	82.83%	28.2%
Precision	84.7%	22.9%	83.1%	22.7%
F-measure	84.7%	23.0%	82.8%	21.3%
TP rate	84.7%	23.0%	82.8%	28.2%
FP rate	3.1%	15.4%	3.4%	14.4%
ROC area	91.7%	53.4%	93.8%	62.6%

An improvement of 56.5% in Recognition rate / Recall is achieved using Feature Set-1 compared to Feature Set-2. Also, KNN gives better performance with Feature Set-1 and Simple Logistic is suitable for Feature Set-2.

To understand distribution and confusion between each affective state, confusion matrix for the best results obtained with Feature Set-1 is tabulated in Table 2 and with Feature Set-2 in Table 3.

Table 2

Confusion matrix for Feature Set-1 using KNN Classifier

Affective state	a	b	c	d	e	f
a = Angry	181	24	6	3	0	2
b = disgust	17	166	17	15	1	0
c = fear	4	13	184	15	0	0
d = Happy	2	7	17	174	13	0
e = sad	0	2	0	9	190	15
F = surprise	0	1	0	0	12	203

Table 3

Confusion matrix for Feature Set-2 for Simple Logistic Classifier

Affective state	a	b	c	d	e	f
a = Angry	129	10	9	9	26	33
b = disgust	74	4	7	17	67	47
c = fear	64	1	5	9	94	43
d = Happy	112	2	16	13	31	39
e = sad	17	2	6	4	152	35
F = surprise	76	2	8	11	57	62

Table 2 states that out of the six basic affective states, surprise state has achieved higher recognition rate of 94.4% and disgust being the least recognized affective state with a recognition rate of 76.9%.

Table 3 states that out of the six basic affective states, sad state achieved higher recognition rate and disgust is the least recognized affective state. Results obtained using Feature Set-1 showed a phenomenal improvement when compared with Feature Set-2. The change observed in the recognition rate of each affective state using Feature Set-1 and Feature Set-2 is shown in Fig. 6.

Fig.6

Bar graph showing performance comparison of affective states.

Recognition rate of each affective state using Feature Set-1 is better compared to Feature Set-2. This implies that the Bi-spectral features extracted from voiced part of speech is more capable of detecting the affecting states when compared with the features obtained from the entire speech.

5. Comparison with state of the art

The comparison between proposed model and various affective state recognition techniques using the same speech corpus is presented in Table 4.

Table 4
Comparison of proposed work with previous works

Authors Features Extracted Gender independent Recognition Rate (%)

S. Sahoo et al. [22] MFCC NO Male-57 Female51

H. Fayek et al. [23] LLD YES 60.53

J. Yan et al. [24] INTERSPEECH-2010 YES 76.23

S. Zhalehpour et al. [25] MFCC and PLP YES 72.95

Gajsek et al. [26] LLD, MFCC, HNR, ZCR YES 62.9

Proposed work BI-SPECTRAL YES 84.68

Authors	Features Extracted	Gender independent	Recognition Rate (%)
S. Sahoo et al. [22]	MFCC	NO	Male-57 Female51
H. Fayek et al. [23]	LLD	YES	60.53
J. Yan et al. [24]	INTERSPEECH-2010	YES	76.23
S. Zhalehpour et al. [25]	MFCC and PLP	YES	72.95
Gajsek et al. [26]	LLD, MFCC, HNR, ZCR	YES	62.9
Proposed work	BI-SPECTRAL	YES	84.68

It is observed that the proposed method shows an improvement in the recognition rate of 8.46% – 27.6% in comparison with the existing works. Also, the work by J. Yan et al. [24] achieved promising rate of recognition comparatively.

Thus, the performance obtained for each affective state using proposed model and model proposed by J. Yan et al. [24] are compared in Fig. 7. There has been a change in rate of recognition for disgust, happy, fear and surprise states. Higher order statistics (HOS) reveal more information about Gaussian and non-linearity which cannot be obtained using second order techniques. Thus, the proposed system can be used to achieve an improvement in the recognition rate of disgust, happy, fear and surprise states.

Fig.7

Bar graph showing recognition rate comparison of each affective state for proposed model and J. Yan et al. [24].

6. Conclusion and future directions

An investigation on affective state detection using the eNTERFACE speech corpus using Bi-spectral features from entire speech signal and voiced speech segments with KNN and Simple Logistic Classifier are presented. Five-fold cross validation is used for classification. Voiced speech using KNN classifier obtained high recognition rate compared to Simple Logistic. The classification results for proposed model using features derived from voiced segments is 84.7% showing an improvement of 8.46% – 27.6% compared to the previous works on the same speech corpus. Thus, the method under consideration achieved better recognition rate even with the compact set of features proving the robustness of the Bi-spectral features.

This work can be applied to analyze the patient’s response to the treatment and to understand the client’s affective state during counseling. Based on this analysis, better treatment and counselling can be provided to the patients.

In future, the speech can be combined with other modalities like facial expression, galvanic skin response and skin temperature to detect the affective state. The addition of Bi-coherence features and extraction of Bi-spectral and Bi-coherence features from glottal signals can be used to increase the system performance. Deep learning architectures can be applied for classification. Finally, this work can be analyzed using other speech corpus comprising other affective states like boredom, trust, anticipation etc.

References

http://canwetalk.ca/about-mental-illness/factors-affectingmental-health/

J.V.

Sloten ,

Verdonck ,

Nyssen and

Haueisen , Influence of mental stress on heart rate and heart rate variability, International Federation for Medical and Biological Engineering Proceedings, 2008, pp. 1366–1369.

Bakker ,

Pechenizkiy and

Sidorova , What's your current stress level? Detection of stress patterns from GSR sensor data, In Proceedings of ICDM, 2011, pp. 573–580.

C.H.

Wu ,

J.C.

Lin and

W.L.

Wei , Survey on audiovisual emotion recognition: Databases, features, and data fusion strategies, APSIPA Trans Sig Inf Process, 2014.

Lalitha ,

Mudupu ,

B.V.

Nandyala and

Munagala , Speech emotion recognition using DWT, in Proc Int Conf Comput Intell Comput Res Madurai, India, 2015, pp. 20–23.

Lalitha and

Tripathi , Emotion detection using perceptual based speech features, 2016 IEEE Annual India Conference (INDICON), Bangalore, 2016, pp. 1–5.

Sonia ,

S.D.

Peter and

Poulose Jacob , Performance of different classifiers in speech recognition, International Journal of Research in Engineering and Technology 2(4) (2013), 590–597.

Senthil Raja and

Dandapat , Speaker recognition under stressed condition, Int J Speech Technol 13 (2010), 141.

Hermansky and

Morgan , Rasta processing of speech, IEEE Transactions on Speech and Audio Processing 2(4) (1994), 578–589.

10.

Swain ,

Sahoo ,

Routray ,

Kabisatpathy and

J.N.

Kundu , Study offeature combination using HMM and SVM for multilingual Odiya speech emotion recognition, Int J Speech Technol 18(3) (2015), 1–7.

11.

Schuller ,

Valstar ,

Eyben ,

Cowie and

Pantic , AVEC 2012 - the continuous audio/visual emotion challenge, in Proc of Int Audio/Visual Emotion Challenge and Workshop (AVEC), ACM ICMI, 2012.

12.

Metallinou ,

Wollmer ,

Katsamanis ,

Eyben ,

Schuller and

Narayanan , Context-sensitive learning for enhanced audiovisual emotion classification, IEEE Trans Affective Comput (2012), 184–198.

13.

Eyben ,

Petridis ,

Schuller and

Pantic , Audiovisual vocal outburst classification in noisy acoustic conditions, ICASSP, 2012, pp. 5097–5100.

14.

Sayedelahl ,

Araujo and

M.S.

Kamel , Audio-visual feature-decision level fusion for spontaneous emotion estimation in speech conversations, Int Conf Multimedia and Expo Workshops, 2013, pp. 1–6.

15.

V.P.

Rosas ,

Mihalcea and

L.P.

Morency , Multimodal sentiment analysis of Spanish online videos, IEEE Intell Syst, 2013, pp. 38–45.

16.

Martin ,

Kotsia ,

Macq and

Pitas , The eNTERFACE' 05 Audio-Visual Emotion database, 22nd International Conference on Data Engineering Workshops (ICDEW'06), Atlanta, GA, USA, 2006, pp. 8–8.

17.

Bachu ,

Kopparthi ,

Adapa and

Barkana , Voiced/Unvoiced Decision for Speech Signals Based on Zero-Crossing Rate and Energy, Elleithy K. (eds), Advanced Techniques in Computing Sciences and Software Engineering, Springer, Dordrecht, 2010.

18.

Muthuswamy ,

D.L.

Sherman and

N.V.

Thakor , Higherorder spectral analysis of burst patterns in EEG, Biomedical Engineering, IEEE Transactions on 46 (1999), 92–99.

19.

T.-T.

Ng ,

S.-F.

Chang and

Sun , Blind detection of photomontage using higher order statistics, International Symposium on Circuits and Systems, 2004, IEEE, Vol. 685, 2004, pp. V688–V-691.

20.

Du ,

Dua ,

R.U.

Acharya and

C.K.

Chua , Classification of epilepsy using high-order spectra features and principle http://canwetalk.ca/about-mental-illness/factors-affecting-mental-health/component analysis, Journal of Medical Systems 36 (2012), 1731–1743.

21.

R.K.

Gowda ,

Nimbalker ,

Lavanya ,

Lalitha and

Tripathi , Affective computing using speech processing for call centre applications, 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, 2017, pp. 766–771.

22.

Sahoo and

Routray , Emotion recognition from audiovisual data using rule based decision level fusion, 2016 IEEE Students' Technology Symposium (TechSym), Kharagpur, 2016, pp. 7–12.

23.

Fayek ,

Lech and

Cavedon , Towards Real-time Speech Emotion Recognition using Deep Neural Networks, in ICSPCS, Cairns, Australia, 2015, pp. 1–6.

24.

Yan ,

Zheng ,

Xu ,

Lu ,

Li and

Wang , Sparse kernel reduced-rank regression for bimodal emotion recognition from facial expression and speech, IEEE Transactions on Multimedia 18(7) (2016), 1319–1329.

25.

Zhalehpour ,

Onder ,

Akhtar and

ErogluErdem , BAUM-1: A spontaneous audio-visual face database of affective and mental states, IEEE Transactions on Affective Computing 8(3) (2017), 300–313.

26.

Struc ,

Mihelic , et al., Multi-modal emotion recognition using canonical correlations and acoustic features, Pattern Recognition (ICPR), International Conference on IEEE, 2010, pp. 4133–4136.

Performance metrics	Classifier
	K-Nearest Neighbor		Simple Logistic
	Feature Set-1	Feature Set-2	Feature Set-1	Feature Set-2
Recall	84.7%	23.0%	82.83%	28.2%
Precision	84.7%	22.9%	83.1%	22.7%
F-measure	84.7%	23.0%	82.8%	21.3%
TP rate	84.7%	23.0%	82.8%	28.2%
FP rate	3.1%	15.4%	3.4%	14.4%
ROC area	91.7%	53.4%	93.8%	62.6%