Fuzzy logic based associative classifier for slow learners prediction

Abstract

Education is a collective intelligence system where a group of persons ranging from students to management thinks and work together to achieve institutions’ goals. The primary goal of every institution is to accomplish excellent end-semester examination results. A good result is achieved through proper training given by the educators and in response to the performance of students in the examination. Training is cost accounting, whereas students’ performance is unpredictable. Outlier analysis in the education system has been stipulated in recent decades to predict the students’ uncertain behavior in learning activities which are utilized to alert the education systems. Fuzzy Logic System can handle such uncertainties in learning activities. The major issues that affect the accuracy of fuzzy based outlier detection methods are fixing appropriate membership function and validating the fuzzy rules before extracting outliers. To remedy these issues the proposed Fuzzy Temporal Outlier Detection (FTOD) method detects outliers from mid-semester examination results using fuzzy logic based associative classifier with optimal membership functions. The resultant outliers distinguish the slow learners from spurious-slow learners with high accuracy than the existing FARIM and modified-FARIM algorithms. Thus, educators can provide cost-effective training to enrich the slow learners’ cognition to score high in end-semester examinations.

Keywords

Education system outlier detection fuzzy association rules optimal membership function associative classification lift measure

1. Introduction

Collective Intelligence (CI) is a shared or group intelligence that emerges by coordinating and collaborating the efforts and competition among many individuals. CI can be used to solve many societal problems ranging from conventional domains like Medicine and Education to the latest business fields like Facebook and Trading. The CI in the education system, which is achieved through the collective efforts of various stakeholders like students, educators and management, is used to obtain the institutional goals and to compete with other institutions.

The fundamental goal of an education system is to achieve excellent examination results economically. The most important learning objects of the education system like training and assessments will help the educators to understand how the learning behavior of the students affects the end-semester examination results [1, 2]. Few of the existing systems study the students’ performance in the end-semester examinations [1 , 4] is helpful to take a remedial measure. But “Prevention is better than cure”. Hence, the proposed method Fuzzy Temporal Outlier Detection (FTOD) considers the students’ behavior and performance in mid-semester examinations rather than end-semester examinations.

Weak students always fail in most of the courses and have to work hard to pass. But, slow learners are the students who fail/pass in few courses of all the mid-semester examinations in the border level and can be motivated to pass through proper training. The spurious-slow learners are the students who fail intentionally in a few courses of one or two mid-semester examinations because of their lethargic activities. In the current trend, the attitude of the students during the mid-semester examinations is not as serious as their focus on end-semester examinations. Hence, there occurs a group of spurious-slow learners in the performance of mid-semester examinations, which will mislead the education system in achieving its goal. Anyhow, the educators have to provide ample training for the slow learners and the weak students to achieve good results. Training for them is a must. But, the spurious-slow learners usually do not support such kind of training due to their attitude. Hence, to provide the training cost-effectively, the educators have to segregate the actual slow learners from spurious-slow learners.

Outliers are the small group of observations (e.g., slow learners) which deviate from the major group of normal observations (e.g., average learners) in a data set and seem to belong to a different mechanism [5, 6]. The outliers are low frequent patterns and maybe Noise (e.g., spurious-slow learners) or Anomaly (e.g., actual slow learners) which will be classified as weak and strong outliers respectively based on the deviation (outlier score) from the normal observations. “Usually, Noise possesses less outlier score than Anomaly [7]”.

In the current decade, there is an escalating demand among the researchers in performing outlier analysis on temporal data like educational data to study the unexpected behavior of the students in various learning activities. Education data are temporal, quantitative (numeric) and fuzzy in nature and has to be subjected to partition (categorization) before performing the mining process [8]. Such partitioning will lead to uncertain values near the boundaries [9]. The uncertainty in the boundary values (e.g., overlapping of slow learners and average learners) while partitioning the numerical or quantitative data will be handled by Fuzzy Logic System (FLS) realistically [10, 11]. Hence, FLS is helpful in segregating or classifying the slow learners from the average learners and spurious-slow learners almost at a clear-cut boundary.

Since FLS adopts “IF-THEN” rules, they become the extension of conventional rule-based systems where the antecedent and the consequent part of the rules constitute a collection of fuzzy logic statements instead of real or categorical values. Association rules are basic and foremost conventional rule-based system used to understand the relationship between the antecedent and the consequent [12]. Making use of Fuzzy Association Rule (FAR) to extract outliers from education data has to consider the issues like

Fixing appropriate Membership Function (MF) a priori, to handle the uncertainty in the boundary values while partitioning the numerical data into ranges before the mining process.

Arriving required predefined threshold values (varies based on applications and data distribution) for the interesting measures like support and confidence to generate infrequent FARs.

Performing validation on enormously generated infrequent FARs before extracting outliers to achieve high classification accuracy.

Fixing appropriate outlier measure (varies from application to application) to extract outliers from the infrequent FARs.

The FTOD make use of the following special features to handle the issues listed above.

Based on the distribution of the marks scored by the students in each course, the proposed algorithm dynamically generates the optimal MFs, which are unique for each numerical attributes, i.e., for each course. Up to our knowledge, this approach is not suggested by any of the authors.

The threshold values for all the measures which are used to extract outliers are dynamically fixed as suggested by the authors [13].

To handle the temporal nature of educational data and to avoid unnecessary generation of invalid outliers (spurious-slow learners) FTOD makes use of Train and Test approach [14]. That is FTOD detects outliers in one snapshot of data (first mid-semester results - training phase) and evaluates these outliers on the other snapshot of data (second and third mid-semesters results - testing phase).

To extract the outliers, FTOD makes use of the unexpected/surprising measure – Lift [15, 16].

We have compared the performance of FTOD with that of ‘FARIM’ [1] and our previous work ‘modified-FARIM’ [17] methods. These methods have adopted static and also single/common MF for fuzzification and the measure - ‘Rank’ for extracting the outliers. The experimental results prove that FTOD detects minimal, pre-validated and accurate outliers with higher classification accuracy than the FARIM and modified-FARIM methods.

The rest of the paper is organized in such a way that Section 2 deals with the motivation, survey and preliminaries for the proposed work. Section 3 discusses in detail the framework and algorithm of the proposed work. Section 4 details the experimental results of the proposed work against the existing methods. Finally, in Section 5 the proposed work is concluded with future research direction.

2. Motivation and preliminaries

This section details the motivation of the proposed method, the literature survey done, the problem formulation and the preliminaries required to implement the proposed method - FTOD.

2.1. Motivation

In the majority of autonomous institutions, students’ curriculum performance is evaluated based on the mid-semester examinations mark (25% to 50%) internally and end-semester examination mark (50% to 75%) externally. Among the three (or four) mid-semester examinations conducted, the institution will consider the best two (or three) for the internal mark evaluation. Hence, the students are having the attitude of attending only two (among three) or three (among four) mid-semester examinations with full effort and simply attend the other (perhaps for the sake of attendance) without proper preparations. This can be understood from Fig. 1 which is arrived from the performance of a batch of students in a semester.

The students’ lethargic attitude makes them fail intentionally or score low marks in at least one or two mid-semester examinations (notice the third mid-semester results in Fig. 1). Thus, the failures list of mid-semester examinations will also include average and fast learners (spurious-slow learners) beyond the actual slow learners, which will lead to misinterpretation of the slow learner’s list. This is true from Fig. 1 that, few of the spurious-slow learners have moved to the Average or Fast Learners group in the end-semester examination results. Also, the slow learner’s count has been reduced.

Fig.1

Mid-semester vs. end-semester performance.

The educator will be arranging extra training for all the slow learners (inclusive of the spurious-slow learners) to improve the results of the end-semester examinations. This will lead to unnecessary resource consumption which is cost accounting. Hence, differentiating the slow learners from spurious-slow learners is essential for educators to provide cost-effective training.

2.2. Literature survey

The education data is subjected to data mining techniques such as classification, clustering, association rule mining, sequential mining, etc., to extract knowledge for various stakeholders [18] and has been extended to handle the learning management system by making use of the moodle data as case study [19, 20]. The authors identified the relationship between the failed courses of the students using the association rules [19] generated by the Apriori algorithm, a frequent pattern-based approach. Also, an evolutionary algorithm based association rules were extracted to identify the relationship between the training offered in the classrooms for the students and the corresponding assessments [20]. The physical manipulative environment [21] when used by the students during their learning activities resulted in a better outcome than the virtual environment. It has been verified that the dynamical aspects which affect the behavior, cognition, and correctness of the school students (who appears for the board examinations) are directly associated with the learning outcomes, ie., the end-of-year examinations [22]. The effectiveness of the collective intelligence has been proved by adding collaborative learning features to the existing individual learning models among the students [23]. Cheng et al. [24] modified the existing MFH-SPAM, a sequential pattern mining algorithm to characterize the learning behavior of the students. By doing so, they found that the proposed method was able to identify the important patterns based on behavior, which was missed by the traditional method. The method followed by the authors helped them in characterizing the students based on the learning behavior. Another set of authors developed a Discrimination Aware Classifier model [25] using association rules and decision tree separately. Both of the classifiers used by the authors were able to classify the educational data and produced a small set of rules for classification with a better understanding and improved accuracy.

2.2.1. Outlier detection using association rules

Association rule mining (Apriori algorithm) is the first and foremost frequent pattern based mining technique [12, 26]. Even though this method is introduced to study the correlation among the frequently occurring items together, soon after its application moves towards the study of correlation among the infrequently occurring items [2]. To prune the patterns from the frequent/infrequent patterns based on the user interest, the method requires a set of measures called ‘interesting measures’ [15]. The unexpected temporal association rules are generated by the frequent pattern-based approach to alert the systems in the stock market [27]. In this work, the normal behavior of the objects in the stock market database is discovered using Temporal Association Rule (TAR). Then the relationship among the features of the TARs over time is identified using quasi-functional dependency, and finally, a predefined outlier pruning measure called ‘dependency degree’ [27] is used to extract the outliers. Also, the stock splits that occur during a particular period which alerts the stakeholders for their investment are identified using TARs with ‘residual leverage’ as the outlier pruning measure [28]. Unexpected episodes to detect the adverse drug reaction in the medical domain are identified using TARs [29 –31]. Preetha et al. proposed a non-parametric FP-Growth algorithm to detect outliers [32]. All the works mentioned above make use of association rules along with static predefined threshold values for the outlier pruning measures used to detect outliers.

2.2.2. Temporal outlier mining

Educational data is temporal because most of its information is valid for a particular time. Hence, most of the educational data mining processes adopt temporal mining techniques [1 , 17]. Temporal mining is the technique which is used to discover knowledge from temporal data and sequence data, where time and order of the event are considered into account respectively [27 –31]. A TAR spans across many intervals. Discovering all such rules is irrelevant and time-consuming. To solve this problem, the author adopted a ‘Train and Test approach’ to prune irrelevant rules while predicting heart disease [14].

2.2.3. Outlier detection using FLS on education data

To handle any engineering problem one should require both objective knowledge (mathematical model) and subjective knowledge (eg. linguistic information) which cannot be quantified by traditional mathematics alone. But, these two forms of knowledge can be simultaneously handled by FLS [33]. Educational data have a combination of quantified (numeric) data like marks, placement scores and ranking (ordinal) data like results. Hence, to handle the boundary problem while partitioning the numeric data, the authors have adopted FLS to generate exceptional TARs [1, 17]. A fuzzy-based Apriori algorithm called Fuzzy Apriori Rare Itemset Mining (FARIM) is used to detect fuzzy specific rare item sets from education data to discover the learning problem – the weak student identification [1]. The FARIM algorithm [1] converts the numerical data into a set of five fuzzy linguistic terms before performing the mining process. ‘Rank’ is proposed as the outlier pruning measure in FARIM for pruning the specific rare cases from the rare item sets. This measure is calculated for all possible linguistic terms generated during the early stage of the algorithm itself, ie., during the infrequent set generation and stored along with its corresponding itemset. This consumes a lot of memory space. But in the modified-FARIM algorithm [17] the same outlier pruning measure – ‘Rank’ is computed in the later phase of the algorithm, ie., only for the rare item sets that are pruned from the infrequent itemsets. By doing so, the authors have proved that the modified-FARIM consume less memory space. Both the FARIM [1] and modified-FARIM algorithms [17] adopted Apriori-based FLS with static Membership Function (MF) for categorizing the numerical data in the preprocessing step and Rank as the outlier pruning measure.

2.2.4. Existing optimal MFs generation methods

The objective of adopting optimal MF to generate fuzzy rules is to improve accuracy. The optimization of MFs can be achieved by adopting heuristics methods, hybrid methods, fuzzy clustering algorithms, neural networks, genetic algorithms, etc. [34]. Ketata et al. in their work, they have adjusted the initial static MF to arrive optimal MF by merging the fuzzy sets of similar rules [35]. Fuzzy neural network method was adopted by Wu et al. [36] to generate the fuzzy rules automatically. In this method without any predefined static MF, the fuzzy sets were randomly chosen, later the width of the fuzzy sets was adjusted based on the output error of existing MF. But, the method required several predefined parameters. Chen et al. [37] and Liao et al. [38] adopted a variant of fuzzy c-means algorithm to generate automatic MF. Even though the fuzzy c-means algorithm is unsupervised and can control the shape of the fuzzy sets, it is sensitive to noise or outliers. Also, Alcalá-Fdez et al. [39] have arrived at the optimal MF by making use of genetic learning method. Here the author has adjusted the peak value of each fuzzy set to and forth by a lateral displacement of – 0.5 to +0.5 to achieve optimal fuzzy sets. All the above-discussed methods require initial fuzzy sets or few predefined parameters to generate or adjust the final optimal MFs.

2.3. Problem formulation

To classify the slow learners from spurious-slow learners and average learners, the proposed method - FTOD adopts a FLS along with the associative classification method (a variant of association rule mining) with train and test approach. Also, FTOD makes use of the optimal MFs for categorizing the numerical data. FTOD aims at detecting the outliers by FLS based Apriori algorithm, using Lift as the outlier pruning measure. The features of FTOD and its advantages are

The minimum user-defined threshold value for the interesting measures.

Minimize user interaction.

Arriving dynamically computed threshold values for the interesting measures used to detect outliers.

No need for domain expert’s guidance.

The FLS adopts dynamically generated unique optimal MFs for categorizing each numerical attribute.

No need to fix initial static MF.

No need to have single MF for all the numerical attributes.

Based on the distribution of data the lower and upper boundary values of each MF will vary.

Able to classify the slow learners from average learners more accurately.

Outlier detection using train and test approach.

Reduce the irrelevant infrequent patterns.

Help in locating slow learners from spurious-slow learners.

Among these features, the 3rd (a unique feature of the proposed method - FTOD) and 4th features helped us in improving the classification accuracy of FTOD by eliminating the identification of spurious-slow learners and average learners as slow learners.

2.4. Preliminaries

This section describes the basic requirements like the dataset considered, the proposed MFs for the FLS and the interesting measures used to detect the outliers using FTOD method.

2.4.1. Dataset used

To evaluate the proposed approach, the education data from Thiagarajar College of Engineering’s web portal – TCENet [40] an automating academic processes management system is considered. The dataset used is the score details of third-year undergraduate students of Computer Science and Engineering Department. It gives the score details of 120 students in 6 theory courses (Statistics and Graph Theory - SG, Databases Principles and Design - DB, Multi-core Architecture - MA, Computer Networks Principles - CN, Web Programming - WP, Software Design - SD) of three mid-semester examinations conducted in the various period of a semester. To pass in a mid-semester exam, the students should score 50% and above in all the courses. All the three mid-semester examinations results are considered as separate snapshots. The outliers detected from one snapshot of data is verified and validated with the outliers detected from other snapshots of data using the train and test approach method. The proposed algorithm has been tested in three different batches of students. Sample dataset is given in Table 1. The ‘RESULT’ field is considered as the target attribute.

Table 1
Sample education data set

R.No SG DB MA CN WP SD RESULT

1 50 36 48 36 52 54 FAIL

2 58 52 44 36 54 48 FAIL

3 94 76 88 62 86 88 PASS

4 100 80 84 68 90 100 PASS

R.No	SG	DB	MA	CN	WP	SD	RESULT
1	50	36	48	36	52	54	FAIL
2	58	52	44	36	54	48	FAIL
3	94	76	88	62	86	88	PASS
4	100	80	84	68	90	100	PASS

2.4.2. Measures used

Subjective measures like support and confidence [15] are used to evolve more interesting rules at the earlier stage of the data mining process [12, 41]. The threshold value for the ‘maximum Support’ which is used to prune the infrequent patterns at each level is computed based on the Dynamic Minimum Support (DMS) and Collective Minimum Support (CMS) [13]. Let the antecedent and consequent part of the rules be A and B respectively and ‘D’ be the given set of transactions. Since, FTOD makes use of associative classification technique, the consequent part ‘B’ of the rule is restricted to the target attribute. Then, as per the definition given in [12, 42], the fuzzy support and fuzzy confidence for the Fuzzy Class Association Rule (FCAR) A⟶B is calculated as in Equations (1 and 2) respectively. $Support (A \to B) = (\sum x_{t} \in D μ_{AB} (x_{t})) / | D |$ (1)

$\begin{matrix} Confidence (A \to B) = \\ \sum x_{t} \in D μ_{AB} (x_{t}) / \sum x_{t} \in D μ_{A} (x_{t}) \end{matrix}$ (2)

where μ_A (x_t) represents the individual matching degree of the antecedent part of the rule among the transaction x_t and μ_AB (x_t) is the combined matching degree of the antecedent and consequent of the rule among the transaction x_t.

Few of the objective measures are of unexpected in nature and are also actionable in certain cases [41]. Among the various objective measures, FTOD is in need of the Surprising or Unexpected measures [15, 41]. The most relevant and important objective measure that is used to prune the rare or unexpected or deviated rules from the infrequent FCARs is ‘Lift’. “Lift measures how many times more often the patterns occur together than expected if they were statistically independent. Also, lift is susceptible to noise in small databases. Rare patterns with low counts (low support) which perchance occur a few times (or only once) together can produce enormous lift values” [43]. This nature – ‘the enormous value’ for the low support and rare, occurring patterns motivated us to make use of the ‘Lift’ as the outlier pruning measure. This is because the outliers are infrequent (low support) and rare observations with more deviation (high outlier score) in characteristics. The Lift value for the fuzzy rule A → B is given as in Equation (3). $Lift (A \to B) = P (AB) / (P (A) P (B))$ (3)

Since the outliers generated are in the form of FCARs, the two measures ‘Rule Coverage’ and ‘Rule Accuracy’ is used to assess the quality of the fuzzy rules [44]. Let N_covers be the number of transactions covered by the FCARs, N_correct be the number of transactions correctly classified by the FACRs and |D| be the total number of transactions in the database. Then the coverage and accuracy of the FCARs are defined as in the Equations (4 and 5) respectively. $Coverage (A \to B) = N_{covers} / | D |$ (4) $Accuracy (A \to B) = N_{correct} / N_{covers}$ (5)

2.4.3. Proposed MFs

The slow learners have to be correctly distinguished from the spurious-slow and average learners. For the correct classification of these categories, the boundary value plays a major role. This is achieved by choosing an appropriate MF for categorizing (fuzzification) the marks of each course.

The Education Database is subjected to fuzzification using triangular MF with five linguistic terms {very low (VL), low (L), medium (M), high (H) and very high (VH)}. FTOD does not use a fixed and single MF for all the courses. Some courses may be quite easier and the students will pass by scoring high marks. The students who have failed in such courses are the slow learners. But some courses will be tuff for most of the students and the failures, in this case, includes the average learners beyond the slow learners. Due to variation in easiness of the courses and the attitude of the students towards the faculty who handles the course, the marks scored by the students will vary. Hence, using a common or a fixed MF for fuzzification is not recommended for correct classification. Hence, FTOD makes use of separate MF for each course. The MF for each course is decided on the fly by scanning the minimum and maximum scores in the corresponding course. By this method, every course will have its own optimal MF for fuzzification. The boundary values generated by the proposed method and the corresponding linguistic terms of each course are listed in Table 2, and a sample MF used by FTOD is given in Fig. 2.

Table 2
Boundary values for entire courses

SUBJECT VERY LOW LOW MEDIUM HIGH VERY HIGH

SG [48.00–65.33] [56.66–74.00] [65.33–82.66] [74.00–91.33] [82.66–100]

DB [36.00–56.00] [46.00–66.00] [56.00–76.00] [66.00–86.00] [76.00–96.00]

MA [44.00–62.66] [53.33–72.00] [62.66–81.33] [72.00–90.66] [81.33–100]

CN [36.00–55.33] [45.66–65.00] [55.33–74.66] [65.00–84.33] [74.66–94.00]

WP [32.00–54.00] [43.00–65.00] [54.00–76.00] [65.00–87.00] [76.00–98.00]

SD [48.00–65.33] [56.66–74.00] [65.33–82.66] [74.00–91.33] [82.66–100]

SUBJECT	VERY LOW	LOW	MEDIUM	HIGH	VERY HIGH
SG	[48.00–65.33]	[56.66–74.00]	[65.33–82.66]	[74.00–91.33]	[82.66–100]
DB	[36.00–56.00]	[46.00–66.00]	[56.00–76.00]	[66.00–86.00]	[76.00–96.00]
MA	[44.00–62.66]	[53.33–72.00]	[62.66–81.33]	[72.00–90.66]	[81.33–100]
CN	[36.00–55.33]	[45.66–65.00]	[55.33–74.66]	[65.00–84.33]	[74.66–94.00]
WP	[32.00–54.00]	[43.00–65.00]	[54.00–76.00]	[65.00–87.00]	[76.00–98.00]
SD	[48.00–65.33]	[56.66–74.00]	[65.33–82.66]	[74.00–91.33]	[82.66–100]

Fig.2

Sample MF generated for SG course.

3. Experiment

The target outliers are the slow learners and not the spurious-slow learners. Slow learners are not only those who have failed in more than two courses but also those who have passed with border marks in more than two courses. The border mark scorers have the chance to fail in the end-semester examination. The outliers detected by the FTOD algorithm can figure out such students by following the steps like

Pre-processing

Optimal MF generation

Fuzzification

Outlier detection

Infrequent pattern generation

Infrequent FCARs generation

Outlier Pruning

Verification & Validation of Outliers

Post-pruning

Redundant outlier pruning

3.1. Framework for mining slow learners

The proposed method FTOD adopts frequent pattern (Apriori algorithm) based associative classification technique to mine FCARs from the mid-semester examination results. These FCARs, in turn, is helpful in spotting the outliers (slow learners). FTOD makes use of dynamically generated unique triangular MFs for fuzzification of each numerical attributes. An initial user-defined value is given as input to generate the threshold value for the subjective measure – ‘Support’ to prune the infrequent pattern in each level of infrequent pattern generation. Then the infrequent FCARs with high confidence (usually 0.9 and above) are generated from these infrequent patterns using the other subjective measure – ‘confidence’ along with the ‘support’ measure. Then the outliers are pruned from the FCARs using the outlier pruning measure – ‘Lift’. The threshold for lift is computed dynamically. Similarly, outliers are pruned for the second mid-semester marks, and the common outliers are validated with the third mid-semester marks by following the train and test approach to detect the exact outliers. The framework used to detect outliers is depicted in Fig. 3.

3.2. Algorithm for mining slow learners

Fig.3

Architecture to mine temporal outliers.

As per the procedure given in Algorithm 1 and Subroutine 1, initially, based on the distribution of data for the numerical attributes, the optimal MFs are generated for each course separately. Next, the numerical data present in the entire data set (all the three mid-semester marks) is fuzzified using the dynamically generated optimal MFs. Thus, the real value of the courses is now converted into its corresponding fuzzy values. The real value of the data for the course ‘SG’ given in Table 1 is fuzzified and listed in Table 3 as a sample.

Table 3

Part of the fuzzified data set

SG VERY LOW	SG LOW	SG MEDIUM	SG HIGH	SG VERY HIGH
1	0	0	0	0
0.846	0.154	0	0	0
0	0	0	0	1
0	0	0	0	1

The fuzzified dataset of first mid-semester marks (Training Data) is given as input for the outlier detection phase along with the initial support, to generate the infrequent patterns. Using the initial support the dynamic support threshold is computed in each level as suggested by the authors [13]. The infrequent patterns pruned in each level are consolidated to form the entire set and used to generate the infrequent FCARs. The threshold for the rule support is computed based on the mean value of the actual support of all the FCARs generated from the infrequent patterns. Then, the outliers are pruned from the infrequent FCARs by making use of the ‘Lift’ measure.

Algorithm 1: Fuzzy Temporal Outlier Detection (FTOD)
Input :Education Database - D, Initial Support - S_i
Output :A set of Temporal Outliers – Outlier
Notations Used : $σ_{dynsup}^{R}$ - Max. Support, S_a - Actual Support, IFP - Infrequent Patterns,
C_k: Candidate patterns of size k, L_k: Infrequent patterns of size k,
Outlier_tr – Outlier pruned from training data, Outlier_ts – Outlier pruned from testing data.
Optimal MF Generation: / Generation of dynamic MF /
1. for (i = course1 ; i ! = ∅ ; i ++) do begin
2. Assign the lower and upper boundary of MF by the minimum and maximum marks of the course.
3. Generate MF with five fuzzy sets {very low (VL), low (L), medium (M), high (H) and very high (VH)}.
4. end for;
Fuzzification: / Converting numerical data into Linguistic terms using the optimal MF generated /
5. D(SG, DB, MA, CN, WP, SD) ⟶ D^T(SG.VL, SG.L, SG.M, SG.H, SG.VH, … … … … .,SD.H, SD.VH, RESULT)
Infrequent Pattern Generation: / using APRIORI algorithm /
6. C₁= {SG.VL, SG.L, SG.M, SG.H, SG.VH, … … … … .,SD.H, SD.VH, RESULT};
7. L₁= Set of all C₁ for which S_a (C₁) < S_i
8. IFP = L₁;
9. for (k = 2 ; L_k-1 ! = ∅ ; k ++) do begin
10. C_k= candidate patterns generated from L_k-1 ;
11. L_k= Set of all C_k for which S_a (C_k) < $σ_{dynsup}^{R}$ ;
12. IFP = IFP∪ L_k ;
13 end for;
14. return IFP;
Outlier Detection: / Pruning low support patterns with high confidence and high value for outlier pruning measure /
15. Outlier_tr = Outlier_pruning (IFP) ;
Verification and Validation of Outliers: / GenerateOutlier_tsas per the steps 1…14 /
16. Outlier = Outlier_tr ∩ Outlier_ts
Post Pruning: /* Redundant Outliers if any are eliminated */

Subroutine 1: Outlier_pruning
Input : Infrequent patterns - IFP
Output : A set outliers (FCAR_pruned)
Notations Used : R_Conf – Rule Confidence, R_Sup - Rule support, R_lift - Rule Lift, FCAR – Fuzzy Class
Association Rule IFP - Infrequent patterns, FCAR_pruned – Outlier, Supp_threshold – Max.
Support, Lift_threshold – Min. Lift.
Infrequent FCARs Generation: / Generating Fuzzy rules from Infrequent patterns /
1. FCAR =∅ ;
2. for all patterns in IFP do
3. if D^T(RESULT) is contained in IFP then Generate FCAR of the form A∧ B ⟶ CLASS ;/*
whereA, B ∈ D^T (SG . VL, SG . L, SG . M, … . , SD . H, SD . VH,) andCLASS ∈ D^T (RESULT) */
4. end if;
5 calculate R_Sup, R_Conf and R_LiftforFCAR ;
6 FCAR = FCAR∪ FCAR ;
7. end for;
8. Support_threshold = Mean (∑R_Sup (FCAR)) ;
9. Lift_threshold = Mean (∑R_Lift (FCAR)) ;
Outlier Pruning: /* Pruning outliers from FCARs */
10. FCAR_pruned - ∅;
11. for all rules in FCAR do
12. FCAR_pruned = FACR_pruned∪ FCAR (withR_Sup < = SUPP_threshold, R_lift > = Lift_threshold and R_Conf > 90 %) ;
13. end for;
14. return FCAR_pruned ;

The threshold value for Lift measure is fixed based on the mean of actual Lift values of the all the IFCARs. The outliers thus detected are validated against the outliers detected in a similar manner using the testing data (second and third mid-semester marks). The outliers that are valid in both the training and the testing phase alone are considered. In the post-pruning phase, the IFCARs with a high confidence value and without duplication is considered as a final set of outliers.

The outliers generated by FTOD by nature have highest Lift values (maximum outlier score) and maximum confidence (almost 100%) and hence, don’t even require the user-defined threshold value for the confidence measure.

4. Results and discussion

The experimental setup for valuation includes the processor Pentium IV with 2.8 GHz, the CPU clock of 450 MHz, RAM with 512 MB and the Hard disk with 40 GB capacity. Likewise, the jdk1.7.0 on Windows XP/ Windows 7 software is used.

FTOD has been subjected to various experiments. At first, the performance of the outliers detected by FLS based FTOD is evaluated against the outliers detected by the crisp boundary method to understand the importance of the FLS in handling the fuzzy nature of the boundary values while categorizing the numerical data. Next, the performance of FTOD is measured against the existing similar fuzzy outlier detection methods like FARIM and modified-FARIM based on time, space, scalability, classification accuracy and the interpretability of the outlier generated. The following subsections discuss these in detail.

4.1. Analysis through crisp and fuzzy boundaries

The fundamental algorithm used for the proposed method – FTOD is the Apriori algorithm which is tuned to evolve Class Association Rules (CAR) rather than the conventional Association Rule (AR). Since ARs is not suitable for classification purpose, CARs are considered for classifying the slow learners. The method used to generate CARs is called as Associative Classification (AC). Hence, the Apriori algorithm is used to generate CARs by adopting both the crisp and fuzzy based methods for categorizing the course (numerical) attributes.

The performance of the proposed method - FTOD which adopts an FLS based AC is evaluated against the conventional AC with crisp boundary (discretization). These crisp and fuzzy based methods have been evaluated based on evaluation metrics like execution time, heap space used by the algorithm to detect the outliers, the number of outliers detected, the number of instances covered by the outliers (coverage) and the classification accuracy of the outliers detected. The experimental results arrived by the above-said methods have been tabulated in Table 4.

Table 4
Performances of crisp vs. fuzzy approaches

Parameters Conventional AC Proposed FTOD

1. Threshold value dynamically generated for Lift measure 0.325 0.94

2. Outlier Generated by Lift measure 15 6

3. Coverage (%) 8.1 7.55

4. Accuracy (%) 72.1 85.12

5. Heap space used to generate outliers (Megabytes) 10.323 60.628

6. Time used to generate outliers (seconds) 10,823,890 63,572,904

Parameters	Conventional AC	Proposed FTOD
1. Threshold value dynamically generated for Lift measure	0.325	0.94
2. Outlier Generated by Lift measure	15	6
3. Coverage (%)	8.1	7.55
4. Accuracy (%)	72.1	85.12
5. Heap space used to generate outliers (Megabytes)	10.323	60.628
6. Time used to generate outliers (seconds)	10,823,890	63,572,904

From the results depicted in Table 4, it is understood that even though the time consumed and the memory space used to detect the outliers by FTOD has boosted up six times than crisp based convention AC method, the accuracy of the outliers generated by FTOD has been improved by 13%. The improvement in the classification accuracy is because of the characteristics of natural boundary handling the property of the FLS method. This can be noticed from the threshold value dynamically fixed for both the methods. The threshold value of the Lift measure in FTOD is almost one, but the value of the conventional AC is less than 0.5. As per the characteristics of ‘Lift’ the higher value, ie., greater than one indicates the rare, occurring patterns [43]. Hence, naturally, the Infrequent FACRs (IFCARs) with Lift value greater than the threshold value of FTOD (0.94) will be an outlier. Whereas, the IFCARs generated by conventional AC with Lift value greater than 0.325 includes the noise as well. That is why FTOD was able to generate a less number of outliers than the conventional AC and as a follow up the accuracy also increased. Thus, it is proved that the boundary problem faced during the preprocessing step affects the quality of the outliers detected.

The nature of the outliers is small in the count and possesses high outlier scores. This has been demonstrated by the nature of the outliers detected by FTOD. From the results tabulated in Table 4 and by the above discussion, it is clearly understood that how the FLS is related to the improvement of classification accuracy.

4.2. Analysis through the performance of FTOD

FTOD make use of Lift as the outlier pruning measure to detect outliers. The results of FTOD in classifying the slow learners are compared with the existing similar outlier detection algorithms like FARIM [1] and modified-FARIM [17] (our previous work). Both FARIM and modified-FARIM make use of the same outlier pruning measure ‘RANK’. The modified-FARIM adopts the same logic as FARIM except for the position of using the outliers pruning measure. In FARIM the ‘RANK’ measure is used in the earlier stage of the algorithm as discussed in Section 2.2.3., whereas the modified-FARIM make use of it at a later stage of the algorithm. All the three algorithms ‘FARIM’, ‘modified-FARIM’ and the proposed ‘FTOD’ make use of FLS based Apriori method to detect the outliers using FCARs. These algorithms are evaluated on the education data [40], and the results are tabulated in Table 5.

Table 5
Results of proposed FTOD vs. existing similar algorithms

Parameters FARIM Modified-FARIM Proposed - FTOD

1. User-defined thresholds Min sup = 0.0 Initial Support = 0.2 Initial Support = 0.2

Max sup = 0.2 Min Conf = 0.9

Max rank = 0.2

Min Conf = 0.9

2. Thresholds fixed dynamically – Max sup = 0.484 Max sup = 0.484

Max Rank = 0.58 Min Lift = 9.4

3. Nature of MF used •Static triangular MF •Static triangular MF •Dynamic triangular MF

•Common for all courses •Common for all courses •Varies for each course

4. Boundary values fixed for MF VL = 0–70 VL = 0–70 VL = 36.0–55.33

L = 65–80 L = 65–80 L = 45.66–65.0

M = 70–90 M = 70–90 M = 55.33–74.66

H = 80–95 H = 80–95 H = 65.0–84.33

VH = 90–100 VH = 90–100 VH = 74.66–94.0 (for CN course alone)

5. No. of infrequent Patterns generated 2,660 1,153 4,522

6. Heap space used to generate outliers 62.37 MB 8.01 MB 38.98 MB

7. Time used to generate outliers 24 Sec. 41 Sec. 22 Sec.

8. No. of outliers generated 10 5 14

9. Coverage (%) 6.49 14.93 7.45

10. Accuracy (%) 80 55.37 85.12

11. Nature of the Instances covered by the outliers •Students failed in 3 courses with very low marks. •Students failed in 3 to 4 courses with very low marks. •Students failed in 3 and four courses with border marks.

•Students passed with 50 to 66 marks in 3 courses. •Students passed with 50 to 70 marks in 3 courses. •Students passed with 50 to 60 marks in 3 to 5 courses.

12. Merits •Moderate execution time. •Least number of infrequent patterns. •Least execution time

•Moderate accuracy. •Least memory usage. •Highest classification accuracy.

13. Limitations •Highest memory space usage. •Pre-defined threshold values for the measures. •Highest execution time. •Least classification accuracy •More number of infrequent pattern generation because of high threshold value dynamically fixed for the support measure.

Parameters	FARIM	Modified-FARIM	Proposed - FTOD
1. User-defined thresholds	Min sup = 0.0	Initial Support = 0.2	Initial Support = 0.2
	Max sup = 0.2	Min Conf = 0.9
	Max rank = 0.2
	Min Conf = 0.9
2. Thresholds fixed dynamically	–	Max sup = 0.484	Max sup = 0.484
Max Rank = 0.58	Min Lift = 9.4
3. Nature of MF used	•Static triangular MF	•Static triangular MF	•Dynamic triangular MF
	•Common for all courses	•Common for all courses	•Varies for each course
4. Boundary values fixed for MF	VL = 0–70	VL = 0–70	VL = 36.0–55.33
	L = 65–80	L = 65–80	L = 45.66–65.0
	M = 70–90	M = 70–90	M = 55.33–74.66
	H = 80–95	H = 80–95	H = 65.0–84.33
	VH = 90–100	VH = 90–100	VH = 74.66–94.0 (for CN course alone)
5. No. of infrequent Patterns generated	2,660	1,153	4,522
6. Heap space used to generate outliers	62.37 MB	8.01 MB	38.98 MB
7. Time used to generate outliers	24 Sec.	41 Sec.	22 Sec.
8. No. of outliers generated	10	5	14
9. Coverage (%)	6.49	14.93	7.45
10. Accuracy (%)	80	55.37	85.12
11. Nature of the Instances covered by the outliers	•Students failed in 3 courses with very low marks.	•Students failed in 3 to 4 courses with very low marks.	•Students failed in 3 and four courses with border marks.
	•Students passed with 50 to 66 marks in 3 courses.	•Students passed with 50 to 70 marks in 3 courses.	•Students passed with 50 to 60 marks in 3 to 5 courses.
12. Merits	•Moderate execution time.	•Least number of infrequent patterns.	•Least execution time
	•Moderate accuracy.	•Least memory usage.	•Highest classification accuracy.
13. Limitations	•Highest memory space usage. •Pre-defined threshold values for the measures.	•Highest execution time. •Least classification accuracy	•More number of infrequent pattern generation because of high threshold value dynamically fixed for the support measure.

Fig.4

Sample outliers detected.

The results in Table 5 show that FTOD consumes less heap space and less time to generate the outliers when compared to FARIM, but more heap space and less execution time than the modified-FARIM algorithm. The FARIM algorithm requires pre-defined thresholds for all the interesting measures to detect outliers whereas the proposed approach uses only the initial support as the pre-defined threshold. Even minimum confidence is not required for FTOD because all the extracted outliers by the Lift measure are almost 100% confidence. This is because the FCARs pruned by the Lift measure is high confidence by default. These results have given us enough confidence that FTOD with Lift as the outlier pruning measure detect more appropriate outliers with less memory space and less execution time than the existing algorithm FARIM and less execution time than the modified-FARIM algorithm.

Even though FTOD has generated more number of infrequent patterns because of the dynamically computed threshold for the support measure, the suggested outlier pruning measure ‘Lift’ can prune irrelevant rules and noise from the wide set of infrequent rules generated. This proves the efficiency of the Lift measure.

By noticing the instances covered by the outliers, it is understood that both the FARIM and modified-FARIM algorithms predict the failed students (with very low marks – weak students) as outliers, whereas the proposed FTOD method predicts the students who have failed in the border marks (slow learners) as outliers. Also, FARIM and modified-FARIM algorithms predict the average learners (students secured FIRST CLASS - 60% to 70%) also as slow learners. But the outliers detected by FTOD predict the students who have passed in the border marks (marks between 50% and 60%) alone as slow learners (which are our goal). This is possible with the FTOD because of the dynamic MFs which was computed uniquely for each course.

Hence, it can be concluded that FTOD with ‘Lift’ as the outlier pruning measure can detect a wide set of outliers (slow learners) from the enormous infrequent patterns generated within least execution time.

4.3. Analysis through the outliers detected

The outliers detected by FTOD, FARIM, and modified-FARIM algorithms are intensely noted and found that the boundary value of the numerical attributes (courses) in the outliers varies. Hence, the interpretation of the rules also differs. Few samples of the common outliers detected by the above-said algorithms have been listed in Fig. 4. From the outliers listed, it can be noted that for the courses CN and DB the students who have scored first class (>60%) are also classified as slow learners in the case of FARIM and modified-FARIM algorithms where ‘Rank’ is used as outlier pruning measure with static MF. But in the case of the proposed FTOD algorithm, for the same courses, those who have scored second class (<60%) alone are classified as slow learners. In the institution [40] considered for this case study, the courses CN, and DB are considered to be difficult for the students. Hence, those who have passed in first class need not want to be considered as slow learners. Instead, they are considered as average learners. But in the case of the course, MA (which is comparatively considered as an easy course by the students) those who have scored 65% are classified as slow learners by the FTOD. This is possible because of the dynamically computed optimal unique MFs for each course separately. It can also be noticed from the data set, that the marks scored by the students in the course MA are better than the CN and DB courses. This can be verified from Table 2, by noticing the lowest boundary values fixed for these courses.

Also, the scalability of the outliers detected by FTOD is measured in the perspective of the coverage and the classification accuracy of the outliers. For this, the data samples with various sizes are used for evaluation. From the graphs depicted in Figs. 5 and 6, it is understood that the coverage value of outliers detected by FTOD keep on decreasing whereas its accuracy keeps on increasing. Since outlier is rare in occurrence it’s coverage value is low. Hence, it is proved that even though the outliers detected by FTOD are moderate in coverage, they can classify the slow learners more accurately.

Fig.5

Coverage of the outliers detected.

Fig.6

The accuracy of the outliers detected.

To understand the extensibility of the FTOD, it is also tested on the PIMA Indian Diabetes Dataset [45], to classify the patients with pre-diabetic condition among the patients who have the test outcomes for diabetes as negative (no diabetes). The experiments have been carried out with various outlier pruning measures along with ‘Lift’ measure and with two different MFs [46]. The results proved that the ‘Lift’ measure along with triangular MFs with five linguistic terms is suitable for classifying the pre-diabetic patients. Thus, pre-diabetic patients will be warned to take care of their health and protect themselves from diabetes in due course.

5. Conclusion

Fuzzy based Temporal Outlier Detection – FTOD that adopts Apriori-based associative classification technique has been proposed in this work. FTOD detects temporal outliers from the education data set that deals with the students’ performance in three mid-semester examinations conducted at various period of a semester. FTOD makes use of dynamically computed optimal triangular MFs for fuzzification. Each numerical attributes (courses) has different MF, which is the uniqueness of the proposed work. Then the rare or infrequent patterns are generated from the fuzzified data using train and test approach with a dynamically computed threshold value for the ’support’ measure. The infrequent FCARs with high confidence value is evolved from the rare patterns. Finally, the infrequent FCARs with more deviation (high outlier score) are pruned as outliers using the measure ‘Lift’. The threshold value for the Lift measure is computed dynamically based on the actual Lift value of the infrequent FCARs.

From the experiment carried out, it is found that the FTOD consumes moderate space and least execution time to detect more accurate outliers than the existing similar methods like FARIM and modified-FARIM. Because of using fuzzy logic in categorizing the numerical attributes, the proposed method detects more accurate outliers than the conventional crisp (discretization) method. Hence, it is conclude that by adopting dynamic optimal MFs for fuzzification, train and test approach based associative classification technique for infrequent FCARs generation and the unexpected measure Lift for outlier pruning, FTOD classify the slow learners from spurious-slow learners and average learners with 5% to 30% better classification accuracy than the FARIM and modified-FARIM algorithms.

As per the knowledge derived from the literature survey and from the observed experimental results, two issues faced by FTOD are suggested as the future research direction. First one is to reduce the enormously generated infrequent patterns because of adopting fuzzy logic. The second one is to improve the classification accuracy by adopting MFs other than triangular MF.

Footnotes

Acknowledgments

The authors would like to extend their sincere and soulful thanks to their institution - Thiagarajar College of Engineering for providing the support and data for this research.

References

C.H.

Weng , Mining fuzzy specific rare itemsets for education data, Knowledge-Based Systems 24(5) (2011), 697–708.

Romero ,

J.R.

Romero ,

J.M.

Luna and

Ventura , Mining rare association rules from e-learning data, In Educational Data Mining, 2010.

Kaur ,

Singh and

G.S.

Josan , Classification and prediction based data mining algorithms to predict slow learners in the education sector, Proceedia Computer Science 57 (2015), 500–508.

B.K.

Baradwaj and

Pal , Mining educational data to analyze students' performance, arXiv preprint arXiv: 201.3417, 2012.

D.M

Hawkins , Identification of outliers, London: Chapman and Hall, 1980, p. 11.

Han ,

Pei and

Kamber ,Data mining: Concepts and techniques, Elsevier, 2011.

C.C.

Aggarwal , An Introduction to Outlier Analysis, In Outlier Analysis, Springer, Cham, 2017, pp. 1–34.

Srikant and

Agrawal , Mining quantitative association rules in large relational tables, In ACM SIGMOD Record 25(2) (1996), 1–12.

C.M.

Kuok ,

Fu and

M.H.

Wong , Mining fuzzy association rules in databases, ACMSIGMOD Record 27(1) (1998), 41–46.

10.

W.H.

Au and

K.C.

Chan , Mining fuzzy association rules in a bank-account database, IEEE Transactions on Fuzzy Systems 11(2) (2001), 238–248.

11.

C.T.

Dhanya and

D.N.

Kumar , Data mining for evolving fuzzy association rules for predicting monsoon rainfall of India, Journal of intelligent systems 18(3) (2009), 193-210.

12.

Agrawal ,

Imielinski and

Swami , Mining association rules between sets of items in large databases, In ACM SIGMOD Record 22(2) (1993), 207–216.

13.

C.K.

Selvi and

Tamilarasi , Mining association rules with dynamic and collective support thresholds, International Journal of Engineering and Technology 1(3) (2009), 236–240.

14.

Ordonez , Association rule discovery with the train and test approach for heart disease prediction, IEEE Transactions on Information Technology in Biomedicine 10(2) (2006), 334–343.

15.

Geng and

H.J.

Hamilton , Interestingness measures for data mining: A survey, ACM Computing Surveys (CSUR) 38(3) (2006), 1–32.

16.

Verma ,

S.D.

Khan ,

Maiti and

O.B.

Krishna , Identifying patterns of safety related incidents in a steel plant using association rule mining of incident investigation reports, Safety Science 70 (2014), 89–98.

17.

A.M.

Rajeswari ,

Sridevi and

Deisy , Outliers detection on educational data using fuzzy association rule mining, In Proceedings of International Conference on Advanced in Computer Communication and Information Science, 2014, pp. 1–9.

18.

Romero ,

Ventura ,

Pechenizkiy and

R.S.

Bakereds , Handbook of educational data mining, CRC Press, 2010.

19.

Romero ,

Ventura and

Garcia , Data mining in course management systems: Moodle case study and tutorial, Computers and Education 51(1) (2008), 368–384.

20.

J.M.

Luna ,

Romero ,

J.R.

Romero and

Ventura , Al evolutionary algorithm for the discovery of rare class association rules in learning management systems, Applied Intelligence 42(3) (2015), 501–513.

21.

Bumbacher ,

Salehi ,

Wierzchula and

Blikstein , Learning environments and inquiry behaviors in science inquiry learning: How their interplay affects the development of conceptual understanding in physics, International Educational Data Mining Society (2015).

22.

M.O.Z.

San Pedro ,

E.L.

Snow ,

R.S.

Baker ,

D.S.

McNa-mara and

N.T.

Heffernan , Exploring dynamical assessments of affect, behavior, and cognition and math state test achievement, International Educational Data Mining Society (2015).

23.

J.K.

Olsen ,

Aleven and

Rummel , Predicting student performance in a collaborative learning environment, International Educational Data Mining Society (2015).

24.

Ye ,

J.R.

Segedy ,

J.S.

Kinnebrew and

Biswas , Learning behavior characterization with multi-feature, hierarchical activity sequences, International Educational Data Mining Society (2015).

25.

Luo ,

Koprinska and

Liu , Discrimination-aware classifiers for student performance prediction, International Educational Data Mining Society (2015).

26.

Agrawal and

Srikant , Fast Algorithms for Mining Association Rules in Large Databases, In Proc 20th Int Conf Very Large Data Bases, VLDB, 1994, pp. 478–499.

27.

Bruno and

Garza , TOD: Temporal outlier detection by using quasi-functional temporal dependencies, Data and Knowledge Engineering 69(6) (2010), 619–639.

28.

A.M.

Rajeswari ,

Deisy ,

Abirami Nachammai and

G.V.

Aishwarya , Temporal Outlier Detection on Quantitative Data using Unexpectedness Measure, In Intelligent Systems Design and Applications (ISDA), 2012, pp. 420–424.

29.

Jin ,

Chen ,

He ,

G.J.

Williams ,

Kelman and

C.M.

O'Keefe , Mining unexpected temporal associations: Applications in detecting adverse drug reactions, IEEE Transactions on Information Technology in Biomedicine 12(4) (2008), 488–500.

30.

Jin ,

Chen ,

He ,

Kelman and

C.M.

O'Keefe , Signaling potential adverse drug reactions from administrative health databases, IEEE Transactions on Knowledge and Data Engineering 22(6) (2010), 839–853.

31.

Ji ,

Ying ,

Dews ,

Mansour ,

Tran ,

R.E.

Miller and

R.M.

Massanari , Apotential causal association mining algorithm for screening adverse drug reactions in postmarketing surveillance, IEEE Transactions on Information Technology in Biomedicine 15(3) (2011), 428–437.

32.

Preetha and

Radha , Enhanced outlier detection method using association rule mining technique, International Journal of Computer Applications 42(7) (2012), 1–6.

33.

K.H.

Lee , An extension of association rules using fuzzy sets, IFSA, 1997.

34.

M.E.

Cintra ,

H.A.

Camargo and

M.C.

Monard , A study on techniques for the automatic generation of membership functions for pattern recognition, In Congresso da AcademiaTrinacional de Ciencias (C3N), 1,2008,pp. 1–10.

35.

Ketata ,

Bellaaj ,

Chtourou and

M.B.

Amer , Adjustment of membership functions, generation and reduction of fuzzy rule base from numerical data, Malaysian Journal of Computer Science 20(2) (2007), 147–169.

36.

Wu ,

M.J.

Er and

Gao , A fast approach for automatic of fuzzy rules by generalized dynamic fuzzy networks, IEEE Transactions on Fuzzy Systems 9(4) 594.

37.

M.S.

Chen and

S.W.

Wang , Fuzzy clustering analysis for optimizing fuzzy membership functions, Fuzzy Sets and Systems 103(2) (1999), 239–254.

38.

T.W.

Liao ,

A.K.

Celmins and

R.J.

Hammell II , A fuzzy c-means variant for the generation of fuzzy term sets, Fuzzy Sets and Systems 135(2) (2003), 241–257.

39.

Alcala-Fdez ,

Alcala ,

M.J.

Gacto and

Herrera , Learning the membership function contexts for mining fuzzy association rules by using genetic algorithms, Fuzzy Sets and Systems 160(7) (2009), 905–921.

40.

TCE, , 2016.

41.

McGarry , A survey of interestingness measures for knowledge discovery, The Knowledge Engineering Review 20(1) (2005), 39–61.

42.

Alcala-Fdez ,

Alcala and

Herrera , A fuzzy association rule-based classification model for high-dimensional problems with genetic rule selection and lateral tuning, IEEE Transactions on Fuzzy Systems 19(5) (2011), 857–872.

43.

Hahsler , A probabilistic comparison of commonly used interest measures for association rules, Available online: http://michael.hahsler.net/research/association_rules/measures.html, 2015.

44.

Han ,

Pei and

Kamber , Data mining: Concepts and techniques, 3rd , Elsevier, 2012, p. 356.

45.

Kaggle , https://www.kaggle.com/uciml/pima-indians-diabetes-database/data, 2018.

46.

A.M.

Rajeswari ,

Sumaiya Sidhika ,

Kalaivani and

Deisy , Prediction of Prediabetes using Fuzzy Logic based Association Classification, International Conference on Inventive Communication and Computational Technologies,2018, pp. 780–785

Fuzzy logic based associative classifier for slow learners prediction

Abstract

Keywords

1. Introduction

2. Motivation and preliminaries

2.1. Motivation

2.2.1. Outlier detection using association rules

2.2.2. Temporal outlier mining

2.2.3. Outlier detection using FLS on education data

2.2.4. Existing optimal MFs generation methods

2.3. Problem formulation

2.4. Preliminaries

2.4.1. Dataset used

Table 1 Sample education data set R.No SG DB MA CN WP SD RESULT 1 50 36 48 36 52 54 FAIL 2 58 52 44 36 54 48 FAIL 3 94 76 88 62 86 88 PASS 4 100 80 84 68 90 100 PASS

3.1. Framework for mining slow learners

3.2. Algorithm for mining slow learners

4.1. Analysis through crisp and fuzzy boundaries

Footnotes

Acknowledgments

References

Table 1
Sample education data set

R.No SG DB MA CN WP SD RESULT

1 50 36 48 36 52 54 FAIL

2 58 52 44 36 54 48 FAIL

3 94 76 88 62 86 88 PASS

4 100 80 84 68 90 100 PASS