Abstract
This paper presents a novel approach to post-processing of association rules based on the idea of meta-learning. A subsequent association rule mining step is applied to the results of “standard” association rule mining. We thus obtain “rules about rules”, which can help us better understand the association rules generated in the first step. We define various types of such meta-rules and report some experiments on benchmark data from the UCI Machine Learning Repository as well as on data from atherosclerosis risk domain. When evaluating the proposed method, we use the LISp-Miner system.
Introduction
Concept description is one of the typical data mining tasks. According to CRISP-DM methodology concept description “…aims at understandable descriptions of concepts or classes” [10]. Concept description is thus similar to classification in the sense that there are predefined classes (given by the values of the target attribute) we are interested in. But unlike classification, concept description focuses on understandability, not on classification accuracy of the discovered knowledge. Therefore association rules, decision rules or decision trees are preferred tools to model these concepts.
In our paper, we focus on concept description using association rules. Association rules have been proposed by R. Agrawal in the early 1990s as a tool for the so-called market basket analysis [1]. An association rule has the form of an implication
where
says that customers who buy products
This idea of association rules can be applied to any data in a tabular, attribute-value form. Data describing values of attributes can be analyzed in order to find associations between conjunctions of attribute-value pairs (categories). Let us denote these conjunctions as Ant (antecedent) and Suc(succedent), and rewrite the association rule as
When using association rules for concept description, Suc will be a category of the target attribute.
We can again characterize the “strength” of an association rule by support and confidence. Now support is the estimate of the probability
In association rule discovery, the task is to find all syntactically correct rules
There are a number of algorithms capable of performing this task. The probably best-known algorithm apripri proceeds in two steps. All frequent itemsets are found in the first step during breadth-first search in the space of all frequent itemsets. This search is controlled using the so-called downward closure property: if a pattern of length
The search space of all possible itemsets (or conjunctions of categories) can be very large. For
The main problem when using association rules for data mining is their interpretation. Usually we end up with a huge number of associations and each of them might be interesting for the domain expert or end-user. So a kind of automatic support for the interpretation in the form of post-processing association rules would be of great help. We present some ideas in this direction and show their experimental evaluation using LISp-Miner, a data mining toolbox for mining different types of rules, which is under development at the University of Economics, Prague [19, 21].
The rest of this paper is organized as follows. Section 2 reviews work related to the problem of post-processing of association rules, Section 3 gives an overview of the GUHA method and the 4ft rules, Section 4 introduces the concept of association meta-rules and describes how they can be obtained using the 4ft-Miner procedure, Section 5 summarizes the experimental evaluation of the proposed approach on some benchmark datasets from the UCI Machine Learning Repository, Section 6 shows the experimental evaluation of the proposed approach on data from atherosclerosis risk domain, and Section 7 concludes the paper. This paper is an extended version of a paper presented at the IDA 2013 Conference [7].
Related work
Various approaches have been proposed in the past to post-process a large list of found associations. Baesens et al. [4] distinguish four types of such post-processing: pruning, where redundant rules are identified and removed from the rule list; summarization, where rules are summarized into more general or abstract concepts that are easier to understand by the user; grouping, where rules are grouped together on various levels of abstraction; and visualization. As visualization, together with filtering or selection, is a rather standard option in most systems, we will not discuss this approach in this paper. The other types of post-processing can be carried out either based on the found association rules only or using some domain knowledge.
An application of deduction rules to post-process the results of the GUHA method is described in [18]. A deduction rule has the form
where
saying that if
saying that if all examples satisfying
A similar idea, but applied to “Agrawal-like” association rules, can be found in the work of Jorge et al. [15]. The authors define a number of operators that can transform a rule into a set of rules. Antecedent generalization or antecedent least general generalization are examples of operators that can be applied to association rules created to describe a concept. In the former case, a rule
Toivonen et al. understand redundancy of rules on the semantic level, i.e., as rules that cover the same examples, and they define a rule cover as a minimal set of rules that covers all examples [23]. This rule cover can thus be understood as the reduced set of rules presented to the user. To find the rule cover, they propose a method very similar to the set covering algorithm for creating decision rules. The standard set covering algorithm works in an iterative way: in each iteration it creates a rule and removes covered examples from the data. The process is terminated when there are no uncovered examples. Here, the found association rules are first sorted in a decreasing order of supports. Then, in each iteration a rule (with highest support) is moved to the next rule and examples covered by this rule are removed from the data; this process is again terminated when there are no examples to be covered by a rule not yet added to the rule cover. This paper also describes clustering of association rules that have the same succedent; the distance between two rules is again defined semantically, i.e., as the number of examples covered by only one of the rules.
KEX is an algorithm that learns rules in the form
Both semantical (i.e., based on number of instances covered by a rule) and syntactical (i.e., based on the lists of attribute-value pairs that occur in the rules) clustering of association rules can be found in [21]. A similarity of association rules is defined there as
where
Zaki [25] proposes a method that only mines for non-redundant association rules. He defines a rule
A set of reduction techniques for redundant rules was proposed and implemented by Ashrafi et al. [3]. Their techniques are based on the generalization/specification of the antecedent/succedent of the rules. The authors distinguish between redundant rules with a fixed antecedent and redundant rules with a fixed succedent. A rule
There is a third possibility, namely, post-processing the rules using some domain knowledge, usually in the form of an ontology. So, e.g., An et al. use expert-supplied taxonomy (is-a hierarchy) of items for clustering the discovered association rules with respect to the taxonomic similarity. The taxonomy has a form of a tree where upper-level nodes represent generalizations of their children. Both leaf and non-leaf nodes of this taxonomy can be present in the antecedent and succedent of a rule. Each node of this taxonomy is associated with a pair of numbers (horizontal position, vertical position) representing the relative position of the node in the taxonomy tree. A rule is then represented by the average value of these two numbers computed for nodes from antecedent and by the average value of these two numbers computed for nodes from the succedent. This representation is then used during clustering [2]. Domingues and Rezende iteratively scan the itemset rules and update a taxonomy that is then used to generalize the association mining results. A rule generalization is done by substituting an item in the rule (in antecedent or succedent) by its parent w.r.t. to the taxonomy tree. The contingency table is updated accordingly. Repeatedly occurring rules are pruned out [11]. Marinica and Guillet propose an interactive post-processing approach to pruning and filtering the discovered rules. Their approach consists of two main parts: creating a knowledge base and post-processing. The knowledge base consists of an ontology (which expresses general knowledge about the domain); and the post-processing task iteratively applies a set of filters (based on the knowledge base) to the extracted rules in order to extract interesting rules [17].
GUHA method and LISp-Miner
Basic principles of the GUHA Method
GUHA is an original Czech method of exploratory data analysis which originated in the 1960s. Its principle is to offer all interesting facts following from the given data with regard to the given problem. The book [13] introduces the main principles of this method, a general theory of mechanized hypothesis formation based on mathematical logic and statistics. Hypotheses defined and studied in that book are relations between two conjunctions derived from values of attributes (columns) of an analyzed data table. Various types of relations between these conjunctions are used, including relations corresponding to statistical hypothesis testing. The original GUHA procedure ASSOC understands the knowledge pattern (called hypothesis) as the expression
where Ant (called antecedent), Suc (called succedent) and Cond (called condition) are conjunctions of literals and
The relation
Fourfold contingency table
Fourfold contingency table
The basic idea of the GUHA method is to find all hypotheses (knowledge patterns) that are true in the analyzed data in the above-described sense. A GUHA procedure generates (in a top-down way) each particular pattern and tests if it is true in the analyzed data, again in the above-mentioned sense. The output of the procedure consists of all such patterns that do not immediately follow from other, simpler output patterns.
Since the 1960s, the GUHA method has been implemented several times on different computer platforms. We will describe the GUHA procedures which are currently implemented in the LISp-Miner system in the next Section. These procedures differ in the types of knowledge patterns that are produced on the output.
LISp-Miner, a freely available system that has, since 1996, been developed at the University of Economics, Prague, implements various GUHA procedures that mine for different types of (mostly rule-like) knowledge patterns.
LISp-Miner is oriented on work with categorical attributes. We can distinguish between binary attributes (that can have only two values, e.g., the attribute “gender”), nominal attributes (that can have more than two values and the values are not ordered, e.g., the attribute “color”), and ordinal attributes (that can have more than two values but their values are ordered, e.g., the attributes “education” or “military rank”). Numeric attributes must be discretized (turned into intervals) in advance before running any of the mining procedures. The data preprocessing module LMDataSource can be used to do this – the module offers an equidistant discretization (the created intervals have the same width values), equifrequent discretization (the created intervals contain roughly the same numbers of examples) or discretization where the intervals are defined by the user.
By now, LISp-Miner consists of 10 data mining procedures: 4ft-Miner (derived from the original GUHA procedure ASSOC), SD4ft-Miner, AC4ft-Miner, KL-Miner, CF-Miner, SDKL-Miner, SDCF-Miner, KEX, ETree-Miner, and MCluster-Miner. The 4ft-Miner is based on the original GUHA procedure ASSOC from the 1960s, the other procedures have been designed during the development of the LISp-Miner system. Most of the procedures mine for various types of rule-like patterns – this makes LISp-Miner more focused on a particular type of models than standard data mining tools. The core of the LISp-Miner system includes the procedures mentioned above. Each procedure is realized using two modules: the data processing module Task (e.g., 4ft-Task) is used to run the analysis; and the data interpretation module Results (e.g., 4ft-Results) is used to display, sort or select the resulting rules. In addition, LISp-Miner also implements (in the DataSource module) a variety of data transformation and data preprocessing methods that can be used to select attributes for the specific data mining tasks, create derived attributes, or discretize numeric attributes [22]. There is also one administration module, LMAdmin, which assigns data and meta-data to a given task (see Fig. 1 for a scheme of LISp-Miner). The concept of meta-data allows (like in RapidMiner, IBM SPSS Modeler or SAS EM) the user to store and reuse inputs for the tasks, as well as the results obtained during the analysis. Both data and meta-data are stored using a database (MS Access or MySQL are usually used).
Scheme of the LISp-Miner system.
In our study, we will focus only on the 4ft-Miner procedure. An interested reader should refer to [20, 22] for more information about the other LISp-Miner procedures. The 4ft-Miner procedure mines for knowledge patterns that can be understood as 4ft association rules (4ft rules for short) in the form
where Ant (antecedent), Suc (succedent) and Cond (condition) are cedents and
Implemented 4ft quantifiers
Table 2 shows the implemented quantifiers (types of 4ft rules). This table also shows the relation that the quantifiers must fulfill to consider the 4ft rule to be true. Here
Unlike in the original GUHA procedure ASSOC, where Ant, Suc and Cond were conjunctions of literals, Ant, Suc and Cond in 4ft-Miner are conjunctions of partial cedents, where each partial cedent is a conjunction or disjunction of literals and a literal is defined as
one category; this is simply a single value of an attribute a subset of a given length (e.g., city(London, Paris) is a literal that contains a subset of length 2), an interval of a given length (e.g., age (10–20, 20–30) or age (0–10, 10–20) are intervals of length 2), a cyclic interval (e.g., if values of age are 0–10, 10–20, 20–30 and 30–40, then age (30–40, 0–10, 10–20) is a cyclic interval), a cut, i.e., an interval of a given length that contains the boundary value (e.g., age (0–10, 10–20) is a cut of length 2 but age (10–20, 20–30) is not a cut), a left cut, a cut that contains the first value (e.g., age (0–10, 10–20)), a right cut, a cut that contains the last value (e.g., age (30–40) or age (20–30, 30–40)).
While subsets can be created for all attributes, creating intervals and cuts only makes sense for ordinal attributes.
When comparing this notion of association rules (we will call them 4ft rules) with the “standard” understanding, we will find that:
a 4ft rule not only consists of an antecedent Ant and a succedent Suc but can also contain a condition Cond; this condition is generated during the rule learning process as well. 4ft rules offer a more expressive syntax for Ant, Suc and eventually Cond. If, e.g., the analyzed data contain attribute
4ft rules offer more types of relations between Ant and Suc, as shown in Table 2; we can search not only for implications (based on standard definitions of support and confidence of a rule), but also for equivalences or statistically based relations.
When generating a rule, the system starts to generate an antecedent (as a conjunction of cedents) and then generates all possible succedents to this antecedent. If the condition should occur in the rules as well, it is generated after fixing the antecedent and succedent parts of the rule. The generating of conjunctions proceeds in a depth-first way. The user-given parameters are:
the definition of partial cedents for Ant, Suc and Cond respectively; each partial cedent is defied by the type of formula (conjunction/disjunction), min. and max. number of literals, and a list of possible literals (each literal is defined by the corresponding attribute, type of the coefficient, minimal and maximal length of the coefficient, specification if the literal should be used without negation, with negation or both with/without negation and specification if the literal is basic – i.e., if it must occur in the cedent or if the literal is the so-called remaining one), the maximal length of Ant, Suc and Cond, a type of relation other task parameters, e.g., how to handle missing values.
To simplify the parameter setting, reasonable default values are assigned to these parameters. So the default settings for 4ft-Miner correspond to the standard association rules (the quantifier is set to a founded implication with parameter
When using 4ft-Miner for concept description, we should prefer the relations expressing implications with the meaning “IF examples have some values of input attributes, THEN they belong to concept C”. So we should choose from among the so-called implicational quantifiers. A quantifier is implicational iff
is true for all fourfold contingency tables (
When looking at the specifications for Ant, Suc and Cond, we get:
Cond need not be specified
Suc should be the concept specification, i.e., the corresponding value of the target attribute we should decide about the complexity of Ant; here the simplest case is to use only distinct values of input attributes
The experiments described in this paper were carried out using the founded implication quantifier and using only positive literals with a coefficient of length 1. Figure 2 shows a screenshot of the input pane with the input parameters for the 4ft-Miner procedure running on the Monk1 data. Here Ant of the generated rules consists of a conjunction of attribute-value pairs (categories) created from all available input attributes (the length of the conjunction is not restricted), Suc is set to the target concept Class(+), Cond is not used, the relation between Ant and Suc is a founded implication (this corresponds to the “standard” association rule) with minsup
Screenshot with input setting for 4ft-Miner applied to Monk1 data.
The inspiration of our method comes from the area of meta-learning. Meta learning is a subfield of machine learning where automatic learning algorithms are applied to meta-data about machine learning experiments. The most widely used approaches to meta-learning (or combining classifiers) are bagging, boosting and stacking [5]. In bagging, each classifier in the ensemble votes with equal weight when classifying a new example; in order to promote model variance, the bagging trains each classifier in the ensemble using a randomly-drawn subset of the training set. In boosting, the ensemble of classifiers is built incrementally by training each new classifier to emphasize the training instances that previous classifiers miss-classified. In stacking, a meta-classifier is built on top of the results of the so-called “base” classifiers that are each separately trained to classify the data.
We propose application of the association rule mining algorithm to the set of “original” association rules obtained as a result of a particular data mining task. This idea thus follows the stacking concept, used to combine classifiers, which however has not been presented yet for descriptive tasks. The input to the proposed meta-learning step will comprise association rules, encoded in a way suitable for the association rule mining algorithm; the result will be a set of association meta-rules uncovering relations between various characteristics in the original set of rules.
We introduced two types of association meta-rules: qualitative and quantitative, and frequent cedents (as an analogy to frequent itemsets) in [8]. Qualitative meta-rules represent the meta-knowledge in the form “if original association rules contain a conjunction Ant, then they also contain the conjunction Suc”, i.e., qualitative rules have the form
Quantitative meta-rules represent the meta-knowledge in the form “if original association rules contain a conjunction Ant, then they have quantitative characteristics
Frequent cedents represent the meta-knowledge in the form “conjunction Ant frequently occurs in the original association rules”. With respect to the concept description task, these cedents represent a meta-knowledge about frequent co-occurrence of specific categories in the concept description.
When using the meta-rules and frequent cedents to condense the concept description based on association rules, we get:
qualitative meta rules should be interpreted as “if the concept is described using literals from Ant, it is also described using literals from Suc”, quantitative meta rules should be interpreted as “if the concept is described using literals from Ant, then this description is “strong” in the sense of values of quantitative characteristics from Q”, frequent cedents should be interpreted as “the concept is frequently described using literals from Ant”.
Encoding of the original rules as data (for the meta-learning step) is the key problem in our approach. We propose four different representation schemes in the original paper [8]. Ant and Suc can be encoded using either (1) binary attributes, where each attribute represents one possible literal, or (2) using the attributes from the original data set. In both cases we can (or need not) also consider whether the literal occurs in Ant or Suc. We can thus consider four different representation schemes. So, to encode an example rule
when using the encoding based on binary attributes without distinguishing between Ant and Suc, this rule will be represented using the categories Body_o(true), Head_o(true), and Class_+(true) when using the encoding based on original attributes without distinguishing between Ant and Suc, this rule will be represented using the categories Body(o), Head(o), and Class(+) when using the encoding based on binary attributes and distinguishing between Ant and Suc, this rule will be represented using the categories Ant_Body_o(true), Ant_Head_o(true), and Suc_Head_o(true) when using the encoding based on original attributes and distinguishing between Ant and Suc, this rule will be represented using the categories Ant_Body(o), Ant_Head(o), and Suc_Class(+).
In all four examples above, the notation
The representation schemes mentioned above can directly be used in situations when literals consist only of attribute-value pairs, i.e., for “standard” association rules. When using the richer syntax of cedents as defined in LISp-Miner, we have to transform the resulting rules into rules containing only attribute-value pairs. To do this, we first transform a negative literal into positive one by creating a coefficient (set of values) from all values not occurring in the negative literal. Then we change a rule containing a literal with set of values into a set of rules with a single category using the following tautology:
So, e.g., a 4ft rule
must be first transformed (by transforming the negative literal
assuming that the possible values for attribute Head are {r,o,s}, and then each literal containing a list of such values must be transformed into literals containing a single value. After this step, the newly created rules are
Such transformation of original 4ft rules can only be applied when searching for qualitative meta-rules and frequent cedents as there is no straightforward way to decompose the quantitative characteristics of the original 4ft rule (e.g., support or confidence) into quantitative characteristics of the transformed rules.
Another open question concerning the representation of a rule is whether categories not occurring in the original rule should be treated as missing or as “negative” ones. In the first approach, attributes not used in the rule will be encoded using a missing value code. In the second approach, when using the binary representation, categories not used in the rule will get the value “false”, and when using the original attributes, categories not used in the rule will get a new special value interpreted as “not used”. Our initial experiments show that using a missing value code is more suitable as it will prevent the meta-learning step from generating a great number of meta-rules about non-occurrence of literals in the original rules. This option also corresponds to the original notion of association rules where only items that do occur in the market baskets are taken into consideration.
In the experiments reported in this paper, we:
Search only for qualitative meta-rules and frequent cedents. Represent a rule using the original attributes. This formally leads to the same structure of data table as for the original data (i.e., the table representing the association rules has the same columns as the table representing the data). Example 4ft rules for Monk1 data
Example 4ft rules for Monk1 data encoded as data
Used a missing value code to represent categories not occurring in the original rule. This option is more suitable as it will prevent the meta-learning step from generating a great number of meta-rules about non-occurrence of literals in the original rules; this option also corresponds to the original notion of association rules where only items that do occur in the market baskets are taken into consideration.
Our approach is not limited to binary classification problems where the data can be considered as examples or counter-examples of a single concept. We can use this approach to data containing examples of an arbitrary number of concepts (such data is represented by the Iris dataset in our experiments reported in Section 4). In such a case, when creating the association meta-rules using LISp-Miner we can set the condition Cond to take the values of the target attribute and thus obtain meta-rules only from the original rules covering a specific concept.
We will illustrate the concept of association meta-rules using the Monk1 dataset from the UCI repository [24]. This dataset consists of 123 examples and following six attributes: head_shape, body_shape, smile, holding, jacket_color, tie (these attributes are the input ones), and class (this is a binary target attribute). Running 4ft-Miner with the input parameters maxlenA
Example 4ft meta-rules for Monk1 data
Example frequent cedents for Monk1 data
Notice that, among the meta-rules and cedents, there are rules with syntax similar to the original association rules. But their interpretation is of course different. Let us compare the association rule
the meta-rule
and the frequent cedent
The association rule says that the concept Class(+) can be described using the conjunction Body(o)
We can also see a significant “compression” of the list of the built association rules; while we obtained 34 4ft rules, we have only 14 4ft meta-rules (this represents a reduction to 41% of the number of association rules) and 13 frequent cedents (this represents a reduction to 38% of the number of association rules). We use these numbers to evaluate the results for other data as presented in Sections 5 and 6 as well.
Basic characteristics of the used UCI benchmark data
Results on UCI benchmark data
Our initial experiments were carried out using some benchmark data sets from the UCI Machine Learning Repository [24]. The characteristics of the data (number of examples, number of attributes and number of concepts) are shown in Table 7. Table 8 summarizes the results. The first column in this table shows the numbers of 4ft rules that were created in the first step. Here we were looking for strong concept descriptions, so we set the parameters minsup
Relation between relative number of meta-rules and frequent cedents on UCI data.
The chart in Fig. 3 shows the relationship between the relative number of meta-rules (horizontal axis) and the relative number of cedents (vertical axis). We can see that the experimental results can be divided into two groups. The percentage of both meta-rules and cedents is very low (below 20% for meta-rules and below 10% for cendents) in the first group, the percentage of meta/rules and cedents is close to 50% in the second group. The percentage of meta-rules is slightly higher than the percentage of frequent cedens in all the experiments. The correlation between these two measures is 0.9174.
While we were interested only in quantitative assessment of our method when carrying out the experiments on the UCI benchmark data, here we will also assess the results qualitatively, by interpreting the found meta-rules. To do this, we will use data for which domain experts are available.
In the early 1970s, a project of extensive epidemiological study of atherosclerosis primary prevention was developed under the name “National Preventive Multifactor Study of Heart Attacks and Strokes” in the former Czechoslovakia. The aims of the study were:
to identify atherosclerosis risk factors prevalence in the population considered to be the most endangered by possible atherosclerosis complications (i.e., middle-aged men), to follow the development of these risk factors and their impact on the examined men’s health, especially with respect to atherosclerotic cardiovascular diseases (CVD), to study the impact of complex risk factors intervention on their development and cardiovascular morbidity and mortality, 10–12 years into the study, to compare risk factors profile and health of selected men who originally did not show any atherosclerosis risk factors with a group of men showing risk factors from the beginning of the study.
Atherosclerosis is a slow, complex disease that typically starts in childhood and often progresses when people grow older. In some people it progresses rapidly, even in their third decade of age. Many scientists think it begins with damage to the innermost layer of the artery. Atherosclerosis involves the slow buildup of deposits of fatty substances, cholesterol, body cellular waste products, calcium, and fibrin (a clotting material in the blood) in the inside lining of an artery. The buildup (referred as a plaque) with the formation of the blood clot (thrombus) on the surface of the plaque can partially or totally block the flow of blood through the artery. If either of these events occurs and blocks the entire artery, a heart attack, stroke or other life-threatening events may result. Research shows the benefits of reducing the controllable risk factors for atherosclerosis: high blood cholesterol (level of LDL cholesterol over 100 mg/dL), cigarette smoking and exposure to tobacco smoke, high blood pressure (blood pressure over 140/90 mm Hg), diabetes mellitus, obesity (BMI over 25), or physical inactivity.
The study included data of 1417 men born between 1926–1937 and living in the center of Prague. The men were divided according to the presence of risk factors (RF), overall health conditions and ECG result into the following three groups: normal (a group of men showing no RF defined above), risk (group of men with at least one RF defined above) and pathological (group of men with a manifested cardiovascular disease). Long-term observation of patients was based on following the men from the normal and risk groups (randomly divided into intervened risk group – RGI and control risk group – RGC). The men from the pathological group were excluded from further observation. Table 9 shows the distribution of men in the initial groups.
Number of men in different groups
STULONG is the data set concerning this longitudinal study of the risk factors of the atherosclerosis. Four data files have been created when transforming the collected data into electronic form1
The study was realized at the 2nd Department of Medicine, 1st Faculty of Medicine of Charles University and Charles University Hospital, U nemocnice 2, Prague 2 (head. Prof. M. Aschermann, MD, SDr, FESC), under the supervision of Prof. F. Boudník, MD, ScD, with collaboration of M. Tomečková, MD, PhD and Ass. Prof. J. Bultas, MD, PhD. The data were transferred to the electronic form by the European Centre of Medical Informatics, Statistics and Epidemiology of Charles University and Academy of Sciences (head. Prof. RNDr. J. Zvárová, DrSc).
the file ENTRY consists of a data table of 1417 objects (patients) and values of 224 attributes obtained from entry examinations, the file CONTROL contains results of observation of 66 attributes recorded during the follow-up control examinations that took almost 20 years (10572 records), the file LETTER contains some additional information (values of 62 attributes) about the health status of 403 men that was collected by a postal questionnaire, the file DEATH contains data about causes of death of 389 patients who died during the study (values of 5 attributes).
Summary of the ENTRY data table
We use only the file ENTRY to test our approach to create comprehensive concept description. Table 10 shows the summary of the attributes presented in this file. This data has been analyzed using LISp-Miner with the aim to find interesting relationships between attributes from different groups [9]. Now we are interested in concept description, where the concept to be described is the group a man belongs to, i.e., normal group, risk group or pathological group. So we run the 4ft-Miner procedure repeatedly for different concepts set to be the succedent of the rules and we allow antecedent to be composed of any of the remaining attributes. We however restrict the maximal length of antecedent to 4, and the minimal confidence to vary between 0.8 and 0.9. We then run the subsequent meta learning step on the found rules. Table 11 summarizes the results in the same way as described in Section 4. But we also try to interpret the created meta-rules.
Results on ENTRY data table
The most frequently occurring cedents for the pathological group are related to long-term smoking, high diastolic blood pressure, diabetes, and chest pain of angina pectoris type. Surprisingly, ictus was typically not diagnosed in this group. If an original 4ft rule refers to chest pain of angina pectoris type, it also refers to high diastolic blood pressure, or to moderate physical activity or to moderate alcohol consumption.
The most frequently occurring cedents for the risky group are related to a high level of cholesterol and heavy smoking, while asthma, diabetes or myocardial infarct was typically not diagnosed. Interestingly, normal values of chest pain and asthma are related to a high level of cholesterol in the original 4ft rules.
The most frequently occurring cedents for the normal group are related to low diastolic blood pressure, low systolic blood pressure, short time smoking or non-smoking, university education, no hypertension, no hyperlipidemia, no ictus or no myocardial infarction. If an original 4ft rule refers to no smoking or short time smoking it also refers to low diastolic pressure or university education or no hypertension.
In association rule mining, the interpretation of the found rules can be very frustrating for the domain experts, as they have to inspect and consider every rule. Hence, different post-processing techniques have been proposed to simplify this step. The post-processing typically has the form of filtering out redundant rules or “merging” rules into more general ones. The post-processing can either be based only on the found association rules or can exploit some domain knowledge. We present a novel method for post-processing association rules that were created for concept description. Our approach is based on the idea of meta-learning: a subsequent association rule mining step is applied to the results of “standard” association rule mining. So the rules found during the first step are used as data for subsequent learning. Doing this, we obtain “rules about rules” that, in a condensed form, represent the knowledge found using the association rules generated in the first step. No domain knowledge is used in the post-processing itself; the participation of domain expert is of course crucial for interpreting the post-processing results but the meta-learning runs without any intervention of the expert.
We carried out several experiments on both benchmark and real data sets. We focused on qualitative meta-rules and frequent cedents as we think that these types of meta-rules can be more informative for the domain experts. The reported experiments support our working hypothesis that the number of the meta-rules will be significantly smaller than the number of original rules. Thus the interpretation of the meta-rules by domain expert will be significantly less time-consuming and less difficult compared to the interpretation of the original association rules. Additionally, the meta-rules can give the user a summarized interpretation of the original rules. Our experimental results also show that there is no straightforward relationship between the size of the analyzed data (number of objects, number of attributes), the number of 4ft rules and the number of 4ft meta-rules. High relative size of the set of 4ft meta-rules just means low redundancy in the set of 4ft rules, and low relative size of the set of 4ft meta-rules means high redundancy in the set of 4ft rules. Moreover, low redundancy in the set of 4ft rules is usually related to a smaller size of this set of rules. Anyway, the correlation was high between the reduction rate for the number of 4ft meta-rules and that for frequent cedents.
Our future work will focus on other types of rules (more complex syntax of cedents, and other types of relationship between Ant and Suc) that can generated by the 4ft-Miner procedure for mining for concept descriptions both during the standard mining step and the meta-learning step.
Footnotes
Acknowledgments
This paper was prepared with the support of Institutional Funds for Support of Long-Term Development of Science and Research at the Faculty of Informatics and Statistics of University of Economics, Prague.
