Abstract
In this research, we succeeded in introducing a new reasoning procedure which applies interval type-2 fuzzy sets into a rule induction process. Our proposal allows information granulation which resulted in achieving good experimental results. We introduced decision tables with elements assumed as interval type-2 fuzzy sets which greatly generalize information. Next, by applying corresponding rule induction procedure, we introduced the possibility to generate directly from a benchmark data fuzzy rulebases for type-2 fuzzy inference models. We strongly believe that our reasoning approach will be a proper solution for different research issues such as classification or ranking procedures as well as determining knowledge for fuzzy inference models. The method proposed was tested in a classification problem verified by using medical benchmark data.
Keywords
Introduction
As a general research problem, there has been introduced a number of fuzzy reasoning approaches which aim to solve different practical tasks and data analysis issues. The fuzzy techniques, applied in various applications might be divided into two basic groups: those related to the classical type-1 fuzzy sets, introduced by Zadeh [32] and those related to new concepts, concerning the type-2 fuzzy sets [12, 13]. In practice, the interval type-2 fuzzy sets [11] which are special case of the type-2 fuzzy sets, are commonly used for their reduced computational cost. What more, in many research publications, it is demonstrated that interval type-2 fuzzy concepts are better able to handle uncertainties than type-1 fuzzy approaches [2, 26]. Interval type-2 fuzzy sets are especially useful when it is difficult to determine the exact membership functions of the fuzzy sets applied.
On the other hand, in order to deal with information issues, data discovery techniques are widely used. Data discovery is considered as the computational process of discovering patterns in data and allow to induce knowledge. Information systems and rough sets [17–19] are applied in order to represent knowledge. The combination of fuzzy techniques and data discovery seems very attractive for analysis of ‘real-world’ data. Very important applications of these theories combined may be found, for example in the very important field of medicine, concerning breast cancer diagnosis [16].
Distinguished from existing approaches, in our research we aimed to apply type-2 fuzzy sets into a data discovery procedure, by corresponding information granulation. Information granulation, combined with fuzzy sets, have already been introduced as well [33]. For instance, in [20] feature analysis through information granulation and fuzzy sets was introduced, where features are considered to be granular than numeric.
Our approach differs by assuming that values of decision tables may be interpreted as fuzzy sets, which allows to generate fuzzy rulebase by data mining. Next, a proposition of transformation into interval type-2 fuzzy sets is given, which allows to apply type-2 fuzzy sets based reasoning.
The paper is organized as follows: in Section 2, the preliminaries of type-2 fuzzy sets and Pawlak’s information systems based data discovery are briefly explained, in Section 3 a combination of data exploration and type-2 fuzzy sets is proposed, in Section 4 some experiments are carried out and finally, Sections 5 and 6 draw discussion and conclusions.
This paper is an extended version of recent research [24], published on the ICCCI’2018 international conference proceedings. We have extended significantly our work by providing general methodology which is not limited to a specific application and adding research and experiments with interval type-2 fuzzy sets.
Theoretical background
In this section, the of fuzzy type 2 reasoning [12, 29] and rule induction based on Pawlak’s information systems [21], are briefly explained.
Interval type 2 fuzzy sets
A type-2 fuzzy set, denoted
Currently, interval type-2 (IT2) fuzzy sets [12] as a special case of type-2 fuzzy sets, are the most widely used because of their acceptable computational complexity and easy interpretation. In this case, the amplitude of
Uncertainty about
The size of an FOU (the corresponding surface) is directly related to the uncertainty that is conveyed by an interval type-2 fuzzy set and what follows, an FOU with more area is more uncertain than one with less area.
The upper membership function and lower membership function of
An illustration of the above definitions, is given in Fig. 1.

An interval type-2 fuzzy set

Information flow within an IT2 fuzzy controller.
Figure 2 shows the information flow within an IT2 fuzzy system (fuzzy controller as a classic example). It is very similar to its type-1 analogue. The major difference or specific is that because of the IT2 fuzzy sets used in the rulebase, the outputs of the inference engine are IT2 fuzzy sets, and a type reducer [8, 11] must be applied to convert them into a type-1 fuzzy sets in order to enable defuzzification procedure.
Below, we give a brief description of the basic steps of the computations in an IT2 fuzzy system [29]. Let consider the rulebase of an IT2 fuzzy system consisting of N rules taking the following form:
where
Assuming an input vector Compute the membership intervals of Compute the firing interval of the n
th
rule:
The y
l
and y
r
may be computed by the Karnik-Mendel algorithms [8, 11] or their variants [27, 28]. Compute the defuzzified output as:
In the eighties and nineties of the last century Zdzislaw Pawlak introduced the fundamentals of information systems [17] and rough sets [18]. An information system (IS) in defined by the following elements:
The above approximations define a rough set. The concept of rough sets has been extended in the direction of fuzzy theory as well, by introducing fuzzy rough sets [3, 15] with many applications.
Information systems can be interpreted as a decision tables if a decision attribute is introduced. With this assumption, a decision making approach was introduced by A. Skowron and Z. Suraj [21], which generates a set of rules for any decision attribute value. The procedure consists of the following steps: Define an information system with decision attribute, Eliminate object ‘conflicts’ (i.e. objects with same information function values, but different decision values) by applying lower or upper approximation precision analysis, Provide attribute reduct using discernibility matrix, Apply rule induction algorithm on the so defined new information system (completing step 2 and 3) and thus, define set of rules for each decision attribute value, which correctly cover the decision problem.
Below, the above procedures are explained in more details and with corresponding examples, starting with step 2.
For an input decision table DT = (U, A ∪ { a* } , V ∪ Va*, f), where a* denotes the decision attribute, the considered rule induction procedure consists of the following steps:
Step 2. Eliminate object conflicts
Let X ⊆ U (X ≠ Ø) and B ⊆ A (B ≠ Ø) The lower approximation precision of the set X regarding to the subset B is defined as follows:
Algorithm 1:
Input: DT
Output: DT \{objects which cause conflicts}
Let x
i
and x
j
are in ‘conflict’ (x
i
, x
j
∈ U) and let f (x
i
, a*) = d1, f (x
j
, a*) = d2, then: Define X1 =
df
{x ∈ U : f (x, a*) = d1} , X2 =
df
{x ∈ U : f (x, a*) = d2} , (X1, X2 ⊆ U) Calculate γ
B
(X1) and γ
B
(X2) (B ⊆ A), next eliminate the object that is member of the set with the lower approximation precision.
Step 3. Define corresponding reduct over the set of attributes
Calculate the so called attribute reduct, which is defined as the set of the most distinguishing attributes in DT. The reduct calculation is an interesting problem itself with proposed solutions [5]. In our research, for the simplicity of calculations, we used a greedy approach which simply count the number of the most distinguishing attributes in DT (see algorithm 2, below). But, in order to apply it, first we need to calculate the discernibility matrix M(DT) of the information system:
Algorithm 2:
Input: M(DT)
Output: Reduct ⊆ A
while (∃[mi,j]∈M(DT) [mi,j]≠ Ø) do
{ Count the number of appearance of each attribute in matrix M(DT), Select the attribute with the highest number of appearance, Set the attribute selected in step 2 as an element of the Reduct, Erase all elements of M(DT) that include the chosen attribute.
}
Exception – 2):
If multiple attributes have the highest number of appearance, then select randomly one of them.
Step 4. Rule induction
Generate the set of decision rules that correctly cover the decision problem (see algorithm 3, below).
Algorithm 3:
Input: DT
Output: Corresponding Decision Rules
To generate the optimal set of rules over a decision table, the following steps should be taken:
Generate corresponding matrixes M k (k = 1, . . . , n ; n = |U|) . The matrixes are constructed using M(DT) and are used to define the so called ‘implicants’ for each object,
Define the object implicants,
Define the target set of decision rules.
Presenting the first point of algorithm 3 in more detail:
Let c
ij
are the elements of M(SI),
For each k = 1, . . . , n :
If i ≠ k then
If (c
kj
≠ Ø) and (
where B ⊆ A and
For clarity of the presentation, a simple example of the above algorithm 3 is considered below.
Example 1:
Let consider the following decision table (extended with the corresponding
The corresponding discernibility matrix (which is symmetric) takes the form:
Next, by applying Algorithm 3 and considering point 1, we can generate the matrix M1 (k = 1) :
as:
Next, we can determine the set of ‘objects implicants’ from each matrix:
Implicant 1: derived from M1 i.e. related to object x1 : x1 ⇒ (a ∨ c) ∧ a
We may simplify this implication, by applying corresponding Boolean algebra reduction rules: x ∧ x = x = x ∨ x, x ∧ (x ∨ y) = x and x (x ∧ y) = x .
Therefore, implicant 1:x1 ⇒ a,
Implicant 2: derived from M2 : X2 ⇒ c,
Implicant 3: derived from M3 : X3 ⇒ a,
Implicant 4: derived from M4 : X4 ⇒ a ∧ c,
Implicant 5: derived from M5 : X5 ⇒ a ∨ b,
The object implicants can be considered as indications concerning which attributes are strongly related to which objects.
Finally, using the implicants defined we can generate the target set of rules, derived from the considered decision table. Each rule concerns one decision value and it is defined as a sum of the implicants of all objects related to that decision, i.e. - concerning decision value ‘0’, we have: f (x1, a*) = f (x3, a*) = f (x5, a*) = 0) , and therefore:
Rule1 = df.f (x1, a) ∨ f (x3, a) ∨ (f (x5, a) ∨ f (x5, b)) ⇒ (decision : 0) ,
by analogy:
Rule2 = df.f (x2, c) ⇒ (decision : 1) ,
Rule3 = df.f (x4, a) ∨ f (x4, c) ⇒ (decision : 2).
The only disadvantage with this decision making approach is that the induced rules are very crisp, i.e. rule2 is strictly interpreted as follows: if object of x2 type has information function value for the attribute ‘c’ exactly equal to ‘0’ then make decision ‘1’.
The above-mentioned disadvantage of the rule induction process lays in the basics of the re-search proposed. It is enough to involve fuzzy sets in the process of rule induction and thus, to achieve information generalisation. Therefore, the new proposal is to modify the information function introduced, by defining all attribute values as fuzzy sets which actually is information granulation. What more, for any numerical attribute the basic fuzzy sets low, medium and high can be defined by using directly the data assuming Gaussian type distributions.
Therefore, for any attribute
With these assumptions, an exemplary fuzzy information system could take the following form:
Fuzzy information system decision table
with the interpretation of the information function for example pair object/attribute (object1 and attribute1) as follows: f fuzzy (object1, attribute1)=low, meaning f(object1, attribute1) has the highest degree of membership to the fuzzy set low, defined over the domain of the attribute1.
The decision attribute values can be defined as fuzzy sets as well, representing the degree of decision. So, under this interpretation of a fuzzy information system, the rules induced are very simple to be transformed into fuzzy rules.
For example, if a rule is induced as follows:
f(object1,attribute1) ∧ f(object2,attribute1) ⇒
decision: D1,
it can be naturally transformed into the fuzzy rule:
IF (f(object1,attribute1) is ‘small’) ⊗ (f(object2,attribute1) is ‘medium’) THEN (decision is D1).
Example 2:
Suppose that we want to sort a set of images with respect to a certain colour criteria – a set of violet/blue images of flowers with respect to the degree of ‘violet colour’. Let assume that the fuzzy term ‘very violet’ corresponds to ‘dark-blue’ colours and the fuzzy term ‘low violet’ corresponds to ‘bright-violet’ colours. The goal is to find the most ‘violet’ images. Under these assumptions, the below decision table can be introduced for a certain learning set (Table 2).
fuzzy values
F1 (Feature1)=mean(S), HSV colour model. F2 (Feature2)=mean(R), RGB colour model. F3 (Feature3)=mean(G), RGB colour model. F4 (Feature4)=mean(mean(|B-R|), RGB colour model.
Considering the values of the information function, for instance: f(image1, feature1)=‘medium’, since:
This assumption along with the fuzzification concept applied (see Section 2.4) implies a very important property of our method. Namely, the decision table is filled with values without the need of expert knowledge, relying only on the dataset used.
Next, by applying the concept proposed combining rule induction and fuzzy reasoning, it is possible to generate numerical values in accordance with the Mamdani inference model [10]. The values correspond to the input recognition criteria (see Table 3).
In order to apply type-2 fuzzy sets in the presented methodology, a very simple extension is proposed. After the generation of type-1 rulebase from an introduced decision table, we change the fuzzification functions used, by modifying the standard deviation parameter of the ‘medium’ fuzzy set and therefore, we define the bounds of the FOU for the corresponding type-2 fuzzy set
Achieved results for a sample test set
Achieved results for a sample test set
(a) Type-1 fuzzification proposal. (b) Type-2 fuzzification proposal.
Using the above extension to type-2 fuzzy sets, we are able combine the rule induction mechanism introduced, with type-2 fuzzy inference.
Therefore, we gain the possibility to develop computer systems embedded reasoning mechanism, starting from data ordered in a decision table and implemented as decision support systems, fuzzy controllers or ranking systems, based on fuzzy type-2 inference.
We have already applied the combination of rule induction with fuzzy sets of type 1 with respect to the main concept proposed in various applications with success. In [23] a decision support system for histopathological diagnosis of HER2 type (human epidermal growth factor receptor 2) breast cancer using Pawlak’s information system and Mamadani type fuzzy control was proposed. The main idea of the application in medical image domain was very similar to example 2, providing the degree of HER2 overexpression, by analyzing histopathology images which is very important information in the cancer therapy.
Recently, the same idea was used to build a recommender system [24] using completely different type of data (not image data) and different goal. The application aimed to predict the degree of subjective customer satisfaction with respect to his previous reviews.
All this proves the generality of the method proposed.
The proposed extension to interval type 2 fuzzy sets and reasoning in this paper was applied on medical data in a classification problem. The Breast Cancer Wisconsin Data Set from UCI repository 1 has been used to test the accuracy of our method.
Benchmark data description
The Breast Cancer Wisconsin (Diagnostic) is a well-known dataset provided by the University of California Irvine (UCI) Machine Learning Repository. The dataset consists of 699 data instances with 65.5% classified as benign and 34.5% as malignant. The classification is based in FNA (Fine Needle Aspirates) test. From the test nine different visually attributes were extracted and assigned an integer value between 1 and 10. Table 4 shows the attributes information of the dataset.
Description of the benchmark attributes
Description of the benchmark attributes

Steepness and midpoint parameters changes.
This dataset presents us a binary classification problem: classify a presented case as to whether it is benign or malignant. There has been a lot of research with this dataset and with good results. B. Fatima and Ch. M. Amine [4] reached 98,25% of accuracy using a neuro-fuzzy inference approach. They combined neural network and fuzzy theory to present a hybrid learning algorithm. T. Sridevi and A. Murugan [22] presented an Intelligent Classifier based on K-Means Clustering and Rough Set. They applied rough set theory in combination of K-means algorithm to find the minimal optimal reduct subset of attributes. The classification process by means of SVM was performed by using these features and the best accuracy they obtained was 99%. A. Onan [16] presented a fuzzy-rough nearest neighbour classifier combined with consistency-based subset evaluation and instance selection for automated diagnosis of breast cancer. It used fuzzy-rough instance selection to eliminate those instances that are considered useless or erroneous. And in the classification phase the fuzzy-rough nearest algorithm was used.
Our classification approach consists of three main phase: Fuzzyfication of the input dataset. We fuzzified the values of the attributes regarding three different fuzzy sets: low, medium and high (see Section 2.4 and 2.5). Pawlak’s information system. We create decision table with values defined as interval type 2 fuzzy sets. Next, using corresponding methodology assumptions of Pawlak’s information systems and rule induction procedure, automatically define rules to be applied by our inference mechanism (see Section 2.3). Type-2 fuzzy inference mechanism. Rules extracted are considered over type-2 fuzzy sets and Karnik-Mandel algorithm is used to obtain a crisp output (see Section 2.2). Then a corresponding threshold value (see Table 5 below) is applied to classify between benign or malignant instances.
The clue of the decision support method presented in this publication, is our proposition to combine rule induction based on Pawlak’s information systems with Type-2 fuzzy control mechanism. This process gives us a numerical output that can be interpreted by a decision making support system. A threshold value is chosen to decide whether a tumour is benign or malignant (i.e. IF threshold < = output then benign, else malignant).
Having the method’s specification clarified, the important issue here is to define the rule consequences in the rulebase used, as type-1 fuzzy sets according to the Mamdani concept. The function used to model our conclusions is the logistic function, a common sigmoid curve defined as:
The above figure shows in blue the function used to represent the conclusion ‘high aggressiveness’ and in red the conclusion ‘low aggressiveness’ as well. In order to choose appropriate function parameters, a corresponding data analysis experiment has been performed. We have split the benchmark dataset in Learning and Validation data set, 80% and 20% respectively. Different computations testing steepness from 0.2 to 2.0 with step size of 0.2 and midpoint from -4 to 3 with step size 1 were performed. Parameters with best results were used for the next experiments, in our case: steepness = 1 and midpoint = 2.
The computation of the needed Y
n
intervals (see Section 2.2, Equation 6) which define the corresponding rule consequences, was performed by applying the inverse functions of the above logistic functions (the logit functions), i.e.:
Once we have intervals that define the rule consequences and corresponding type-2 fuzzification of the data used (see Fig. 3 (b)), the Karnik-Mendel algorithm (see Section 2.2, Equation 12) is applied to compute crisp outputs.
In order to provide proper experiments, we have splitted the set randomly into two parts: Benchmark dataset: It is used to perform a 10-fold cross validation (see Table 5). It contains 70% of the instances. Here, the training is aimed to find the best threshold values and best (in terms of the introduced fuzzy rule induction process – see Sections 2.4 and 3) rulebase. Test dataset: it is the remaining 30% of the dataset. The test set was never used during the training phase as it simulates the ‘real world data’ and should give us a good understanding of how well your algorithm will do in the real world.
Threshold value and rules from the best results in Benchmark dataset are used by our inference mechanism.
The accuracy average of the 10-fold cross validation over the Benchmark dataset is
Finally, using rules extracted from fold 4 that has 100% of accuracy and using also threshold value from this fold (which additionally does not deviate much from the average value of all folds), the test dataset was applied. Table 7 shows the results obtained in this case.
Notice, that an adjustment of the threshold value can decrease or increase PPV and NPV. If we decrease the threshold value, we may increase NPV and probably decrease PPV. And the opposite would happen if we increase the threshold value.
For experimental purposes we have decided to use the same threshold value as in our best result. However, in a real case an expert could decide whether to increase or decrease this value.
10-fold Cross Validation over Benchmark dataset
10-fold Cross Validation over Benchmark dataset
Comparison with other classification methods
Results obtained with the test dataset
The main advantages of our proposal are: combination of type-2 fuzzy set reasoning with rule induction procedure and a fuzzification procedure which define decision tables directly from data without the need of expert knowledge. This allows generating rulebase of type-2 fuzzy rules in automatic manner. By defining decision table values as fuzzy sets, we achieve information granulation which implicates generalizations. All these improvements allow the usage of our proposal in variety of applications based on ‘real world’ data analysis, such as: classification, ranking procedures and fuzzy control.
However, there are some disadvantages which show the directions of further research: The data fuzzification proposed, certainly will not give good results for any data. We assume normal distribution of the features, which will not be preserved in some cases. We expect good performance in biomedical domains, because the nature of natural processes often corresponds to Gaussian type distribution. Nevertheless, it should be noted that we are applying this assumption just to define membership functions. These functions don’t have to be related to the real specifics of the data distribution. What more, if the data distribution should be considered it won’t change the common assumptions of our proposal. It will impose only the necessity of more data dependent defining of the fuzzy sets applied. The method proposed, does not relieve from the requirement to define the appropriate rule consequences. Therefore, optimization procedures for a certain data, in order to investigate corresponding parameters should be applied. Nevertheless, it should be noted as well that the last issue is a common problem in all fuzzy reasoning models.
Conclusions
In this research, a new reasoning approach is presented. The proposal combines data discovery and interval type-2 fuzzy sets. A corresponding data fuzzification procedure is introduced as well. Our proposal may be applied in various applications which use ‘a real world’ data characterized by incomplete, inexact, uncertain or vague information. The applied fuzzy sets, used as values in decision tables, provide information granulation. The data exploration applied induces fuzzy rules. Additionally, a transformation of the rulebase induced into type-2 fuzzy rules which allow to apply type-2 inference and to improve achieved results, is introduced as well. The concept proposed, should be considered as a common approach and may find applications in classification and ranking procedures as well as in fuzzy control. The results obtained from the experiments over medical benchmark data, demonstrate the quality of the concept.
