Flow graphs as data structures for inducing classifiers

Abstract

This paper describes an empirical research work based on the use of a suitable data structure, named Flow Graph (FG), that can be induced from a supervised training data set. A FG can be approached as a weighted and labeled digraph that summarizes a given supervised training set, aiming at its analysis. FGs can also be used as a repository of the information embedded in training sets, that supports the extraction of classification rules, aiming at the definition of classifiers. The work described in this paper reviews FGs and related concepts, as originally proposed i.e., a suitable structure for modeling discrete data, and proposes its customization for dealing with continuous data. The customization consists of a pre-processing step where a discretization process is carried out in a two-step hybrid approach named HFG (Hybrid Flow Graph). Several experiments with focus on the classifiers extracted from HFGs were conducted and their results were analyzed with focus on both, the value of some metrics associated with the induced digraph-based structure as well as the performance of the classifier extracted from the structure. For the experiments 19 diversified datasets were used and the classification results were comparatively analyzed with those obtained by classifiers induced using four other algorithms namely, J48, Naïve Bayes, k-Nearest-Neighbor and Support Vector Machine.

Keywords

Flow graphs extended flow graphs data structures supervised machine learning algorithms data discretization hybrid systems

1. Introduction

The concept of Flow Graph (FG) was proposed by Pawlak [46] as a mathematical formalism suitable for representing, exploring and analyzing some characteristics associated with a set $X=\{x_{1},x_{2},\ldots,x_{N}\}$ of $N$ supervised discrete data instances. According to Pawlak’s conception such set can be typically characterized as a training set in a supervised machine learning environment (see [18, 19, 26, 32, 39]), where each instance $x_{i}\in X(1\leqslant i\leqslant N)$ is described by values associated with a fixed set of $M$ attributes { $A_{1}$ , $A_{2}$ , …, $A_{M}$ } and, also, by a class, which is one out of $K$ available classes { $c_{1}$ , $c_{2}$ , …, $c_{K}$ }.

The formalism and the procedures for summarizing training sets as FGs aim at easing and promoting the analysis of the flow distribution of attribute values that describe the training data instances, taking into account their classes. The FG data structure also supports the extraction of decision rules, which can be used to classify new data instances that have no associated class. As stated in several works related to FGs cited in this paper, some options for representing data (e.g., those based on the Bayes rule [12]) have a probabilistic interpretation, since their results are strongly linked to probability. The results of analyses performed taking into account an induced FG structure, in turn, have a deterministic interpretation, since they effectively reflect the data flow and not just a probability related to the flow.

It cannot be forgotten, however, that like any inductive approach to data/knowledge representation, the training set has a deep impact on the suitability of FGs, when used either as a structure for data analysis or for supporting the extraction of decision rules, the latter being the focus of the work described in this paper.

Based on the available literature, it is clear that the formalism supporting the induction of original FGs and, subsequently, the extraction of decision rules from them, is not as popular as several other propositional supervised algorithms, such as those that induce Decision Trees [22, 23], Neural Networks [3, 40], Decision Rules [33], etc. Perhaps one of the reasons for the low popularity of FGs is the fact that several works found in the literature, particularly those that formally introduce the FG approach and discuss theoretical issues, restrain themselves to the use of discrete-valued data, which confines the use of FGs mostly to artificially generated data.

The research work reported in this paper is an extension of the work described in [8], where a proposal for adapting the original FG for representing and handling continuous data has been described. The proposal, named HFG (Hybrid Flow Graph), allows the use of FG structures as the source of data information for extracting classifiers in continuous domains, and has been implemented as a computational system, the EFLOWG (Extended Flow Graph), which allows the pre-processing of training data attribute values using discretization algorithms [37, 41]. This paper mainly explores the FG and HFG structures by (a) enlarging and refining the literature review on FGs, (b) extending the number of experiments conducted and presented in [8] and (c) discussing the impact (as far as classifiers are concerned), of the order in which attributes are placed in FG-based structures.

The reminder of the paper is organized as follows. Section 2 presents a brief survey of research works related to FGs. Section 3 introduces the main concepts related to FGs, as well as a high level description of how to construct such graph (actually, a digraph) to summarize a given supervised training data. Section 4 presents a small example of the process involved in creating a FG, as originally proposed i.e., from a discrete training data. Section 5 briefly presents the default discretization algorithm used as a pre-processing step for the construction of HFGs. Section 6 approaches the extraction of classifiers from FG-based structures and Section 7 describes the main technical characteristics and functionalities of the computational system EFLOWG, which was developed with the main intent of conducting experiments involving the use of HFGs as data structures for extracting classifiers. Section 8 describes a set of experiments using classifiers extracted from HFGs as well as those induced by the C4.5 [23], in its J48 version, the Naïve Bayes, the k-Nearest-Neighbor and the Support Vector Machine, all available in the Weka environment [10] (see [35, 38, 40] for details about the four algorithms). Section 9 discusses the impact of the order of the attributes that describe training instances, on induced FG-based structures. In Section 10, the last section of this paper, a few conclusions, based on what has been achieved, are presented and some suggestions for continuing the research work done so far are considered.

In the text, depending on the context, FG refers to the algorithm or the structure that represents a conventional flow graph, HFG refers to the algorithm or the structure that represents the extended flow graph and EFLOWG is the computational system that implements both algorithms, as part of a friendly computational environment for using them.

2. A brief literature review on the use of flow graphs

As mentioned before the so-called Flow Graph (FG), as proposed by Pawlak in several articles (see, for instance, [46, 47, 48, 49, 50, 51, 52, 54]), can be approached as a data structure that aims to summarize a set of supervised data instances and, at the same time, to act as a repository for the extraction of classifiers, in the context of the knowledge domain related to the data it summarizes. In this way, FGs can be considered structures that support decision algorithms and therefore, an important part of the set of Machine Learning (ML) techniques and algorithms (see [4, 18, 23, 35, 40]).

It can be found in the literature several definitions of FGs associated with concepts other than those defined by Pawlak i.e., an acyclic digraph (summary graph) that represents a set of data instances. This is the case, for example, of the concept of control flow graphs, proposed by Allen [13, 14, 15, 16], associated with a representation based on graphs, of the paths that can be traversed by the code of a program, during its execution, mainly aimed at code optimization. A variant of it, known as data flow analysis [27], has been established as a technique for evaluating and possibly ensuring the reliability of computer programs.

FGs also differ fundamentally from the flow graphs proposed by Ford and Fulkerson in [28]. The Ford-Fulkerson algorithm (or method) determines the maximum flow in a flow network, where a flow network is a directed graph in which each arc has an associated capacity and receives a flow; the flow volume in an arc cannot exceed its capacity.

As mentioned in the previous section, it is a fact that research work and empirical studies having FGs as main goal are not as widespread as those related to the most popular algorithms in the ML area, such as decision tree based algorithms and clustering algorithms, for example. Nevertheless, most FG-related articles found in the literature provide sufficient information to understand the main concepts, which are usually presented in didactic way, invariably accompanied by examples for promoting ease comprehension

In spite of not introducing the concept of FG yet, the main ideas involved in the definition of the FG structure are already embedded in the text presented in [45], where rough sets [43, 44, 53], decision algorithms and the Bayes’ theorem have been considered in relation to their relationships and where some of the examples discussed were later revisited in research works specifically related with FGs.

Since its proposal, in 2003 [46], it can be observed that the definition of FGs has evolved and has been refined over time; also, a few other concepts have been proposed and incorporated into the main structure of a FG. The work described in [46] defines a relationship between flow graphs and decision algorithms, showing that the information flow in a decision algorithm can be represented as a flow in the FG and also, that the flow is subjected to the Bayes’ formula, approached with a deterministic meaning.

The work presented in [48] is a short paper which has as main goal to stress the relationship between the Bayes’ theorem and the concept of rough sets. It also can be seen as a summary of some of the previous works by the same author, where FGs are defined and their use is exemplified. The work, however, starts from the definition of decision algorithms and discusses the relationships between decision algorithms, FGs and the Bayes’ theorem. The short paper approaches FGs as a derivative from decision rules and not the other way round, as usually can be found in later publications on FGs.

The work reported in [50], as suggested by its author, can be seen as a continuation of a series of investigations that were conducted on the relationship between decision algorithms and the Bayes’ theorem. It focuses mainly on the Lukasievics’ ideas concerning the relationship between multivalued logic, probability and the Bayes’ theorem. In spite of the focus being on decision rules, the FG formal concept is not considered in the work.

In [52] several applications of the use of FGs in various knowledge domains have been presented, aiming at approaching different aspects of the formalism as a tool for analyzing data, as summarized next.

The example by the name of Smoking and Cancer explores the probabilistic aspect of data analysis and discusses the relationship between statistical and FG-based approaches. In the example a table concerning 60 people who smoke (or not) and who have cancer (or not) is considered. The total number of people who do not have cancer is 50, and this group of people is approached divided into 40 people who are non smokers and 10 people who are smokers. The total number of people who have cancer is 10, approached divided into 7 people who are non-smokers and 3 who are smokers. The resulting FG has two layers (a) the first layer is defined by attribute smoker and has two nodes, no and yes, each associated with one the two possible values of the attribute smoker and (b) the second layer is defined by attribute cancer and has two nodes, no and yes, each associated with one of the two possible values of attribute cancer. The four arcs in the FG connect: (a) node smoker $=$ no with node cancer $=$ no, (b) node smoker $=$ no with node cancer $=$ yes, (c) node smoker $=$ yes with node cancer $=$ no and (d) node smoker $=$ yes with node cancer $=$ yes. In the example the author also drew conclusions based on the inverse FG, a concept not used in the work described in this paper.

The other seven examples discussed in the previous reference investigate: (1) the relationship between the color of eyes, the color of hair and the nationality, (2) three industrial plants that produce three different products. The plants are investigated in relation to the quality of their products being defective (or not); (3) the relationship between shape and size, as well as between size and color, in a playing block data domain is analyzed based on FG; (4) an analysis of customer preference in buying cars is conducted, based on FGs, in a scenario that considers three models of cars sold to three disjoint groups of customers by four different car dealers (also discussed in [46]); (5) a voting analysis based on FGs is conducted in a situation involving voters grouped in three disjoint groups of age, who belong to one of three different social classes and voted for one of four political parties (also discussed in [47]); (6) the analysis of a promotion campaign for a product being released in the market, based on FGs constructed using three groups of potential customers split by age (young, middle-age and old) as well by their place of residence (town, village and rural) and their answer to the question whether they would (or would not) buy the product and (7) an analysis of paint demand and supply concerning cars, based on FGs.

Considering that no investigations related to the complexity of the procedures involved in FGs had been dealt with so far, Butz and co-workers, in [2], conducted an analysis of the traditional FG inference algorithm (classifier) establishing that its time complexity is exponential with respect to the number of nodes in the FG. In the same work the authors propose a new FG-based algorithm, that exploits factorization in a FG, which has polynomial time complexity.

In [34] the authors approach rule learning by establishing ordinal prediction, based on rough sets, soft-computing and implicitly using FGs. They redefine the ordinal prediction for each decision rule, generally stated as “If $C$ is $c_{1}$ then $D$ is $d_{1}$ ” to be “If $C$ is in the range between $c_{1}$ and $c_{2}$ then D is in the range between $d_{1}$ and $d_{2}$ ”.

For exemplifying many of the concepts related to FGs the author in [54] uses data from a group of 1,000 patients put to test for evaluating the effectiveness of a certain drug. Patients are grouped according to the presence (or not) of a particular disease, their age (old, middle-age or young) and the test results (positive or negative). The FG induced from the data was then analyzed, searching for insights based on the relationships between different groups of patients.

In [42] the authors attempted to discover possible trends and needs, related to business aviation, for supporting the government to make a decision, in anticipation to an eventual deregulation in a near future. For the study a combination of rough sets, flow graphs and formal concept analysis was employed for analyzing the previous purpose. The results have shown that the combined approaches were well suited for both, the analysis of the market potential for business aviation and the needs of business aviation customers, prior to the industry’s deregulation.

The paper by Lisowski and Czyzewski [25] explores the use of FG extensions in video surveillance systems, especially in distributed multi-camera systems, for tracking objects between cameras with non-overlapping fields of vision.

Chan and Tsumoto, in their paper [1], investigate FGs induced using multiset decision tables. The concepts of rough multisets and information multisystems were introduced in [24]. Multiset decision tables are data tables represented by multisets, as defined in [6].

3. Defining and constructing FGs

The definition of flow graph (FG) involves the definition of directed graphs (or digraphs); it is important, therefore, to present both concepts, graphs and digraphs, before presenting the formal definition of FGs.

As defined in [20], a graph $G=(V,E)$ consists of two finite sets in which (1) $V\neq\emptyset$ is the set of vertices (or nodes) of the graph and (2) $E$ is the set of edges of the graph. Each edge $a\in E$ represents an unordered pair of nodes of $V$ , noted by ( $u$ , $v$ ), where $u,v\in V$ . The unordered pair representing the edge $a,a=(u,v)$ , indicates that, through $a$ , the node $v$ can be reached from node $u$ and vice-versa. It is important to note that if the set $E=\emptyset$ , the graph is called the null graph.

If, instead of unordered pairs ( $u$ , $v$ ), edges are defined by ordered pairs of nodes, noted by $\langle u,v\rangle$ , they are called directed edges (or arcs) and the structure is a particular graph named digraph. The arc $a=\langle u,v\rangle$ indicates that $a$ has its origin at node $u$ and destination at node $v$ , but not vice versa. When modeling data using graphs, it is often necessary to assign values to both, nodes and edges (arcs). Particularly, when values are assigned to edges (arcs), such graphs (digraphs) are also referenced as weighted graphs (digraphs).

As defined in [46, 49], a flow graph $G$ is denoted by $G=(N,\beta,\phi)$ , where: (1) $N\neq\emptyset$ is a finite set of nodes, (2) $\beta\subseteq N\times N$ is a finite set of arcs that connect nodes $x\in N$ and (3) $\phi:\beta\rightarrow R^{+}$ is a function that associates with every arc $a\in\beta$ of $G$ a positive real number.

In a flow graph, an arc is represented by the ordered pair $\langle x,y\rangle$ , where $x$ is the source node and $y$ is the target node of the arc and $x,y\subseteq N$ . In the terminology associated with FGs, it is also said that given an arc $\langle x,y\rangle$ , the node $x$ is an input of node $y$ and that node $y$ is an output of node $x$ .

Given a set of supervised data instances $X$ (training set), $|X|=N$ , each instance described by $M$ attributes, $A_{1}$ , $A_{2}$ , …, $A_{M}$ , and an associated class (from a set with $K$ classes), the construction of a FG that represents $X$ can be carried out by the following 5-step procedure:

(1)
For each attribute $A_{i}$ ( $1\leqslant i\leqslant M$ ) that describes instances, a set of distinct nodes is created, where each node represents a possible value of $A_{i}$ in $X$ (the set of nodes associated with an attribute is referred to as a layer). A training set described by $M$ attributes and an associated class will be represented by a FG containing $M+$ 1 layers, where the $M$ first layers, $L_{1}$ , $L_{2}$ , …, $L_{M}$ , are composed by nodes representing the possible values of attributes $A_{1}$ , $A_{2}$ , …, $A_{M}$ in $X$ , respectively.

The class attribute is represented by layer $L_{M+1}$ , which has as many nodes as there are class values (i.e., $K$ ). In short, each one of the $M$ layers has as many nodes as there are different values of the attribute associated with the layer. Layers 1 and $M+1$ are known as input layer and output layer, respectively.
(2)
Consider that each attribute $A_{i}$ ( $1\leqslant i\leqslant M$ ) has as possible values the elements in set { $A_{i_{1}}$ , $A_{i_{2}}$ , …, $A_{i_{ji}}$ }. Therefore, the layer $L_{i}$ that represents attribute $A_{i}$ ( $1\leqslant i\leqslant M$ ), will have $|\{A_{i_{1}},A_{i_{2}},\ldots,A_{i_{ji}}\}|$ nodes.
(3)
Nodes in a given layer $L_{i}$ ( $1\leqslant i\leqslant M$ ) can relate to one or more nodes in layer $L_{i+1}$ , depending on the values that attributes $A_{i}$ and $A_{i+1}$ , that define both layers, respectively, have in $X$ . The relationship between nodes $x\in L_{i}$ and $y\in L_{i+1}$ is represented in the FG under construction by the arc $\langle x,y\rangle$ .
(4)
For each arc $\langle x,y\rangle$ constructed in (3), the number of occurrences of the relationship between the two values of attributes (i.e., $x$ and $y$ ) that define the arc, in the training set, is counted. Such number is denoted by $\phi(\langle x,y\rangle)$ and is associated with the arc as a weighting label.
(5)
Two numeric values, inflow ( $\sigma_{+}(x)$ ) and outflow ( $\sigma_{-}(x)$ ), are associated with each node $x$ constructed in (2). The inflow of $x$ represents the sum of the weights of all arcs that reach $x$ and the outflow, the sum of the weights of all arcs that leave $x$ . The inflow of nodes in layer $L_{1}$ is always zero and the outflow of nodes in layer $L_{M+1}$ is always 0. All nodes $x$ belonging to internal layers are such that $\sigma_{+}(x)=\sigma_{-}(x)=\sigma(x)$ , where $\sigma(x)$ is named the throughflow of $x$ . These values are used as labels associated with each node.

Once the set of nodes and set of arcs have been constructed and the associated labels and weights have been assigned to nodes and arcs, the construction of the basic version of the FG is finished. Finally, the flow of the FG, noted by $\phi$ (FG), is defined as the number of data instances ( $N$ ) summarized by the FG.

A normalized version of FG can be obtained by dividing each label/weight by the value given by $\phi$ (FG). In the literature it can be found many derived concepts associated with an FG, usually based on the two basic concepts, the inflow and the outflow associated with each node in the FG. The formal definitions of concepts such coverage, certainty, strength, connections and many more can be found in the several references at the end of this paper, particularly references [46, 50, 52].
4. Exemplifying the induction of a FG

Consider a training set having 14 data instances, $X=\{x_{1},\ldots,x_{14}\}$ , each of them described by 3 attributes, $A_{1}$ , $A_{2}$ and $A_{3}$ , where the value of $A_{3}$ is the associated class of the instance. Table 1 describes the 14 instances in $X$ . Considering that each attribute of $X$ defines a layer in the FG, the FG to be constructed should have three layers, as shows Fig. 1.

Based on the description of $X$ shown in Table 1 and considering that the values assigned to attribute $A_{1}$ are {1, 2, 3, 4, 5, 7}, to $A_{2}$ are {1, 2, 3, 4, 5, 8, 9, 10} and to $A_{3}$ are {1, 2, 3}, the three layers to be constructed will have, respectively, 6, 8 and 3 nodes.

Table 1
Training set with 14 instances described by 3 attributes, $A_{1}$ , $A_{2}$ and $A_{3}$ , where $A_{3}\in\{1,2,3\}$ represents the instances associated class

	$A_{1}$	$A_{2}$	$A_{3}$		$A_{1}$	$A_{2}$	$A_{3}$
$x_{1}$	1	1	1	$x_{8}$	4	8	2
$x_{2}$	2	1	1	$x_{9}$	2	9	2
$x_{3}$	5	3	3	$x_{10}$	5	2	3
$x_{4}$	2	10	2	$x_{11}$	7	5	3
$x_{5}$	5	5	3	$x_{12}$	2	2	1
$x_{6}$	4	9	2	$x_{14}$	7	4	3

In Table 1 the instance $x_{1}$ is the only instance in $X$ that has as attribute values, $A_{1}=$ 1 and $A_{2}=$ 1. The relation between the node representing the value 1 for attribute $A_{1}$ and the node representing the value 1 for attribute $A_{2}$ defines an arc $\langle 1,1\rangle$ , between the corresponding nodes in layers $A_{1}$ and $A_{2}$ , respectively, having an associated weight of 1 ( $\phi(\langle 1,1\rangle)=$ 1). Similarly, there are two instances in $X$ ( $x_{7}$ and $x_{8}$ ) described by $A_{2}=$ 8 and $A_{3}=$ 2, so an arc $\langle 8,2\rangle$ is established between the two nodes, belonging to the second and third layers, respectively, having an associated weight of 2 ( $\phi(\langle 8,2\rangle)=$ 2).

The procedure is repeated for all pairs of nodes belonging to adjacent layers (the first node of the pair in layer $J$ and the second in layer $J+1$ ), which are represented in the training set. At the end of the process, the FG is constructed and it can be observed that its structure, the way it has been built, condenses the information contained in the training set and thus, it can be used for data analysis, as well as for the extraction of decision rules, among others.

By dividing each value associated with nodes and arcs of a FG by the number of summarized instances, a normalized version of flow graph, where the associated values are in the interval [0 1], is obtained. The graph presented in Fig. 1 has not been normalized.

Figure 1.

The FG structure (not normalized) induced from the set of data instances in Table 1.

5. The pre-processing phase – discretizing continuous-valued attributes

In the experiments described in Section 8, the Minimum Description Length Principle Cut (MDLPC) criterion, proposed and described in [41], has been used as the first-step of the HFG proposal, for dealing with continuous-valued attributes. The MDLPC criterion is one of the most commonly used discretization algorithms in supervised machine learning. It is based on two concepts, the entropy and the information gain measure, to determine the best cut point for splitting a continuous interval.

The algorithm is recursive and has time complexity of $Ο(N\log N)$ , where $N$ is the number of instances in the input data set. For finding a suitable partition of the range of values associated to a particular continuous attribute, the information gain measure is used to evaluate splitting point candidates in that range.

The splitting point candidate with larger information gain is evaluated according to the MDLPC criterion. If the splitting point candidate passes the MDLPC criterion it becomes a cut point and the discretization method is recursively executed for both subsets of values induced by the new cut point. Otherwise, the algorithm ends.

Let $X=\{x_{1},\ldots,x_{N}\}$ be a given data set with $N$ instances, each described by M attributes, $A_{1}$ , $A_{2}$ , …, $A_{M}$ and an associate class, from a set of $K$ possible classes, $\{C_{1},C_{2},\ldots,C_{K}\}$ . Consider $A$ one of the $M$ attributes and consider its range of values in $X$ . Let CP be a cut point that splits the range of $A$ , which will provoke a splitting in $X$ . The gain in splitting $X$ into two subsets, $X_{1}$ and $X_{2}$ , such that $|X_{1}|=N_{1}$ and $|X_{2}|=N_{2}$ where $X_{1}$ has instances whose values of attribute $A$ are less than or equal to CP and $X_{2}$ has instances whose values of attribute $A$ are greater than CP is given by Eq. (1).

$\displaystyle\!\!\!\!\!\!\!\!\text{Gain}(A,\textit{CP},X)=\text{entropy}(X)-(N% _{1}/N)\times\text{entropy}(X_{1})-(N_{2}/N)\times\text{entropy}(X_{2})$ (1)

For calculating the entropy of set $X$ , it is necessary to calculate the prior probability of each one of the $K$ classes in $X$ . The probability of class $C_{i}$ , noted by $p(C_{i})$ , $1\leqslant i\leqslant K$ , is given by dividing the number of instances of class $C_{i}$ , by the total number of instances in $X$ i.e., $N$ . The Eq. (2) shows how the entropy of a set $X$ is calculated.

$\displaystyle\text{entropy}(X)=\sum_{i=1}^{K}p(C_{i})\log_{2}p(C_{i})$ (2)

Taking into account the attribute $A$ , the MDLPC criterion establishes that a cut point candidate CP is a cut point if and only if Eq. (3) holds.

$\displaystyle\!\!\!\!\!\!\!\!\text{Gain}(A,\textit{CP},X)>\frac{\log(N-1)}{N}+% \frac{\Delta(A,\textit{CP},X)}{N}$ (3)

The expression $\Delta(A,\textit{CP},X)$ in Eq. (3) is defined by Eq. (4) where $Y_{1}$ and $Y_{2}$ represent the number of classes in $X_{1}$ and $X_{2}$ , respectively.

$\displaystyle\!\!\!\!\!\!\!\!\!\Delta(A,\textit{CP},X)\!=\log(3^{K}\!-\!2)\!-% \![K\times\text{entropy}(X)-Y_{1}\times\text{entropy}(X_{1})-Y_{2}\times\text{% entropy}(X_{2})]$ (4)

The MDLPC criterion only splits an interval into two subintervals at a time. To split an interval into more than two subintervals, the procedure should be recursively called, until a cut point candidate no longer satisfies Eq. (3).

6. A method for extracting classifiers from FGs

It is possible to extract, from a flow graph, a set of decision rules representing a classifier, as proposed and described in [46, 48, 50]. In this section the original method used to extract classifiers from FGs is reviewed and two modifications are considered.

Given a flow graph FG, a complete path [ $x, y$ ] in FG is a path starting at node $x\in I(\text{FG})$ and ending at node $y\in O(\text{FG})$ , that visits one node in each internal layer of FG; $I$ (FG) and $O$ (FG) represent the input and the output layers of the FG, respectively. According to Pawlak [46], a complete path [ $x, y$ ] can be considered a decision rule $x^{*}\rightarrow y$ , where $x^{*}$ represents the set of sequential conditions where a condition is a test on values of attributes associated with the input and internal layers that define the digraph, and $y$ is the node associated with the decision attribute value, in the output layer.

Consider again the data set given in Table 1 and the FG shown in Fig. 1, both in Section 4. The set of complete paths in the FG and their corresponding decision rules, are presented in Table 2. The notation employed in the table i.e., $[x,y]_{i}$ , represents a unique complete path from $x$ to $y$ , identified by subscript $i$ .

The method for extracting classifiers as described in [46], however, allows the extraction of both, unsupported decision rules as well as inconclusive decision rules. In this paper unsupported decision rules are defined as rules whose conditional part does not match any data instance and, possibly, they have been induced as the result of an over generalized process. To detect unsupported decision rules each decision rule extracted from the FG has its quality assessed based on its classification performance over the training set. Decision rules detected as unsupported are removed from the set of rules that represents the classifier.

Table 2
Complete paths of the FG in Fig. 1 and associated decision rules $x^{*}\rightarrow y$ , where #path: counter associated with a particular path. The subscript in $[x,y]_{i}$ refers to one of possibly various complete paths from $x$ to $y$

#path	[ $x, y$ ]	$x^{*}\rightarrow y$	#path	[ $x, y$ ]	$x^{*}\rightarrow y$
1	$[1,1]_{1}$	1, 3 $\rightarrow$ 1	9	$[4,2]_{2}$	4, 9 $\rightarrow$ 2
2	$[2,1]_{1}$	2, 1 $\rightarrow$ 1	10	$[4,2]_{3}$	4, 10 $\rightarrow$ 2
3	$[2,1]_{2}$	2, 2 $\rightarrow$ 1	11	$[5,1]_{1}$	5, 2 $\rightarrow$ 1
4	$[2,2]_{1}$	2, 9 $\rightarrow$ 2	12	$[5,3]_{1}$	5, 2 $\rightarrow$ 3
5	$[2,2]_{2}$	2, 10 $\rightarrow$ 2	13	$[5,3]_{2}$	5, 3 $\rightarrow$ 3
6	$[2,3]_{1}$	2, 2 $\rightarrow$ 3	14	$[5,3]_{3}$	5, 5 $\rightarrow$ 3
7	$[3,2]_{1}$	3, 8 $\rightarrow$ 2	15	$[7,3]_{1}$	7, 4 $\rightarrow$ 3
8	$[4,2]_{1}$	4, 8 $\rightarrow$ 2	16	$[7,3]_{2}$	7, 5 $\rightarrow$ 3

Let CL be a classifier represented by the set of decision rules given in Table 2. By inspecting the rules in CL, it can be observed that rule associated with the #path 1, given as 1, 3 $\rightarrow$ 1 does not classify any instance of the original data set and therefore, it can be removed from CL. The same happens with the decision rule associated with the #path 6, given as 2, 2 $\rightarrow$ 3.

Inconclusive decision rules are characterized as rules that share the same conditions, but have different conclusions. The removal of inconclusive decision rules must be performed after the removal of unsupported decision rules, to prevent supported from being removed too early.

To remove inconclusive rules an analysis, taking into account the set of decision rules, is conducted and, for each pair of decision rules that have the same conditions but different decisions, the rule with the lowest strength value associated is removed from the classifier. The strength factor associated with a complete rule [ $x, y$ ] is given as the product of the number of instances that have value $x$ for its first (input) attribute by the certainty factor of the complete path, which is calculated as the product of the certainty factors associated with each arc that defines the path.

Considering again the CL classifier used as example, after the removal of the unsupported rules, no inconclusive rules can be found. At the end of the process just described, each one of the rules in CL is a supported decision rule (i.e., it classifies at least one instance of the training set) and conclusive (i.e., there is no ambiguity in the decisions to be made). The CL classifier initially induced had its original size reduced and, considering it only contains supported and conclusive decision rules (as far as the training set is concerned), its chance of correctly classifying new data instances has increased.

Table 3

Data descriptions where DS: data set identification, #NI: no. of data instances, #NA: no. of attributes, #NG: no. of groups and G_Id $=$ #NI: no. of data instances per group where G_id is the group identification

DS	#NI	#NA	#NG	G_id $=$ #NI
Ruspini (Ru)	75	2	4	1 $=$ 20, 2 $=$ 23, 3 $=$ 17, 4 $=$ 15
Mouse-Like (ML)	1,000	2	3	1 $=$ 200, 2 $=$ 200, 3 $=$ 600
Spherical_6_2 (Sp)	300	2	6	1 $=$ 50, 2 $=$ 50, 3 $=$ 50, 4 $=$ 50, 5 $=$ 50, 6 $=$ 50
3MC (MC)	400	2	3	1 $=$ 120, 2 $=$ 170, 3 $=$ 110
Long square (LS)	900	2	6	1 $=$ 147,2 $=$ 155,3 $=$ 150,4 $=$ 148,5 $=$ 150,6 $=$ 150
Iris (Ir)	150	4	3	1 $=$ 50, 2 $=$ 50, 3 $=$ 50
Fossil (Fo)	87	6	3	1 $=$ 40, 2 $=$ 34, 3 $=$ 13
Aggregation (Ag)	788	3	7	1 $=$ 45, 2 $=$ 170, 3 $=$ 102, 4 $=$ 273, 5 $=$ 34, 6 $=$ 130, 7 $=$ 34
Flame (Fl)	240	3	2	1 $=$ 87, 2 $=$ 153
Seeds (Se)	210	8	3	1 $=$ 70, 2 $=$ 70, 3 $=$ 70
Ecoli (Ec)	336	8	8	1 $=$ 143, 2 $=$ 77, 3 $=$ 52, 4 $=$ 35, 5 $=$ 20, 6 $=$ 5, 7 $=$ 2, 8 $=$ 2
Blood transfusion (BT)	748	5	2	1 $=$ 574, 2 $=$ 174
Haberman’s survival (HS)	306	4	2	1 $=$ 225, 2 $=$ 81
Vertebral column (VC)	310	6	3	1 $=$ 60, 2 $=$ 150, 3 $=$ 100
User knowledge model (UK)	403	5	4	1 $=$ 50, 2 $=$ 129, 3 $=$ 122, 4 $=$ 102
Caesarian section (CS)	80	6	2	1 $=$ 34, 2 $=$ 46
Glass (Gl)	214	10	7	1 $=$ 70, 2 $=$ 76, 3 $=$ 17, 4 $=$ 0, 5 $=$ 13, 6 $=$ 9, 7 $=$ 29
Wine (Wi)	178	14	3	1 $=$ 59, 2 $=$ 71, 3 $=$ 48
Cryotherapy (Cr)	90	7	2	1 $=$ 42, 2 $=$ 48

7. Some functionalities of the computational system EFLOWG

As mentioned before, a computational system named EFLOWG (Extended FLOW Graph) was developed to provide an environment to carry out experiments related to the use of flow graph concepts and structures, as well as the methods investigated, particularly with focus on the FG extended version for dealing with continuous data, referred to as HFG.

The EFLOWG computational system is a desktop system, developed in the Java programming language, implemented and used in a Windows 10 platform. The system has an intuitive interface, which favors its use and facilitates its understanding and learning. Among the several EFLOWG characteristics and functionalities, the following nine stand out:

(1)
Input data instance sets should be given to the system as ARFF files;
(2)
The system identifies attributes, attribute types and values associated with attributes;
(3)
The information area provided by the system, regarding the set of data instances, is updated in real time;
(4)
Facilities for identifying and dealing with attribute missing values and outliers are provided by the system. Particularly, it offers three choices for dealing with the problem of missing attribute values;
(5)
The system offers the implementation of three methods to be used for data discretization;
(6)
The system allows the user to sort the attributes in a particular way, at his/her choice or, then, to sort them using their associated Gain Information value;
(7)
It allows the induction of flow graphs and the extraction of classifiers, and corresponding evaluation, in two distinctive ways:

(a)
Division by percentage: training and testing sets are defined based on a percentage of instances, provided by the user and;
(b)
Cross-validation: runs a cross-validation process, based on an integer value $k$ provided by the user.

(8)
The system makes available an output screen that prompts detailed information about the FG-based induced structures and results of the experiments performed and
(9)
The system keeps trace and stores experiments performed, for future analysis and comparison.

The experiments results presented and discussed in Section 8, as well as the transformations in the data used in the experiments, were carried out using the computational system EFLOWG.

Table 4
Structure characteristics obtained in the 5-fold experiments using the HFG approach having as input data each one of the 19 data sets in Table 3, where DS: data set identification, #nodes: average number of nodes of the induced HFG, #arcs: average number of arcs of the induced HFG, size: #nodes $+$ #arcs and #complete paths: number of complete paths in the HFG. Corresponding values of standard deviations are shown between parentheses

DS #nodes #arcs size #complete paths

Ru 12 (0.00) 12 (2.24) 24 10 (3.61)

ML 11 (0.00) 19 (2.24) 30 30 (4.47)

Sp 14 (0.00) 12 (0.00) 26 10 (0.00)

MC 12 (0.00) 15 (0.00) 27 11 (0.00)

LS 13 (0.00) 15 (2.00) 28 15 (2.24)

Ir 15 (0.00) 26 (1.73) 41 60 (8.49)

Fo 26 (2.00) 55 (2.65) 81 419 (13.04)

Ag 20 (0.00) 49 (2.00) 69 88 (2.24)

Fl 8 (0.00) 12 (0.00) 20 11 (0.00)

Se 30 (0.00) 72 (2.45) 102 4,126 (892.26)

Ec 25 (2.00) 46 (4.00) 71 1,519 (559.93)

BT 9 (0.00) 10 (0.00) 19 8 (0.00)

HS 6 (0.00) 7 (0.00) 13 4 (0.00)

VC 19 (0.00) 39 (1.41) 58 566 (62.61)

UK 17 (0.00) 34 (0.00) 51 124 (0.00)

CS 15 (0.00) 32 (2.00) 47 117 (10.82)

Gl 35 (0.00) 67 (1.73) 102 26,989 (5647.27)

Wi 43 (2.46) 110 (7.08) 153 1,658,880 (6237.54)

Cr 13 (0.00) 18 (0.00) 31 48 (0.00)

Table 5
Classification performance of sets of rules extracted from HFGs, where DS: data set identification, #TDR: average number of total decision rules, #DR: average number of consistent decision rules, #cc/#total (SD): average number of correct classifications/average number of instances (standard deviation) and %acc: average percentage of accuracy values

DS #TDR #DR #cc/#total (SD) %acc

Ru 10 7 14/15 (1.10) 97.33%

ML 30 12 184/200 (12.42) 92.00%

Sp 10 6 59/60 (3.85) 98.33%

MC 11 9 71/80 (5.18) 88.50%

LS 15 9 148/180 (9.32) 82.22%

Ir 60 35 28/30 (3.16) 93.33%

Fo 419 419 16/17 (2.00) 91.95%

Ag 88 34 158/158 (0.00) 100.00%

Fl 11 8 39/48 (6.87) 82.08%

Se 4,126 1724 39/42 (7.01) 92.86%

Ec 559 235 60/68 (6.54) 88.24%

BT 8 4 113/150 (11.22) 75.53%

HS 4 2 51/61 (8.37) 75.53%

VC 566 226 48/62 (5.90) 77.10%

UK 124 56 68/80 (10.33) 84.12%

CS 117 58 10/16 (2.45) 62.50%

Gl 26,989 73 42/43 (4.60) 97.67%

Wi 1,658,880 137 36/37 (2.24) 97.22%

Cr 48 24 14/18 (5.73) 77.78%

Table 6
Average of accuracy rate results obtained by HFG (HFG), J48 (J48), Naïve Bayes (NB), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) algorithms for data sets in Table 3. Best results are bold faced

DS HFG J48 NB KNN SVM

Ru 97.33% 98.67% 98.67% 100.00% 100.00%

ML 92.00% 97.90% 98.30% 98.20% 98.80%

Sp 98.33% 100.00% 100.00% 100.00% 100.00%

MC 88.50% 100.00% 100.00% 100.00% 100.00%

LS 82.22% 99.89% 99.78% 99.89% 99.78%

Ir 93.33% 94.00% 96.00% 94.00% 96.67%

Fo 91.95% 100.00% 100.00% 100.00% 100.00%

Ag 100.00% 99.62% 99.33% 99.33% 95.68%

Fl 82.08% 97.91% 95.83% 100.00% 88.33%

Se 92.86% 91.90% 90.95% 92.38% 93.33%

Ec 88.24% 82.44% 85.12% 80.95% 83.03%

BT 75.53% 77.27% 75.13% 71.65% 76.20%

HS 75.53% 70.92% 75.16% 68.95% 73.52%

VC 77.10% 81.94% 83.22% 76.12% 75.16%

UK 84.12% 93.05% 89.58% 84.86% 92.80%

CS 62.50% 62.50% 62.50% 57.75% 60.00%

Gl 97.67% 97.20% 84.58% 90.19% 79.90%

Wi 97.22% 92.70% 97.17% 95.50% 98.31%

Cr 77.78% 90.00% 83.33% 88.89% 90.00%

Table 7
Results obtained using attribute sequences defined by the Information Gain (IG), in ascending and descending order, where DS: data set identification; %ASC: accuracy obtained using the IG in ascending order; %DESC: accuracy obtained using the IG in descending order, and %ORIG: accuracy obtained using the original sequence of attributes that describes the data instances. Best results are bold faced

DS %ASC %DESC %ORIG DS %ASC %DESC %ORIG

Ru 97.33% 97.33% 97.33% Ec 82.14% 78.33% 88.24%

ML 90.20% 89.12% 92.00% BT 75.53% 72.20% 75.53%

Sp 99.67% 99.67% 98.33% HS 75.53% 99.67% 75.53%

MC 88.33% 85.15% 88.50% VC 80.65% 99.75% 77.10%

LS 99.56% 99.56% 82.22% UK 82.16% 80.56% 84.12%

Ir 94.00% 94.00% 93.33% CS 73.75% 94.00% 62.50%

Fo 88.51% 88.51% 91.95% Gl 97.67% 88.51% 97.67%

Ag 99.75% 99.75% 100.00% Wi 90.19% 94.22% 97.22%

Fl 95.00% 95.00% 82.08% Cr 82.22% 95.00% 77.78%

Se 91.81% 88.23% 92.86%

8. Experiments, results and discussion

DS	#nodes	#arcs	size	#complete paths
Ru	12 (0.00)	12 (2.24)	24	10 (3.61)
ML	11 (0.00)	19 (2.24)	30	30 (4.47)
Sp	14 (0.00)	12 (0.00)	26	10 (0.00)
MC	12 (0.00)	15 (0.00)	27	11 (0.00)
LS	13 (0.00)	15 (2.00)	28	15 (2.24)
Ir	15 (0.00)	26 (1.73)	41	60 (8.49)
Fo	26 (2.00)	55 (2.65)	81	419 (13.04)
Ag	20 (0.00)	49 (2.00)	69	88 (2.24)
Fl	8 (0.00)	12 (0.00)	20	11 (0.00)
Se	30 (0.00)	72 (2.45)	102	4,126 (892.26)
Ec	25 (2.00)	46 (4.00)	71	1,519 (559.93)
BT	9 (0.00)	10 (0.00)	19	8 (0.00)
HS	6 (0.00)	7 (0.00)	13	4 (0.00)
VC	19 (0.00)	39 (1.41)	58	566 (62.61)
UK	17 (0.00)	34 (0.00)	51	124 (0.00)
CS	15 (0.00)	32 (2.00)	47	117 (10.82)
Gl	35 (0.00)	67 (1.73)	102	26,989 (5647.27)
Wi	43 (2.46)	110 (7.08)	153	1,658,880 (6237.54)
Cr	13 (0.00)	18 (0.00)	31	48 (0.00)

DS	#TDR	#DR	#cc/#total (SD)	%acc
Ru	10	7	14/15 (1.10)	97.33%
ML	30	12	184/200 (12.42)	92.00%
Sp	10	6	59/60 (3.85)	98.33%
MC	11	9	71/80 (5.18)	88.50%
LS	15	9	148/180 (9.32)	82.22%
Ir	60	35	28/30 (3.16)	93.33%
Fo	419	419	16/17 (2.00)	91.95%
Ag	88	34	158/158 (0.00)	100.00%
Fl	11	8	39/48 (6.87)	82.08%
Se	4,126	1724	39/42 (7.01)	92.86%
Ec	559	235	60/68 (6.54)	88.24%
BT	8	4	113/150 (11.22)	75.53%
HS	4	2	51/61 (8.37)	75.53%
VC	566	226	48/62 (5.90)	77.10%
UK	124	56	68/80 (10.33)	84.12%
CS	117	58	10/16 (2.45)	62.50%
Gl	26,989	73	42/43 (4.60)	97.67%
Wi	1,658,880	137	36/37 (2.24)	97.22%
Cr	48	24	14/18 (5.73)	77.78%

DS	HFG	J48	NB	KNN	SVM
Ru	97.33%	98.67%	98.67%	100.00%	100.00%
ML	92.00%	97.90%	98.30%	98.20%	98.80%
Sp	98.33%	100.00%	100.00%	100.00%	100.00%
MC	88.50%	100.00%	100.00%	100.00%	100.00%
LS	82.22%	99.89%	99.78%	99.89%	99.78%
Ir	93.33%	94.00%	96.00%	94.00%	96.67%
Fo	91.95%	100.00%	100.00%	100.00%	100.00%
Ag	100.00%	99.62%	99.33%	99.33%	95.68%
Fl	82.08%	97.91%	95.83%	100.00%	88.33%
Se	92.86%	91.90%	90.95%	92.38%	93.33%
Ec	88.24%	82.44%	85.12%	80.95%	83.03%
BT	75.53%	77.27%	75.13%	71.65%	76.20%
HS	75.53%	70.92%	75.16%	68.95%	73.52%
VC	77.10%	81.94%	83.22%	76.12%	75.16%
UK	84.12%	93.05%	89.58%	84.86%	92.80%
CS	62.50%	62.50%	62.50%	57.75%	60.00%
Gl	97.67%	97.20%	84.58%	90.19%	79.90%
Wi	97.22%	92.70%	97.17%	95.50%	98.31%
Cr	77.78%	90.00%	83.33%	88.89%	90.00%

DS	%ASC	%DESC	%ORIG	DS	%ASC	%DESC	%ORIG
Ru	97.33%	97.33%	97.33%	Ec	82.14%	78.33%	88.24%
ML	90.20%	89.12%	92.00%	BT	75.53%	72.20%	75.53%
Sp	99.67%	99.67%	98.33%	HS	75.53%	99.67%	75.53%
MC	88.33%	85.15%	88.50%	VC	80.65%	99.75%	77.10%
LS	99.56%	99.56%	82.22%	UK	82.16%	80.56%	84.12%
Ir	94.00%	94.00%	93.33%	CS	73.75%	94.00%	62.50%
Fo	88.51%	88.51%	91.95%	Gl	97.67%	88.51%	97.67%
Ag	99.75%	99.75%	100.00%	Wi	90.19%	94.22%	97.22%
Fl	95.00%	95.00%	82.08%	Cr	82.22%	95.00%	77.78%
Se	91.81%	88.23%	92.86%

As introduced earlier in this paper, the proposed HFG approach, implemented by the computational system EFLOWG, is a two-step process that uses, in its first step, the discretization method presented in Section 5, which favors, in continuous domains, the induction of smaller FGs, when compared with the bulky FGs induced by the original FG. The digraph induced by the HFG approach also allows the extraction of a classifier, composed by a set of decision rules, as discussed in Section 6.

This section shows and analyzes the results of experiments performed considering the HFG approach and the EFLOWG system, having as input training instances from 19 data sets, most of them downloaded from the UCI ML Repository [5] and some from other places, such as [11, 17, 21, 29, 36]. The main characteristics of the 19 data sets are described in Table 3.

For each data set $X$ in Table 3, the adopted methodology implemented a 5-fold cross-validation process, by sequentially going through the following steps:

(1)
For each set $X_{i}$ ( $1\leqslant i\leqslant 19$ ) in Table 3 the corresponding attributes were discretized, producing the discretized set versions $DX_{i}$ ( $1\leqslant i\leqslant 19$ ).
(2)
For each set $DX_{i}$ ( $1\leqslant i\leqslant 19$ ) a 5-fold cross-validation process was conducted:

(2.1)
$DX_{i}$ was partitioned into five parts i.e., $\{DX_{i_{1}},DX_{i_{2}},DX_{i_{3}},DX_{i_{4}},DX_{i_{5}}\}$ . In each one of the 5 steps of the 5-fold cross-validation, four subsets were used for inducing the HFG, the set of decision rules embedded in the HFG was extracted and evaluated using the fifth subset that was not used for training. The structure of the induced HFG was then ‘measured’ in relation to its number of: nodes, arcs, existing paths from the initial nodes to the ending nodes and number of extracted decision rules.
(2.2)
At the end of the 5-fold cross-validation, the average and corresponding standard-deviation of classification rates of the set of rules extracted from the corresponding induced HFGs, taking into account the values obtained in each one of the five steps of the process, were calculated.

Table 4 shows the values of the four characteristics used for evaluating the flow graph structures induced by the processes that implement the Hybrid Flow Graph (HFG) approach, for each data set in Table 3.

The characteristics are: the number of nodes, the number of arcs, the size (given by the number of nodes plus the number of arcs) and the number of complete paths found in the structure. The values in the table are the average and the corresponding SDs of measurements related to the four characteristics, based on the 5-fold cross-validation process. The original sequence of attributes (i.e. attributes order) that describes the instances in each original data set was maintained for the experiments.

Table 5 presents information about the performances of the classifiers extracted from the HFGs whose structure characteristics are shown in Table 4. As discussed before, the EFLOWG, when extracting the set of rules from HFGs, only considers supported and conclusive rules i.e., rules that classify at least one instance and are not ambiguous. Unsupported and inconclusive rules are not present in the classifier (as described in Section 6).

Table 8
Accuracies of classifiers when using two different sequences of attributes, where DS: data set identification, BAS: sequence of attributes that produced the best result, OAS: original attribute sequence, %acuBAS: accuracy obtained using BAS, %acuOAS: accuracy obtained using OAS

DS BAS vs OAS %acuBAS %acuOAS

Ru BAS $x, y$ 97.33% 97.33%

OAS $x, y$

ML BAS $x, y$ 88.27% 92.00%

OAS $x, y$

Sp BAS $y, x$ 99.67% 98.33%

OAS $x, y$

MC BAS $x, y$ 99.75% 88.50%

OAS $x, y$

LS BAS a1, a0 99.56% 82.22%

OAS a0, a1

Ir BAS sepalwidth, sepallength, petallength, petalwidth 92.00% 93.33%

OAS sepallength, sepalwidth, petallength, petalwidth

Fo BAS at0, at3, at1, at5, at4, at2 88.51% 91.95%

OAS at0, at1, at2, at3, at4, at5

Ag BAS $y, x$ 99.75% 100.00%

OAS $x, y$

Fl BAS $y, x$ 95.00% 82.08%

OAS $x, y$

Se BAS at2, at5, at1, at0, at3, at6, at4 90.80% 92.86%

OAS at0, at1, at2, at3, at4, at5, at6

Ec BAS alm2, aac, alm1, gvh, chg, mcg, lip 82.14% 88.24%

OAS mcg, gvh, lip, chg, aac, alm1, alm2

BT BAS recency_in_months, frequency_times, monetary, time_in_months 71.20% 75.53%

OAS recency_in_months, frequency_times, monetary, time_in_months

HS BAS age_when_operated, operated_in, pos_axi_nodes 74.16% 75.53%

OAS age_when_operated, operated_in, pos_axi_nodes

VC BAS pelvic_incidence, pelvic_radius, lumbar_lordosis_angle, degree_spondylolisthesis, pelvic_tilt,sacral_slope 75.65% 77.10%

OAS pelvic_incidence, pelvic_tilt, lumbar_lordosis_angle, sacral_slope, pelvic_radius,degree_spondylolisthesis

UK BAS stg, peg, str, lpr, scg 81.23% 84.12%

OAS stg, scg, str, lpr, peg

CS BAS heart, age, delivery_number, delivery_time, blood 73.75% 62.50%

OAS age, delivery_number, delivery_time, blood, heart

Gl BAS Ca, K, Fe, ID, Mg, RI, Ba, Si, Al, Na 97.67% 97.67%

OAS ID, RI, Na, Mg, Al, Si, K, Ca, Ba, Fe

Wi BAS alcohol, malic_acid, ash, alcalinity_of_ash, magnesium, total_phenols, flavanoids,nonflavanoid_phenols, proanthocyanins, color_intensity, hue, OD280_OD315, prolin 97.19% 97.22%

OAS alcohol, malic_acid, ash, alcalinity_of_ash, magnesium, total_phenols, flavanoids,nonflavanoid_phenols, proanthocyanins, color_intensity, hue, OD280_OD315, prolin

Cr BAS sex, type, time, n_warts, age, area 75.22% 77.78%

OAS sex, age, time, n_warts, type, area

For instance, associated with the data set identified as Gl in Table 5, the total number of extracted rules was 26,989 and, after the removal of unsupported and/or inconclusive rules, the number went down to 73, implying that 99.73% of the extracted rules were unsupported or inconclusive. For the data set identified as Wi, only 0.008% of the original rules, extracted from its corresponding HFG, were supported and conclusive. In Table 5, #TDR stands for the original number of decision rules extracted from the HFG and #DR stands for the number of consistent decision rules.

Although performances of classifiers induced from HFGs were quite acceptable for most data sets, in the CS data set the results (accuracy rate of 62.50%) were far from being satisfactory; in this case, a further investigation about the reasons for this outcome needs to be conducted.

Aiming at a further investigation of HFG performances, the results produced by EFLOWG as well as those obtained using the J48, Naïve Bayes, K-Nearest Neighbor and Support Vector Machine, from implementations available in the Waikato Environment for Knowledge Analysis (Weka 3.8) [10], in the 19 data sets, are shown in Table 6. Details about the four algorithms can be seen in references [21, 35, 38, 40].

As can be seen in Table 6, the performance results obtained by the HFGs in several data domains are not as good as those obtained by the other four algorithms. The HFG approach had comparative best performance in only 5 out of the 19 data sets. However, it can be observed in Table 6 that, in most cases, the results obtained by HFGs were close to those obtained by the other 4 algorithms; although FG-based results in most cases were not the best ones, they still can be considered good results, except in the CS data set. Considering that the other 4 algorithms are widely well-established algorithms that have been receiving research investments since they have been proposed, the results obtained using HFGs are quite encouraging for supporting to pursue this line of research. Also, a statistical analysis is still needed to verify if the differences among the results of the five algorithms are indeed relevant.

It is important to mention that the synthetic bi-dimensional data sets used in the experiments described in this section are typically used in experiments related to clustering and their instances have no associated class. For the experiments conducted however, the instances of the synthetic data sets have been assigned a class, corresponding to the group they belong to in a clustering identified by a visual analysis of the corresponding plotting of each data set.
9. Investigating the impact of the attribute order on induced FGs

DS	BAS vs OAS	%acuBAS	%acuOAS
Ru	BAS	$x, y$	97.33%	97.33%
	OAS	$x, y$
ML	BAS	$x, y$	88.27%	92.00%
	OAS	$x, y$
Sp	BAS	$y, x$	99.67%	98.33%
	OAS	$x, y$
MC	BAS	$x, y$	99.75%	88.50%
	OAS	$x, y$
LS	BAS	a1, a0	99.56%	82.22%
	OAS	a0, a1
Ir	BAS	sepalwidth, sepallength, petallength, petalwidth	92.00%	93.33%
	OAS	sepallength, sepalwidth, petallength, petalwidth
Fo	BAS	at0, at3, at1, at5, at4, at2	88.51%	91.95%
	OAS	at0, at1, at2, at3, at4, at5
Ag	BAS	$y, x$	99.75%	100.00%
	OAS	$x, y$
Fl	BAS	$y, x$	95.00%	82.08%
	OAS	$x, y$
Se	BAS	at2, at5, at1, at0, at3, at6, at4	90.80%	92.86%
	OAS	at0, at1, at2, at3, at4, at5, at6
Ec	BAS	alm2, aac, alm1, gvh, chg, mcg, lip	82.14%	88.24%
	OAS	mcg, gvh, lip, chg, aac, alm1, alm2
BT	BAS	recency_in_months, frequency_times, monetary, time_in_months	71.20%	75.53%
	OAS	recency_in_months, frequency_times, monetary, time_in_months
HS	BAS	age_when_operated, operated_in, pos_axi_nodes	74.16%	75.53%
	OAS	age_when_operated, operated_in, pos_axi_nodes
VC	BAS	pelvic_incidence, pelvic_radius, lumbar_lordosis_angle, degree_spondylolisthesis, pelvic_tilt,sacral_slope	75.65%	77.10%
	OAS	pelvic_incidence, pelvic_tilt, lumbar_lordosis_angle, sacral_slope, pelvic_radius,degree_spondylolisthesis
UK	BAS	stg, peg, str, lpr, scg	81.23%	84.12%
	OAS	stg, scg, str, lpr, peg
CS	BAS	heart, age, delivery_number, delivery_time, blood	73.75%	62.50%
	OAS	age, delivery_number, delivery_time, blood, heart
Gl	BAS	Ca, K, Fe, ID, Mg, RI, Ba, Si, Al, Na	97.67%	97.67%
	OAS	ID, RI, Na, Mg, Al, Si, K, Ca, Ba, Fe
Wi	BAS	alcohol, malic_acid, ash, alcalinity_of_ash, magnesium, total_phenols, flavanoids,nonflavanoid_phenols, proanthocyanins, color_intensity, hue, OD280_OD315, prolin	97.19%	97.22%
	OAS	alcohol, malic_acid, ash, alcalinity_of_ash, magnesium, total_phenols, flavanoids,nonflavanoid_phenols, proanthocyanins, color_intensity, hue, OD280_OD315, prolin
Cr	BAS	sex, type, time, n_warts, age, area	75.22%	77.78%
	OAS	sex, age, time, n_warts, type, area

During the research work it was noticed that different sequences of attributes (in relation to the order of appearance of its elements) can produce different results. An empirical investigation, based on three experiments, was carried out to verify the impact of different sequences of attributes on the results obtained.

The first two experiments used the Information Gain (IG) measure (presented in Section 5) to establish the order of the attributes in two different sequences. The first sequence of attributes was based on ordering of attributes in ascending order of their IG values, and the second sequence of attributes was obtained using the descendent order of the IG values of the attributes. Classification results using the ascending, descending and original order of the attributes for describing the training instances are shown in Table 7.

Based on results shown in Table 7 it can be observed that the orderings (ascending or descending), when contributing to the accuracy of the induced rules, only had a mild contribution in approximately half of the domains. The same can be said about maintaining the original the sequence of attributes which, in spite of producing rules with better accuracy in 9 out of 19 data sets, the values themselves are close to those obtained by ordered sequences (in ascending or descending order).

For the third experiment it is worth mentioning that the EFLOWG system enables the identification of the $n!$ different sequences of attributes that can be obtained from a set of n attributes. For the experiment, out of the $n!$ obtained sequences, 50 of them were randomly selected and used for inducing classifiers. The sequence of attributes that produced the classifier with the best precision was considered the best sequence. The results obtained by both, classifiers induced using the best sequence of attributes and classifiers induced using the original sequence of attributes that describes the data set, are presented in Table 8.

Based on the three experiments performed it can be said that the use of the original sequence of the attributes produced, in the majority of the experiments, better results when compared to those obtained with sequences considered as best attribute sequences. However, it was not possible to identify a pattern when ordering the attribute sequence that guarantees the best classification results.

10. Conclusions and future work

FGs can be defined as knowledge representation structures and are mostly used as a mathematical tool for analysis of information flows, in information networks represented by digraphs. FG-based structures also can be approached as a special type of database in which, instead of storing information about individual objects, statistical characteristics of the objects are represented and stored as information flow distribution. The relations between the attributes that describe the data are established through a chain of sequenced arcs that starts at the input layer and ends at the output layer, which is interpreted as a decision rule. The structure of a FG as a whole can be interpreted as a decision algorithm, composed by several decision rules. Thus, based on the extraction of the distribution of the many flows embedded in a FG it is possible to induce a classifier.

As previously pointed out in this paper, although one can find in the literature research works related to the use of FGs, the formalism and its associated procedures are not as frequently used in applications as the most popular machine learning algorithms are.

FGs, as originally proposed, are not able to handle continuous-valued attributes and, as such, have a very limited scope of use in real-world applications. This paper describes an extension of the original FG structure into a structure named Hybrid Flow Graphs (HFG), suitable for dealing with continuous-valued attributes. The HFG structure is induced using a process that first implements a procedure for discretizing the input data set, followed by an inductive procedure that constructs the flow graph. In continuous-valued domains an induced HFG is structurally much more condensed and can summarize better the data flow distribution.

The results obtained with the use of a computational system, the EFLOWG, that implements the induction of FGs and the extraction of classifiers from FG-based structures, can be considered very promising. In approximately half of the data domains used in the experiments, although FG-based results were below those obtained by some of the other four algorithms they have been compared with, in most cases the values obtained by HFGs were close to those obtained by the winner algorithm. This can be considered a positive result, considering that not much research has been invested in improving FGs yet, as far as the literature in the area is concerned. Although the results of the experiments conducted on attribute ordering were not conclusive, we believe that more research should be conducted in relation to the digraph structure itself, which does not allow for much flexibility in relation to connecting attribute values sequentially.

The rigidity of the current approach for inducing FG-based structures could be flexed by implementing a genetic algorithm based process which would conveniently (depending on the data) compose path structures which not necessarily involve all attributes present in the data. FG-based structures could also be improved by using feature selection algorithms [7, 30] as well as benefit from dealing with outliers [31], as the first step of the process of inducing such structures and, from them, classifiers. The EFLOWG computational system was developed during the research work described in [9].

Footnotes

Acknowledgments

The authors thank UNIFACCAMP and CNPq for their support as well as the anonymous reviewers for their suggestions that helped improving our previous work on FGs. The second author is grateful to CAPES for the scholarship during the period of his studies. This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior – Brasil (CAPES) – Finance Code 001.

References

Chan

C.-C.

and Tsumoto

, On learning decision rules from flow graphs, in: Proc of the 2007 Annual Meeting of the North American Fuzzy Information Processing Society (NAFIPS 2007), v. 5, 2007, pp. 655–658.

Butz

C.J.

Yan

and Yang

, An efficient algorithm for inference in rough set flow graphs, Transactions on Rough Sets Peters

J.F.

and Skowron

, eds, LCCS 4100, (2006), 102–122.

Bishop

C.M.

, Neural Networks for Pattern Recognition, Oxford University Press, UK, 2005.

Bishop

C.M.

, Pattern Recognition and Machine Learning, Springer-Verlag Publishing, Berlin, 2006.

Dua

and Graff

, UCI Machine Learning Repository, http://archive.ics.edu/ml. University of California, School of Information and Computer Science, Irvine, CA, 2019.

Knuth

D.E.

, The Art of Computer Programming, v. III, Addison-Wesley, USA, 1973.

Santoro

D.M.

and Nicoletti

M.C.

, Investigating a wrapper approach for selecting features using constructive neural networks, in: Proc International Conference on Information Technology: Coding and Computing (ITCC 2005), 2005, pp. 77–82.

Rodrigues

E.C.

and Nicoletti

M.C.

, Extending flow graphs for handling continuous-valued attributes, in: Hybrid Intelligent Systems (HIS 2018), Advances in Intelligent Systems and Computing, v. 923 Madureira

Abraham

Gandhi

and Varela

, eds , Springer, Cham.

Rodrigues

E.C.

, Flow graphs as data structures for representing and extracting information, M. Sc. dissertation, UNIFACCAMP, 2018 (in Portuguese).

10.

Frank

Hall

M.A.

and Witten

I.A.

, The WEKA Workbench. Online Appendix for Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann Publishers, USA, 2016.

11.

Ruspini

E.H.

, Numerical methods for fuzzy clustering, Information Science 2 (1970), 319–350.

12.

Hruschka

E.R.

, Jr. Nicoletti

M.C.

Oliveira

V.A.

and Bressan

G.M.

, BayesRule: A Markov-blanket based procedure for extracting a set of probabilistic rules from Bayesian classifiers, International Journal of Hybrid Intelligent Systems 5(2) (2008), 83–96.

13.

Allen

F.E.

, Program optimization, Annual Review in Automatic Programming 5 (1969), 239–307.

14.

Allen

F.E.

, Control flow analysis, SIGPLAN Notices 5(7) (1970), 1–19.

15.

Allen

F.E.

, A basis for program optimization, in: Proc IFIP Congress, North Holland Publ Co., Amsterdam, 1972, pp. 385–390.

16.

Allen

F.E.

, Interprocedural data flow analysis, in: Proc IFIP Congress, North Holland Publ Co., Amsterdam, 1974, pp. 398–402.

17.

Chernoff

, The use of faces to represent points in n-dimensional space graphically, Tech Report no. 71, Department of Statistics, Stanford University, Stanford, CA, USA, 1971.

18.

Witten

I.H.

Frank

and Hall

M.A.

, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann Publishers, USA, 2011.

19.

Williams

J.B.

and Zhang

, Combining affective intelligence with learning to improve action selection in decision-making agents, International Journal of Hybrid Intelligent Systems, Pre-press, (2018), 1–27.

20.

Clark

and Holton

D.A.

, A First Look at Graph Theory, (2

{}^{\text{nd}}

Ed), World Scientific, USA, 1998.

21.

Handl

and Knowles

, Multiobjective clustering with automatic determination of the number of clusters, Tech Rep TR-COMPSYSBIO-2004-02, UMIST, UK, 2004.

22.

Quinlan

J.R.

, Induction of decision trees, Machine Learning 1 (1986), 81–106.

23.

Quinlan

J.R.

, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, USA, 1993.

24.

Grzymala-Busse

J.W.

, Learning from examples based on rough multisets, in: Proc of the Second International Symposium on Methodologies for Intelligent Systems, 1987, pp. 325–332.

25.

Lisowski

and Czyzewski

, Pawlak’s flow graph extensions for video surveillance systems, in: Proc of the Federated Conference on Computer Science and Information Systems, v. 5, 2015, pp. 81–87.

26.

Mondal

, Application design and analysis of different hybrid intelligent techniques, International Journal of Hybrid Intelligent Systems 13(3–4) (2016), 173–181.

27.

Fosdick

L.D.

Osterweil

L.J.

, Data flow analysis in software reliability, Computing Surveys 8(3) (1976), 305–330.

28.

Ford

L.R.

and Fulkerson

D.R.

, Maximal flow through a network, Canadian Journal of Mathematics 8 (1956), 399–404.

29.

M.C.

Chou

C.H.

Hsieh

C.C.

, Fuzzy c-Means algorithm with a point symmetry distance, International Journal of Fuzzy Systems 7(4) (2005), 175–181.

30.

Nicoletti

M.C.

and Santoro

D.M.

, The influence of search mechanisms in feature subset selection processes, Intelligent Decision Technologies 2(4) (2008), 231–238.

31.

Suri

N.N.R.R.

Murty

M.N.

and Athithan

, A ranking-based algorithm for detection of outliers in categorical data, International Journal of Hybrid Intelligent Systems 11(1) (2014), 1–11.

32.

Latifa

Feraoun

Batouche

and Abraham

, Arabic text detection using ensemble machine learning, International Journal of Hybrid Intelligent Systems 14(4) (2018), 233–238.

33.

Clark

and Niblett

, The CN2 induction algorithm, Machine Learning 3 (1989), 261–283.

34.

Pattaraintakorn

Cercone

and Naruedomkul

, Rule learning: Ordinal prediction based on rough sets and soft-computing, Applied Mathematics Letters 19 (2006), 1300–1307.

35.

Duda

R.O.

Hart

P.E.

and Stork

D.G.

, Pattern Classification, (2nd Ed), John Wiley and Sons, Inc., USA, 2001.

36.

Bandyopadhyay

and Maulik

, Genetic clustering for automatic evolution of clusters and application to image classification, Pattern Recognition 35 (2002), 1197–1208.

37.

García

Luengo

Sáez

J.A.

López

and Herrera

, A survey of discretization techniques: taxonomy and empirical analysis in supervised learning, IEEE Transactions on Knowledge and Data Engineering 25(4) (2013), 734–750.

38.

Russell

and Norvig

, Artificial Intelligence: A Modern Approach, (3

{}^{\text{rd}}

Ed), USA: Pearson Publishing Ltd., 2009.

39.

Ludermir

T.B.

Prudêncio

R.B.C.

and Zanchettin

, Feature and algorithm selection with hybrid intelligent techniques, International Journal of Hybrid Intelligent Systems 8(3) (2011), 115–116.

40.

Mitchell

T.M.

, Machine Learning, McGraw-Hill, USA, 1997.

41.

Fayyad

U.M.

and Irani

K.B.

, Multi-interval discretization of continuous-valued attributes for classification learning, in: Proc of the International Joint Conference on Artificial Intelligence, 1993, pp. 1022–1029.

42.

Yang

Y.-P.O.

Shich

H.-M.

Tzeng

G.-H.

Yen

and Chan

C.C.

, Combined rough sets with flow graph and formal concept analysis for business aviation decision-making, Journal of Intelligent Information Systems 36 (2011), 347–366.

43.

Pawlak

, Rough Sets: Theoretical Aspects of Reasoning About Data, Kluwer, London, 1991.

44.

Pawlak

Grzymala-Busse

Slowinski

and Ziarko

, Rough sets, Communications of the ACM 38(11) (1995), 89–95.

45.

Pawlak

, Rough sets, decision algorithms and Bayes’ theorem, European Journal of Operational Research 136 (2002), 181–189.

46.

Pawlak

, Flow graphs and decision algorithms, in: Lecture Notes in Artificial Intelligence, v. 2639 Wang

et al., eds, Springer-Verlag Publishing, Berlin, 2003, pp. 1–10.

47.

Pawlak

, Probability, truth and flow graphs, Electronic Notes in Theoretical Computer Science 82(4) (2003), 1–9.

48.

Pawlak

, Decision algorithms and flow graphs: A rough set approach, Journal of Telecommunications and Information Technology 3 (2003), 98–101.

49.

Pawlak

, Flow graphs – a new paradigm for data mining and knowledge discovery, in: JAIST Forum 2004 – Technology Creation Based on Knowledge Science: Theory and Practice, jointly with The 5th International Symposium on Knowledge and Systems Science (Proc of the KSS2004), 2004, pp. 147–153.

50.

Pawlak

, Decision rules and flow networks, European Journal of Operational Research 152 (2004), 184–190.

51.

Pawlak

, Data analysis and flow graphs, Journal of Telecommunications and Information Technology 3 (2004), 1–5.

52.

Pawlak

, Flow graphs and data mining, in: Transactions on Rough Sets III, Lecture Notes in Computer Science, v. 3400 Peters

J.F.

and Skowron

, eds, Springer-Verlag Publishing, Berlin, 2005, pp. 1–36.

53.

Pawlak

, and Skowron

, Rudiments of rough sets, Information Sciences 177 (2007), 3–27.

54.

Pawlak

, Flow Graphs – a new paradigm for intelligent data analysis, Warsaw University of Technology Digital Library, 2010, 1–28.

Flow graphs as data structures for inducing classifiers

Abstract

Keywords

1. Introduction

2. A brief literature review on the use of flow graphs

3. Defining and constructing FGs

Table 1 Training set with 14 instances described by 3 attributes, A 1 , A 2 and A 3 , where A 3 ∈ { 1 , 2 , 3 } represents the instances associated class

Table 2 Complete paths of the FG in Fig. 1 and associated decision rules x * → y , where #path: counter associated with a particular path. The subscript in [ x , y ] i refers to one of possibly various complete paths from x to y

10. Conclusions and future work

Footnotes

Acknowledgments

References

Table 1
Training set with 14 instances described by 3 attributes, $A_{1}$ , $A_{2}$ and $A_{3}$ , where $A_{3}\in\{1,2,3\}$ represents the instances associated class

Table 2
Complete paths of the FG in Fig. 1 and associated decision rules $x^{*}\rightarrow y$ , where #path: counter associated with a particular path. The subscript in $[x,y]_{i}$ refers to one of possibly various complete paths from $x$ to $y$