Dynamic programming based fuzzy partition in fuzzy decision tree induction

Abstract

Fuzzy decision trees are one of the most popular extensions of decision trees for symbolic knowledge acquisition by fuzzy representation. Among the majority of fuzzy decision trees learning methods, the number of fuzzy partitions is given in advance, that is, there are the same amount of fuzzy items utilized in each condition attribute. In this study, a dynamic programming-based partition criterion for fuzzy items is designed in the framework of fuzzy decision tree induction. The proposed criterion applies an improved dynamic programming algorithm used in scheduling problems to establish an optimal number of fuzzy items for each condition attribute. Then, based on these fuzzy partitions, a fuzzy decision tree is constructed in a top-down recursive way. A comparative analysis using several traditional decision trees verify the feasibility of the proposed dynamic programming based fuzzy partition criterion. Furthermore, under the same framework of fuzzy decision trees, the proposed fuzzy partition solution can obtain a higher classification accuracy than some cases with the same amount of fuzzy items.

Keywords

Fuzzy decision trees Fuzzy partition Dynamic programming Fuzzy items

1 Introduction

Decision trees are one of the well-known methods based on some past knowledge to describe the progress of decision making. A decision tree is made up by some decision rules in the shape of internal nodes and leaf nodes [1]. Each internal node can be split into one or more children nodes to make a decision, and each leaf node is associated with a class label or an outcome. Decision trees transform a complex decision process into a set of simpler decisions to classify a given object with an easily understandable representation [2]. Many extensions have been derived from the classical decision trees because of their higher accuracy [3], fewer parameters [4], and better comprehensibility (compared to some other classification models) [5, 6]. In these extensions, fuzzy decision trees are one of the most popular approaches. They combine symbolic decision trees with the approximate reasoning offered by fuzzy representation [7]. The intent is to exploit the complementary advantages of the comprehensibility of decision trees and the uncertain information of fuzzy representation [8 –10].

Based on some splitting mechanisms, fuzzy decision trees recursively partition the training data into several subsets with the similar or the same outputs in a top-down way. Specifically, fuzzy information theory as one of the splitting mechanisms has exerted a widespread influence on the growth of fuzzy decision trees. In 1992, Weber et al. [11] presented a well-known fuzzy ID3 algorithm by modifying the information gain measure, which is used to split a node for fuzzy representation. Through eliciting fuzzy sets defined for all attributes by a user, Umanol et al. [12] designed a new algorithm based on the probability of membership values to generate a fuzzy decision tree from numerical data. Ichihashi et al. [13] extracted fuzzy reasoning rules viewed as fuzzy partitions, where an algebraic method to facilitate incremental learning was also employed. As knowledge inference must be newly defined in fuzzy decision trees, Janikow et al. [8] studied several alternatives for fuzzy decision trees based on a rule-based system and fuzzy control [14]. Then, a new splitting criterion developed with a fuzzy cumulative distribution function was designed by Qi [15]. The criteria adopted minimum classification information entropy to select expanded attributes in fuzzy decision trees model. In addition, some extended fuzzy decision trees [16, 17] were established in the framework of axiomatic fuzzy set logic by improving membership functions used in the computing of fuzzy entropy. There are also some recently proposed fuzzy decision trees [6 , 18–21]. However, the number of fuzzy partitions in many fuzzy decision tree approaches is predefined, which means the same number of fuzzy items is forcedly utilized in all the tree nodes. It is obvious that, to some extent, these kinds of partitions limit the representation ability and the classification accuracy.

This study proposes a dynamic programming based fuzzy partition (DPFP) criterion for fuzzy decision trees. The most important characteristic of this paper is to generate an optimal number of fuzzy items (fuzzy numbers or fuzzy sets) for each condition attribute during the fuzzy decision trees induction. This solution can avoid to predefine the number of fuzzy partitions that exists in many fuzzy decision trees. In other words, the fuzzy partitions used in this paper are flexible and realistic. The proposed DPFP algorithm processes datasets from the viewpoint of dynamic programming problems. We improve a dynamic programming method in scheduling problems on its optimal objective function, in which the new objective function can make the samples that belong to the same partition more compact. For different condition attributes, the numbers of partitions are different and the mean value of the samples in each partition is applied to define fuzzy items. Finally, a fuzzy decision tree is constructed based on the predefined fuzzy items. The main contributions of this paper include: (1) establishing a dynamic programming based fuzzy partition criterion to make the partitions more suitable for the uncertain information of fuzzy representation; (2) constructing a dynamic programming based fuzzy decision tree with higher classification accuracy than some fixed partitions based fuzzy decision trees.

The remainder of this paper is organized as follows: Section 2 offers a simple introduction about the relevant basic knowledge. In Section 3, a dynamic programming based fuzzy partition criterion, called DPFP, to confirm the best splitting points for each condition attribute. Then, the dynamic programming based fuzzy decision tree learning algorithm (DP-FDT) is proposed in Section 4. Section 5 provides our experimental results. Finally, Section 6 concludes this paper and describes the direction of future work.

2 Related work

In this section, we offer a simple review about a dynamic programming algorithm for semi-continuous batch scheduling problems.

2.1 Semi-continuous batch scheduling problems

In [22], Tang et al. presented a new kind of batch scheduling problem. In this problem, the jobs are processed on a machine in batch mode, where the jobs enter and leave the machine one by one and semi-continuously. The machine can handle more than one job simultaneously. The maximum number of jobs that can be held at a time in the machine is defined as the capacity of the machine. A new batch can be processed only when the processing of the previous batch is completed. For jobs in the same batch, there is a basic processing time that is defined by the longest processing time among the jobs in the batch. This problem arises in the heating operation of tube-billets in the steel industry [22].

To be able to refer to the problem being studied in a concise manner, some notation of [23] are used to present the batching machine scheduling. This notation consists of three fields, α|β|γ, where α specifies the machine environment (single machine (α = 1)); β specifies the job characteristics such as preemption, precedence relations, batching problems, and so on; and γ denotes the optimality criterion. Suppose there are n independent jobs J₁, J₂, . . . , J_n to be processed in a machine with C capacity. The processing time of job J_j is denoted by p_j. All data are assumed to be deterministic [22] since the processing time is known a priori from the product specification. Adding the symbol c - batch and the symbol C to the β-field represents the fact that the machine being scheduled is a semi-continuous batching machine and the capacity of the machine is C, respectively. The notation can be summarized as follows:

1|c - batch, C|C_max:] Minimize makespan on a single semi-continuous batching machine, of which the capacity is C.

For resolving of the problem 1|c - batch, C|C_max, there is an O (n²) time dynamic programming algorithm [22] as shown in Algorithm 1. In the algorithm, f (k) denotes the minimum makespan value of a partial schedule containing jobs J₁ through J_k. B_k represents a batch containing the k-th job. In Example 2, the calculating process is illustrate in detail.

Algorithm 1: Dynamic programming algorithm
for 1\|c - batch, C\|C_max.
Input: The capacity of a machine C; n independent jobs.
Output: The optimal schedule: ${B_{1}^{'}, B_{2}^{'}, . . ., B_{m}^{'}}$ .
1 Order the jobs by processing time p₁ ≥ . . . ≥ p_n and
denote the ordered jobs as J₁, J₂, . . . , J_n.
2 Set f (0) =0, B₀ =∅, r₀ = 0.
3 Calculate f (k), r_k, B_k, k = 1, 2, 3, . . . , n, according to
recurrence:
$f (k) = min_{r_{k - 1} \leq i \leq k - 1} {f (i) + p_{i + 1, k}},$ (1)
$r_{k} = arg min_{r_{k - 1} \leq i \leq k - 1} {f (i) + p_{i + 1, k}},$ (2)
B_k = {J_{r_k+1}, . . . . , J_k} , (3)
where $p_{i + 1, k} = p_{i + 1} (1 + \frac{k - i - 1}{C})$ .
4 Start from B_n and find the corresponding optimal schedule
${B_{1}^{'}, B_{2}^{'}, . . ., B_{m}^{'}}$ by backtracking.
5 return ${B_{1}^{'}, B_{2}^{'}, . . ., B_{m}^{'}}$ .

Example 1. Consider an example with these data: n = 10 and C = 5, where the processing time of the ordered jobs J₁, J₂, . . . , J₁₀ are shown in Table 1.

Table 1
An example with 10 independent jobs and 5 capacity

jobs J ₁ J ₂ J ₃ J ₄ J ₅ J ₆ J ₇ J ₈ J ₉ J ₁₀

p _j 10 10 3 1.8 1 1 1 1 1 1

jobs	J ₁	J ₂	J ₃	J ₄	J ₅	J ₆	J ₇	J ₈	J ₉	J ₁₀
p _j	10	10	3	1.8	1	1	1	1	1	1

According to the initial condition, we can get f (0) =0, r₀ = 0, B₀ =∅.

At k = 1, i ∈ [r_k-1, k - 1] = [0, 0] = {0}, when i = 0, B₀ is divided from the 0-th position, which means J₁ forms a single batch, that is, B₁ = {J₁}. As the processing time of batch B₁ is p_1,1 = p₁ = 10, the total processing time is f (0) + p_1,1 = 10. Therefore, f (1) = f (0) + p_1,1 = 10, r₁ = 0, B₁ = {J₁}.

At k = 2, i ∈ [r_k-1, k - 1] = [0, 1] = {0, 1}, when i = 0, the batch B₁ is divided from the 0-th position, which means J₂ and B₁ form one batch, that is, B₂ = {J₁, J₂}. As the processing time of batch B₂ is $p_{1, 2} = p_{1} (1 + \frac{| B_{2} | - 1}{C}) = 12$ , for the current batch, the total processing time is f (0) + p_1,2 = 12. When i = 1, the batch B₁ is divided from the 1-st position, which means that J₂ forms a single batch, that is, B₂ = {J₂}. As the processing time of batch B₂ is p_2,2 = p₂ = 10, for current batch, the total processing time is f (1) + p_2,2 = 20. Therefore, f (2) = min {f (0) + p_1,2 ; f (1) + p_2,2} = min {12 ; 20} =12, r₂ = 0, B₂ = {J₁, J₂}.

At k = 3, i ∈ [r_k-1, k - 1] = [0, 2] = {0, 1, 2}, when i = 0, the batch B₂ is divided from the 0-th position, which means J₃ and B₂ form one batch, that is, B₃ = {J₁, J₂, J₃}. As the processing time of batch is $p_{1, 3} = p_{1} (1 + \frac{| B_{3} | - 1}{C}) = 14$ , for the current batch, the total processing time is f (0) + p_1,3 = 14. When i = 1, the batch B₂ is divided from the 1-st position, which means that J₃ and J₂ form one batch, that is, B₃ = {J₂, J₃}. As the processing time of batch B₃ is $p_{2, 3} = p_{2} (1 + \frac{| B_{3} | - 1}{C}) = 12$ , for the current batch, the total processing time is f (1) + p_2,3 = 22. When i = 2, the batch B₂ is divided from the 2-nd position, which means J₃ forms a single batch, that is, B₃ = {J₃}. As the processing time of batch B₃ is p_3,3 = p₃ = 3, for current batch, the total processing time is f (2) + p_3,3 = 15. Therefore, f (3) = min {f (0) + p_1,3 ; f (1) + p_2,3 ; f (2) + p_3,3} = min {14 ; 22 ; 15} =14, r₃ = 0, B₃ = {J₁, J₂, J₃}.

Similarly, the following results can be computed:

f (4) =15.6, r₄ = 2, B₄ = {J₃, J₄}.

f (5) =14, r₅ = 3, B₅ = {J₄, J₅}.

f (7) =16.88, r₇ = 3, B₇ = {J₄, J₅, J₆, J₇}.

f (8) =17.2, r₈ = 4, B₈ = {J₅, J₆, J₇, J₈}.

f (9) =17.4, r₉ = 4, B₉ = {J₅, J₆, J₇, J₈, J₉}.

f (10) =17.6, r₁₀ = 4, B₁₀ = {J₅, J₆, J₇, J₈, J₉, J₁₀}.

Finally, the optimal value is 17.6 and the last optimal batch is B₁₀. By backtracking, we can get the corresponding optimal schedule: {J₁, J₂}, {J₃, J₄}, {J₅, J₆, J₇, J₈, J₉, J₁₀}.

3 An improved dynamic programming algorithm and its application in fuzzy partition

In this section, an improved dynamic programming algorithm based on a new optimal objective function is introduced. Then, the fuzzy partition criterion DPFP based on this improved dynamic programming algorithm is proposed. Let X be a set of samples with M condition attributes $A = {A_{i}}_{i = 1}^{M}$ and one decision attribute D. For a sample x in X, the attribute value of x on condition attribute A_i is denoted with symbol v (x, A_i).

3.1 An improved dynamic programming algorithm

First, an introduction is given below about how to deal with data set X from the viewpoint of adapting dynamic programming algorithm (Algorithm 1). For each condition attribute A_i in data set X, there are |X| attribute values, that is, v (x₁, A_i), v (x₂, A_i), . . ., v (x_|X|, A_i). We make three assumptions: (1) the samples are treated as |X| independent jobs J₁, J₂, . . . , J_|X|; (2) the X attribute values are regarded as the corresponding processing time p₁, p₂, . . . , p_|X|; (3) in Algorithm 1, there is a parameter C representing the capacity of the machine, where we consider it as a given parameter. Based on these assumptions, the dynamic programming Algorithm 1 can be applied to each condition attribute of data set X. The detailed process flow can be illustrated by Fig. 1.

Fig. 1

Applying dynamic programming algorithm to attribute values.

In this work, our target is partitioning the |X| condition attribute values to produce fuzzy representations. In order to make the samples in a same fuzzy representation more compact, we employ a new optimal objective function to improve Algorithm 1. Suppose the |X| attribute values are ordered as p₁ ≥ , . . . , ≥ p_|X| and denoted as J₁, J₂, . . . , J_|X|. For each alternative batch B_k containing the k-th job (suppose the first job in B_k is J_{r_k+1}), we define ∥diff (B_k)∥ as follows: $∥ diff (B_{k}) ∥ = \sum_{i = r_{k} + 1}^{k - 1} | p_{i + 1} - p_{i} | .$ (4) Actually, diff (B_k) denotes the difference-vector of the elements in B_k and symbol ∥· ∥ denotes 1-norm. We add ∥diff (B_k)∥ to the original optimal objective function in Algorithm 3 to generate a new optimal objective function.

The improved dynamic programming algorithm based on the new objective function is established in the following Algorithm 2, where α ∈ [0, 1] is a parameter and α * |X| can be regarded as the capacity C of the used machine. From Example 2, we can easily find the improved dynamic programming algorithm can make the final partitions more compact than the original dynamic programming algorithm.

Algorithm 2: An improved dynamic programming algorithm.
Input: A preset parameter α; \|X\| condition attribute
values
Output: The optimal partitions: ${B_{1}^{'}, B_{2}^{'}, . . ., B_{m}^{'}}$ .
1 Order the attribute values as p₁ ≥ . . . ≥ p_\|X\| and denote
the original indices as J₁, J₂, . . . , J_\|X\|.
2 Set f (0) =0, B₀ =∅, r₀ = 0.
3 Calculate f (k), r_k, B_k, k = 1, 2, 3, . . . , \|X\|, according to
recurrence:
$f (k) = min_{r_{k - 1} \leq i \leq k - 1} {f (i) + p_{i + 1, k} + ∥ diff (B_{k}) ∥},$ (5)
$r_{k} = arg min_{r_{k - 1} \leq i \leq k - 1} {f (i) + p_{i + 1, k} + ∥ diff (B_{k}) ∥},$ (6)
B_k = {J_{r_k+1}, . . . . , J_k} , (7)
where $p_{i + 1, k} = p_{i + 1} (1 + \frac{k - i - 1}{α * \| X \|})$ , ∥diff (B_k)∥ is
defined by Equation 4.
4 Start from B_N and find the corresponding optimal schedule
${B_{1}^{'}, B_{2}^{'}, . . ., B_{L}^{'}}$ by backtracking.
5 return ${B_{1}^{'}, B_{2}^{'}, . . ., B_{L}^{'}}$ .

Example 2. Consider a data set named iris coming from UCI Machine Learning Repository [24], where the data set contains 150 instances with 4 condition attributes and one decision attribute. In this example, we only consider the attribute named petallength. There are 150 attribute values in the focused attribute. The two dynamic programming algorithms (Algorithm 3 and Algorithm 3.1) are applied to these 150 attribute values, respectively. The final partitions are shown in Fig. 2. From the distribution of the samples in each partition, we can find the middle partition generated by the original dynamic programming algorithm is disperse because of the two samples (the left image). Actually, the two samples are more near the bottom partition. By improving the optimal objective function, this problem can be resolved (the right image).

Fig. 2

The final partitions of two dynamic programming algorithms on iris dataset.

3.2 The dynamic programming based fuzzy partition algorithm

In terms of the improved dynamic programming algorithm, a partition algorithm named DPFP is designed. The purpose of DPFP algorithm is searching the optimal splitting points to generate a different number of fuzzy items for each condition attribute during the induction of the fuzzy decision tree. For a condition attribute A_i, suppose there are M_i the optimal partitions, that is, ${B_{1}^{'}, B_{2}^{'}, . . ., B_{M_{i}}^{'}}$ , which is obtained from Algorithm 2. Then, we take the average value of each batch $B_{m}^{'}$ (m = 1, 2, . . . , M_i) as the corresponding splitting point to define the related fuzzy item. The detailed process of the designed DPFP algorithm can be found in Algorithm 3.

4 The dynamic programming based fuzzy decision trees

In this section, the forms of fuzzy items related to fuzzy decision trees induction are first introduced. Then, a simple review is made about fuzzy entropy and fuzzy information gain. Finally, the dynamic programming based fuzzy decision tree, called DP-FDT, is established. Let X be a set of samples with M condition attributes $A = {A_{i}}_{i = 1}^{M}$ and one decision attribute D. Suppose there are M_i splitting points ${{ms}_{m, i}}_{m = 1}^{M_{i}}$ generated by applying Algorithm 3 to attribute A_i, then the number of fuzzy items defined on attribute A_i is M_i. The fuzzy items are denoted as ${A_{i}^{m}}_{m = 1}^{M_{i}}$ . The detailed constructing process can be described by Fig. 3.

Fig. 3

Constructing dynamic programming based fuzzy decision trees.

4.1 The form of fuzzy items

Many forms of fuzzy items have been commonly studied, such as the asymmetric trapezoidal form and the triangular form [25, 26]. In this paper, the forms of membership function of fuzzy items used are mixed types, such as Figure 4 in Example 3, where the first fuzzy item uses the right trapezoidal form, the last fuzzy item uses the left trapezoidal form, and the mid fuzzy items take the triangular form.

For an attribute A_i, we treat the splitting points ${{ms}_{m, i}}_{m = 1}^{M_{i}}$ as the parameters of fuzzy items ${A_{i}^{m}}_{m = 1}^{M_{i}}$ , which play a pivotal role in the fuzzy decision trees. According to the forms of membership functions, the fuzzy item $A_{i}^{1}$ is right trapezoidal and characterized by two parameters ms_1,i and ms_2,i. The fuzzy items $A_{i}^{m}$ (m = 2, 3, . . . , M_i - 1) are triangular and characterized by three parameters ms_m-1,i, ms_m,i and ms_m+1,i. The fuzzy item $A_{i}^{M_{i}}$ is left trapezoidal and has two parameters ms_{M_i-1,i} and ms_{M_i,i}. As shown in Example 3, the fuzzy items $A_{i}^{1}$ and $A_{i}^{4}$ are characterized by ms_1,i, ms_2,i and ms_3,i, ms_4,i, respectively. The fuzzy items $A_{i}^{2}$ and $A_{i}^{3}$ are characterized by ms_m,i (m = 1, 2, 3) and ms_m,i (m = 2, 3, 4), respectively. Suppose a is an attribute value in attribute A_i, $μ_{A_{i}^{m}} (a)$ represents the membership degree of a on fuzzy items $A_{i}^{m}$ , then $μ_{A_{i}^{1}} (a)$ , $μ_{A_{i}^{m}} (a)$ (m = 2, 3, …, M_i - 1) and $μ_{A_{i}^{M_{i}}} (a)$ can be computed as follows: $μ_{A_{i}^{1}} (a) = {\begin{matrix} 1, & a \leq {ms}_{1, i} \\ \frac{{ms}_{2, i} - a}{{ms}_{2, i} - {ms}_{1, i}}, & {ms}_{1, i} < a \leq {ms}_{2, i} \\ 0, & a > {ms}_{2, i} \end{matrix} .$ (9) $μ_{A_{i}^{m}} (a) = {\begin{matrix} 0, & a \leq {ms}_{m - 1, i} \\ \frac{a - {ms}_{m - 1, i}}{{ms}_{m, i} - {ms}_{m - 1, i}}, & {ms}_{m - 1, i} < a \leq {ms}_{m, i} \\ \frac{{ms}_{m + 1, i} - a}{{ms}_{m + 1, i} - {ms}_{m, i}}, & {ms}_{m, i} < a \leq {ms}_{m + 1, i} \\ 0, & a > {ms}_{m + 1, i} \end{matrix} .$ (10)

$μ_{A_{i}^{M_{i}}} (a) = {\begin{matrix} 0, & a \leq {ms}_{M_{i} - 1, i} \\ \frac{a - {ms}_{M_{i} - 1, i}}{{ms}_{M_{i}, i} - {ms}_{M_{i} - 1, i}}, & {ms}_{M_{i} - 1, i} < a \leq {ms}_{M_{i}, i} \\ 1, & a > {ms}_{c, j} \end{matrix} .$ (11)

Example 3. Suppose the splitting points generated by Algorithm 3 are ms_1,i, ms_2,i, ms_3,i and ms_4,i. The number of fuzzy items is 4 and the one-dimensional attribute space of A_i is described in terms of four fuzzy items “small", “medium small", “medium big" and “big". The forms of membership functions of the fuzzy items $A_{i}^{1}$ , $A_{i}^{2}$ , $A_{i}^{3}$ and $A_{i}^{4}$ can be shown in Fig. 4.

Fig. 4

Triangular and trapezoidal forms of fuzzy items.

4.2 Fuzzy entropy and fuzzy information gain used in fuzzy decision trees

There are many attribute selection criteria in fuzzy decision trees. The most common criterion is information gain used in the Fuzzy ID3 algorithm [11, 12]. The Fuzzy ID3 algorithm is a well-known decision tree extending from Interactive Dichotomizer 3 (ID3) algorithm [1]. Basically, both of them use an information theoretic measure of entropy to select an attribute for splitting such that the information difference is maximized between that contained in a given node and in its children nodes [10].

Let β^N denote the set of fuzzy items appearing on the path leading to a tree node N, μ_{β
^N} (x) represents the membership degree of a sample x ∈ X at the node N. If $β^{N} = {A_{i}^{m} | i \in {1, 2, . . ., M}, m \in {1, 2, . . ., M_{i}}}$ , we define μ_{β
^N} (x) in the following formula: $μ_{β^{N}} (x) = \prod_{A_{i}^{m} \in β^{N}} μ_{A_{i}^{m}} (v (x, A_{i})),$ (12) where v (x, A_i) is the attribute value of x on A_i, $μ_{A_{i}^{m}} (v (x, A_{i}))$ is defined by Equation 9, Equation 10 or Equation 11.

Then, the fuzzy entropy [8] can be calculated as follows: $E^{N} (X) = \sum_{c = 1}^{C} - \frac{P_{c}^{N}}{P^{N}} {log}_{2} \frac{P_{c}^{N}}{P^{N}},$ (13) $P_{c}^{N} = \sum_{x \in X_{c}} μ_{β^{N}} (x),$ (14) $P^{N} = \sum_{x \in X} μ_{β^{N}} (x),$ (15) where X_c is a set of samples belonging to the c-th class in X (C is the total number of classes), μ_{β
^N} (x) is defined by Equation 12. Especially, $X = β_{δ}^{N}$ is δ (δ ∈ (0, 1)) cut set of β^N defined as: $β_{δ}^{N} = {x \in X | μ_{β^{N}} (x) > δ} .$ (16)

Finally, fuzzy information gain [8] for each attribute is computed as: ${IG}^{N} (X, A_{i}) = E^{N} (X) - \sum_{A_{i}^{p}}^{P} \frac{P^{N_{p}}}{\sum_{A_{i}^{p}}^{P} P^{N_{p}}} E^{N_{p}} (X),$ (17) $P^{N_{p}} = \sum_{x \in X_{p}} μ_{β^{N_{p}}} (x),$ (18) where $A_{i}^{p}$ (p = 1, 2, . . . , P) denotes the P fuzzy numbers for attribute A_i, $N_{p} = β^{N} \cup A_{i}^{p}$ , $X_{p} = β_{δ}^{N_{p}}$ is the δ (δ ∈ (0, 1)) cut set of fuzzy set β^{N
_p} which can be described by Equation 16.

4.3 The dynamic programming based fuzzy decision tree

Following the traditional construction process, advance consideration are taken to induct DP-FDT from four aspects: (1) predefining fuzzy items: defining or appointing the forms of fuzzy items; (2) splitting rule: selecting an attribute to generate tree’s branches; (3) stopping criterion: determining when the tree’s growth should be stopped; and (4) labeling rules: assigning the class label for leaf nodes. The following Algorithm 4 summarizes the overall flow in detail.

As shown in Algorithm 4, to predefine the used fuzzy items, the proposed Algorithm 3 is first applied to confirm the splitting points for each attribute. Then, the fuzzy items are defined according to the previous introduction about the forms.

About the splitting rule, to evaluate the feature quality, fuzzy information gain is employed, which is widely used in node splitting criteria for fuzzy decision trees. During the decision making, the attribute adding the greatest information about the decision is selected first [27]. A larger mount of information gain means there is a decrease in entropy. The growth process of tree is guided by the maximum information gain. During each repetition of the fuzzy decision trees, if an attribute has the greatest gain of information, then it is selected as the best splitting attribute for the next splitting of tree nodes.

Regarding stopping criteria, there are usually three cases for stopping in the proposed DP-FDT algorithm, as described in Algorithm 4. Obviously, one is that if all the samples covered in a tree node belong to the same class, then the entropy is zero. In this case, there is no need to split the node on the corresponding decision level. The second case is that when all the attributes are used in the path from the root to a tree node, the growth of the tree should be stopped in this node. The final stopping criterion is the information gain. If the maximum information gain at a tree node is zero or negative, then the growth of the tree is stopped.

For the labeling rules, if all the samples in a leaf node belong to the same class, then the class label is assigned to this leaf node. Otherwise, the leaf node of tree is assigned with the class label containing most of the samples covered by the current node, which is widely discussed and used in many decision tree classifiers.

5 Experimental studies

In this section, several experimental studies are described. First, Section 5.1 introduces the computing environment and the datasets. In Section 5.2, we investigate the impact of parameter α on the number of fuzzy items. Furthermore, Section 5.3 provides a detailed comparative analysis of the classification accuracy with several traditional decision tree classifiers. Finally, in Section 5.4, the proposed DPFP criterion is compared with two fuzzy items partitioning criteria in terms of classification accuracy, the number of leaves and the depth of tree in a same constructing framework of fuzzy decision trees.

5.1 Computing environment and datasets

In the experimental studies, the proposed DPFP and DP-FDT algorithms are implemented in Java programming language. The data structures of WEKA [28] and the related software package are utilised (Version 3.6.9). All the experiments are conducted on a machine with a Pentium 5 3.30 GHz processor, 8 GB of RAM, and Ubuntu 14.04.1 LTS (64 Bit) operating system.

To clearly test the behaviors of our approach in real-world applications, we choose 24 datasets from the UCI Machine Learning Repository [24] as shown in Table 2. The collected datasets are related to life sciences, social sciences, and medical sciences. As a benchmark, these datasets have been widely discussed in many of the proposed approaches. In the experiments, all the results are the average values based on ten times 10-folds cross-validation technique.

Table 2
The detailed information of the used data sets

No. Data sets Attributes Class Instances

Numerical Nominal

1 australian 6 8 2 690

2 auto_mpg 5 2 3 392

3 BLOGGER 0 5 2 100

4 blood 4 0 2 748

5 breast-cancer-w-d 30 0 2 569

6 breast-cancer-w-o 0 9 2 699

7 breast-cancer-w-p 33 0 2 198

8 breast-cancer 0 9 2 286

9 breast-tissue 9 0 6 106

10 car 0 6 4 1,728

11 creditapproval 6 9 2 690

12 diabetes 8 0 2 768

13 ecoli 7 0 8 336

14 fertility 9 0 2 100

15 haberman 2 1 2 306

16 heart-statlog 13 0 2 270

17 ILPD 9 1 2 281

18 iris 4 0 3 150

19 lenses 4 0 3 24

20 new-thyroid 5 0 3 215

21 pima-indians-diabetes 8 0 2 768

22 sonar 60 0 2 208

23 spectf-heart 44 0 2 267

24 wine 13 0 3 178

No.	Data sets	Attributes	Class	Instances
1	australian	6	8	2	690
2	auto_mpg	5	2	3	392
3	BLOGGER	0	5	2	100
4	blood	4	0	2	748
5	breast-cancer-w-d	30	0	2	569
6	breast-cancer-w-o	0	9	2	699
7	breast-cancer-w-p	33	0	2	198
8	breast-cancer	0	9	2	286
9	breast-tissue	9	0	6	106
10	car	0	6	4	1,728
11	creditapproval	6	9	2	690
12	diabetes	8	0	2	768
13	ecoli	7	0	8	336
14	fertility	9	0	2	100
15	haberman	2	1	2	306
16	heart-statlog	13	0	2	270
17	ILPD	9	1	2	281
18	iris	4	0	3	150
19	lenses	4	0	3	24
20	new-thyroid	5	0	3	215
21	pima-indians-diabetes	8	0	2	768
22	sonar	60	0	2	208
23	spectf-heart	44	0	2	267
24	wine	13	0	3	178

5.2 The impact of parameter α on the number of fuzzy items

In this experiment, our goal is to research the impact of parameter α on the number of fuzzy items for the data sets. From the form of fuzzy items discussed in this paper, it is obvious that the number of splitting points generated by the proposed DPFP algorithm (Algorithm 3) is equal to the number of fuzzy items used in the proposed DP-FDT algorithm. Therefore, to study the changes of fuzzy items on each condition attribute when the parameter α is shifted, the proposed Algorithm 3 is employed. The detailed changes can be found in Fig. 5, where the Y-axis is the parameter α varying from 0.1 to 1.0 under a step length 0.1, and the X-axis is the indices of the condition attributes and different colors represent the different number of fuzzy items.

Fig. 5

The impacts of parameter α on the number of fuzzy items.

From Fig. 5, three observations can be made: (1) for a fixed α, the number of fuzzy items on different condition attributes are different; (2) with an increase of α in a certain range, the number of fuzzy items on one condition attribute has a slight decrease; and (3) when parameter α is large enough, the number of fuzzy items has no obvious changes for some condition attributes.

The underlying reasons for the above-stated results can be understood through a review of the proposed DPFP algorithm. In the proposed DPFP algorithm, there is an improved dynamic programming algorithm, where the parameter α ∗ N (where N is the number of attribute values or the number of jobs) can be regarded as the capacity C of a machine used in the steel industry. For a certain machine, the capacity is confirmed. By applying dynamic programming algorithm, different jobs may form different optimal batch schedules, which means different attribute values may have different splitting points in Algorithm 3. For multiple machines with different capacities, when the capacity of one machine is small, the optimal batch schedules of the same jobs may be different because of the capacity, however, when the capacity increases to a certain degree, the same jobs may have same optimal batch schedules. Therefore, it can be concluded that the proposed DPFP algorithm can divide different condition attributes to different numbers of fuzzy partitions on a given parameter α.

5.3 The comparison with non-fuzzy decision trees

In order to evaluate the classification accuracy of the proposed DP-FDT approach, DP-FDT is compared with several traditional decision tree classifiers, including C4.5 decision tree (C4.5) [29], Multi-class alternating decision tree (LAD) [30], Simple cart (SC) [31], Random decision tree (RT) [32] and Random forest (RF) [32] (More detailed comparisons with some fuzzy decision trees are described in the next section, and thus we only discuss some classic non-fuzzy decision trees at here). All these existing decision tree classifiers are implemented by the Weka machine learning toolkit [33] (The version is Weka 3.6.9). Meanwhile, all the parameters used in these methods are set as their defaults. In the proposed DP-FDT algorithm, the parameters α and δ are respectively preset to 0.1 and 0.5. The employed datasets are 24 benchmark data sets which can be found in Table 2.

This study mainly focus on the comparative analysis on classification accuracy. The experimental results are shown in Table 3, where all the results are the average values of ten times 10-folds cross-validation technique. Meanwhile, the results marked with a bold face represent that the current traditional method can obtain a higher accuracy than the proposed DP-FDT algorithm.

Table 3
The comparative results on classification accuracy with non-fuzzy decision trees

Data sets C4.5 LAD SC RT RF DP-FDT

australian 0.8190 0.8025 0.8328 0.7828 0.7725 0.8380

auto_mpg 0.7644 0.6221 0.8110 0.7220 0.7266 0.6919

BLOGGER 0.7470 0.7940 0.7210 0.8090 0.810 0.8450

blood 0.7755 0.7126 0.7541 0.7071 0.7092 0.7438

breast-cancer-w-d 0.9313 0.9104 0.9267 0.9283 0.9288 0.9292

breast-cancer-w-o 0.9359 0.9430 0.9329 0.9223 0.9252 0.9445

breast-cancer-w-p 0.7111 0.6595 0.7159 0.6643 0.6715 0.7389

breast-cancer 0.6606 0.6973 0.6563 0.6420 0.6623 0.6460

breast-tissue 0.6449 0.5643 0.6418 0.6295 0.6301 0.6095

car 0.9342 0.9008 0.9730 0.8388 0.8325 0.9381

creditapproval 0.8206 0.7967 0.8257 0.7762 0.779 0.8451

diabetes 0.7345 0.6799 0.7105 0.6920 0.6948 0.6982

ecoli 0.8150 0.6997 0.8201 0.7736 0.7483 0.7712

fertility 0.8420 0.7660 0.8340 0.7830 0.7820 0.7990

haberman 0.6861 0.6915 0.6813 0.6420 0.6389 0.7039

heart-statlog 0.7567 0.6978 0.7581 0.7307 0.7241 0.7785

ILPD 0.6828 0.6840 0.6883 0.6734 0.6841 0.6951

iris 0.9427 0.9353 0.9373 0.9320 0.9340 0.9793

lenses 0.7883 0.8500 0.7650 0.7317 0.7533 0.7617

new-thyroid 0.9230 0.8853 0.9266 0.9280 0.9315 .8953

pima-indians-diabetes 0.7302 0.672 0.7148 0.6891 0.6912 0.7009

sonar 0.7109 0.6611 0.7053 0.7064 0.7038 0.7409

spectf-heart 0.7462 0.7131 0.7338 0.7314 0.7335 0.7982

wine 0.9268 0.8478 0.8934 0.9167 0.9067 0.9283

Avg. 0.7929 0.7578 0.7900 0.7647 0.7656 0.7925

(11/24) (2/24) (11/24) (4/24) (4/24)

Data sets	C4.5	LAD	SC	RT	RF	DP-FDT
australian	0.8190	0.8025	0.8328	0.7828	0.7725	0.8380
auto_mpg	0.7644	0.6221	0.8110	0.7220	0.7266	0.6919
BLOGGER	0.7470	0.7940	0.7210	0.8090	0.810	0.8450
blood	0.7755	0.7126	0.7541	0.7071	0.7092	0.7438
breast-cancer-w-d	0.9313	0.9104	0.9267	0.9283	0.9288	0.9292
breast-cancer-w-o	0.9359	0.9430	0.9329	0.9223	0.9252	0.9445
breast-cancer-w-p	0.7111	0.6595	0.7159	0.6643	0.6715	0.7389
breast-cancer	0.6606	0.6973	0.6563	0.6420	0.6623	0.6460
breast-tissue	0.6449	0.5643	0.6418	0.6295	0.6301	0.6095
car	0.9342	0.9008	0.9730	0.8388	0.8325	0.9381
creditapproval	0.8206	0.7967	0.8257	0.7762	0.779	0.8451
diabetes	0.7345	0.6799	0.7105	0.6920	0.6948	0.6982
ecoli	0.8150	0.6997	0.8201	0.7736	0.7483	0.7712
fertility	0.8420	0.7660	0.8340	0.7830	0.7820	0.7990
haberman	0.6861	0.6915	0.6813	0.6420	0.6389	0.7039
heart-statlog	0.7567	0.6978	0.7581	0.7307	0.7241	0.7785
ILPD	0.6828	0.6840	0.6883	0.6734	0.6841	0.6951
iris	0.9427	0.9353	0.9373	0.9320	0.9340	0.9793
lenses	0.7883	0.8500	0.7650	0.7317	0.7533	0.7617
new-thyroid	0.9230	0.8853	0.9266	0.9280	0.9315	.8953
pima-indians-diabetes	0.7302	0.672	0.7148	0.6891	0.6912	0.7009
sonar	0.7109	0.6611	0.7053	0.7064	0.7038	0.7409
spectf-heart	0.7462	0.7131	0.7338	0.7314	0.7335	0.7982
wine	0.9268	0.8478	0.8934	0.9167	0.9067	0.9283
Avg.	0.7929	0.7578	0.7900	0.7647	0.7656	0.7925
	(11/24)	(2/24)	(11/24)	(4/24)	(4/24)

From Table 3, we can find that, in the comparisons with the proposed DP-FDT algorithm, the traditional C4.5, LAD, SC, RT and RF classifiers can obtain better performance on 11, 2, 11, 14,4 and 4 datasets in 24 datasets, respectively. According to the average of the testing accuracies on all the testing datasets, only the C4.5 algorithm is slightly higher than the proposed DP-FDT algorithm.

In order to give a statistical analysis of the results, the Friedman test [34] is employed to verify if there is any significant difference among these results, where the null hypothesis of the Friedman test is that the tested indices are equivalent. Since the tests are related to 24 datasets and 6 classifiers, the Friedman statistic F_F is distributed according to the F-distribution with 5 and 115 degrees of freedom, and the critical value for the significance level 0.05 is 2.2932. As the F_F statistics for the results in Table 7 is 12.3307 greater than 2.2932, the null hypothesis cannot be rejected, which means that there is a significant difference among these classifiers.

Therefore, we use the Nemenyi post hoc test [34] to evaluate the pairwise differences, where the critical difference (CD) is 1.5392 for α = 0.05. In Table 4, the pairwise differences (in absolute value) of the average performance ranks for the classification accuracies are described. The underlined values are the pairwise differences greater than CD, which implies that there is a statistically significant difference in performance. From the highlights in Table 4, the following conclusions can be summarized about the proposed DP-FDT algorithm: (1) the proposed DP-FDT algorithm performs significantly better than the traditional LAD, RT and RF classifiers; and (2) the C4.5 classifier and SC classifier are not significantly different from the proposed DP-FDT algorithm.

Table 4

Pairwise differences of average performance ranks for classification accuracy

Classifiers	C4.5	LAD	SC	RT	RF	DP-FDT
C4.5	—	—	—	—	—	—
LAD	2.2500	—	—	—	—	—
SC	0.3750	1.8750	—	—	—	—
RT	2.2083	0.0417	1.8333	—	—	—
RF	1.7500	0.5000	1.3750	0.4583	—	—
DP-FDT	0.0833	2.3333	0.4583	2.2917	1.8333	—

Based on the above analysis, the proposed DP-FDT approach can obtain a similar classification accuracy to the existing traditional tree models, and even better than some approaches such as the LAD classifier, RT classifier and RF classifier.

5.4 The comparison with fuzzy decision trees

The goal of this experiment is to compare the proposed DPFP criterion with two fuzzy items partitioning criteria in the same induction framework of fuzzy decision trees. The two partitioning criteria need to predefine a fixed number of fuzzy items for each condition attribute. One criterion is that the number of splitting points used in fuzzy items is equal to 3 [7, 17], where the first splitting point is the minimal attribute value, the second splitting point is the mean value, and the third splitting point is the maximal attribute value. The other criterion is that the number of splitting points is equal to the number of classes [6, 35], where the cut points are calculated as follows: ${cp}_{i} = min + \frac{(max - min) * (i - 1)}{NC - 1},$ (19) where min is the minimal attribute value, max is maximal attribute value, NC is the number of classes.

In this experiment, the above three different fuzzy items partitioning criteria are simply marked as Fuzzy items=DP, Fuzzy items=3, and Fuzzy items=Class. It should be note that the same framework is employed to construct tree except for the partitioning of fuzzy items. The parameter δ varies from 0.1 to 1.0 under a step length 0.1 during the induction. The parameter α used in DP-FDT is 0.1. More in detail, we are interested in evaluating the impacts of these criteria in final decision trees on the classification accuracy, the number of leaves, and the depth of the constructed trees. The detailed results on 12 CUI datasets can be found in Figs. 6, 7, and 8.

Fig. 6

The impact of different partitioning criteria on the classification accuracy with varying of δ.

Fig. 7

The impact of different partitioning criteria on the number of leaves with varying of δ.

Fig. 8

The impact of different partitioning criteria on the depth of tree with varying of δ.

As shown in the curves in Fig. 6, in most cases, the proposed DPFP criterion-based decision trees possess higher classification accuracies than the other two partitioning criteria. Moreover, regarding the curves in Fig. 7 and Fig. 8, there are not obvious differences among the three partitioning criteria. Meanwhile, with an increase of the parameter δ, all the curves have a decreasing tendency, which is in accordance with the intent of parameter δ. In conclusion, the proposed dynamic programming based fuzzy partition criterion for fuzzy items has more advantages over the two partition criteria for classification accuracy.

5.5 The analysis on artificial data sets

In this section, we further investigate the proposed DPFP criterion in terms of the classification accuracies, the number of leaves and the depth of tree on artificial data sets in fuzzy decision trees. During the experiments, the parameters α and δ are preset to 0.1 and 0.5, respectively. The generated artificial data sets mainly contain three categories (each category contains 200 instances), where each category obeys the normal distribution, and the means are μ = 2, μ = 6 and μ = 10, respectively. Based on these artificial data sets, this experiment mainly focuses on three aspects: the dimensionality, the noise intensity and the extreme outliers. Besides, the results are the average values of ten times 10-folds cross-validation technique.

For the dimensionality of data, we generate 6 artificial data sets, where the standard deviation σ is 1 in the normal distribution for all categories, and the dimensionality varies from 2, 4, 6, 8, 10 and 12, respectively. The experimental results can be found in Fig. 9. From the results, it is obvious that the performances on the classification accuracies, the number of leaves and the depth of tree tend to be stable with the increase of dimensionality.

Fig. 9

The comparative results on different dimensionality.

For the noise intensity, there are also 6 artificial data sets, where the dimensionality is fixed to 4, and the standard deviations for all categories in these data sets are σ = 1.0, σ = 1.1, σ = 1.2, σ = 1.3, σ = 1.4 and σ = 1.5, respectively. The artificial data set with the standard deviation σ = 1.0 is treated as the benchmark data set. With the increase of standard deviation, the noise intensity is increasing gradually. In order to overcome the influence of noise on the benchmark data set, the number of leaves and the depth of trees appear an upward trend, but the classification accuracy is slightly reduced (about 2.4%). Thus, the proposed method has some capability of noise immunity. The detailed experimental results are presented in Fig. 10.

Fig. 10

The comparative results on different noise intensity.

For the extreme outliers, we first generate a data set with the standard deviation σ = 1 for all categories, and the dimensionality is fixed to 4. Then, the maximum value of each dimensionality (condition attribute) are enlarged to 2 times, 4 times, 6 times, 8 times and 10 times to conduct some extreme outliers, respectively. Through replacing the original values with these extreme outliers, the rest 5 data sets are produced. The following Fig. 11 describes the comparative results in detail. We can find that the different extreme outliers have relatively little effect on the performance results.

Fig. 11

The comparative results on different extreme outliers.

6 Conclusion

In this paper, a dynamic programming based fuzzy partition criterion, namely DPFP, is established for fuzzy items that is mainly used in fuzzy decision tree induction. The proposed DPFP algorithm can produce different numbers of fuzzy items on different condition attributes, which resolves the fixed partitioning problem that occurs in some fuzzy decision trees. A new objective function is utilized in the established DPFP algorithm for a dynamic programming problem that makes the partitions more compact. In addition, the dynamic programming based fuzzy decision tree learning algorithm DP-FDT is proposed. The experimental studies investigate the number of fuzzy items for different condition attributes. Then, by comparing with several traditional decision tree models on classification accuracy, the proposed DP-FDT algorithm is comparable. Moreover, the DP-FDT algorithm presents a higher classification accuracy than some other fixed partitioning criteria.

It deserves to point out that there is a new parameter α in the proposed DP-FDT algorithm. How to optimize the parameter α is a meaningful direction in the future research. Besides, the proposed DPFP criterion may decrease the training time efficiency, since the dynamic programming algorithm needs more time to search the optimal partitions. However, it is necessary to highlight that the additional time can be tolerated in this study.

Footnotes

Acknowledgments

This work is supported by the Research Foundation for Advanced Talents of Henan University of Technology (No. 2019BS007), the Open Fund of Key Laboratory of Grain Information Processing and Control (Henan University of Technology), Ministry of Education (No. KFJJ-2020-112), and the National Natural Science Foundation of China under Grants (Nos. 62006071, 61773352).

References

Quinlan

J.R.

, Induction of decision trees, Machine Learning 1(1) (1986), 81–106.

Mitra

, Konwar

K.M.

and Pal

S.K.

, Fuzzy decision tree, linguistic rules and fuzzy knowledge-based network: generation and evaluation, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 32(4) (2002), 328–339.

Lim

T.-S.

, Loh

W.-Y.

and Shih

Y.-S.

, An empirical comparison of decision trees and other classification methods, in Technical Report. Citeseer, (1998).

Gehrke

, Ganti

, Ramakrishnan

and Loh

W.-Y.

, Boat optimistic decision tree construction, in ACM SIGMOD Record 28(2). ACM, (1999), 169–180.

Mehta

, Agrawal

and Rissanen

, Sliq: A fast scalable classifier for data mining, in International Conference on Extending Database Technology. Springer, (1996), 18–32.

Wang

, Liu

, Pedrycz

and Zhang

, Fuzzy rule based decision trees, Pattern Recognition 48(1) (2015), 50–59.

Wang

, Li

, Yan

and Chen

, A survey of fuzzy decision tree classifier methodology, in Fuzzy Information and Engineering. Springer (2007), 959–968.

Janikow

C.Z.

, Fuzzy decision trees: issues and methods, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 28(1) (1998), 1–14.

Zhang

H.G.

, Wang

Y.C.

and Song

, Absolute stabilization of singular systems with ferromagnetic hysteresis nonlinearity, Science China Information Sciences 56(7) (2013), 1–14.

10.

Altay

and Cinar

, Fuzzy decision trees, in Fuzzy Statistical Decision-Making. Springer, (2016), 221–261.

11.

Weber

, Fuzzy-id3: a class of methods for automatic knowledge acquisition, in The Second International Conference on Fuzzy Logic and Neural Networks, (1992), 265–268.

12.

Umanol

, Okamoto

, Hatono

, Tamura

, Kawachi

, Umedzu

and Kinoshita

, Fuzzy decision trees by fuzzy id3 algorithm and its application to diagnosis systems, in International Conference on Fuzzy Systems IEEE (1994), 2113–2118.

13.

Ichihashi

, Shirai

, Nagasaka

and Miyoshi

, Neurofuzzy id3: a method of inducing fuzzy decision trees with linear programming for maximizing entropy and an algebraic method for incremental learning, Fuzzy Sets and Systems 81(1) (1996), 157–167.

14.

Zhang

H.G.

, Zhang

J.L.

, Yang

G.H.

and Luo

Y.H.

, Leaderbased optimal coordination control for the consensus problem of multi-agent differential games via fuzzy adaptive dynamic programming, IEEE Transactions on Fuzzy Systems 23(1) (2015), 152–163.

15.

, A new partition criterion for fuzzy decision tree algorithm, in Intelligent Information Technology Application, Workshop on. IEEE, (2007), 43–46. 71

16.

Liu

and Pedrycz

, The development of fuzzy decision trees in the framework of axiomatic fuzzy set logic, Applied Soft Computing 7(1) (2007), 325–342.

17.

Liu

, Feng

and Pedrycz

, Extraction of fuzzy rules from fuzzy decision trees: An axiomatic fuzzy sets (afs) approach, Data & Knowledge Engineering 84 (2013), 1–25.

18.

Wang

, Dong

and Yan

, Maximum ambiguity-based sample selection in fuzzy decision tree induction, IEEE Transactions on Knowledge and Data Engineering 24(8) (2012), 1491–1505.

19.

Shukla

S.K.

and Tiwari

M.K.

, Ga guided cluster based fuzzy decision tree for reactive ion etching modeling: a data mining approach, IEEE Transactions on Semiconductor Manufacturing 25(1) (2012), 45–56.

20.

Kumar

, Hanmandlu

and Gupta

, Fuzzy binary decision tree for biometric based personal authentication, Neurocomputing 99 (2013), 87–97.

21.

Gadomer

ł.

and Sosnowski

Z.A.

, Fuzzy random forest with c–fuzzy decision trees, in IFIP International Conference on Computer Information Systems and Industrial Management. Springer, (2016), 481–492.

22.

Tang

and Zhao

, Scheduling a single semi-continuous batching machine, Omega 36(6) (2008), 992–1004.

23.

Lageweg

B.J.

, Lawler

E.L.

, Lenstra

J.K.

and Rinnooy

A.H.G.

, Kan, Computer aided complexity classification of deterministic scheduling problems, Organic Letters 46(31) (1981), 1521–1524.

24.

Dheeru

and Karra Taniskidou

, UCI machine learning repository, (2017). [Online]. Available: http://archive.ics.uci.edu/ml.

25.

Nazarko

and Zalewski

, The fuzzy regression approach to peak load estimation in power distribution systems, IEEE Transactions on Power Systems 14(3) (1999), 809–814.

26.

Chen

C.-H.

, Li

A.-F.

and Lee

Y.-C.

, A fuzzy coherent rule mining algorithm, Applied Soft Computing 13(7) (2013), 3422–3428.

27.

Pach

F.P.

, Abonyi

, Nemeth

and Arva

, Supervised clustering and fuzzy decision tree induction for the identification of compact classifiers, in 5th International Symposium of Hungarian Researchers on Computational Intelligence, Budapest, Hungary. Citeseer, (2004).

28.

Eibe Frank

M.A.H.

and Witten

I.H.

, The WEKA Workbench, Online Appendix for Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition. Morgan Kaufmann, (2016).

29.

Quinlan

J.R.

, C4.5: Programs for machine learning, Morgan Kaufmann (1993).

30.

Holmes

, Pfahringer

, Kirkby

, Frank

and Hall

, Multiclass alternating decision trees, in European Conference on Machine Learning. Springer (2002), 161–172.

31.

Breiman

, Friedman

J.H.

, Olshen

and Stone

C.J.

, Classification and regression trees, Biometrics 40(3) (1984), 358.

32.

Breiman

, Random forests, Machine Learning 45(1) (2001), 5–32.

33.

Witten

I.H.

, Frank

, Hall

M.A.

and Pal

C.J.

, Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann (2016).

34.

Demšar

, Statistical comparisons of classifiers over multiple data sets, Journal of Machine Learning Research 7 (2006), 1–30.

35.

Wang

and Mendel

J.M.

, Generating fuzzy rules by learning from examples, IEEE Transactions on systems, man, and cybernetics 22(6) (1992), 1414–1427.

Dynamic programming based fuzzy partition in fuzzy decision tree induction

Abstract

Keywords

1 Introduction

2 Related work

2.1 Semi-continuous batch scheduling problems

Table 1 An example with 10 independent jobs and 5 capacity jobs J 1 J 2 J 3 J 4 J 5 J 6 J 7 J 8 J 9 J 10 p j 10 10 3 1.8 1 1 1 1 1 1

3.1 An improved dynamic programming algorithm

4 The dynamic programming based fuzzy decision trees

5 Experimental studies

5.1 Computing environment and datasets

Footnotes

Acknowledgments

References

Table 1
An example with 10 independent jobs and 5 capacity

jobs J ₁ J ₂ J ₃ J ₄ J ₅ J ₆ J ₇ J ₈ J ₉ J ₁₀

p _j 10 10 3 1.8 1 1 1 1 1 1