Abstract
The fuzzy rule base is essential for the performance of fuzzy systems. However, because of many uncertain effects and a great deal of noise in practical industrial applications, the Wang-Mendel (WM) algorithm may extract bad fuzzy rules or fuzzy rules with low confidence that decrease model performance. Moreover, the efficiency of the WM algorithm is affected by scale of the dataset. To address these issues, this paper proposes an improved WM algorithm that optimizes samples before training the fuzzy system using the clustering algorithm. Furthermore, the proposed method enhances its accuracy by using the weighted distance among samples to extract the complete fuzzy rule base. Moreover, the proposed method can adaptively calculate the number of fuzzy partitions and standard deviation of the Gaussian membership function of each variable. Experiments demonstrate that the proposed method performs well for the datasets.
Introduction
Zadeh [14] first proposed fuzzy theory in 1965. Following his lead, fuzzy theory has been gradually improved and applied in many fields [4, 28]. It is difficult for many traditional modeling methods to provide an exact description for industrial productions because there are many uncertain effects in practical applications [2], such as time variation [12, 27], high dimensionality [20], uncertainty [11], and nonlinearity [18]. Therefore, the concept of fuzziness is typically introduced into the construction of a reasonable, traceable model [24]. Training a robust and high accuracy fuzzy model becomes a significant program. Research has demonstrated that, in the same manner as an artificial neural network [5], the fuzzy system is a universal approximator [3, 16].
A fuzzy rule base is typically central to building a fuzzy system [8]. A better rule base can effectively improve the performance of the fuzzy modeling [23]. Thus, extracting fuzzy rules from samples is key to generating a fuzzy system. Fuzzy clustering, heuristic algorithms, genetic algorithms, neural network methods and many other methods have already been used for fuzzy rule discovering [6, 8]. Among these, the Wand-Mendal (WM) algorithm proposed by Wang and Mendel [17, 19] is one of the effective methods used to generate fuzzy rules. The WM algorithm discovers fuzzy rules simply and effectively using the look-up table method. The WM algorithm has been commonly used in this field for its simplicity, practicability and without using prior knowledge [30]. However, extracting one rule from a sample leads to a high dependency on the samples, which often makes the fuzzy rule base lack completeness and robustness especially when processing small samples that contain noise data [30, 32]. Thus, the quality and scale of the sample data heavily affect the performance of the WM algorithm [31]. Efficiency can be improved, to some degree, using the clustering algorithm while processing large scale data [10]. Moreover, research also shows that there is relativity between samples, which is an effective way to improve the WM algorithm [7, 29]. Nonetheless, these improved WM algorithms should predefine fuzzy regions.
Inspired by the results of our predecessors, this paper introduces sample set optimization and sample correlation to further improve the WM algorithm, which is called CWM algorithm. This is improved using the fast search and find of density peaks (FSFDP) clustering algorithm [1] and sample correlation. The CWM algorithm contains data optimization using the FSFDP clustering algorithm, which reduces the scale of the dataset and noise data. Furthermore, the subtractive clustering method (SCM) [22] is used to adaptively adjust and optimize antecedent and consequent parameters because of its simplicity and effectiveness. Moreover, sample correlation is used to improve completeness, robustness and accuracy. The use of sample correlation, to some extent, can help to reduce the interference of noise data through fuzzy rule discovering.
The rest of the paper is structured as follows: Some background and related work are briefly introduced in Section 2. In Section 3, the improved WM algorithm with the FSFDP clustering algorithm and sample correlation is described, which generates a complete and robust fuzzy rule base with high accuracy. In Section 4, the experimental results and analyses for the proposed algorithm are described. The study’s conclusions are provided in Section 5.
Background and related work
Basic framework of the WM algorithm
The fuzzy rule generation procedure of the WM algorithm consists of the following steps:
Clustering using the FSFDP algorithm
The FSFDP algorithm can effectively cluster the sample set, and it is only sensitive to the local density, which makes it robust for managing large-scale data problems.
Given N samples, for the i
th
sample, the local density ρ
i
is obtained as follows:
Additionally, its distance is obtained as follows:
The cluster centers are obtained by observing the decision graph, which have high local density and distance.
The SCM algorithm is a density-based clustering algorithm that is simple and effective for clustering [22, 26]. The first cluster center is the sample that has the largest density of the surrounding samples. Then, the next data cluster and its center are determined by removing all samples within the radii of this point. The process is repeated until all the samples are in the radii of a cluster center. Because the SCM does not need to predefine the number of cluster centers, it is applicable to adaptively obtaining the fuzzy regions of each input of the WMalgorithm.
Improved WM algorithm based on the clustering algorithms and sample correlation (CWM)
Problem description and cause analysis of completeness and robustness in the WM algorithm
Suppose that each input variable x i (i = 1, 2,⋯, n) can be divided into M i fuzzy sets and output variable y into K parts. Thus, L = , where A n = ⋯, , and B = B1, B2, ⋯, B K . A complete fuzzy rule base should satisfy an L × (n + 1) matrix.
The following can be concluded: Once , the fuzzy rule base is incomplete because more data are required to cover every fuzzy subinterval. Once , the fuzzy rule base will also lack of incompleteness when the data distribution is uneven because some fuzzy subspace may lose sample coverage. Once , the fuzzy rule base constructed using the WM algorithm should have high completeness when the data distribution is even.
where N and M i is the number of samples and partitions of the i th input variables, respectively.
A rule is extracted from one sample using the WM algorithm. By calculating the confidence of each rule, which is regarded as the strength of the rule, ambiguities can be eliminated. However, if the noise data have sufficient confidence, then the wrong rules may be preserved or, sometimes, invalid rules may be extracted, which results in a lack of robustness. Take two conflict rules as example, if the output value of the noise sample has higher degree of membership, then, the rule extracted from noise has higher strength to be the final rule, namely, the right rule will be eliminated and the bad fuzzy rule will remain.
In this section, the extracted fuzzy rule is expressed in IF-THEN rule form. The l
th
fuzzy rule Rule(l) is shown as follows:
The description of the procedure of the CWM algorithm is as follows:
where m is the number of input variables, A i is the i th fuzzy set, and μ is the Gaussian membershipfunction.
where sc is the sample correlation, larger sc means stronger correlation.
The flowchart of the CWM algorithm is shown in Fig. 1.
Data optimization based on the FSFDP algorithm
The reasons for choosing the FSFDP algorithm are as follows: The FSFDP algorithm is robust to data density. For large data sets, the FSFDP algorithm is robust with respect to the choice of d
c
; that is, a change of d
c
does not obviously influence the clustering results. The algorithm has good robustness to data distribution. For example, if the size of the class is unbalanced, a large number of classes with high overlapping, or various types of distribution in the feature space are non spherical. The algorithm has good robustness to metrics. A nonlinear mapping of data does not affect the clustering result. The performance of the FSFDP algorithm is not susceptible to the intrinsic dimensionality. The algorithm has few parameters to predefine. The algorithm requires low computation complexity [13].
After preprocessing using the FSFDP algorithm, noise data can be removed. Moreover, the scale of data is reduced. Furthermore, the distance is used to calculate the sample correlation.
Adaptive parameters selection based on the SCM algorithm
The reasons for choosing the SCM algorithm [9] are as follows: The SCM algorithm is very fast. The time complexity of the algorithm is only linear with respect to the dimension of the data and it is the square of the number of data. The SCM algorithm can be used to optimize the parameters of the CWM algorithm because the SCM can adaptively obtain the cluster centers, which can be used in the partitions of the WM algorithm. Moreover, the standard deviation can be used for the Gaussian membershipfunction.
Because the SCM algorithm is sensitive to noise data, this algorithm can be used only after data preprocessing, which is why the FSFDP algorithm was used to optimize the original data earlier in Section 3.3.1. After this phase, the fuzzy regions, and the standard deviation can be effectively obtained according to the sample distribution.
Improved fuzzy rule base based on sample correlation
The sample correlation is defined as:
The reasons for choosing sample correlation for the proposed method are as follows: Sample correlation improves the completeness of the WM algorithm. Because the completeness of the WM algorithm can be guaranteed by traversing every fuzzy region of every input variable, and sample correlation is involved in the generation of consequents, samplecorrelation, to some extent, can enhance its completeness. Sample correlation improves the robustness of the WM algorithm. Sample correlation adopts a weighted average operation of samples when calculating the confidence of each fuzzy rule. Thus, the influence of noise data can be reduced; that is, the fuzzy rule base is more robust. Sample correlation enhances the accuracy of the CWM algorithm. Because sample correlation can reduce the effect of noise data, and sample correlation is used to obtain the average output, the accuracy is improved.
After the use of sample correlation, the completeness, robustness, and the accuracy are, to some extent, improved.
The applicable scope and conditions for the proposed algorithm are as follows: The algorithm is suitable for low dimension problem. To satisfy the completeness, the number of fuzzy rules is , which grows exponentially as the dimension increases. Thus, the computation is very expensive. An alternative approach is to reduce the dimension of the raw dataset before generating the fuzzy rule base. The algorithm is suitable for a small-scale dataset. Regarding a large scale dataset, the CWM algorithm will extract useless rules that are also involved in the computation of the consequents and the computation of the output values. Hence, the efficiency of the CWM algorithm is severely affected. Preprocessing of the data requires human intervention because the clusters of the FSFDP algorithm need to be selected in the decision graph. The SCM algorithm has to predefine the radii of the data points. Specifying a smaller cluster radius generally yields more (smaller) clusters in the data, whereas specifying a larger cluster radius generally yields fewer (larger) clusters in the data. Thus, the manner used to set the radii has a great influence on the results.
Experiments and analyses
Evaluation index
In this section, four typical evaluation indexes are introduced to evaluate the performance of the WM algorithm, COWM algorithm, FCAWM algorithm, and proposed algorithm: mean absolute error (MAE), root mean square error (RMSE), completeness of the WM algorithm, and the robustness of the WM algorithm (MAC). Completeness and MAC were defined in Section 3.1.
MAE is defined for comparison between the prediction value and target value:
Furthermore, RMSE is used to evaluate performance and is defined as
Given the definitions of MAC, RMSE and MAE, the smaller RMSE and MAE mean a better performance, and larger MAC means the fuzzy system is more robust. Moreover, a high completeness of the fuzzy rule base is required.
In this section a single-input single-output dataset is used to train the fuzzy system. To compare the results in different cases, four situations are considered for completeness and outliers. The largest MAC, lowest MARE and lowest RMSE of the results among the four algorithms are shown in bold in Table 2.
Dataset description and organization
The 41 samples that were used for training were collected from the target model in Equation (15), where x = [-1, - 0.95, ⋯ , 9.5, 1]. Data points t42 - t45 are noise samples.
Table 1 shows the training set.
The training samples and noise samples are shown in Fig. 2.
To compare completeness, robustness and accuracy in different situations, the following four cases are introduced.
In this section, the number of fuzzy regions of each variable, and parameter σ in the Gaussian function, are adaptively calculated using the SCM algorithm. The SCM algorithm specifies a range of influence of 0.1 for all data dimensions. After the calculation, the input and output variable is divided into eleven fuzzy subsets (A1, A2, ⋯ , A11) and nine fuzzy subsets (B1, B2, ⋯ , B9) respectively. The parameter σ in the Gaussian function for input and output variable is 0.0707 and 0.1615, respectively.
Experimental results of single-input single-output models
Figure 3 shows the comparative experimental results among the WM algorithm, COWM algorithm, FCAWM algorithm, and proposed algorithm using the RMSE evaluation index in the aforementioned four cases.
The results in Table 2 show the comparative experimental results among the WM algorithm, COWM algorithm, FCAWM algorithm, and proposed algorithm in terms of the completeness, robustness, MAE, and RMSE. Additionally, the results also include the aforementioned four cases.
Analysis for single-input single-output models
In conclusion, from Table 2 and Fig. 3, it is known that if the dataset is complete and without noise, the WM algorithm performs better than COWM algorithm and FCAWM algorithm both in MAE and RMSE, but MAC is performs worse than the COWM method and FCAWM method. This is because with a small dataset, the COWM algorithm, FCAWM algorithm and CWM algorithm may overfit. For the other three cases, the proposed method has largest MAC, MAE and RMSE. While the WM algorithm has the smallest MAC, MAE and RMSE, which demonstrates that the CWM algorithm performs better using MAC, MAE and RMSE indexes with high completeness for the single-input single-outputmodel.
Multiple-input single-output model
Dataset description and organization
Time series prediction is an important practical application. Hence, the Mackey-Glass chaotic time series is introduced to demonstrate the validity of the CWM algorithm.
In this section, the fuzzy partitions for each input variables and output variable are all adaptively divided into seven fuzzy subsets after the calculation using the SCM algorithm. The parameter σ in the Gaussian function is adaptively calculated using the SCM algorithm: 0.0411, 0.0412, 0.0413, 0.0413, and 0.0414 for the input variables and output variable, respectively. The SCM algorithm specifies the size of the cluster as 0.1 in each of the data dimensions.
The experiments were implemented on a system with the specifications provided in Table 3.
Experimental results for multiple-input single-output models
Fig. 4. shows the comparative experimental results among the WM algorithm, COWM algorithm, FCAWM algorithm, and proposed algorithm in terms of RMSE, whereas the results in Table 4 present the comparative experimental results among the WM algorithm, COWM algorithm, FCAWM algorithm, and proposed algorithm in terms of the fuzzy rule base (completeness), robustness, MAE, RMSE, and computation time used to construct the fuzzy system.
Analysis for multiple-input single-output models
From Table 4 and Fig. 4, it is known that the WM algorithm method, COWM method, FCAWM method, and CWM method perform very well in predicting the Mackey-Glass chaotic time series, but the proposed method performs best among the other three methods for MAE, RMSE, and MAC, which demonstrates the effectiveness of the CWM algorithm. However, the WM algorithm takes the minimum time to construct a fuzzy system. This is because the WM algorithm does not compute the correlation between each sample and the remaining sample. The FCM algorithm used in the FCAWM algorithm reduces the scale the original data, but it costs a slightly more time than computing the removed data, which is why the FCAWM algorithm takes a slightly more time than the COWM algorithm, whereas the clusters of the FSFDP algorithm are artificially selected using the decision graph, which takes some time. Despite this, the FSFDP algorithm effectively removes the noise data and some atypical sample data. Thus, the time used to build the fuzzy system is less than that for the COWM algorithm and FCAWM algorithm. Furthermore, the results show that the introduction of the FSFDP algorithm and sample correlation improved the performance of the CWM algorithm in terms of approximation ability, completeness, androbustness.
Prediction for the sailing yacht application
Dataset description and organization
The raw data are taken from the UCI Yacht Hydrodynamics Data Set [25], which includes 308 data points consisting of six input variables (Froude number, length-displacement ratio, prismatic coefficient, length-beam ratio, beam-draught ratio, and longitudinal position of the center of buoyancy), and one output variable (residuary resistance per unit weight of displacement). The present authors made some adjustments to the dataset for convenience.
First, only beam-draught ratio and Froude number are used for predicting the residuary resistance per unit weight of displacement; that is, a two-input one-output dataset. Then, only the input variables are normalized using Equation (17). Then, the first 100 samples are used to train the model, while the rest samples are used to test the model.
The fuzzy partitions for all variables are adaptively divided into 4, 14, and 2 fuzzy subsets and the parameter σ in the Gaussian function is adaptively calculated as 0.1529, 0.1138, and 1.9159, respectively, using the SCM algorithm for the input variables and output variable. The SCM algorithm specifies the size of the cluster as 0.1 in each of the data dimensions.
Experimental results for predicting the sailing yacht application
The results in Table 5 present the comparative experimental results among the WM algorithm, COWM algorithm, FCAWM algorithm, and proposed algorithm in terms of the fuzzy rule base (completeness), robustness, MAE, and RMSE.
Analysis for predicting the sailing yacht application
From Table 5, it is known the proposed algorithm outperforms the WM algorithm, COWM algorithm, and FCAWM algorithm for MAE, RMSE, and MAC, which demonstrates the effectiveness of the proposed method. Even using only two out of six input variables, the proposed algorithm also performs well; that is, the results show the introduction of the FSFDP algorithm and sample correlation improved the performance of the CWM algorithm in approximation ability, completeness, and robustness, which demonstrates the usability of the CWM algorithm in practical applications.
Conclusion
The construction of fuzzy rule base is essential in fuzzy systems. In this paper the CWM algorithm was proposed. The introduction of sample correlation improved the performance of the WM algorithm which enhanced the performance of the fuzzy rule base in terms of completeness, robustness and accuracy. The FSFDP algorithm was used to optimize the original samples, namely reducing noise samples and reducing the size of the original samples. Experiments demonstrated that the proposed model could effectively enhance the accuracy of the prediction process and improves its robustness.
In futrue work, the density among samples, or distance and density, can be used, according to the dataset in the practical applications. Furthermore, combining the proposed method and support vector machines (SVMs) will be considered, and the SVM will be used as the consequent. This will make full use of their advantages. Moreover, an effective feature reduction method will be determined because the efficiency of the CWM algorithm is sensitive to dimension.
Footnotes
Acknowledgment
This work was supported by National Natural Science Foundation of China under Grant 61572204, National Natural Science Foundation of China under Grant 51305142, and Subsidized Project for Cultivating Postgraduates’ Innovative Ability in Scientific Research of Huaqiao University under Grant 1400414002.
