Abstract
In this study, we develop a novel clustering with double fuzzy factors to enhance the performance of the granulation-degranulation mechanism, with which a fuzzy rule-based model is designed and demonstrated to be an enhanced one. The essence of the developed scheme is to optimize the construction of the information granules so as to eventually improve the performance of the fuzzy rule-based models. In the design process, a prototype matrix is defined to express the Fuzzy C-Means based granulation-degranulation mechanism in a clear manner. We assume that the dataset degranulated from the formed information granules is equal to the original numerical dataset. Then, a clustering method with double fuzzy factors is derived. We also present a detailed mathematical proof for the proposed approach. Subsequently, on the basis of the enhanced version of the granulation-degranulation mechanism, we design a granular fuzzy model. The whole design is mainly focused on an efficient application of the fuzzy clustering to build information granules used in fuzzy rule-based models. Comprehensive experimental studies demonstrate the performance of the proposed scheme.
Keywords
Introduction
Fuzzy clustering has been considered as one of the commonly used approaches to reveal and visualize structures of data through forming information granules [1]. Fuzzy C-Means (FCM) is the most representative and widely used method in fuzzy clustering [2–6]. In the construction of information granules with the FCM clustering approach, the Granulation-Degranulation Mechanism (GDM) is regarded as an effective tool to evaluate and optimize the quality of information granules [7]. The FCM-based GDM is also exploited in Takagi-Sugeno (T-S) fuzzy models [8–13]. Fuzzy models can be considered as mappings from information granules expressed in the input and output spaces [8]. Thus, one can improve the fuzzy rule-based models through optimizing the quality of FCM-based GDM (information granules).
The main objective of this paper is to raise awareness about the essence of the GDM completed and to study the rule-based models in the background of fuzzy sets. From a design point of view, the study mainly revolves around the optimization of the GDM using an augmented clustering method so as to improve the quality of the fuzzy rule-based models.
In the design process, we define a prototype matrix to express the FCM-based granulation-degranulation mechanism in a clear manner. We assume that a dataset degranulated from the formed information granules is equal to the original numerical dataset. Then, a clustering method with double fuzzy factors is derived and also demonstrated to be an augmented version of fuzzy clustering, which is very different from other enhanced versions [14–16].
In a nutshell, the objective of this study is to design a double fuzzy factor based clustering approach through optimizing the mechanism of granulation-degranulation. By considering the mappings mentioned above, a granular fuzzy model is designed with the double fuzzy GDM.
The organization of the paper is reflective of the main flow of investigations. The FCM-based GDM and an enhanced version are introduced in Section II. An augmented granular fuzzy model is discussed with the double factor based fuzzy clustering in Section III. A series of experiments is reported in Section IV. Section V concludes the study.
FCM-Based GDM and an enhanced version
In this section, we briefly recall the FCM-based GDM and develop an enhanced version.
A. FCM-based GDM
With the use of the FCM method, a dataset
The degranulation involves numerical data reconstruction on the basis of the constructed information granules and is a reverse process of the granulation, which can be expressed as:

FCM-based granulation-degranulation Mechanism.
B. An enhanced version of GDM
For the purpose of analysis, we establish the following formulas to express the FCM-based mechanism of granulation-degranulation.
From the above analysis, we wish that the reconstructed dataset and the original dataset are equal, that is
According to (3), (4) and (7), we have
Let Lambda = ΓΘ [
After large amounts of extensive research, we find that the above problem can be solved through setting a fuzzy factor increment for the partition matrix when updating the prototype matrix in the FCM method. In the following section, we present a proof for the proposed scheme.
Let
With the increase of m, the elements of the jth row vector in Γ can be described as follows:
Similarly, with the increase of m, the elements of the ith row vector in
Thus, the elements in Lambda present such characteristics:
In addition, with double fuzziness factors the fuzzy clustering can converge rapidly since at each iteration, the proposed method can achieve better reconstruction performance compared with the FCM method. In FCM, the complexity for computing the prototypes is O(CNn) and the complexity for computing the partition matrix is O(CNn2) at each iteration. Thus, the complexity of the proposed DFCM is less than or equal to that of the FCM method.
As clustering-based fuzzy models are considered to be mappings from information granules expressed in the input and output spaces [8, 13], the fuzzy rule-based models can be enhanced through an augmented version of the GDM. In this section, we discuss a granular-based model based on the proposed DFCM-based GDM.
Given a collection of input-output data (
The structure of the rules forming the T-S fuzzy model is described as:
Through consideration of the partition matrix produced by the proposed double fuzzy factor based clustering method, the output y
i
of the fuzzy model for the input variable
Obviously, the vector
Commonly, the root means square error (RMSE) described in detail in [22–23] is employed to evaluate the quality of the granular-based models.
In this section, we compare the proposed DFCM-based granular model with the FCM-based granular rule-based model designed by Cui and Pedrycz in [22] and another granular-based fuzzy model [12] proposed by Hu et al. For the fuzzy clustering methods, the values of the fuzzification factors are taken from 1.1 to 3.1. For the DFCM method the fuzzy increment (Δm) is taken from 0 to 4. For each set of parameters fuzzy-based models are repeated 10 times and the means of the RMSE are recorded.
As the most commonly used index in granular fuzzy models, the RMSE is used in the experiments. A synthetic dataset and six well-known publicly available datasets (yacht hydrodynamics, istanbul stock exchange, concrete compressive strength, airfoil self-noise, energy efficiency, and turkiye student evaluation) from UCI [24] repository are used in the experiments. The yacht hydrodynamics dataset contains four features and 308 instances, 1 –Longitudinal position of the center of buoyancy, 2 –Prismatic coefficient, 3 –Length-displacement ratio, 4 –Beam-draught ratio, 5 –Length-beam ratio, 6 –Froude number. The istanbul stock exchange dataset has eight attributes and 536 instances, and more specifically, it includes returns of Istanbul Stock Exchange with seven other international indexes: SP, DAX, FTSE, NIKKEI, BOVESPA, MSCE_EU, MSCI_EM from Jun 5, 2009 to Feb 22, 2011. The concrete compressive strength dataset has nine features and 1,030 instances, 1 –cement, 2 –blast furnace slag, 3 –fly ash, 4 –water, 5 –superplasticizer, 6 –coarse aggregate, 7 –fine aggregate, 8 –age, 9 –concrete compressive strength. The airfoil self-noise dataset has six features and 1,503 instances, 1 –frequency (in Hertzs), 2 –angle of attack (in degrees), 3 –chord length (in meters), 4 –free-stream velocity (in meters per second), 5 –suction side displacement thickness (in meters), 6 –scaled sound pressure level (in decibels). The energy efficiency dataset contains 768 instances and eight features which are relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area, glazing area distribution, heating load, cooling load. The turkiye student evaluation dataset contains a total 5820 evaluation scores in which there is a total of 28 course specific questions and additional 5 attributes.
The synthetic dataset is governed by a nonlinear function y = -5e-0.5x sin(1.25x + 2.8) by adding Gaussian white noise with a signal-to-noise ratio of 0 dB [25]. Figure 2 illustrates the generated input–output data.

Synthetic input–output data.
In the experiments, each dataset is randomly divided into training (60% of data) and testing subsets (40% of data). The number of prototypes varies from 2 to 10 for the synthetic dataset, and for the publicly available datasets it is set as 2 to 6. Both the training and testing subsets are normalized to [0, 1], respectively. The t-test with α= 0.05 (95% confidence) [22] is used to assess whether the differences between the testing errors generated by the two fuzzy models are statistically significant. Figures 3 to 5 visualize the quality of the two granular models in terms of the produced prediction bounds. For the publicly available datasets we present the RMSE results which are summarized in Tables 1 to 6.

Comparing performance of the model.

Prediction bounds produced by rule-based models with FCM.

Prediction bounds produced by rule-based models with DFCM.
Experimental results obtained for Yacht hydrodynamics dataset
Experimental results obtained for Istanbul stock exchange dataset
Experimental results obtained for Concrete compressive strength dataset
Experimental results obtained for Airfoil self-noise dataset
Experimental results obtained for Energy efficiency dataset
Experimental results obtained for Turkiye student evaluation dataset
It is noticeable that the RMSEs (for both the training and testing subsets) of all the datasets are reduced by using the proposed DFCM, as the membership matrix of the testing subset is obtained on the basis of the prototypes of the training subsets. The granular-based fuzzy model [12] also provides improvements on most datasets. For some datasets, it works better than the proposed model, and in most cases the proposed model provides lower Rmse than the granular-based fuzzy model.
In the DFCM, the prototype matrix is updated with a larger fuzziness factor, which is proved to enhance the quality of the fuzzy clustering (through optimizing the membership matrix). Thus, with the training model and optimized membership matrix, the testing error of the granular fuzzy model is significantly reduced.
In this paper, we propose a clustering approach with double fuzzy factors to optimize the GDM. With the augmented version of the GDM, we also design a granular fuzzy rule-based model. In the design process, we establish a novel set of mathematical models to express the GDM. A clustering with double fuzzy factors is derived by optimizing the models of the GDM, with which a granular fuzzy model is discussed and proved to be an augmented one.
This research opens a specific way for improving the GDM and its applications and also poses an interesting problem: How to solve a closed-form solution for the fuzzy factor? Thus, future work will consider to build a cost function of the fuzzy factor and to obtain the closed-form solution, and this may open up another new direction of future pursuits. Future work will apply the double fuzzy factors to multi-agent systems, and social networks with different data types as input.
Footnotes
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grant No. 61971349.
Compliance with ethical standards
Conflict of interest
All the authors have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
