Abstract
In this study, an intuitionistic fuzzy neural network (IFNN) with Gaussian membership function and Yager-generating function is proposed. Since intuitionistic fuzzy logic (IFL) considers membership, non-membership and hesitation values simultaneously, the incorporation of the concept of IFL into a fuzzy neural network (FNN) can enhance the performance of an FNN. A back-propagation learning algorithm is developed to optimize the IFNN parameters and weights. The proposed IFNN is applied to ten problems, including nonlinear control and prediction problems. The computational results indicate that the proposed IFNN is more efficient than conventional algorithms, such as artificial neural networks (ANN), fuzzy neural networks (FNN), and a support vector regression (SVR).
Introduction
Fuzzy neural networks (FNNs), a popular research topic, have been successfully applied in many fields, including control, pattern recognition, classification, forecasting, and bioengineering [1–11]. Basically, an ANN is a system derived from neurophysiology models, and functions by neural connections between many different processing elements, each analogous to a single neuron in a biological brain. Thus, ANN consists of a collection of simple, nonlinear computing elements, whose inputs and outputs are connected together to form a network [12]. However, a disadvantage of ANNs is that while a particular result may be obtained from the network, there is no explanation of how the network arrived at that result [13]. Fuzzy modeling [14], which is used to fuse decisions from different variables, requires an approach that learns from experience. For optimization, ANN learning algorithms are used to enhance the performance of fuzzy systems. Fuzzy IF-THEN rules are generated and adjusted, using the numerical data from these learning methods [15].
FNNs have the low-level learning and computational power of neural networks. In addition, fuzzy systems combine high-level, human thinking and reasoning ability, unlike artificial neural networks. The Takagi–Sugeno (TS) [16] method is a fuzzy control method developed in 1985. This method is widely used to control nonlinear systems, since the fuzzy control model can efficiently represent a nonlinear system, using a set of linear subsystems. Therefore, Lin and Lee [2] proposed the so-called neural-network-based fuzzy logic control system (NNFLCS). The low-level learning power of ANNs is used for fuzzy logic systems, and combines the normal connectionist architecture with a high-level meaning that is comprehensible. Furthermore, a different FNN, called an adaptive network-based fuzzy inference system (ANFIS) was proposed by Jang [17]. In ANFIS, there are five layers in the network architecture, and it employs a Sugeno fuzzy system. In order to train parameters, ANFIS uses a back-propagation algorithm to obtain membership functions. In addition, it determines the coefficients of the linear combinations in the consequences of the rule by a least mean squares algorithm [18]. Kuo and Cohen [19] employed the TS model for fuzzy inference to propose a feed-forward ANN.
The above FNNs are, however, only appropriate for numerical data. However, expert knowledge is qualitative, or cannot be quantified, so some studies have attempted to address this problem. Ishigami et al. [20] proposed learning methods for ANNs that utilize not only numerical data, but also expert knowledge, which is represented by fuzzy IF-THEN rules. Buckley and Hayashi [21] surveyed learning algorithms, and enhanced the training performance for FNNs, and Buckley proposed some techniques for error back-propagation learning algorithms. Unlike artificial neural networks, the advantage of FNNs is that the fuzzy inference rules can be explained by the IF-THEN rules. This allows the relationship between the input and output variables to be explained clearly. However, the fuzzy logic in FNNs considers only the degree of the membership function, and there still exists a degree of uncertainty.
However, according to the definition of intuitionistic fuzzy logic (IFL), the degree of uncertainty can be reduced. Sotirov and Atanassov [22] proposed feed forward neural networks (FFNNs) with IFL. Recently, IFL has also been applied to data mining [23]. Li et al. [24] proposed the max-min intuitionistic fuzzy Hopfield neural network (IFHNN), which can converge to a stable point within finite iterations under suitable extra conditions. Zhou et al. [25] proposed an IFNN model with a triangular membership and a two-step dynamic optimal training algorithm.
In light of the above, the purposes of this study are summarized as follows. In the proposed IFNN system, the Gaussian function is considered as the membership function, and the Yager-generated function is employed to obtain the membership value with the hesitation value. To optimize the connecting weights and parameters of the proposed IFNN, a back-propagation algorithm is developed to train the proposed IFNN system. Ten benchmark problems are applied to evaluate the performance of the proposed IFNN system, and the proposed IFNN is compared with FNN, ANN and SVR. Furthermore, in the computational results, the Wilcoxon signed-rank test is employed to verify the statistical significance.
The remainder of this paper is arranged as follows. Intuitionistic fuzzy sets are introduced in Section 2. A back-propagation learning algorithm that is used to train the proposed IFNN is described in Section 3. In Section 4, ten computational experiments, using benchmark functions, demonstrate the performance of the proposed IFNN. Finally, conclusions are offered in Section 5.
Intuitionistic fuzzy sets (IFSs)
The concept of intuitionistic fuzzy sets (IFS) was introduced by Atanassov [28, 29], and included an additional attribute parameter called non-membership [30]. Bustince and Burillo [31] then showed that vague sets (VS) are a kind of IFS. Generally, IFS are useful means to describe and deal with vague and uncertain data. They have received wide attention in recent years. Many studies have applied IFS to solve complex problems such as data mining [23], decision-making [32–39], clustering problems [40, 41], forecasting problems [42], pattern recognition [43–45] and medical problems [46, 47]. IFS were proposed as an extension of fuzzy sets. An IFS A in a fixed set E is an objective of the expression:
According to [29, 30], in order to describe an IFS completely, a model should include the membership function, non-membership function, and degree of hesitation. A concept of IFS is to consider the non-membership function, thereby obtaining the degree of hesitation. In order to demonstrate the IFS completely, the Yager-generating function [31] is employed in this study, as the advantage of the Yager-generating function is that, in the functions for each value of α ∈ (0, ∞), a particular fuzzy complement can be well defined, which includes non-membership and degree of hesitation. Thus, the intuitionistic fuzzy complement with Yager-generating functions is shown as:
Therefore, using Atanassov’s intuitionistic fuzzy complement with Yager-generating functions, IFS become:
After the definition of the functions in IFS, the degree of hesitation and the membership degree are calculated using a linear combination of μ A (x) and υ A (x).
Since the membership function, non-membership function, and degree of hesitation are defined, the intuitionistic fuzzy neural network (IFNN) is developed with the concept of IFS. The model of the proposed IFNN and the learning algorithm are illustrated in the following section.
This section describes the architecture of the IFNN proposed in this study. The learning algorithm and the parameter determination are also described in this section.
Intuitionistic fuzzy neural network
The advantage of fuzzy neural networks is that they combine the advantages of fuzzy control and artificial neural networks. They can also obtain the fuzzy IF-THEN rules after training. As a fuzzy neural network, the fuzzy IF-THEN rule is employed in IFNNs. The kth rule, which is instantiated as:
An integration function, f is associated with the fan-in of a unit, and serves to combine information, activation, or evidence from other nodes. This function provides the net input for this node:
A second action of each node is to output an activation value (C (f)) as a function of its net-input:
Basically, the IFNN architecture is the same as that of ANFIS. Next, the detailed computation of each layer is shown as follows:
The link weight of layer 1 (
According to:
The link weight of layer 3 (
The link weight of layer 4 (
For the supervised learning algorithm, many studies have employed the back-propagation (BP) algorithm [48–50]. Back-propagation is a method to obtain gradients while trying to minimize the loss function for a neural network. Thus, to train the parameters for the proposed IFNN, the stochastic version of gradient (BP) algorithm is used for this supervised learning, and to minimize the error function as the objective as follows:
The process begins at the output nodes. A backward pass is used to compute ∂E/∂Y for all of the hidden nodes. Assuming that φ is the adjustable parameter in a node (e.g., the center of membership function), the general learning rule used is as follows:
Therefore, the width parameter is updated using:
Therefore, the error is
The adaptive rule is:
The adaptive rule for σ
ij
becomes:
Therefore, the adaptive rule for α
i
becomes
In order to evaluate the performance of the proposed model, the mean square error (MSE) and the mean absolute difference (MAD) were used to measure the forecasting accuracy. The estimated values more accurately represented the actual values than those of the MSE or the MAD. The MAD is one of the natural measures of average error magnitude, and is an unambiguous measure. The expressions for the MSE and the MAD are shown in Equations (24 and 25), respectively:
In order to test the proposed IFNN, this study used Matlab to program the code. Three different benchmark functions were used to verify the proposed model. This study compared the proposed IFNN with other algorithms, including a FNN, an SVR and an ANN. In the study, K-fold cross-validation was employed to evaluate the model. For K-fold cross-validation, the original sample was randomly assigned into K subsamples. From the K subsamples, a single subsample was retained as the verification data. In the testing process, the remaining K - 1 sub-samples were used as training data. The cross-validation process was repeated K times, or folded, and each K sub-sample was used exactly once as the verification data. The results of these folds in K were then averaged to produce a single estimate. The advantage of this approach is that random sub-samples repeat, in that all observations are used for training and testing, and each observation is used to authenticate once. Ten-fold cross-validation is commonly used, so the value of K used was 10. The goal was to determine whether the IFNN is significantly better than other algorithms.
In IFNN, the learning rate (η) significantly affects the learning efficiency. However, there are five parameters in the IFNN. In order to reduce the number of simulations, and evaluate the parameter combinations, the Taguchi method [51] was employed. The Taguchi method uses orthogonal arrays, which identify the main effects without the interactions between the parameters. So, even with a large number of parameters, it evaluates the values of parameters efficiently, and identifies the parameters that have a greater impact on performance.
Five factors with three levels were used to design the parameters for the IFNN. The notation of the factors are as follows: the mean learning rate (η m ), the standard deviation learning rate (η s ), the Yager-parameter learning rate (α a ), the weight learning rate (η w ), and the momentum (ρ). An L27 (35) orthogonal array was used for the experiment. The orthogonal array generated had 27 types of combination. The test for each combination was performed ten times, over five hundred iterations, to allow the optimal training parameters to be determined. This goal of the experiment was to determine the lowest MSE, the criterion used for the objective. The MINITAB program was used to perform the Taguchi experiment.
This study set the MSE as the objective. The smaller the MSE, the better, so this experiment featured the-lower-the-better characteristics. The S/N ratio is the signal to noise ratio, which was used to evaluate the quality and stability of the Taguchi experimental design. In the Taguchi experiment, the mean value and the S/N ratio were consistent. A lower mean value indicates a higher S/N ratio, and a lower mean indicates a generated product of better quality.
This section discusses the evaluation of the proposed IFNN with the Gaussian membership function. Using the proposed IFNN, Matlab was used to design a computer program that simulated different cases in order to demonstrate the feasibility of the proposed IFNN. Each test problem had its own characteristics and corresponding number of inputs. The training results were used to demonstrate the convergence of the test data, in order to determine the utility of the proposed IFNN.
The simulation cases
In this dataset, the Ackley function is described by:
In the dataset, there are two variables, and the domain of x i is -2 ⩽ x i ⩽ 2, i = 1, 2. The global minimum of the function is (x1, x2) = (0, 0), A (x1, x2) =0. There are 1000 patterns, generated in the domain (– 2, 2).
The Lim et al. [52] non-polynomial function is described by:
This function is an example of a non-polynomial model, which exhibits a shape similar to that of a multivariate polynomial. Lim et al. [52] compared predictions from this function with predictions. The function is evaluated on the square x i ∈ [0, 1], for all i = 1, 2. There are 1000 patterns, generated in the domain (0, 1), and the function is illustrated in Fig. 1.

The Lim et al. non-polynomial function.
In this dat aset, the Hartmann function is described by:
In the dataset, there are three variables, and the domain 0 ⩽ x j ⩽ 1, j = 1, 2, 3. There are four local minima, and the global minimum is x* = (0.114614, 0.555649, 0.852547), H3,4 (x*) -3.86278. There are 1000 patterns, generated in the domain (0, 1).
The Dette and Pepelyshev [53] exponential function is described by:
This function has asymptotes. It is used for the comparison of computer experiment designs. In addition to several hyperball domains, the function is evaluated on the cube x i ∈ [0, 1], for all i = 1, 2, 3. There are 1000 patterns, generated in the domain (0, 1).
The time-series prediction problem used is the chaotic Mackey–Glass time series, which is generated from the following differential equation:
Following the majority of studies, the series has been generated using the next values for the parameters: a = 0.2, b = 0.1, and where τ ⩾ 17, the equation shows chaotic behavior. There are a total of 1000 patterns, generated from t = 124 to 1123.
The Gramacy and Lee [55] function is described by:
This function, used by Gramacy and Lee [55], is nonlinear in x2 and x3, and linear in x4. In x1, it begins to oscillate more quickly as it reaches the right bound of the interval [0, 1]. There is a random term ɛ ∼ N(0, 0.052) added to the response. The function is evaluated on the hypercube x i ∈ [0, 1], for all i = 1, …, 6. There are 1000 patterns, generated in the domain (0, 1).
The Friedman function is described by:
The function is evaluated on the hypercube x i ∈ [0, 1], for all i = 1, …, 5. There are 1000 patterns, generated in the domain (0, 1).
This is a real world prediction problem of automobile city-cycle fuel consumption. The dataset contains 392 samples, and can be downloaded from KEEI [57]. There are five attributes including displacement, horsepower, weight, acceleration and model of year, while the output attribute is miles per gallon.
This is a real world prediction problem of airfoil self-noise. In the datasets, there are 5 input attributes, and the attribute information is as follows: Frequency, in Hertz; Angle of attack, in degrees; Chord length, in meters; Free-stream velocity, in meters per second; Suction side displacement thickness, in meters.
The only output is Scaled sound pressure level, in decibels. It contains 1503 instances, and can be download from the UCI machine learning repository [58].
Prediction of residuary resistance of sailing yachts at the initial design stage is of great value for evaluating ships’ performance and for estimating their required propulsive power. Essential inputs include the basic hull dimensions and the boat velocity. The Delft data set consists of 308 full-scale experiments, which were performed at the Delft Ship Hydromechanics Laboratory for that purpose. The dataset can be downloaded from the UCI machine learning repository. The attribute information is as follows: Longitudinal position of the center of buoyancy; Prismatic coefficient; Length-displacement ratio; Beam-draught ratio; Length-beam ratio; Froude number.
The measured variable is the residuary resistance per unit weight of displacement, and residuary resistance per unit weight of displacement. The sources of benchmark datasets are summarized in Table 1. The datasets 1 to 7 are generated from the functions, and datasets 8 to 10 are real world problems.
The benchmark datasets
This study used the Taguchi method [51] to determine the IFNN parameters. The results of the learning rate in ten cases are shown in Table 2. They demonstrate that the IFNN converges faster and gives more accurate results than the other three algorithms.
The parameters of IFNN in ten cases.
The parameters of IFNN in ten cases.
K-fold cross-validation (K = 10) was also used to confirm the statistical independence of the random process, and each experiment was implemented three times, for 500 iterations. Therefore, each experiment involved thirty runs. Table 3 shows the experimental results of training and testing MSE for each dataset, respectively. It shows that the IFNN had the smallest MSE value. Table 4 shows the MAD results for each algorithm. It shows that IFNN performs better than the other methods.
Computational results (MSE) of training and testing data.
STD.: standard deviation.
Computational results (MAD) of training data.
STD.: standard deviation.
Furthermore, ANOVA was employed to compare the efficiency of IFNN with other models, as shown in Tables 5 to 8. To verify the performance of the proposed IFNN, the non-parametric Wilcoxon signed-rank test [59] was employed. The test results are shown in Tables 9 and 10. The null hypothesis was rejected at p-values <0.05, which indicates a statistically significant difference between the IFNN and the other algorithms.
The ANOVA of training data (MSE)
The ANOVA of testing data (MSE)
The ANOVA of training data (MAD)
The ANOVA of testing data (MAD)
The p-values of Wilcoxon signed-rank test (MSE)
The p-values of Wilcoxon signed-rank test (MAD)
The MSE for training data indicates that IFNN can obtain better results than other compared algorithms. According to the results of the Wilcoxon signed-rank test, the proposed IFNN exhibited superior performance. This can attribute to several factors. Firstly, the back-propagation learning algorithm is able to train the network efficiently. Secondly, since the IFNN incorporates the concept of IFL, the degree of hesitation considers the membership degree and non-membership degree simultaneously. This can better define the degree of uncertainty. Due to the reduction of the uncertainty degree, the performance can be enhanced.
This study proposed an IFNN with Gaussian membership function, since previous studies have indicated that using the Gaussian function as the membership function can result in better performance for developing the FNN model [2, 61]. In addition, the IFNN incorporates the concept of IFL. Thus, the degree of hesitation can consider the membership degree and non-membership degree simultaneously. The Yager-generated function was employed to obtain the membership degree with degree of hesitation. This study also developed a back-propagation learning algorithm to train the parameters and weights of the IFNN. Using ten benchmark problems to verify the proposed IFNN, the computational results indicate that the proposed IFNN outperforms the other three compared methods.
A possible direction for future is to employ metaheuristics, such as genetic algorithm and particle swarm optimization, to optimize the IFNN. In addition, the concept of multi-fuzzy set is a generalization of the concepts of both fuzzy set and intuitionistic fuzzy set [62]. Thus, a multi-fuzzy neural network can be considered for future work. Furthermore, applying the proposed method to solve real world forecasting problems will also be implemented.
