Abstract
This paper presents a learning algorithm for fuzzy neural networks based on unineurons able to generate interpretation provided by the model through fuzzy rules. The learning algorithm is based on ideas from Extreme Learning Machine, to achieve a low time complexity, and pruning method based on F-scores resulting in accurate models using low complexity resources, using only training data in a single step. Experiments considering binary pattern classification are detailed. Results and statistical evaluation suggest the suggested approach as a promising alternative for pattern recognition with a good accuracy and some level of interpretability through a process of pruning performed in simple steps.
Introduction
The intelligent models capable of unifying the predictive capacity of artificial neural network training algorithms and the possibility of linguistic interpretation of the model results through the fuzzy systems [1] allowed the creation of fuzzy neural networks. The main contribution of these models is the interpretability of the results obtained through a set of fuzzy rules extracted from the network structure in the training stages of the model [2]. Over the last few decades, fuzzy systems [1] and their hybrid derivations have been shown to be able to simulate the typical human reasoning ability in a computationally efficient way. An important area of current research is the development of such systems with a high level of flexibility and autonomy in order to evolve their structures and knowledge based on changes in the environment, being able to handle modeling, control, prediction and pattern classification in an environment not unmoved, susceptible to constant changes.
The fuzzy neural networks differ from artificial neural networks in the use of neurons, which are used respectively fuzzy and artificial neurons. Examples of fuzzy neurons are and and or neurons [10]. The fuzzy neural network training algorithms stand out in well-defined steps within their layers. In these models, most of the time the first layer is responsible for fuzzyfication input data. Already the next model layers are able to perform many machine learning tasks using techniques and models of neural networks. Finally, an artificial neuron is responsible for presenting the results obtained by fuzzy neural network. These networks have been shown to be highly promising in various application areas, such as fuzzy clustering, modeling of nonlinear dynamic systems, fault detection and diagnosis, among others. Thus, these networks present themselves as strong candidates to be used as the basis for the development of intelligent systems. The most commonly used methods for structure definition are clustering [2, 9], evolutionary optimization [11–13] and membership function equally spaced [4]. Once the network structure is defined, free parameters are estimated. A number of distinct methods have already been used in this step including reinforcement learning [2, 6], gradient based methods [12, 14], genetic algorithms [8], least squares [5, 9] and Extreme Machine Learning [4, 25].
Regarding network structure optimization, the two most commonly used methods may present significant deficiencies especially regarding the amount of neurons created to solve the pattern classification problems. Approaches that create a high amount of fuzzy neurons may impair the final response of the network because unnecessary information is present in these neurons.
This paper proposes a novel learning algorithm for fuzzy neural networks able generate compact and accurate models. The learning is performed using ideas from Extreme Learning Machine [16] as in [4], to speed-up parameter tuning, and regularization theory [17]. In the model developed in [4] is used the concept of regularization by resampling, which is an excellent formula to improve the model response, but has four variables that need to be defined to fit the model. This paper it is proposed to use, an approach using F-scores [18] is used to define the network topology by selecting a subset of the candidate neurons. Finally, the remaining network parameters are estimated through least squares. The resulting network is a sparse model and can be expressed as a compact set of incomplete fuzzy rules, that is, rules with antecedents defined using only a fraction of the available input variables [15] making it possible to reduce the complexity of the model by eliminating two variables. In this case they must be chosen the percentage of selected neurons and the number of membership functions for each feature of the database used in themodel.
The paper is organized as follows. Next section reviews the necessary basic concepts about fuzzy logic neurons and fuzzy neural networks. Section 3 details the proposal new learning algorithm able to pruning less significant neurons for the model, besides the concepts about the F-scores. Section 4 presents the experimental results for pattern classification problems and comparison with alternative classifiers based on statistical tests. Finally, the conclusions and further developments are summarized in Section 5.
Fuzzy neural networks
Fuzzy logic neurons
Logical neurons are functional units that add logical aspects of processing with learning capacity. They can be seen as multivariate non-linear transformations between unit hypercubes, or [0, 1] N → [0, 1] [1], where N is the number of inputs.
Thus, the and and or neurons aggregate the input signals (membership values)
The neuron or is interpreted as a logical expression that performs a t-norm with inputs and weights, and then to perform a global aggregation with the results using an s-norm. The neuron and acts in the opposite directionusing an s-norm with the inputs and the weights and t-norm with the results obtained. The activation functions φ and and φ or can, in general, be nonlinear mappings. In this paper φ and (ξ) = φ or (ξ) = ξ, i.e., they are defined as the identityfunction [4].
In order to improve the capacity of the fuzzy neurons, it was decided to unite characteristics of neurons and and or in a single logical relationship. This structure is known as uninorm [27].
A uninorm is a binary operator U : [0, 1] 2 → [0, 1] which is commutative, monotonic, associative, and such that there exists an element u ∈ [0, 1] called the absorption element that satisfies NU (u, x ; u) = u, for all x ∈ [0, 1]. Unineuron can be seen as an extension of logical neurons and and or [1] in which a uninorm is a generalization of t-norms and s-norms [8], is used in the operations of weighting and aggregation [1]. The uninorm used in this paper is expressed as follows [27]:
The unineuron adopted in this work follows the same concept proposed in [9] that performs the following operations to calculate its output:
each pair (a
i
, w
i
) is transformed into a single value bi = we calculate the unified aggregation of the transformed values
Fuzzy neural networks are neural networks of fuzzy neurons [2]. These networks have as the main technical features the synergic collaboration between the fuzzy and neural networks theory generating models that integrate the treatment of the uncertainty and interpretability provided by fuzzy systems and the learning ability provided by neural networks [2]. This type of model has many applications in industry [7] and computer systems, allowing standards to be discovered, series forecasts are made easier and accurate responses are delivered to its users.
The fuzzy logic neurons described in the previous section can be used to construct fuzzy neural networks and solve pattern recognition problems. Figure 1 illustrates the feed forward topology of the fuzzy neural networks presented in [4], which will also be considered in this paper.

Feedforward fuzzy neural network.
In the first model layer neurons are of the type formed by Gaussian membership functions created with the model input data different from the model proposed in [4], which uses triangular functions. They are the activation functions of the corresponding neurons. For each input variable x ij , K fuzzy sets are defined A k , k = 1 … K. Thus, the outputs of the first layer are the membership degrees associated with the input values, i.e., a jk = μ A k . for j = 1 …, N and k = 1, …, K, where N is the number of inputs and K is the number of fuzzy sets for each input. The second layer is composed of L unineurons. Each unineuron has the ability to perform a weighted aggregation of some of the outputs of the first layer with the weights w i l (for i = 1 … N and l = 1 … L) of the logical neuron. For each input variable j, only one first layer output a jk is defined as input of the l-th neuron. Furthermore, in favor of generating sparse topologies, each second layer neuron is associated with only n l < n input variables, that is, the weight matrix w is sparse [4].
Finally, in the same way as in [4], the output layer uses a classic linear perceptron neuron to compute the network output:
Figure 1 illustrates an example of a fuzzy network composed by unineurons. This network has 2 input variables, 3 fuzzy sets for each variable and 3 unineurons, i.e., M = 2, K = 3 and L = 3. The following if-then rules can be extracted from the network structure [4]:
Fuzzy neural network architecture and training
Fuzzy neural network architecture can be done in a number of ways, as has been reported in the literature, but each of these applications can vary greatly with the size of the test sample or even with the complexity of the database. The greater the number of dimensions of a problem, the more complex it is for a neural network algorithm to make adjustments to the internal parameters of the network.
In the model proposed in [4] the adjustment of the parameters occurs over the concepts of Extreme Learning Machine [19]. The algorithm assigns random values for the first layer weights and analytically estimates the output layer weights. When compared with traditional methods for SLFNs learning, this algorithm has good generalization performance and extremely low time complexity, since only the output layer parameters are tuned [19], especially when the database used for the training has unnecessary information discarded, so the use of regularization methods [21] are necessary so that the responses of the model are much closer tothe real.
Input domain partition using a number of equally spaced membership functions that are chosen at the time of network training [4]. This type of approach was first used in [9] and extensively improved in [4]. This approach assigns random values for the neuron parameters and estimates the parameters of the output layer neuron using least squares. In [9] the learning algorithm has a low computational cost, but still generates networks which are not easily interpretable, since the fuzzy sets are still defined using clustering. Already in [4] the regularization by resampling improves the decision space of the model, improving the accuracy of the pattern classification, however in tests with large samples or dimensions the resampling can be slow or require a lot of computingprocessing.
This paper proposes an improvement in the learning algorithm described in [4] in order to improve the interpretability of the resulting networks to maintain the interpretability and transparency of the model, allowing the regularization of fuzzy neural network to happen in a single step, simple configuration and understanding. A variable selection algorithm based on f-scores is used to define the network topology. This algorithm is able to generate a sparse topology which can be interpreted as a compact set of incomplete fuzzy rules like what is done by regularization by resampling, but in a simpler way. The proposal learning algorithm initially defines first layer neurons by dividing each input variable domain interval with K fuzzy sets, where K is usually a small number. All input variables are partitioned using the same number of fuzzy sets [4].
Next, L c candidate neurons are randomly generated, where L c < L. For each candidate neuron l = 1, …, LC, first, a random fraction of the input variables is selected [4].
The value n
l
represents the number of input variables associated with the l-th neuron. See more about the n
l
variable settings in [4] The learning algorithm assumes that the output of a network composed to all L
c
candidate neurons can be writtenas:
Finally, once the network structure is defined, the learning algorithm has only to estimate the output layer vector v = [v0, v1, v2 … v
L
] T which best adjustment the wanted output. In this paper these parameters are computed using the Moore- Penrose pseudo-inverse:
The data projection in (10) of the fuzzy neural network can be interpreted as a characteristic extraction procedure, since it generates a new set of characteristics, represented by the columns of the matrix
In this context, a lot of L neurons can bring the model some information that have high correlation, producing redundant and unnecessary information for the final output of the fuzzy neural network, and may even harm their predictive ability to perform pattern classification. The approach used in this paper to solve the problem of correlation in the fuzzy neural network is to select the characteristics most relevant to the model, those that contain maximum discriminative information about the classes. For this, we will use the F-Scores concept [16] that is capable of evaluating the discriminating power of the variables of a set of characteristics. In this context, we used the concept of F-scores in the second layer of fuzzy neural network seeking to discard neurons that have smaller F-score than a defined threshold value.
Given the output
Therefore Equation (8) is now calculated by:
The new learning procedure is summarized in Algorithm 1. The algorithm has two parameters, half of which is used in [4]:
the number of fuzzy sets for input space partition, K; the number of candidate neurons, Lc;
The fuzzy neural network learning algorithm described in the previous section is evaluated using pattern classification problems. For all experiments in this section, only networks composed to unineurons are considered. All unineurons use the product t-norm and the probabilistic sum s-norm and only Gaussian fuzzy sets are used. The network used in the experiments was named Pruning UNINET. First, a simple toy binary classification problem is considered to illustrate the transparency of the resulting network. Next, the accuracy of the Pruning UNINET is evaluated for benchmark binary classification problems [22].
Artificial classification dataset
Initially, the transparency of the Pruning UNINET and its ability to literal interpretation of results was evaluated using a toy problem. The artificial data set consists of 40 samples and was formed by two Gaussians (Fig. 2) with covariance equal to the identity matrix and centered on [0,0] and [6,6, 6,6] respectively. In this context, 70% of the samples for training and 30% for the tests will also be used. The algorithm parameters were set to: K = 2,L c = 4.

Artificial Classification dataset.
The resulting network is able to classify all test set samples correctly and has only 2 neurons, which can be interpreted as the following fuzzy rules:
Considering this artificial problem, their separation in Cartesian space and the answers obtained, we can consider each of the Gaussian membership functions for each input space as literal features, such as small and large. The two dimensions of this problem may be related to the relationship between people who smoke (x) and on the frequency of consumption of alcoholic beverages (y). The model’s answer is the probability of having cancer (yes (blue) and not (red)). As the model was used with K = 2, the initial interpretation for the problem can be seen in Fig. 3 below where small is the first membership function and large to second.

Artificial Classification dataset and membership functions.
The most representative fuzzy rules for the problem are those that represent the largest set of data. We can verify that for the artificial problem, the neuron formed by the two functions of small and large relevance are the most representative for this problem. The literal interpretation of this hypothetical problem can be given by the following rules:
In this section the proposes learning algorithm is evaluated using benchmark binary classification datasets taken from the UCI Machine Learning repository [26]. In order to verify the performance of the Pruning UNINET, datasets with different sizes and input dimensions were used. The Ionosphere (ion) database, which represents measurements of free electrons in the ionosphere, the Pima Indian Diabetes (pid) that brings information collected regarding the presence of signs of diabetes in patients and finally the database Wisconsin Diagnostic Breast Cancer (wbc), containing information on breast cancer diagnoses. The databases provided by the starlog deal respectively with Australian credit (acr), German (gcr) and probability of heart attacks (hea). The main characteristics of these datasets are summarized in Table 1.
Specification of binary classification datasets
Specification of binary classification datasets
All observations with missing values were removed and the output were normalized to be in [−1, 1]. The inputs were normalized to zero mean and unit variance. Two thirds of the data samples were randomly selected for training and theremaining for performance evaluation. For all experiments, the algorithm parameters were set to: K = 3 and the number of candidate neurons, L c , was set to half of the training set size of each dataset [20].
The performance of the proposed approach was compared with two alternative classifiers. An algorithm that performs the pruning of neurons in ELM based on affinity matrix (AFP-ELM) described in [30], the Pruning UNINET, the R-ANDNET suggested in [4] and a state-of-the art pruning algorithm for ELMs, the OP-ELM [23]. The number of candidate neurons for the OPL-ELM was also set to half of the training set size and sigmoidal neurons were considered. Since some parameters of these models are selected randomly and may affect the final training results, each approach was run 30 times and the average values of the performance indexes used for comparison. ANDNET uses the same value of K and of L c that is used by Pruning UNINET. The number of replications of bootstrap is 8 and the established consensus threshold is 60%.
Table 2 details the test set accuracy for each dataset. Table 3 details the average number of neurons after training. The results presented in these tables suggest that the Pruning UNINET performs similar than AFP-ELM, ANDNET and OP-ELM in most of the experiments. Further-more, for all experiments, the resulting network has a very simple topology (small number of neurons), if compared with the other models. The average execution times of the algorithms, measured in seconds, are listed in Table 4. Simulations were performed on a Core (TM) 2 duo CPU, 2.27 GHz with 3-GB RAM.
Performance comparison for binary classification datasets
Performance comparison for binary classification datasets
Number of neurons for binary classification datasets
Average time in seconds of model training
We can verify that in general the accuracy of the methods is presented in a very egalitarian way, be-tween the algorithms tested and the bases submitted to the test. Each model was the best classifier in at least one database, but the highlight in this issue was for Pruning UNINET, which stood out in 3 bases where the number of characteristics vary Between 8 to 20. The other models stood out in one of the essays. Despite the greater prominence of Pruning UNINET, it is important to note that the results are very close in most of the trials, which corroborates that the model proposed in the paper maintained the ability to classify binary data.
When evaluating the amount of retained neurons, we highlight the efficiency of the proposed model in per-forming retention of neurons that are significant to the context. The pruned model and the regularized model of fuzzy neural networks played an efficient role in the classification of patterns, maintaining the accuracy indices close to the models evaluated in the test, performing the task with a smaller number of neurons, which allows us to say that the final model was less complex to carry out classification activities.
Another relevant factor is present in the Table 4, where the execution times of each of the algorithms are highlighted. Because the regularized algorithm uses replication to perform neuron retention, its running time is greater than all others. As the model proposed in the article accomplishes the classification of binary patterns with times similar to the compared models, we can conclude that the proposal has a viable execution time for the resolution of problems related to pattern classification.
In order to confirm the performance of the proposed model, a statistical analysis is performed on the data collected in the pattern classification.The algorithms will be evaluated considering the accuracy as the only factor, where each of the classification bases can be seen as the blocking factor. To perform the statistical tests, the analysis of variance (ANOVA) [31] on the results of each of the groups (algorithm x block factor) is used in the test. In general it is verified that the test has 24 (4 algorithms and 6 bases) groups. Through this test seeks to establish whether the performance of the proposed algorithms in this paper presents average performance equal to algorithms that perform pruning of neurons.
After performing the analysis of variance it was concluded that the null hypothesis must be accepted (p-value of 0.4712) that the accuracy performance of the four algorithms is equal. After validation of the model [31] where we confirmed the normality of the collected data (p-value of 0.8621), the homoscedasticity (p-value of 0.7687) and the independence of the data with the p-value of 0.0871 can be affirmed with 95% certainty that the four algorithms analyzed have the same behavior when performing the binary classification of the six bases evaluated.
Conclusion
This paper presented a pruning method for fuzzy neural network models. In the model, the more well defined the group of fuzzy neurons that perform the classification of binary patterns, the greater is the efficiency of the algorithm.
In the tests carried out, we verified that the proposed use of f-scores retained a significant amount of neurons for the problem and in a smaller amount when compared with state-of-the-art models of neurons based on extreme learning Machine training. Compared to the model of fuzzy neural networks regulated by resampling, the accuracy of results are statistically equal and can be obtained by reducing the complexity of the model, reducing the parameters in half and performing tasks with a run-time in much smaller pattern classification than models that use the bootstrap.
In the future, other models can be added to the test, increasing the base and parameterizing the tests with values that can be changed through cross-validation.
The model can be extended to multi-class tests and other methods that correlate the internal layers of fuzzy neural networks. Can also be explored in order to prune unnecessary neurons. Other hypothesis of improvement in the accuracy of the model can be proposed with new fuzzyfication algorithms, which may be based on other fuzzy elements, such as cluster fuzzy algorithms.
