Interpretation of first-order recurrent neural networks by means of fuzzy rules

Abstract

First-order recurrent neural networks can be trained to recognize strings of a regular language. Finite state automata can be extracted from these neural networks. Normally, a search process in the output domain of the neurons is necessary for carrying out this extraction procedure. On the other hand, studies about fuzzy rules extraction from feedforward multilayered neural networks can be considered to define new techniques that transform first-order recurrent neural networks into finite state automata. With these new techniques, a fuzzy description of the action of each neuron can be obtained. From these descriptions, the transition function of the automaton can be directly found and, in this way, the search process is not necessary. A technique with this approach is presented in this paper. Besides, the used method to extract fuzzy rules from a neuron has the advantage that the inputs of the fuzzy system coincide with the inputs of the neuron. Thus, the fuzzy system is more intuitive. Once the transition function is obtained, the automaton structure can be found with the analysis of the transitions for every state and input from the initial state. Finally, several examples are presented to illustrate the method.

Keywords

First-order recurrent neural networks regular grammars fuzzy rules finite state automata

1. Introduction

In 1956 Chomsky classified the languages in terms of generative grammars [3]. The type of simplest grammar is regular grammar. The languages that are described with such grammars are called regular languages, which can be recognized by Finite State Automata (FSA) [6]. These formal machines are useful for designing and verifying digital circuits, carrying out the lexical analysis of a programming language, exploring and acting on large text files, …

On the other hand, in the early 1990 s, Elman [4] introduced his well-known simple recurrent network. The connection between finite state machines and neural networks was again established. He compared the internal activations of the networks with the states of a finite state machine. Since this moment, an abundance of articles has been written on Recurrent Neural Networks (RNNs) and their connection with the state machines. Many references and revision can be found in [7]. Actually, RNNs have been applied in many practical problems [17 –26].

Several articles describe techniques to transform RNNs into FSA [1 , 12–14]. In these works, it is necessary to carry out a search process in the output domain of the neurons to find the states of the FSA. This procedure has several problems. Normally, the search process is based on dividing the output of each state neuron of the RNN into q intervals or looking for clusters in the state space. Since many partition values of q are available or many clusters can be found, many FSA can be extracted. How to obtain the best finite state automaton?.

Actually, the transformation of RNNs into FSA can be done in a different way. This is possible thanks to the progresses about fuzzy rules extraction from Multilayer Feedforward Neural Networks (MFNN) [9, 10]. If the action of each neuron of the RNN is understood by means of a fuzzy rule system, a comprehensible description of the transition function of the automaton learnt by the trained RNN can be obtained. In this way, it is avoided the search process that is necessary to transform RNNs into FSA in other methods.

Regarding the methods to understand the action of a neuron by means of fuzzy rules. in [9] fuzzy rules are extracted from a RNN to extract the associated automaton. In this case, the way of obtaining rules is inspired by [8] where fuzzy rules are obtained from Multilayer Feedforward Neural Networks (MFNN). In this method, the inputs of the fuzzy rules are the inner product between the system inputs and the corresponding weight vector. An improvement of this method was presented in [10] where the fuzzy rules obtained from a MFNN directly use the system inputs. The action of a hidden neuron can be understood by analyzing two simple Takagi-Sugeno-Kang Fuzzy (TSK) fuzzy rules [15]. This fact facilitates the analysis of the acquired knowledge by a network.

In this way, if the extraction method presented in [10] is used to extract FSA from first-order RNNs, then the description of the transition function of the automaton can be obtained by means of a fuzzy rule system that uses the original inputs. Besides, it is not necessary a search process in the output domain of the neurons.

The methodology to transform first-order RNNs into FSA is as follow:

Let us suppose that a fisrt-order RNN has been trained to recognize strings of a regular language L, then:

The values of its neurons represent the states of the automaton M associated with L.

The action of these neurons implements the transition function of M.

Step 1) To apply the method presented in [10] on each neuron of the RNN. Thus, a fuzzy description of the transition function of M is obtained.

Step 2) To get the structure of the automaton by analyzing transitions for every state and input from the initial state, by using the previous fuzzy description.

As complement to the previous extraction methodology, the techniques presented in [11] are also applied on first-order RNNs. In [11], fuzzy knowledge is inserted into MFNNs. Hence, a method to insert FSA into first-order RNNs can be defined if the insertion method presented in [11] is applied on first-order RNNs. This will be illustrated with an example.

The structure of the paper is as follows. Initially, several concepts are introduced: TSK Fuzzy Rule Based Systems (TSK FRBSs) and extraction of TSK fuzzy systems from neural networks. Next, the relations between FSA and RNNs are exposed which allow to present the two applications of this work: extraction of transition functions from first-order RNNs and insertion of FSA into first-order RNNs. They are illustrated by using Tomita’s grammars. Finally, some conclusions are described.

2. Zero-order TSK fuzzy rule-based systems

Commonly, the zero-order TSK FRBSs [15] have the rule structure that is introduced in the following definition.

Definition 1.

A FRBS with the following rule structure is named as zero-order TSK FRBS: $R_{k} : If x_{1} is B_{1} \land x_{2} is B_{2} \land \dots \land x_{n} is B_{n} then y_{k} = p_{k,}$ where x_i are the input features of problem, B_i are the linguistic labels of associated fuzzy sets, $p_{k} \in ℜ$ and $y_{k}$ is the output variable.

This zero-order TSK FRBS computes its output according to the following definition.

Definition 2. The output y of a FRBS with m zero-order TSK rules is computed as the weighted average of individual rule outputs y_i (i = 1, . . , m) as follows: $y = \frac{\sum_{i = 1}^{m} y_{i} \cdot g_{i}}{\sum_{i = 1}^{m} g_{i}},$ (1) where g_i = T (B₁ (x₁) , . . . , B_n (x_n)) is the matching degree between the IF-part of rule and the current system inputs, T is a t-norm (usually, the product operator) and x = (x₁, x₂, . . . , x_n) is the input of system.

In the following Sections, the fuzzy rules will be implemented using the product operator as t-norm T.

3. Extraction of TSK fuzzy rules from neural networks

Next, the definition of a hidden neuron (Definition 3) and its equivalence with a TSK FRBS (Proposition 1) are presented.

Definition 3.

The action of hidden neuron h_j of a MFNN follows the given expression: $h_{j} (x) = sigm (\sum_{i = 1}^{n} (x_{i} \cdot w_{ij}) + b_{j}),$ (2) where b_j is the bias of hidden neuron h_j, w_ij is the weight of connection from the input x_i to the hidden neuron h_j, “sigm” is the sigmoid activation function and x = (x₁, x₂, . . . , x_n) is the input of system.

The expression (2) can be rewritten as: $h_{j} (x) = sigm (\sum_{i = 1}^{n} (x_{i} \cdot w_{ij} + b_{ij})),$ (3) where $\sum_{i = 1}^{n} b_{ij} = b_{j}$ . This translation will be useful in the fuzzy rules extraction process.

Proposition 1. The action of the hidden neuron h_j presented in Def. 3 and Equation (3) is equivalent to the action of following zero-order TSK FRBS with two rules: $\begin{matrix} R_{1}) If not A_{j} then h_{j}^{1} (x) = 0 \\ R_{2}) If A_{j} then h_{j}^{2} (x) = 1 . \end{matrix}$ where $\begin{matrix} • ″ not A_{j} ″ \equiv x_{1} is sigm ((- 1) (x w_{lj} + b_{lj})) \land \dots \land x_{n} is sigm ((- 1) (x w_{nj} + b_{nj})) \\ \equiv x_{1} is (1 - sigm (x w_{lj} + b_{lj})) \land \dots \land x_{n} is (1 - sigm ({xw}_{nj} + b_{nj})) \\ \equiv x_{1} is not (sigm (x w_{lj} + b_{lj})) \land \dots \land x_{n} is not (sigm (x w_{nj} + b_{nj})) \\ • A_{j} ″ = x_{1} is (sigm ({xw}_{lj} + b_{lj})) \land \dots \land x_{n} is (sigm ({xw}_{nj} + b_{nj})) \\ • The aggregation \land is implemented with the product operator . \end{matrix}$

The proof of Proposition 1 can be found in [10].

The fuzzy rules of the system presented in Proposition 1 use the membership function sigm(x × w_ij + b_ij) . This function has an easy linguistic interpretation, which can be used in the insertion process of an automaton into a RNN. In this process, an expert must provide an initial description of the transition function. This expert can use linguistic terms instead of sigmoidal functions to describe the transitions.

The function $sigm (x \times w_{ij} + b_{ij}) = sigm (w_{ij} \times (x + (b_{ij} / w_{ij})))$ is centered in the value (- b_ij/w_ij) and its slope is proportional to w_ij. Thus, the propositions of the fuzzy rules can be linguistically interpreted as:

When w_ij >0.0, the proposition “x_i is sigm (w_ij × (x + (b_ij/w_ij))) ” can be interpreted as

“x is approximately greater than (- b_ij/w_ij) ”

“x is ≈ > (- b_ij/w_ij) ”.

The corresponding fuzzy set is shown in Fig. 1.

Fig. 1

“x is approximately greater than (- b_ij/w_ij)”.

When w_ij < 0.0, the proposition “x_i is sigm (w_ij (x + (b_ij/w_ij)))” can be interpreted as

“x is not approximately greater than (- b_ij/w_ij)”

≡ “x is approximately smaller than (- b_ij/w_ij)”

“x is ≈ < (- b_ij/w_ij)”.

The corresponding fuzzy set is depicted in Fig. 2.

Fig. 2

“x is approximately smaller than (- b_ij/w_ij)”.

The degree of uncertainty established by the term “approximately” is determined by the slope of the sigmoid function. If the factor |w_ij| is high, then the sigmoid function has a sharp slope and the defined fuzzy set is practically crisp. If |w_ij| is small, then the fuzzy set has a high uncertainty. This last class of propositions will be avoided by using a simplification process of the fuzzy systems. This is exposed in the following section.

3.1. Simplification of the TSK fuzzy system

According to Proposition 1, the action of a hidden neuron h_j represented by (3) is equivalent to a system with two fuzzy rules. Each rule uses n propositions “x_i is [not] sigm (xw_ij + b_ij) ″i = 1, . . . , n. This number of propositions can be reduced. The idea is to take advantage of the freedom to distribute the bias b_j among the variables b_ij in (3). This reduction process can be explained with the following points:

The membership functions with small slope (|w_ij| is small) produce propositions with high uncertainty. This kind of functions are candidate for the simplification process by means of the next steps.

The membership degree of the propositions “x_i is sigm(x × w_ij + b_ij)” can be almost 0.0 or 1.0 in the domain of the variable x_i. It is only necessary to adjust the value b_ijin the following way:

Let us suppose a particular variable x_c with c∈ { 1, …, n } and x_c ∈ [a₁, a₂] , a₁, a₂ ∈ ℜ, a₁a₂. Because sigm (+ 8) >>1.0 and sigm (-8) >>0.0, the proposition “x_c is sigm (x × w_cj + b_cj)” has degrees almost 1.0 when: $\begin{matrix} a) b_{cj} = (8 - a_{1} \times w_{cj}), if w_{cj} > 0.0 . \\ b) b_{cj} = (8 - a_{2} \times w_{cj}), if w_{cj} < 0.0 . \end{matrix}$ and it has degrees almost 0.0 when: $\begin{matrix} c) b_{cj} = (- 8 - a_{2} \times w_{cj}), if w_{cj} > 0.0 . \\ d) b_{cj} = (- 8 - a_{1} \times w_{cj}), if w_{cj} < 0.0 . \end{matrix}$ With these values for b_cj, the domain of the function sigm(x×w_cj+b_cj) is:

(x × w_cj + b_cj) ∈ [8, 8 + (a₂ - a₁) × |w_cj|] if proposition has degrees ≈ 1.0.

(x × w_cj + b_cj) ∈ [-8 - (a₂ - a₁) × |w_cj|, -8] if proposition has degrees ≈ 0.0.

The input of the function sigm(x × w_ij + b_ij) is replaced by a constant value in the fuzzy propositions with degrees almost 0.0 or 1.0. This constant is equal to the mean value of the domain of the sigmoid function.

For example, the previous proposition “x_c is sigm(x × w_cj + b_cj)” is replaced by:

“x_c is sigm (8 + [(a₂ - a₁)/2] × |w_cj|)” if proposition has degrees almost 1.0.

•“x_c is sigm (-8 - [(a₂ - a₁)/2] × |w_cj|) ” if proposition has degrees almost 0.0.

The fuzzy propositions with constant values can be discarded from the rules. In order to make this, the original bias b_j of the hidden neuron is modified.

In the previous example, the new expression for the action of the hidden neuron is: $h_{j} (x) = sigm (\sum_{i = 1, i \neq c}^{n} (x_{i} \cdot w_{ij}) + b_{j}^{new})$ (4) where

$b_{j}^{new} = b_{j} + (8 + [(a_{2} - a_{1}) / 2] \times | w_{cj} |)$ for the case of proposition with degrees ≈ 1.0.

$b_{j}^{new} = b_{j} + (- 8 - [(a_{2} - a_{1}) / 2] \times | w_{cj} |)$ for the case of proposition with degrees ≈ 0.0.

Starting from expression (4), a new TSK FRBS with two rules is calculated by using Proposition 1. This system will have (n-1) fuzzy propositions.

4. First-order RNNs and FSA

Next, the definitions of first-order RNN and finite state automaton are presented. If these definitions are compared, then relations between both concepts can be found. These associations will be useful to extract FSA from first-order RNNs.

4.1. First-order RNNs

Definition 4 A first-order RNN (Fig. 3) is defined by the following elements:

The network has m state neurons (S₁, . . . , S_m).

It has one binary input II{ 0, 1 }

The RNN’s dynamics is given by

Fig. 3

First-order recurrent neural network.

\begin{matrix} S_{j}^{t + 1} & = & sigm (\sum_{i = 1}^{m} (S_{i}^{t} \cdot w_{ij}) + I^{t} \cdot v_{j} + b_{j}), \\ j = 1, \dots, m \end{matrix}

(5) where b_j is the bias of the neuron S_j, I^t is the symbol in the position t of an binary string, w_ij is the weight of connection from the neuron S_i to the neuron S_j, v_j is the weight of connection from the input I to the neuron S_j and sigm is the sigmoid activation function.

The network initial conditions are $S_{j}^{1}$ = 0 for j = 1,…, m.

The neuron S₁ is considered as output neuron, if S₁ ^end≤ 0.5 the input string is accepted, otherwise it is rejected.

In order to facilitate the future rule extraction process, Equation (5) can be rewritten as:

$\begin{matrix} S_{j}^{t + 1} & = & sigm (\sum_{i = 1}^{m} (S_{i}^{t} \cdot w_{ij} + b_{ij}) + (I^{t} \cdot v_{j} + b_{0 j})), \\ j = 1, \dots, m \end{matrix}$ (6) where $\sum_{i = 0}^{m} b_{ij} = b_{j}$ .

4.2. Finite state automata

Definition 5

A finite state automaton is a 5-tuple M = (Q, X, d, q₀, F) , where:

Q is a finite set of states.

X is the finite input alphabet.

d : Q′X → Q is the transition function.

q₀ is the initial state.

F ⊆ Q is a set of accepting states.

A string u is accepted by a finite state automaton M if the state that is reached after u has been read by M is an accepting state. Regular languages can be recognized by FSA. There is an associated finite state automaton M for each regular language L such that a string u is accepted by M iff u ∈ L.

For a full discussion of what a regular language is and what other classes of languages there are, interested readers are referred to [6].

4.3. Relations between first-order RNNs and FSA

When a first-order RNN is trained to recognize strings of a regular language, several relations appear between the automaton associated with the language and the first-order RNN. These relations are:

The finite set of states Q of the automaton is encoded by the state neurons of the first-order RNN. Each state q ∈ Q of the automaton is represented by the m output values of the network neurons, that is, $q = (S_{1}^{t}, . . ., S_{m}^{t}) \in Q = {(0, 1)}^{m}$ .

The finite input alphabet X of the automaton coincides with the domain of the input I of the first-order RNN, that is, I ∈ { 0, 1 } = X.

The transition function of the automaton (d : Q′X → Q) is represented in the first-order RNN as: $δ : {(0, 1)}^{m'} X \to {(0, 1)}^{m}$

$δ ((S_{1}^{t}, . . ., S_{m}^{t}), I^{t}) = (S_{1}^{t + 1}, . . ., S_{m}^{t + 1}) .$ (7)

This representation is equivalent to m functions δ_j (j = 1, …, m): $δ_{j} : {(0, 1)}^{m}' X \to (0, 1)$

$δ_{j} ((S_{1}^{t}, . . ., S_{m}^{t}), I^{t}) = S_{j}^{t + 1} .$ (8)

Each function δ_j is implemented in the first-order RNN by the action of a S_j state neuron. This action is determined by the weights and biases of the neuron. It is interesting to notice the similarity between the expression (8) and the Equation (6).

The initial state q₀ of the automaton is encoded with the initial conditions of the first-order RNN, that is, $(S_{1}^{1} = 0, . . ., S_{m}^{1} = 0) = q_{0}$ (9)

The set of accepting states F⊆ Q is encoded with the output neuron S₁, that is,

$q = (S_{1}^{t}, . . ., S_{m}^{t}) \in F \Leftrightarrow S_{1}^{t} ⩽ 0.5$ (10)

5. Extracting FSA from first-order RNNs

Let us suppose that a first-order RNN has been trained to recognize a regular language L. If the action of each state neuron S_j of this first-order RNN is understood by means of a TSK fuzzy system, it can be obtained a description of each transition function δ_j of the automaton M associated with the language L. This fact is possible by using the following proposition:

Proposition 2 The action of the state neuron S_j presented in (6) is equivalent to the following TSK FRBS with two rules: $\begin{matrix} R_{1}) If not A_{j} then S_{j}^{t + 1} = 0 \\ R_{2}) If A_{j} then S_{j}^{t + 1} = 1 . \end{matrix}$ where

“not A_j “ $\equiv [S_{1}^{t}$ is not sigm (x × w_1j + b_1j) ∧ … ∧ $S_{m}^{t}$ is not sigm (x × w_mj + b_mj)]

[I^t is not sigm (x × v_j + b_0j)]

“A_j “ $\equiv [S_{1}^{t}$ is sigm (x × w_1j + b_1j) ∧ … ∧ $S_{m}^{t}$ is sigm (x × w_mj + b_mj)]

[I^t is sigm(x× v_j+b_0j)]

Proof (Proposition 2)

The proof is direct by applying Proposition 1 on a hidden neuron with n + 1 inputs (x₀, x₁, …, x_n), where x₀ = I^t and x_i = $S_{i}^{t}$ for i = 1, …, n.

If a fuzzy description is obtained for all the state neurons S_j (j = 1, …, m), the transition function δ of the automaton M is obtained. In this moment, it is easy to get the total structure of M from the initial state. This is done with the analysis of all the possible transitions for every input symbol and state from the initial state.

Next, the steps of this process are algorithmically detailed:

Input: First-order RNN trained to recognize a regular language L.

Output: Minimal finite state automaton M that accepts the strings of the regular language L.

Step 1) For each state neuron S_j (j = 1, …, m) of the first-order RNN:

Step 1.1) To extract a TSK FRBS with two rules from the neuron S_j using the Proposition 2. This FRBS represents the function δ_j.

Step 1.2) To simplify the fuzzy propositions of the TSK system by means of the procedure described in Section 3.1.

Step 1.3) To rewrite the fuzzy rules by considering discrete the inputs of the fuzzy system. These inputs are the output values of the state neurons S_j and the input I in the time t.

Step 2) Starting from the initial state S_j (j = 1, …, m), to analyze the structure of the automaton for every input and state by using the discrete descriptions of the transition functions δ_j obtained in the previous step 1.3.

Step 3) To minimize the automaton of the Step 2 with a standard FSA minimization algorithm [6].

5.1. Example of extraction

In order to show as FSA can be extracted from first-order RNNs, Tomita’s 4th grammar has been chosen [16]. This grammar generates the language L composed by the binary strings not containing “000” as a substring. Its automaton is illustrated in Fig. 4.

Fig. 4

Automaton that represents tomita’s 4th grammar.

In [9], a first-order RNN with three state neurons (S₁, S₂, S₃) was trained to recognize the language generated by Tomita’s 4th grammar. Details on the training conditions of this network can be found in [9]. A first-order RNN, trained with the same conditions, that classifies all the strings of the language generated by Tomita’s 4th grammar has the following dynamics [9]: $S_{3}^{t + 1} = sigm (S_{1}^{t} \times 0 + S_{2}^{t} \times 0 + S_{3}^{t} \times 0 + I^{t} \times (- 8) + 3.8) .$ (11) $S_{2}^{t + 1} = sigm (S_{1}^{t} \times 0.2 + S_{2}^{t} \times (- 0.2) + S_{3}^{t} \times 4.5 + I^{t} \times (- 3) - 3.2) ._$ (12) $S_{1}^{t + 1} = sigm (S_{1}^{t} \times 15.2 + S_{2}^{t} \times 8.4 + S_{3}^{t} \times 0.2 + I^{t} \times (- 6) - 4.6) .$ (13)

Once that a first-order RNN with three state neurons has been trained to recognize the strings generated by Tomita’s 4th grammar, the following steps are carried out to obtain the automaton:

Step 1. In this step, the description of the partial transition function that represent each state neuron (S₁, S₂, S₃) must be obtained. This is carried out with the steps 1.1, 1.2 and 1.3 for each state neuron.

State neuron S₃

Step 1.1) The following expression is obtained from (11),

$\begin{matrix} S_{3}^{t + 1} & = & sigm (S_{1}^{t} \times 0 + S_{2}^{t} \times 0 + S_{3}^{t} \times 0 \\ + [(- 8) \times (I^{t} - 0.475)]) . \end{matrix}$ (14)

The following TSK fuzzy system can be extracted from (14) which describes the function δ₃: $\begin{matrix} If I^{t} is sigm ((- 8) (x - 0.475)) then S_{3}^{t + 1} = 1 \\ If I^{t} is not sigm ((- 8) (x - 0.475)) then S_{3}^{t + 1} = 0 \end{matrix}$

This system can be linguistically interpreted as: $\begin{matrix} If I^{t} is \approx < 0.475 then S_{3}^{t + 1} = 1 \\ If I^{t} is \approx > 0.475 then S_{3}^{t + 1} = 0 \end{matrix}$

Step 1.2) In this case, there is not simplification process.

Step 1.3) As I^t ∈ { 0, 1 } , the system can be approximated by the following discrete system:

$\begin{matrix} If I^{t} = 0 then S_{3}^{t + 1} = 1 \\ If I^{t} = 1 then S_{3}^{t + 1} = 0 \end{matrix}$

Intuitively, this discrete system indicates that the neuron S₃ is only activated when the input ‘0’ is presented.

State neuron S₂

Step 1.1) According expression (12), the weights w₁₂ and w₂₂ produce membership functions with small slope. According point 1 in Section 3.1, these functions are candidate for the future simplification process. The bias value will be used for replacing these functions by constant values (point 2 in Section 3.1). In this way, expression (12) can be expressed as: $\begin{matrix} S_{2}^{t + 1} & = & sigm ((S_{1}^{t} 0.2 + 8) + (S_{2}^{t} (- 0.2) - 8) \\ + (S_{3}^{t} 4.5) + (I^{t} (- 3) - 3.2) \\ = & sigm ([0.2 (S_{1}^{t} + 40)] + [(- 2.0) (S_{2}^{t} + 40)] \\ + [4.5 S_{3}^{t}] + [(- 3) (I^{t} + 1.07)] \end{matrix}$ (15)

The following TSK fuzzy system can be extracted from (15) which describes the function δ₂:

$\begin{matrix} If S_{1}^{t} is sigm (0.2 (x + 40)) \land S_{2}^{t} is sigm ((- 0.2) (x + 40)) \\ \land S_{3}^{t} is sigm (4.5 x) \land I^{t} is sigm ((- 3) (x + 1.07)) \\ then S_{2}^{t + 1} = 1 \\ If S_{1}^{t} is not sigm (0.2 (x + 40)) \land S_{2}^{t} is not sigm ((- 0.2) (x + 40)) \\ \land S_{3}^{t} is not sigm (4.5 x) \land I^{t} is not sigm (- 3) (x + 1.07)) \\ then S_{2}^{t + 1} = 1 \end{matrix}$

Step 1.2) It can be observed in the previous TSK fuzzy system that the proposition ”S₁^t is sigm(0.2×(x + 40))” is practically equal to 1.0 when S₁^t ∈ (0, 1) and the proposition ”S₂^t is sigm((−0.2)×(x + 40))” is practically equal to 0.0 when S₂ ^t ∈ (0, 1). In this way, the simplification process described in point 3 of Section 3.1 can be applied on these propositions.

As (0.2(x+40))∈ [8, 8.2] and ((-0.2)(x+40))∈ [-8.2, -8], the terms of the sigmoid functions can be replaced by the mean value of these domains. So, the propositions ” and ”S₂ ^t is sigm(−8.1)” are obtained.

With these new propositions the TSK fuzzy system is:

$\begin{matrix} If S_{1}^{t} is sigm (8.1) \land S_{2}^{t} is sigm (- 8.1) \\ \land S_{3}^{t} is sigm (4.5 x) \land I^{t} is sigm ((- 3) (x + 1.07)) \\ then S_{2}^{t + 1} = 1 \\ If S_{1}^{t} is not sigm (8.1) \land S_{2}^{t} is not sigm (- 8.1) \\ \land S_{3}^{t} is not sigm (4.5 \times x) \land I^{t} is not sigm ((- 3) \times (x + 1.07)) \\ then S_{2}^{t + 1} = 0 \end{matrix}$

If this fuzzy system is transformed into neuron, a new simplified neuron S₂ is got: $\begin{matrix} S_{2}^{t + 1} & = & sigm ([8.1] + [- 8.1] + [4.5 S_{3}^{t}] \\ + [(- 3) (I^{t} + 1.07]) \\ = & sigm ([4.5 S_{3}^{t}] + [(- 3) (I^{t} + 1.07]) \end{matrix}$ (16)

Step 1.3) As I^t ∈ { 0,1}, the action of the neuron S₂ expressed in (16) can be rewritten as: $\begin{matrix} {RB}_{1}) When I^{t} & = & 0, S_{2}^{t + 1} = sigm (4.5 \times S_{3}^{t} + (- 3.21)) \\ = & sigm (4.5 \times (S_{3}^{t} - 0.71)) . \\ {RB}_{2}) When I^{t} & = & 1, S_{2}^{t + 1} = sigm (4.5 \times S_{3}^{t} + (- 6.21)) \\ = & sigm (4.5 \times (S_{3}^{t} - 1.38)) . \end{matrix}$

Hence, a new fuzzy description of the neuron S₂ can be achieved by using Rule Bases (RBs) composed by two rules: $\begin{matrix} {RB}_{1}) When I^{t} = 0 \\ If S_{3}^{t} is sigm (4.5 \times (x - 0.71)) then S_{2}^{t + 1} = 1 \\ If S_{3}^{t} is not sigm (4.5 \times (x - 0.71)) then S_{2}^{t + 1} = 0 \\ {RB}_{2}) When I^{t} = 1 \\ If S_{3}^{t} is sigm (4.5 \times (x - 1.38)) then S_{2}^{t + 1} = 1 \\ If S_{3}^{t} is not sigm (4.5 \times (x - 1.38)) then S_{2}^{t + 1} = 0 \end{matrix}$

This system can be linguistically interpreted as:

$\begin{matrix} {RB}_{1}) When I^{t} = 0 \\ If S_{3}^{t} is \approx > 0.71 then S_{2}^{t + 1} = 1 \\ If S_{3}^{t} is \approx < 0.71 then S_{2}^{t + 1} = 0 \\ {RB}_{2}) When I^{t} = 1 \\ If S_{3}^{t} is \approx > 1.38 then S_{2}^{t + 1} = 1 \\ If S_{3}^{t} is \approx < 1.38 then S_{2}^{t + 1} = 0 \end{matrix}$

If the output values of the neuron S₃ are considered discrete in 0,1, the proposition “S₃^t is ≈ > 1.38” is always false and “S₃ ^t is ≈ < 1.38” is always true. Therefore, rule base RB₂ always returns S₂ ^t + 1 = 0.

The complete discrete system is:

$\begin{matrix} R_{1}) When I^{t} = 0 \\ If S_{3}^{t} = 1 then S_{2}^{t + 1} = 1 \\ If S_{3}^{t} = 0 then S_{2}^{t + 1} = 0 \\ R_{2}) When I^{t} = 1 S_{2}^{t + 1} = 0 \end{matrix}$

Intuitively, this discrete system indicates that the neuron S₂ is activated when the input ‘0’ is presented and S₃ is active (input ‘0’ was presented in the time (t−1)). In other words, the neuron S₂ is “on” when the substring “00” is being presented.

State neuron S₁

Step 1.1) According expression (13), the weight w₃₁ produces a membership function with small slope. According to the simplification process of Section 3.1, the bias value will be used for replacing this membership function by a constant value (points 1 and 2 of Section 3.1). In this way, expression (13) can be expressed as: $\begin{matrix} S_{1}^{t + 1} & = & sigm ((S_{1}^{t} 15.2) + (S_{2}^{t} 8.4) + (S_{3}^{t} 0.2 + 8) \\ + (I^{t} (- 6) - 4.6 - 8) \\ = & sigm ((S_{1}^{t} 15.2) + (S_{2}^{t} 8.4) + (S_{3}^{t} 15.2) \\ + (I^{t} (- 6) - 12.6) \\ = & sigm ([15.2 S_{1}^{t}] + [0.2 (S_{3}^{t} + 40)] \\ + [(- 6) (I^{t} + 2.1)]) \end{matrix}$ (17)

The following TSK fuzzy system can be extracted from (17) which describes the function δ₁:

$\begin{matrix} If S_{1}^{t} is sigm (15.2 x) \land S_{2}^{t} is sigm (8.4 x) \\ \land S_{3}^{t} is sigm (0.2 (x + 40)) \land I^{t} is sigm ((- 6) (x + 2.1)) \\ then S_{1}^{t + 1} = 1 \\ If S_{1}^{t} is sigm (15.2 x) \land S_{2}^{t} is sigm (8.4 x) \\ \land S_{3}^{t} is sigm (0.2 (x + 40)) \land I^{t} is sigm ((- 6) (x + 2.1)) \\ then S_{1}^{t + 1} = 0 \end{matrix}$

Step 1.2) In the previous TSK fuzzy system, it can be observed that the proposition ” S₃^t is sigm(0.2(x + 40))” is practically equal to 1.0 when S₃ ^t ∈ (0, 1). In this way, the simplification process described in point 3 of Section 3.1 can be applied on this proposition.

As (0.2(x + 40)) ∈ [8,8.2, 8,8.2], the term of the sigmoid function is replaced by the mean value of this domain. The proposition ”S₃^t is sigm(8.1)” is obtained. From this new proposition, the TSK fuzzy system is:

$\begin{matrix} If \land S_{1}^{t} is sigm (15.2 \times x) \land S_{2}^{t} is sigm (8.4 \times x) \\ \land S_{3}^{t} is sigm (8.1) \in^{t} is sigm ((- 6) \times (x + 2.1)) \\ then S_{1}^{t + 1} = 1 \\ If S_{1}^{t} is not sigm (15.2 \times x) \land S_{2}^{t} is not sigm (8.4 \times x) \\ \land S_{3}^{t} is not sigm (8.1) \in^{t} is not sigm ((- 6) \times (x + 2.1)) \\ then S_{1}^{t + 1} = 0 \end{matrix}$

This fuzzy system is transformed into neuron, a new simplified neuron S₁ is obtained: $\begin{matrix} S_{1}^{t + 1} & = & sigm ([15.2 S_{1}^{t}] + [8.4 S_{2}^{t}] + [8.1] + [(- 6) I^{t} + 2.1)] \\ = & sigm ([15.2 S_{1}^{t}] + [8.4 S_{2}^{t}] + [(- 6) I^{t}] - 12.6 + 8.1) \\ = & sigm (S_{1}^{t} 15.2 + S_{2}^{t} 8.4 + I^{t} (- 6) - 4.5) \end{matrix}$ (18)

Step 1.3) As I ^t∈ 0,1, the action of the neuron S₁ expressed in (18) can be rewritten as: $\begin{matrix} {RB}_{1}) When I^{t} & = & 0, S_{1}^{t + 1} = sigm (S_{1}^{t} \times 15.2 \\ + S_{2}^{t} \times 8.4 - 4.5) . \\ {RB}_{2}) When I^{t} & = & 1, S_{1}^{t + 1} = sigm (S_{1}^{t} \times 15.2 \\ + S_{2}^{t} \times 8.4 - 10.5) . \end{matrix}$

Now, the behavior of this system can be analyzed when the output of the neuron S₁^t is discrete in {0,1} or when the output of the neuron S₂ ^t is discrete in {0,1} The final discrete system is the same in an independent way of the chosen neuron. The neuron S₂ ^t will be considered to analyze the system.

If it is supposed that S₂ ^t ∈ {0,1}, the action of the neuron S₁ is: $\begin{matrix} {RB}_{1.1}) When I^{t} = 0 and S_{2}^{t} = 0, S_{1}^{t + 1} = sigm (S_{1}^{t} \times 15.2 - 4.5) = sigm (15.2 \times (S_{1}^{t} - 0.3)) . \\ {RB}_{1.2}) When I^{t} = 0 and S_{2}^{t} = 1, S_{1}^{t + 1} = sigm (S_{1}^{t} \times 15.2 + 3.9) = sigm (15.2 \times (S_{1}^{t} + 0.26)) . \\ {RB}_{2.1}) When I^{t} = 1 and S_{2}^{t} = 0, S_{1}^{t + 1} = sigm (S_{1}^{t} \times 15.2 - 10.5) = sigm (15.2 \times (S_{1}^{t} - 0.69)) . \\ {RB}_{2.2}) When I^{t} = 1 and S_{2}^{t} = 1, S_{1}^{t + 1} = sigm (S_{1}^{t} \times 15.2 - 2.1) = sigm (15.2 \times (S_{1}^{t} - 0.14)) . \end{matrix}$

From this action the following fuzzy rule bases can be extracted: $\begin{matrix} {RB}_{1.1}) When I^{t} = 0 and S_{2}^{t} = 0, \\ If S_{1}^{t} is \approx > 0.3 then S_{1}^{t + 1} = 1 \\ If S_{1}^{t} is \approx < 0.3 then S_{1}^{t + 1} = 0 \\ {RB}_{1.2}) When I^{t} = 0 and S_{2}^{t} = 1, \\ If S_{1}^{t} is \approx > (- 0.26) then S_{1}^{t + 1} = 1 \\ If S_{1}^{t} is \approx < (- 0.26) then S_{1}^{t + 1} = 0 \\ {RB}_{2.1}) When I^{t} = 1 and S_{2}^{t} = 0, \\ If S_{1}^{t} is \approx > 0.69 then S_{1}^{t + 1} = 1 \\ If S_{1}^{t} is \approx < 0.69 then S_{1}^{t + 1} = 0 \\ {RB}_{2.2}) When I^{t} = 1 and S_{2}^{t} = 1, \\ If S_{1}^{t} is \approx > 0.14 then S_{1}^{t + 1} = 1 \\ If S_{1}^{t} is \approx < 0.14 then S_{1}^{t + 1} = 0 \end{matrix}$

If this system is analyzed by considering that S₁^t ∈ {0, 1}, the rule base RB_1.2 is simplified, because the first rule of RB_1.2 is always true and the second one is always false. Therefore, rule base of RB_1.2 always returns S₁^t+1 = 1.

The following discrete system is obtained: $\begin{matrix} {RB}_{1.1}) When I^{t} = 0 and S_{2}^{t} = 0, \\ If S_{1}^{t} = 1 then S_{1}^{t + 1} = 1 \\ If S_{1}^{t} = 0 then S_{1}^{t + 1} = 0 \\ {RB}_{1.2}) When I^{t} = 0 and S_{2}^{t} = 1, then S_{1}^{t + 1} = 1 \end{matrix}$ $\begin{matrix} {RB}_{2.1}) When I^{t} = 1 and S_{2}^{t} = 0, \\ If S_{1}^{t} = 1 then S_{1}^{t + 1} = 1 \\ If S_{1}^{t} = 0 then S_{1}^{t + 1} = 0 \\ {RB}_{2.2}) When I^{t} = 1 and S_{2}^{t} = 1, \\ If S_{1}^{t} = 1 then S_{1}^{t + 1} = 1 \\ If S_{1}^{t} = 0 then S_{1}^{t + 1} = 0 \end{matrix}$

This discrete system is easily comprehensible. It can be summarized so:

$\begin{matrix} When I^{t} = 0 and S_{2}^{t} = 1, S_{1}^{t + 1} = 1 \\ Otherwise, S_{1}^{t + 1} = S_{1}^{t} \end{matrix}$

Intuitively, this discrete system indicates that the neuron S₁ with output ‘0’ only changes to ‘1’ when the input ‘0’ is presented and S₂ is active (substring ‘00’ is being presented), that is, when substring ‘000’ appears. On the other hand, the neuron S₁ with output ‘1’ (substring ‘000’ was presented) remains with this value.

Next, the rules extracted from the state neuron of a first-order RNN by applying the method presented in [9] will be described. In this way, these rules can be compared with the simple rules obtained from state neuron S₁ with the extraction method presented in this work. In the extraction example exposed in [9], there is a state neuron with a similar action to the one of the state neuron S₁. The rules extracted in [9] from the state neuron similar to S1 are:

$\begin{matrix} R 1) If (2.16 I^{t - 3} + 1.5 I^{t - 2}) is smaller than 0.6 and 5 I^{t - 1} is smaller than 2.5, \\ Then {sigm}^{- 1} (S_{1}^{t}) = 3.9 + 15.2 \cdot S_{1}^{t - 1}, \\ R 2) If (2.16 I^{t - 3} + 1.5 I^{t - 2}) is smaller than 0.6 and 5 I^{t - 1} is larger than 2.5, \\ Then {sigm}^{- 1} (S_{1}^{t}) = - 2.1 + 15.2 \cdot S_{1}^{t - 1}, \\ R 3) If (2.16 I^{t - 3} + 1.5 I^{t - 2}) is larger than 0.6 and 5 I^{t - 1} is smaller than 2.5, \\ Then {sigm}^{- 1} (S_{1}^{t}) = - 4.5 + 15.2 \cdot S_{1}^{t - 1}, \\ R 4) If (2.16 I^{t - 3} + 1.5 I^{t - 2}) is larger than 0.6 and 5 I^{t - 1} is larger than 2.5, \\ Then {sigm}^{- 1} (S_{1}^{t}) = - 10.5 + 15.2 \cdot S_{1}^{t - 1} . \end{matrix}$

It can be observed that these rules use the inner product of the inputs I in the antecedent and the inner product of the state neurons in the consequent. In this way, these rules are less comprehensible than the rules obtained from the state neuron S₁ by using the extraction method presented in this paper. The rules extracted with the method of this work directly use the inputs of the system in the antecedent and the consequent is only composed by constant values.

Step 2: In this step, it is achieved a description of the structure of the automaton that recognizes the language accepted by the first-order RNN.

In the previous steps, discrete systems that describe the functions δ₁, δ₂and δ₃have been found. These are:

Function δ₃

$\begin{matrix} If I^{t} = 0 then S_{3}^{t + 1} = 1 \\ If I^{t} = 1 then S_{3}^{t + 1} = 0 \end{matrix}$

Function δ₂

$\begin{matrix} R_{1}) When I^{t} = 0 \\ If S_{3}^{t} = 1 then S_{2}^{t + 1} = 1 \\ If S_{3}^{t} = 0 then S_{2}^{t + 1} = 0 \\ R_{2}) When I^{t} = 1 S_{2}^{t + 1} = 0 \end{matrix}$

Function δ₁

$\begin{matrix} When I^{t} = 0 and S_{2}^{t} = 1, S_{1}^{t + 1} = 1 \\ Otherwise, S_{1}^{t + 1} = S_{1}^{t} \end{matrix}$

With all these functions δ_j, a description of the δ transition function of the automaton is easily got. Every state of the automaton can be visited. In order to do that, it is only necessary to analyze each transition for every input and state from the initial state.

For example, let us suppose that the state (S₁^t = 0,S₂ ^t = 0,S₃ ^t = 0) is being studied. Firstly, this state is accepting because S₁^t ≤ 0.5. Now, the transitions from this state for every input I^t ∈ X must be analyzed:

If I^t = 0 then

according function δ₃, S₃ ^t + 1 = 1,

according function δ₂, S₂ ^t + 1 = 0 because S₃ ^t = 0,

according function δ₁, S₁^t + 1 = S₁^t = 0 because S₂ ^t = 0.

Therefore, the state (S₁^t + 1 = 0, S₂ ^t + 1 = 0, S₃ ^t + 1= 1) is visited from the state (S₁^t = 0, S₂ ^t = 0, S₃ ^t = 0) with the input I^t = 0. This state is also accepting one.

If I^t = 1then

according function δ₃, S₃ ^t + 1 = 0,

according function δ₂, S₂ ^t + 1 = 0,

according function δ₁,S₁^t + 1 = S₁^t = 0 because S₂ ^t = 0 and S₁^t = 0.

Therefore, the automaton remains in the state (S₁^t + 1 = 0, S₂ ^t + 1 = 0, S₃ ^t + 1 = 0)if it is active the state (S₁^t = 0, S₂ ^t = 0, S₃ ^t = 0)and the input I^t = 1 is read.

This procedure is repeated for every new visited state and input I ∈ X. In this way, the automaton that is illustrated in Fig. 5 is obtained. It corresponds with the not minimal automaton that recognizes the language generated by Tomita’s 4th grammar.

Fig. 5

Automaton extracted from the RNN.

Step 3: In this step, the automaton obtained in the previous step is minimized.

When the automaton of Fig. 5 is minimized, by using standard minimization algorithm [6], the minimal one associated with the Tomita’s 4th grammar (Fig. 4) is achieved.

With this last step, it has been shown as the presented method can extract FSA from first-order RNNs trained with strings of a regular language.

6. Inserting FSA into RNNs

The insertion of FSA into first-order RNNs is possible thanks to:

The relation between the transition function of an automaton and the hidden neurons of a first-order RNN.

The transformation of these hidden neurons into equivalent TSK fuzzy rules.

This process can be summarized in the following steps:

Input: Finite state automaton M that accepts strings of a regular language L.

Output: First-order RNN that recognizes the language L.

Step 1) To choose a number mof state neurons for the first-order RNN. This number mmust be sufficient to carry out the following step 2.

Step 2) To encode univocally the states of M by using discrete values for the state neurons S_j (j = 1, …, m),

Step 3) To configure the weights and biases of the RNN for representing the transition functions S_j (j = 1, …, m), with the action of each state neuron S_j (j = 1, …, m), that is:

Step 3.1) To describe the action of the neuron S_j with a discrete rule base system. This action must be equivalent to the transition function δ_j.

Step 3.2) To translate the discrete system into a fuzzy system by using propositions with the format “x_i is [not] sigm(x × w_ij + b_ij)” and the structure presented in Proposition 2.

Step 3.3) To transform the FRBS obtained in the step 3.1 into the description of a hidden neuron S_j according (5) and (6).

Step 4) To build the first-order RNN from the description of weights and biases of the state neurons.

6.1. Example of insertion

In order to show as FSA can be inserted into first-order RNNs, Tomita’s 2nd grammar has been chosen [16]. The regular expression of the binary language generated by this grammar is (10)^* and its associated automaton is illustrated in Fig. 6.

Fig. 6

Automaton that represents tomita’s 2nd grammar.

Next, the steps to find a first-order RNN that recognizes the strings generated by Tomita’s 2nd grammar are detailed.

Step 1: A RNN with two state neurons (S₁, S₂) will be used for implementing the automaton associated with the Tomita’s 2nd grammar.

Step 2: The states of the automaton will be encoded with two values (S₁, S₂) in the following way (Fig. 7):

Fig. 7

Automaton that represents tomita’s 2nd grammar with encoded states.

q₀ = (S₁ = 0, S₂ = 0), this is the initial state that corresponds with (S₁¹ = 0, S₂¹ = 0). q₀∈ F because S₁≤ 0.5.

q₁ = (S₁ = 1, S₂ = 1), q₁ ∉ F because S₁ >0.5

q₂ = (S₁ = 1, S₂ = 0), q₂ ∉ F because S₁ >0.5

According the previous codification, the transition functions δ_j of the automaton are illustrated in Table 1.

Table 1

Transition functions of the automaton with the encoded states

Inputs			Function δ₁	Function δ₂
S₁^t	S ₂ ^t	I^t	S₁^t + 1	S₂^t + 1
0	0	0	1	0
0	0	1	1	1
1	1	0	0	0
1	1	1	1	0
1	0	0	1	0
1	0	1	1	0

Step 3: In this step, each function δ_j is represented with the neuron S_j, for j = 1,2. This is carried out by using the steps 3.1, 3.2 and 3.3 of the algorithm described in Section 6.

Case j = 1, neuron S₁ to represent function δ₁

Step 3.1) The action of the neuron S₁ can be described with discrete values as: $\begin{matrix} {RB}_{1}) When I^{t} = 0 \\ If S_{2}^{t} = 0 then S_{1}^{t + 1} = 1 \\ If S_{2}^{t} = 1 then S_{1}^{t + 1} = 0 \\ {RB}_{2}) When I^{t} = 1 S_{1}^{t + 1} = 1 \end{matrix}$

Step 3.2) The previous behavior can be approximated with the following TSK fuzzy rule bases: $\begin{matrix} {RB}_{1}) When I^{t} = 0 \\ If S_{2}^{t} is \approx < 0.5 then S_{1}^{t + 1} = 1 \\ If S_{2}^{t} is \approx > 0.5 then S_{1}^{t + 1} = 0 \\ {RB}_{2}) When I^{t} = 1 \\ If S_{2}^{t} is \approx < B then S_{1}^{t + 1} = 1 \\ If S_{2}^{t} is \approx > B then S_{1}^{t + 1} = 0 \end{matrix}$

As S₂ ^t ∈ (0, 1), if B ∈ ℜ is very greater than 1(B ∈ 1), then the first rule in RB₂ is always activated. Thus, the output of RB₂ is always S₁^{t + 1} = 1.

This fuzzy system with linguistic propositions is equivalent to the following one with sigmoid functions:

$\begin{matrix} {RB}_{1}) When I^{t} = 0 \\ If S_{2}^{t} is sigm ((- 10) \times (x - 0.5)) then S_{1}^{t + 1} = 1 \\ If S_{2}^{t} is not sigm ((- 10) \times (x - 0.5)) then S_{1}^{t + 1} = 0 \\ {RB}_{2}) When \\ If S_{2}^{t} is sigm ((- 10) \times (x - B)) then S_{1}^{t + 1} = 1 \\ If S_{2}^{t} is not sigm ((- 10) \times (x - B)) then S_{1}^{t + 1} = 0 \end{matrix}$

A weight value w_ij = (- 10) has been used in these sigmoid functions. The absolute value of this weight ( | w_ij | = 10) is sufficient to consider that the membership functions “x_i is [not] sigm(x × w_ij + b_ij)” have low uncertainty.

These TSK fuzzy rule bases are equivalent to the following action for neuron S₁: $• When I^{t} = 0, S_{1}^{t + 1} = sigm (S_{1}^{t} \times 0 + S_{2}^{t} \times (- 10) + 5) .$ (19) $• When I^{t} = 1, S_{1}^{t + 1} = sigm (S_{1}^{t} \times 0 + S_{2}^{t} \times (- 10) + 10 \times B) .$ (20)

From (19) and (20), the bias b₁ and weight v₁ for neuron S₁ can be calculated according equation (5):

$\begin{matrix} When I^{t} = 0, 0 \times v_{1} + b_{1} = 5 \Rightarrow b_{1} = 5 \\ When I^{t} = 1, 1 \times v_{1} + b_{1} = v_{1} + 5 = 10 \times B \Rightarrow v_{1} = 10 \times B - 5 \end{matrix}$

A value B > > 1 can be B = 9. So it is obtained that v₁ = 10 × B - 5 = 90 - 5 = 85.

Step 3.3) From the previous steps it can be concluded that the action of the neuron S₁ is:

S_{1}^{t + 1} = s i g m (S_{1}^{t} \times 0 + S_{2}^{t} \times (- 10) + I^{t} \times 85 + 5) .

(21)

This neuron implements the function δ₁.

Case j = 2, neuron S₂ to represent function δ₂

Step 3.1) The action of the neuron S₂ can be described with discrete values as: $\begin{matrix} {RB}_{1}) When I^{t} = 0 S_{2}^{t + 1} = 0 \\ {RB}_{2}) When I^{t} = 1 \\ If S_{1}^{t} = 0 then S_{2}^{t + 1} = 1 \\ If S_{1}^{t} = 1 then S_{2}^{t + 1} = 0 \end{matrix}$

Step 3.2) The previous behavior can be approximated with the following TSK fuzzy rule bases:

$\begin{matrix} {RB}_{1}) When I^{t} = 0 \\ If S_{1}^{t} is \approx < C then S_{2}^{t + 1} = 1 \\ If S_{1}^{t} is \approx > C then S_{2}^{t + 1} = 0 \\ {RB}_{2}) When I^{t} = 1 \\ If S_{1}^{t} is \approx < 0.5 then S_{2}^{t + 1} = 0 \\ If S_{1}^{t} is \approx > 0.5 then S_{2}^{t + 1} = 0 \end{matrix}$

As S₁^t ∈ (0, 1), if C ∈ R is very smaller than 0 (C << 0), then the second rule in RB₁ is always activated. Thus, the output of RB₁ is always S₂^t + 1 = 0.

This fuzzy system with linguistic propositions is equivalent to the following one with sigmoid functions: $\begin{matrix} {RB}_{1}) When I^{t} = 0 \\ If S_{1}^{t} is sigm ((- 10) \times (x - C)) then S_{2}^{t + 1} = 1 \\ If S_{1}^{t} is not sigm ((- 10) \times (x - C)) then S_{2}^{t + 1} = 0 \\ {RB}_{2}) When I^{t} = 1 \\ If S_{1}^{t} is sigm ((- 10) \times (x - 0.5)) then S_{2}^{t + 1} = 1 \\ If S_{1}^{t} is not sigm ((- 10) \times (x - 0.5)) then S_{2}^{t + 1} = 0 \end{matrix}$

Also, in this case the absolute value of the weights of the propositions (|w_ij | = 10) is sufficient to consider that the membership functions have low uncertainty.

These TSK fuzzy rule bases are equivalent to the following action for neuron S₂ :

$• When I^{t} = 0, S_{2}^{t + 1} = sigm (S_{1}^{t} \times (- 10) + S_{2}^{t} \times 0 + 10 \times C) .$ (22) $• When I^{t} = 1, S_{2}^{t + 1} = sigm (S_{1}^{t} \times (- 10) + S_{2}^{t} \times 0 + 5 .$ (23)

From (22) and (23), the bias b₂ and weight v₂ for neuron S₂ can be calculated according Equation (5):

$\begin{matrix} When I^{t} = 0, 0 \times v_{2} + b_{2} = 10 \times C \Rightarrow b_{2} = 10 \times C \\ When I^{t} = 1, 1 \times v_{2} + b_{2} = v_{2} + 10 \times C = 5 \Rightarrow v_{2} = 5 - 10 \times C \end{matrix}$

A value C << 0 can be C = (-8). So it is obtained that b₂ = 10 × C = (-80) and v₂ = 5 - 10 × C = 85.

Step 3.3) From the previous steps it can be concluded that the action of the neuron S₂ is:

S_{2}^{t + 1} = s i g m (S_{1}^{t} \times (-10) + S_{2}^{t} \times 0 + I^{t} \times 85 - 80) .

(24)

This neuron implements the function δ₂.

Step 4: In this step, a first-order RNN is built from the weights and biases of the state neurons calculated in the previous steps.

From (21) and (24), the first-order RNN that implements the Tomita’s 2nd grammar can be built. It is illustrated in Fig. 8.

Fig. 8

RNN that implements the tomita’s 2nd grammar.

7. Conclusions

Methods to understand feedforward artificial neural networks by means of fuzzy rule based systems have been recently published. These works have been used to obtain procedures to extract FSA from first-order RNNs that infer regular grammars and to insert FSA into first-order RNNs. In this way, it is not necessary to carry out a search process in the output domain of the neurons to find the states of the automaton.

The similarities between first-order RNNs and FSA have been analyzed. The actions of the neurons of a first-order RNN represent the transition function of an automaton. Thus, the description of this function is obtained when the dynamic of each neuron is interpreted by means of fuzzy rules. Once the transition function is obtained, it is easy to find the automaton structure by analyzing transitions for every state and input from the initial state.

Several examples have been used to illustrate the presented methods. New options to extract FSA from RNNs and to insert FSA into RNNs are now available.

As future research, it is proposed the adaptation of the rule extraction method from first-order RNNs to be applied on other type of RNNs. In particular, we think that it can be interesting the application of the presented method to understand the recurrent structures used in the field of deep learning [27].

Footnotes

Acknowledgments

This work has been supported by the Spanish “Ministerio de Economía y Competitividad” and by “Fondo Europeo de Desarrollo Regional” (FEDER) under Project TEC2015-69496-R.

References

Alquézar and

Sanfeliu , “An algebraic framework to represent finite state machines in single-layer recurrent neural networks”, Neural computation 7(5) (1995), 931–949.

Rafael C.

Carrasco and

Mikel L.

Forcada , “Simple Strategies to Encode Tree Automata in Sigmoid Recursive Neural Networks”, IEEE Transactions on Knowledge and Data Engineering 13(2) (2001), 148–156.

Chomsky, Noam, “Three Models for the Description of Language”, IRE Transactions on Information Theory 2(2) (1956), 113–123.

J.L.

Elman , “Finding structure in time”, Cognitive Science 14(2) (1990), 179–211.

Lee Giles ,

C.B.

Miller ,

Chen ,

H.H.

Chen ,

G.Z.

Sun and

Y.C.

Lee , “Learning and extracting finite state automata with second-order recurrent neural networks”, Neural Computation 4(3) (1992), 393–405.

Hopcroft and

J.D.

Ullman , “Introduction to Automata Theory, Languages, and Computation”, Reading, MA: Addison-Wesley, (1979).

Jacobsson , “Rule Extraction from Recurrent Neural Networks: A Taxonomy and Review”, Neural Computation 17(6) (2005), 1223–1263.

Kolman and

Margaliot , “Are neural networks white boxes?”, IEEE Transactions on Neural Networks 16(4) (2005), 844–852.

Kolman and

Margaliot , “Extracting Symbolic Knowledge from Recurrent Neural Networks – A Fuzzy Logic Approach”, Fuzzy Sets and Systems 160 (2009), 145–161.

10.

C.J.

Mantas and

J.M.

Puche “Artificial Neural Networks are zero-order TSK Fuzzy Systems”, IEEE Transactions on Fuzzy Systems 16(3) (2008), 630–643.

11.

C.J.

Mantas , “A generic fuzzy aggregation operator: Rules extraction from and insertion into artificial neural networks”, Soft Computing 12(5) (2008), 493–514.

12.

Christian W.

Omlin and

Lee Giles “Constructing deterministic finite-state automata in recurrent neural networks”, Journal of the ACM (JACM) 43(6) (1996), 937–972.

13.

Christian W.

Omlin and

Lee Giles , “Rule Revision With Recurrent Neural Networks”, IEEE Transactions on Knowledge and Data Engineering 8(1) (1996), 183–188.

14.

C.W.

Omlin ,

K.K.

Thornber and

C.L.

Giles , “Fuzzy finite-state automata can be deterministically encoded in recurrent neural networks”, IEEE Transactions on Fuzzy Systems 5 (1998), 76–89.

15.

Takagi and

Sugeno , “Fuzzy identification of systems and its application to modeling and control”, IEEE Transactions on Systems, Man and Cybernetics 15(1) (1985), 116–132.

16.

Tomita , “Dynamic construction of finite-state automata from examples using hillclimbing”, in ‘Proceedings of Fourth Annual Cognitive Science Conference’, Ann Arbor, MI, (1982) pp. 105–108.

17.

Jain, Harshit and Fatema, Nuzhat, “Layer recurrent neural network based intelligent user activity classification model using smartphone”, Journal of Intelligent & Fuzzy Systems 35 (2018), 1–13.

18.

Mon, Yi-Jen and Lin, Chih-Min, “Image processing based obstacle avoidance control for mobile robot by recurrent fuzzy neural network”, Journal of Intelligent and Fuzzy Systems 26 (2014), 2747–2754.

19.

Khodabakhshi, Mohammad Bagher and Moradi, Mohammad and Momayez Sanat, Zahra and Jafari Moghadam Fard, Pooria., “Lung sound decomposition using recurrent fuzzy wavelet network”, Journal of Intelligent & Fuzzy Systems 33 (2017), 1–12.

20.

Vinayakumar and

K.P.

Soman and Poornachandran, Prabaharan and Kumar S, Sachin, “Evaluating deep learning approaches to characterize and classify the DGAs at scale”, Journal of Intelligent & Fuzzy Systems 34 (2018), 1265–1276.

21.

Jose de Jesus Rubio, Edwin Lughofer, Jesus Meda Campana, Luis Paramo Carranza, Juan Francisco

Novoa

and Jaime

Pacheco

, “Neural network updating via argument Kalman filter for modeling of Takagi-Sugeno fuzzy models”, Journal of Intelligent & Fuzzy System 35(2) (2018), 2585–2596.

22.

Meng ,

Shi and

Yao , “An inequality approach for evaluating decision making units with a fuzzy output”, Journal of Intelligent & Fuzzy Systems 34(1) (2018), 459–465.

23.

Jose de

Jesus Rubio

, “Stable Kalman filter and neural network for the chaotic systems identification”, Journal of the Franklin Institute 354(16) (2017), 7444–7462.

24.

Cheng ,

Prayogo and

Wu , “Prediction of permanent deformation in asphalt pavements using a novel symbiotic organisms search-least squares support vector regression”, Neural Computing and Applications, (2018).

25.

Jose de

Jesus Rubio

, “SOFMLS: Online self-organizing fuzzy modified least-squares network”, IEEE Transactions on Fuzzy Systems 17(6) (2009), 1296–1309.

26.

Zhang and

Han , “State Estimation for Static Neural Networks With Time-Varying Delays Based on an Improved Reciprocally Convex Inequality”, IEEE Transactions on Neural Networks and Learning Systems 29(4) (2018), 1376–1381.

27.

Schmidhuber , “Deep Learning in Neural Networks: An Overview”, Neural Networks 61 (2015), 85–117.