Abstract
First-order recurrent neural networks can be trained to recognize strings of a regular language. Finite state automata can be extracted from these neural networks. Normally, a search process in the output domain of the neurons is necessary for carrying out this extraction procedure. On the other hand, studies about fuzzy rules extraction from feedforward multilayered neural networks can be considered to define new techniques that transform first-order recurrent neural networks into finite state automata. With these new techniques, a fuzzy description of the action of each neuron can be obtained. From these descriptions, the transition function of the automaton can be directly found and, in this way, the search process is not necessary. A technique with this approach is presented in this paper. Besides, the used method to extract fuzzy rules from a neuron has the advantage that the inputs of the fuzzy system coincide with the inputs of the neuron. Thus, the fuzzy system is more intuitive. Once the transition function is obtained, the automaton structure can be found with the analysis of the transitions for every state and input from the initial state. Finally, several examples are presented to illustrate the method.
Introduction
In 1956 Chomsky classified the languages in terms of generative grammars [3]. The type of simplest grammar is regular grammar. The languages that are described with such grammars are called regular languages, which can be recognized by Finite State Automata (FSA) [6]. These formal machines are useful for designing and verifying digital circuits, carrying out the lexical analysis of a programming language, exploring and acting on large text files, …
On the other hand, in the early 1990 s, Elman [4] introduced his well-known simple recurrent network. The connection between finite state machines and neural networks was again established. He compared the internal activations of the networks with the states of a finite state machine. Since this moment, an abundance of articles has been written on Recurrent Neural Networks (RNNs) and their connection with the state machines. Many references and revision can be found in [7]. Actually, RNNs have been applied in many practical problems [17–26].
Several articles describe techniques to transform RNNs into FSA [1, 12–14]. In these works, it is necessary to carry out a search process in the output domain of the neurons to find the states of the FSA. This procedure has several problems. Normally, the search process is based on dividing the output of each state neuron of the RNN into q intervals or looking for clusters in the state space. Since many partition values of q are available or many clusters can be found, many FSA can be extracted. How to obtain the best finite state automaton?.
Actually, the transformation of RNNs into FSA can be done in a different way. This is possible thanks to the progresses about fuzzy rules extraction from Multilayer Feedforward Neural Networks (MFNN) [9, 10]. If the action of each neuron of the RNN is understood by means of a fuzzy rule system, a comprehensible description of the transition function of the automaton learnt by the trained RNN can be obtained. In this way, it is avoided the search process that is necessary to transform RNNs into FSA in other methods.
Regarding the methods to understand the action of a neuron by means of fuzzy rules. in [9] fuzzy rules are extracted from a RNN to extract the associated automaton. In this case, the way of obtaining rules is inspired by [8] where fuzzy rules are obtained from Multilayer Feedforward Neural Networks (MFNN). In this method, the inputs of the fuzzy rules are the inner product between the system inputs and the corresponding weight vector. An improvement of this method was presented in [10] where the fuzzy rules obtained from a MFNN directly use the system inputs. The action of a hidden neuron can be understood by analyzing two simple Takagi-Sugeno-Kang Fuzzy (TSK) fuzzy rules [15]. This fact facilitates the analysis of the acquired knowledge by a network.
In this way, if the extraction method presented in [10] is used to extract FSA from first-order RNNs, then the description of the transition function of the automaton can be obtained by means of a fuzzy rule system that uses the original inputs. Besides, it is not necessary a search process in the output domain of the neurons.
The methodology to transform first-order RNNs into FSA is as follow:
Let us suppose that a fisrt-order RNN has been trained to recognize strings of a regular language L, then: The values of its neurons represent the states of the automaton M associated with L. The action of these neurons implements the transition function of M.
Step 1) To apply the method presented in [10] on each neuron of the RNN. Thus, a fuzzy description of the transition function of M is obtained.
Step 2) To get the structure of the automaton by analyzing transitions for every state and input from the initial state, by using the previous fuzzy description.
As complement to the previous extraction methodology, the techniques presented in [11] are also applied on first-order RNNs. In [11], fuzzy knowledge is inserted into MFNNs. Hence, a method to insert FSA into first-order RNNs can be defined if the insertion method presented in [11] is applied on first-order RNNs. This will be illustrated with an example.
The structure of the paper is as follows. Initially, several concepts are introduced: TSK Fuzzy Rule Based Systems (TSK FRBSs) and extraction of TSK fuzzy systems from neural networks. Next, the relations between FSA and RNNs are exposed which allow to present the two applications of this work: extraction of transition functions from first-order RNNs and insertion of FSA into first-order RNNs. They are illustrated by using Tomita’s grammars. Finally, some conclusions are described.
Zero-order TSK fuzzy rule-based systems
Commonly, the zero-order TSK FRBSs [15] have the rule structure that is introduced in the following definition.
A FRBS with the following rule structure is named as zero-order TSK FRBS:
This zero-order TSK FRBS computes its output according to the following definition.
In the following Sections, the fuzzy rules will be implemented using the product operator as t-norm T.
Next, the definition of a hidden neuron (Definition 3) and its equivalence with a TSK FRBS (Proposition 1) are presented.
The action of hidden neuron h
j
of a MFNN follows the given expression:
The expression (2) can be rewritten as:
The proof of Proposition 1 can be found in [10].
The fuzzy rules of the system presented in Proposition 1 use the membership function sigm(x × w ij + b ij ) . This function has an easy linguistic interpretation, which can be used in the insertion process of an automaton into a RNN. In this process, an expert must provide an initial description of the transition function. This expert can use linguistic terms instead of sigmoidal functions to describe the transitions.
The function
When w
ij
>0.0, the proposition “x
i
is sigm (w
ij
× (x + (b
ij
/w
ij
))) ” can be interpreted as “x is approximately greater than (- b
ij
/w
ij
) ” “x is ≈ > (- b
ij
/w
ij
) ”. The corresponding fuzzy set is shown in Fig. 1.
“x is approximately greater than (- b
ij
/w
ij
)”. When w
ij
< 0.0, the proposition “xi is sigm (w
ij
(x + (b
ij
/w
ij
)))” can be interpreted as

“x is not approximately greater than (- b ij /w ij )”
≡ “x is approximately smaller than (- b ij /w ij )”
“x is ≈ < (- b ij /w ij )”.
The corresponding fuzzy set is depicted in Fig. 2.

“x is approximately smaller than (- b ij /w ij )”.
The degree of uncertainty established by the term “approximately” is determined by the slope of the sigmoid function. If the factor |w ij | is high, then the sigmoid function has a sharp slope and the defined fuzzy set is practically crisp. If |w ij | is small, then the fuzzy set has a high uncertainty. This last class of propositions will be avoided by using a simplification process of the fuzzy systems. This is exposed in the following section.
According to Proposition 1, the action of a hidden neuron hj represented by (3) is equivalent to a system with two fuzzy rules. Each rule uses n propositions “x
i
is [not] sigm (xw
ij
+ b
ij
) ″i = 1, . . . , n. This number of propositions can be reduced. The idea is to take advantage of the freedom to distribute the bias b
j
among the variables b
ij
in (3). This reduction process can be explained with the following points: The membership functions with small slope (|w
ij
| is small) produce propositions with high uncertainty. This kind of functions are candidate for the simplification process by means of the next steps. The membership degree of the propositions “x
i
is sigm(x × w
ij
+ b
ij
)” can be almost 0.0 or 1.0 in the domain of the variable x
i
. It is only necessary to adjust the value b
ij
in the following way: Let us suppose a particular variable x
c
with c∈ { 1, …, n } and x
c
∈ [a1, a2] , a1, a2 ∈ ℜ, a1a2. Because sigm (+ 8) >>1.0 and sigm (-8) >>0.0, the proposition “x
c
is sigm (x × w
cj
+ b
cj
)” has degrees almost 1.0 when:
(x × w
cj
+ b
cj
) ∈ [8, 8 + (a2 - a1) × |w
cj
|] if proposition has degrees ≈ 1.0. (x × w
cj
+ b
cj
) ∈ [-8 - (a2 - a1) × |w
cj
|, -8] if proposition has degrees ≈ 0.0. The input of the function sigm(x × w
ij
+ b
ij
) is replaced by a constant value in the fuzzy propositions with degrees almost 0.0 or 1.0. This constant is equal to the mean value of the domain of the sigmoid function. For example, the previous proposition “x
c
is sigm(x × w
cj
+ b
cj
)” is replaced by: “x
c
is sigm (8 + [(a2 - a1)/2] × |w
cj
|)” if proposition has degrees almost 1.0. •“x
c
is sigm (-8 - [(a2 - a1)/2] × |w
cj
|) ” if proposition has degrees almost 0.0. The fuzzy propositions with constant values can be discarded from the rules. In order to make this, the original bias bj of the hidden neuron is modified. In the previous example, the new expression for the action of the hidden neuron is:
Starting from expression (4), a new TSK FRBS with two rules is calculated by using Proposition 1. This system will have (n-1) fuzzy propositions.
Next, the definitions of first-order RNN and finite state automaton are presented. If these definitions are compared, then relations between both concepts can be found. These associations will be useful to extract FSA from first-order RNNs.
First-order RNNs
The network has m state neurons (S1, . . . , S
m
). It has one binary input II{ 0, 1 } The RNN’s dynamics is given by
First-order recurrent neural network. The network initial conditions are The neuron S1 is considered as output neuron, if S1 end≤ 0.5 the input string is accepted, otherwise it is rejected.

In order to facilitate the future rule extraction process, Equation (5) can be rewritten as:
A finite state automaton is a 5-tuple M = (Q, X, d, q0, F) , where: Q is a finite set of states. X is the finite input alphabet. d : Q′X → Q is the transition function. q0 is the initial state. F ⊆ Q is a set of accepting states.
A string u is accepted by a finite state automaton M if the state that is reached after u has been read by M is an accepting state. Regular languages can be recognized by FSA. There is an associated finite state automaton M for each regular language L such that a string u is accepted by M iff u ∈ L.
For a full discussion of what a regular language is and what other classes of languages there are, interested readers are referred to [6].
Relations between first-order RNNs and FSA
When a first-order RNN is trained to recognize strings of a regular language, several relations appear between the automaton associated with the language and the first-order RNN. These relations are: The finite set of states Q of the automaton is encoded by the state neurons of the first-order RNN. Each state q ∈ Q of the automaton is represented by the m output values of the network neurons, that is, The finite input alphabet X of the automaton coincides with the domain of the input I of the first-order RNN, that is, I ∈ { 0, 1 } = X. The transition function of the automaton (d : Q′X → Q) is represented in the first-order RNN as:
This representation is equivalent to m functions δ
j
(j = 1, …, m):
Each function δj is implemented in the first-order RNN by the action of a S
j
state neuron. This action is determined by the weights and biases of the neuron. It is interesting to notice the similarity between the expression (8) and the Equation (6). The initial state q0 of the automaton is encoded with the initial conditions of the first-order RNN, that is,
The set of accepting states F⊆ Q is encoded with the output neuron S1, that is,
Let us suppose that a first-order RNN has been trained to recognize a regular language L. If the action of each state neuron S j of this first-order RNN is understood by means of a TSK fuzzy system, it can be obtained a description of each transition function δ j of the automaton M associated with the language L. This fact is possible by using the following proposition:
“not Aj “ [It is not sigm (x × v
j
+ b0j)] “A
j
“ [It is sigm(x× vj+b0j)]
The proof is direct by applying Proposition 1 on a hidden neuron with n + 1 inputs (x0, x1, …, x
n
), where x0 = I
t
and x
i
=
If a fuzzy description is obtained for all the state neurons S j (j = 1, …, m), the transition function δ of the automaton M is obtained. In this moment, it is easy to get the total structure of M from the initial state. This is done with the analysis of all the possible transitions for every input symbol and state from the initial state.
Next, the steps of this process are algorithmically detailed:
Input: First-order RNN trained to recognize a regular language L. Output: Minimal finite state automaton M that accepts the strings of the regular language L.
Step 1) For each state neuron S
j
(j = 1, …, m) of the first-order RNN:
Step 1.1) To extract a TSK FRBS with two rules from the neuron Sj using the Proposition 2. This FRBS represents the function δ
j
. Step 1.2) To simplify the fuzzy propositions of the TSK system by means of the procedure described in Section 3.1. Step 1.3) To rewrite the fuzzy rules by considering discrete the inputs of the fuzzy system. These inputs are the output values of the state neurons Sj and the input I in the time t. Step 2) Starting from the initial state S
j
(j = 1, …, m), to analyze the structure of the automaton for every input and state by using the discrete descriptions of the transition functions δj obtained in the previous step 1.3. Step 3) To minimize the automaton of the Step 2 with a standard FSA minimization algorithm [6].
Example of extraction
In order to show as FSA can be extracted from first-order RNNs, Tomita’s 4th grammar has been chosen [16]. This grammar generates the language L composed by the binary strings not containing “000” as a substring. Its automaton is illustrated in Fig. 4.

Automaton that represents tomita’s 4th grammar.
In [9], a first-order RNN with three state neurons (S1, S2, S3) was trained to recognize the language generated by Tomita’s 4th grammar. Details on the training conditions of this network can be found in [9]. A first-order RNN, trained with the same conditions, that classifies all the strings of the language generated by Tomita’s 4th grammar has the following dynamics [9]:
Once that a first-order RNN with three state neurons has been trained to recognize the strings generated by Tomita’s 4th grammar, the following steps are carried out to obtain the automaton:
State neuron S3
The following TSK fuzzy system can be extracted from (14) which describes the function δ3:
This system can be linguistically interpreted as:
Step 1.2) In this case, there is not simplification process.
Step 1.3) As I t ∈ { 0, 1 } , the system can be approximated by the following discrete system:
Intuitively, this discrete system indicates that the neuron S3 is only activated when the input ‘0’ is presented.
State neuron S2
The following TSK fuzzy system can be extracted from (15) which describes the function δ2:
As (0.2(x+40))∈ [8, 8.2] and ((-0.2)(x+40))∈ [-8.2, -8], the terms of the sigmoid functions can be replaced by the mean value of these domains. So, the propositions ” and ”S2 t is sigm(−8.1)” are obtained.
With these new propositions the TSK fuzzy system is:
If this fuzzy system is transformed into neuron, a new simplified neuron S2 is got:
Hence, a new fuzzy description of the neuron S2 can be achieved by using Rule Bases (RBs) composed by two rules:
This system can be linguistically interpreted as:
If the output values of the neuron S3 are considered discrete in 0,1, the proposition “S3 t is ≈ > 1.38” is always false and “S3 t is ≈ < 1.38” is always true. Therefore, rule base RB2 always returns S2 t + 1 = 0.
The complete discrete system is:
Intuitively, this discrete system indicates that the neuron S2 is activated when the input ‘0’ is presented and S3 is active (input ‘0’ was presented in the time (t−1)). In other words, the neuron S2 is “on” when the substring “00” is being presented.
State neuron S1
The following TSK fuzzy system can be extracted from (17) which describes the function δ1:
As (0.2(x + 40)) ∈ [8,8.2, 8,8.2], the term of the sigmoid function is replaced by the mean value of this domain. The proposition ”S3 t is sigm(8.1)” is obtained. From this new proposition, the TSK fuzzy system is:
This fuzzy system is transformed into neuron, a new simplified neuron S1 is obtained:
Now, the behavior of this system can be analyzed when the output of the neuron S1 t is discrete in {0,1} or when the output of the neuron S2 t is discrete in {0,1} The final discrete system is the same in an independent way of the chosen neuron. The neuron S2 t will be considered to analyze the system.
If it is supposed that S2
t
∈ {0,1}, the action of the neuron S1 is:
From this action the following fuzzy rule bases can be extracted:
If this system is analyzed by considering that S1t ∈ {0, 1}, the rule base RB1.2 is simplified, because the first rule of RB1.2 is always true and the second one is always false. Therefore, rule base of RB1.2 always returns S1t+1 = 1.
The following discrete system is obtained:
This discrete system is easily comprehensible. It can be summarized so:
Intuitively, this discrete system indicates that the neuron S1 with output ‘0’ only changes to ‘1’ when the input ‘0’ is presented and S2 is active (substring ‘00’ is being presented), that is, when substring ‘000’ appears. On the other hand, the neuron S1 with output ‘1’ (substring ‘000’ was presented) remains with this value.
Next, the rules extracted from the state neuron of a first-order RNN by applying the method presented in [9] will be described. In this way, these rules can be compared with the simple rules obtained from state neuron S1 with the extraction method presented in this work. In the extraction example exposed in [9], there is a state neuron with a similar action to the one of the state neuron S1. The rules extracted in [9] from the state neuron similar to S1 are:
It can be observed that these rules use the inner product of the inputs I in the antecedent and the inner product of the state neurons in the consequent. In this way, these rules are less comprehensible than the rules obtained from the state neuron S1 by using the extraction method presented in this paper. The rules extracted with the method of this work directly use the inputs of the system in the antecedent and the consequent is only composed by constant values.
In the previous steps, discrete systems that describe the functions δ1, δ2and δ3have been found. These are:
Function δ3
Function δ2
Function δ1
With all these functions δj, a description of the δ transition function of the automaton is easily got. Every state of the automaton can be visited. In order to do that, it is only necessary to analyze each transition for every input and state from the initial state.
For example, let us suppose that the state (S1t = 0,S2
t
= 0,S3
t
= 0) is being studied. Firstly, this state is accepting because S1
t
≤ 0.5. Now, the transitions from this state for every input I
t
∈ X must be analyzed: If I
t
= 0 then according function δ3, S3
t
+ 1 = 1, according function δ2, S2
t
+ 1 = 0 because S3
t
= 0, according function δ1, S1t + 1 = S1t = 0 because S2
t
= 0. Therefore, the state (S1t + 1 = 0, S2
t
+ 1 = 0, S3
t
+ 1= 1) is visited from the state (S1t = 0, S2
t
= 0, S3
t
= 0) with the input I
t
= 0. This state is also accepting one. If I
t
= 1then according function δ3, S3
t
+ 1 = 0, according function δ2, S2
t
+ 1 = 0, according function δ1,S1t + 1 = S1t = 0 because S2
t
= 0 and S1t = 0. Therefore, the automaton remains in the state (S1t + 1 = 0, S2
t
+ 1 = 0, S3
t
+ 1 = 0)if it is active the state (S1t = 0, S2
t
= 0, S3
t
= 0)and the input I
t
= 1 is read. This procedure is repeated for every new visited state and input I ∈ X. In this way, the automaton that is illustrated in Fig. 5 is obtained. It corresponds with the not minimal automaton that recognizes the language generated by Tomita’s 4th grammar.
Automaton extracted from the RNN. When the automaton of Fig. 5 is minimized, by using standard minimization algorithm [6], the minimal one associated with the Tomita’s 4th grammar (Fig. 4) is achieved. With this last step, it has been shown as the presented method can extract FSA from first-order RNNs trained with strings of a regular language.

The insertion of FSA into first-order RNNs is possible thanks to: The relation between the transition function of an automaton and the hidden neurons of a first-order RNN. The transformation of these hidden neurons into equivalent TSK fuzzy rules.
This process can be summarized in the following steps:
Input: Finite state automaton M that accepts strings of a regular language L.
Output: First-order RNN that recognizes the language L.
Step 1) To choose a number mof state neurons for the first-order RNN. This number mmust be sufficient to carry out the following step 2. Step 2) To encode univocally the states of M by using discrete values for the state neurons Sj (j = 1, …, m), Step 3) To configure the weights and biases of the RNN for representing the transition functions Sj (j = 1, …, m), with the action of each state neuron Sj (j = 1, …, m), that is:
Step 3.1) To describe the action of the neuron Sj with a discrete rule base system. This action must be equivalent to the transition function δj. Step 3.2) To translate the discrete system into a fuzzy system by using propositions with the format “xi is [not] sigm(x × wij + bij)” and the structure presented in Proposition 2. Step 3.3) To transform the FRBS obtained in the step 3.1 into the description of a hidden neuron Sj according (5) and (6). Step 4) To build the first-order RNN from the description of weights and biases of the state neurons.
Example of insertion
In order to show as FSA can be inserted into first-order RNNs, Tomita’s 2nd grammar has been chosen [16]. The regular expression of the binary language generated by this grammar is (10)* and its associated automaton is illustrated in Fig. 6.

Automaton that represents tomita’s 2nd grammar.
Next, the steps to find a first-order RNN that recognizes the strings generated by Tomita’s 2nd grammar are detailed.

Automaton that represents tomita’s 2nd grammar with encoded states.
q0 = (S1 = 0, S2 = 0), this is the initial state that corresponds with (S11 = 0, S21 = 0). q0∈ F because S1≤ 0.5. q1 = (S1 = 1, S2 = 1), q1 ∉ F because S1 >0.5 q2 = (S1 = 1, S2 = 0), q2 ∉ F because S1 >0.5
According the previous codification, the transition functions δj of the automaton are illustrated in Table 1.
Transition functions of the automaton with the encoded states
Case j = 1, neuron S1 to represent function δ1
As S2 t ∈ (0, 1), if B ∈ ℜ is very greater than 1(B ∈ 1), then the first rule in RB2 is always activated. Thus, the output of RB2 is always S1t + 1 = 1.
This fuzzy system with linguistic propositions is equivalent to the following one with sigmoid functions:
A weight value wij = (- 10) has been used in these sigmoid functions. The absolute value of this weight ( | wij | = 10) is sufficient to consider that the membership functions “xi is [not] sigm(x × wij + bij)” have low uncertainty.
These TSK fuzzy rule bases are equivalent to the following action for neuron S1:
From (19) and (20), the bias b1 and weight v1 for neuron S1 can be calculated according equation (5):
A value B > > 1 can be B = 9. So it is obtained that v1 = 10 × B - 5 = 90 - 5 = 85.
This neuron implements the function δ1.
Case j = 2, neuron S2 to represent function δ2
As S1t ∈ (0, 1), if C ∈ R is very smaller than 0 (C << 0), then the second rule in RB1 is always activated. Thus, the output of RB1 is always S2 t + 1 = 0.
This fuzzy system with linguistic propositions is equivalent to the following one with sigmoid functions:
Also, in this case the absolute value of the weights of the propositions (|wij | = 10) is sufficient to consider that the membership functions have low uncertainty.
These TSK fuzzy rule bases are equivalent to the following action for neuron S2 :
From (22) and (23), the bias b2 and weight v2 for neuron S2 can be calculated according Equation (5):
A value C << 0 can be C = (-8). So it is obtained that b2 = 10 × C = (-80) and v2 = 5 - 10 × C = 85.
This neuron implements the function δ2.
From (21) and (24), the first-order RNN that implements the Tomita’s 2nd grammar can be built. It is illustrated in Fig. 8.
RNN that implements the tomita’s 2nd grammar.
Methods to understand feedforward artificial neural networks by means of fuzzy rule based systems have been recently published. These works have been used to obtain procedures to extract FSA from first-order RNNs that infer regular grammars and to insert FSA into first-order RNNs. In this way, it is not necessary to carry out a search process in the output domain of the neurons to find the states of the automaton.
The similarities between first-order RNNs and FSA have been analyzed. The actions of the neurons of a first-order RNN represent the transition function of an automaton. Thus, the description of this function is obtained when the dynamic of each neuron is interpreted by means of fuzzy rules. Once the transition function is obtained, it is easy to find the automaton structure by analyzing transitions for every state and input from the initial state.
Several examples have been used to illustrate the presented methods. New options to extract FSA from RNNs and to insert FSA into RNNs are now available.
As future research, it is proposed the adaptation of the rule extraction method from first-order RNNs to be applied on other type of RNNs. In particular, we think that it can be interesting the application of the presented method to understand the recurrent structures used in the field of deep learning [27].
Footnotes
Acknowledgments
This work has been supported by the Spanish “Ministerio de Economía y Competitividad” and by “Fondo Europeo de Desarrollo Regional” (FEDER) under Project TEC2015-69496-R.
