Abstract
Lately, a lot of research has been conducted on the automatic design of artificial neural networks (ADANNs) using evolutionary algorithms, in the so-called neuro-evolutive algorithms (NEAs). Many of the presented proposals are not biologically inspired and are not able to generate modular, hierarchical and recurrent neural structures [25, 28, 26], such as those often found in living beings capable of solving intricate survival problems. Bearing in mind the idea that a nervous system’s design and organization is a constructive process carried out by genetic information encoded in DNA, this paper proposes a biologically inspired NEA that evolves ANNs using these ideas as computational design techniques. In order to do this, we propose a Lindenmayer System with memory that implements the principles of organization, modularity, repetition (multiple use of the same sub-structure), hierarchy (recursive composition of sub-structures), minimizing the scalability problem of other methods. In our method, the basic neural codification is integrated to a Genetic Algorithm (GA) that implements the constructive approach found in the evolutionary process, making it closest to biological processes. Thus, the proposed method, is a Decision-Making (DM) process, capable to evolve ANNs architectures with optimal number of neurons and appropriate topology for any given problem. Our method was initially tested on a simple, but non-trivial, XOR problem. We also submit our method to two other problems of increasing complexity: time series prediction that represents consumer price index and prediction of the effect of a new drug on breast cancer. In most cases, our NEA outperformed the other methods, delivering the most accurate classification. These superior results are attributed to the improved effectiveness and efficiency of NEA in the decision-making process. The result is an optimized neural network architecture for solving classification problems.
Introduction
The design of an Artificial Neural Network (ANN) can be considered a complex Decision Making process, that frequently relies on the user experience. In general, this ANN design task is a trial and error process, where a number of different transfer functions and amount of hidden neurons should be adjusted in order to solve a specific problem. As the design of ANN is still being done, manually, by human experts, the automation of this design process will benefit the decision-making process done by human experts [2].
Bio-inspired algorithms have shown efficient in different nonlinear optimization problems [3, 4, 12, 32, 36, 37, 39]. Due to their efficiency and adaptability, the interest in research in the field of neuroevolution (i.e. evolutionary approach to design ANNs) has increased lately. The core issue in neuroevolution is to build an efficient indirect encoding scheme [5, 6, 7, 8, 9]. Which means that the evolutionary algorithm evolves a compressed description of the ANN rather than the ANN itself [7].
These approaches are biologically motivated by the fact, that in case of the human brain, there are more neurons than nucleotides in the genome. Moreover, in biological genetic encoding the mapping between genotype and phenotype is indirect. The phenotype typically contains orders of magnitude more structural components than the genotype contains genes. Nevertheless, neuroevolution has not so far addressed that the development and evolution of neurons is carried out by genetic information encoded in DNA and when followed will generate the final shape of the organs including the brain.
To this end, we propose a new NEA, a biologically inspired artificial hybrid system called Artificial Development and Evolution of ANNs (ADEANN). The ADEANN integrates two components. The first is a generative representation that represents genotypes (a set of production rules of a Lindenmayer system) by a compact Indirect Encoding System (IES). The IES also conducts and controls the process of mapping the genotypes to the phenotypes (complex neural morphologies). To mimic the DNA encoding scheme and enable scalability, our IES leverages the phenotype representation to a smaller genotype. Thus, the search process is carried out in a lower-dimensional solution space. In addition, our IES implements the organizational principles of hierarchy, modularity, and gene reuse (allowing compact representation of complex phenotypes). The second component is a genetic algorithm (GA), a simplified representation of natural evolution. In local search problems based on GAs, a bit string is called a chromosome (the genotype). Each bit on the chromosome is a gene, and a gene set represents the parameters of a function to be optimized. Each string is assigned a fitness that indicates the quality of its encoded solution (the phenotype). To improve the biological realism of GA, the GA in our approach evolves the generative representation. The evolutionary process can be regarded as the temporal genetic changes in the hypothetical DNAs of a population of individuals, regulated by an artificial selection mechanism. The above biological inspiration underlies the originality of our approach. To our knowledge, we report the first attempt to generate recurrent neural networks from combined metaphors.
The main contribution of our method is the genotype representation by our proposed IES. Using a compact DNA encoding, we codify a parametric Lindenmayer system (L-system) with memory, which implements the principles of organization, modularity, repetition, and hierarchy to achieve complex neural architectures (multilayer and recurrent networks). [40, 41] adopted L-systems, although [41], used DNA encoding, their study was restricted to feedforward neural networks, whereas our approach is extended to recurrent networks. In the IES used by [40], the genotypes encode twenty rewrite rules of an L-system. Our DNA encoding system encodes a parametric L-system with memory using 10 production rules. Therefore, our IES is more compact than Hornby’s method[40], and reduces the search space of all feasible solutions. In addition, the memory mechanism in our approach enables the reuse of phenotypic structures (rewrite of nodes and connections) at different stages of development. Such reuse is an important capability of NEAs.
This paper is organized as follows. Section 2 discusses the state of the art. In Section 3 we describe a new approach to formalize the problem of ADANNs (artificial development and evolution of ANNs) as a local search based on rational agents. Section 4 introduces a biologically inspired method for automatic design of ANNs. In Section 5, the experimental setup and results are presented. Lastly, a discussion and conclusions are presented in Sections 6 and 7, respectively.
State of the art
This section discusses existing research on NEAs. Most NEAs use direct encoding systems (DESs) [11], which specify every connection and node in the genotype that will appear in the phenotype (ANN).
These methods are simply implemented, but the size of the connectivity matrix scales as the square of the number of nodes. In NEAT [5], the DES incorporates a few biologically plausible entities, and alters both the weighting parameters and structures of the networks. [27] developed a novel multi-objective optimization for a hierarchical genetic algorithm (MOHGA) based on the micro-GA approach. However, this method was not tested in other applications such as temporal series forecasting.
As discussed by [1], the increasing complexity of evolutionary computation demands more sophisticated methods than direct mapping from genotype to phenotype. At the other extreme are indirect encoding systems (IESs) [6, 26, 25, 7, 8, 9, 15, 28]. In an IES, the network generation is indirectly specified by the genotype. The solution description is compressed, enabling complex topologies of ANNs.
ES-HyperNEAT [9, 16] has shown promising results enabling the development of complex regular plastic ANNs. The encoding scheme adopted by HyperNEAT [7, 17] and ES-HyperNEAT [9] does not shape itself to the biological development process, including the nervous system. The CPPN in HyperNEAT plays the role of DNA in nature, but at a much higher level of abstraction; in effect it encodes a pattern of weights that is painted across the geometry of a network. Furthermore, in these methods both the genotype and phenotype are neural networks, which is not biologically plausible. The ES-HyperNEAT [9] is an improvement over HyperNEAT [7]. While the location of the hidden nodes in the substrate, had to be decided by the designer on the original HyperNEAT. ES-HyperNEAT showed that the decision might in fact be automated. In the HyperNEAT, CPPN encodes an infinite number of weights within the Hypercube, from which a subset should be chosen to be incorporated into the ANN. [9] consider other important information is the density of nodes from which an additional increase of the same does not offer any advantage. For more details about state of the art, query the authors’ research [18].
The following methods [6, 8, 28, 37] belong to the class of GDSs using grammatical evolution (GE). Lee et al. [8] sought inspiration from the DNA, in their research, information is coded using the symbols A, G, T and C. A sequence of three of these symbols is known as a codon. The sequence of codons between these delimiters is translated into a production rule for developing a neural controller. The production rules of the L-System used by [8] is context-free and does not allow to generate recurrent networks. A similar idea [6] using binary strings also exists in which the process of reading and translating bits can be repeated from different bits in the string to produce different production rules. The ANE proposed by [6] has one disadvantage that is the difficulty to set the parameters of the fitness function of Genetic Algorithm to direct the search for minimum architectures.
The data structure of NEAT [5], which represents the genotype, grows linearly with the number of edges between two nodes. (MOHGA) [27] cannot yield the (RNNs). Despite the novel capabilities of HyperNEAT, the user must manually place the hidden nodes at the substrate, which substantially disadvantages the original HyperNEAT. The applicability of the method proposed by [28] to other applications, such as temporal series-forecasting, was not tested, and the model cannot yield RNNs.
A neuro-evolutive algorithm, adapted from [21].
Little effort has been made to formalize the problem of ADANNs as a local search strategy. Simon [19] mentions that the most important part of a decision-making process is that concerning the structuring and formulation of the decision problem. Simon [19] also mentions that decision-making is a problem of choosing a course of action from available alternatives, in other words, it consists in finding a sequence of actions that transforms current situation into another, desired one.
Considering these assumptions, this section describes a formalization of the problem of ADANNs based on rational agents. The approach presented here is based on a prototype proposed by Russell [20] called a problem-solving-agent, see Function ARP. Problem-solving agents decide what to do by finding sequences of actions that lead to desirable states. Given a precise definition of the problem, the state space is defined, each state is an ANN, and the way the algorithm finds the solution is by means of a local search strategy: the agent calls the procedure SEARCH in Line 5 of Function ARP, as is described below.
The procedure SEARCH shown in Fig. 1 contains a high-level description of a neuroevolutionary method (search strategy) that evolves action-selection ANNs. It takes a problem as input and returns a solution in the form of an ANN. Once a solution is found, the actions it recommends can be carried out. This is called the execution phase. Thus, we have a simple (formulate, search, execute) approach to designing the agent, as shown in Fig. 1.
It begins by creating a population of ANNs (Line 4) generated by an artificial development system based on L-System randomly generated (Line 3). This process is described in Subsection 4.1, where St is the set of states, Ac the set of actions, and psize the population size. It repeatedly iterates in each generation over the current population (Line 5). After each episode, the next ANN is trained (Line 7), where st means the current state nearest to the previous state. After training, the current ANN is evaluated on the current state (s), using the square error at the output of the network (Line 9) and the number of hidden neurons Eq. (4.3), Subsection 4.3. In each stage of a given episode, the agent selects the next action (ac’) (Line 10) according to the square error in the output of the current ANN. The following notation is used: (st’) and (ac’) are the current state and action while (st) and (ac) are the previous ones (Line 11). After that, it performs the action TAKE-ACTION (ac’): it updates the weights, modifies the learning rate (Line 12) and makes the transition to a new state; here, the weights are updated by the method of steepest descent. The fitness is updated according to the cost of the action (Line 13). The stop condition is tested: namely, that the mean square error (MSE) be less than a desired value (E) (Line 14). A new ANN is trained (Line 7), until all the ANNs of a determined generation have been trained. After each generation, the function SEARCH creates a new population by repeatedly calling GENERATE-NEW-POP(P) (Line 18), which generates a new population of networks from highly fit parents, according to the selection criteria and genetic operators described in Subsection 4.3. In the next section we describe a method for ADANN.
Biologically inspired NEA
The general structure of ADEANN is shown in Fig. 2. The optimization process of ADEANN proceeds through several stages. The GA starts with a population of individuals randomly initialized with 0 s and 1 s Fig. 2a. Second, the bits of each individual of the population are subjected to transcription Fig. 2b and translation Fig. 2c, following valid production rules. Both processes are performed by the function Rule-Extraction-with-GA, presented in Subsection 4.2. After finding the appropriate production rules (Table 1), the rewriting system generates the genotypes Fig. 2d. All of the genotypes are mapped to phenotypes (ANN architectures) Fig. 2e. The ANNs are then trained Fig. 2f and validated, and tests are carried out. The classification accuracy of each ANN is measured from its fitness Fig. 2g, calculated by Eqs (4) or (4.3). The latter equation implements a penalty approach that rewards economical ANNs with stronger generalization capacity and greater ease of implementation. The genotypes are classified by the performances of their ANNs Fig. 2h. The GA selects the best individuals Fig. 2i for mutation and crossover operations Fig. 2j, which provide the new population Fig. 2k. The previous steps are repeated through
The production rules of the parametric L-system with memory
The production rules of the parametric L-system with memory
To mimic the mechanism of grown structures, including neurons, we adopt a parametric L-system with memory. It comprises a set of rules created from an alphabet. This system can be described as a grammar
ADEANN links three computational paradigms: GA, an L-system, and an ANN. The evolutionary processes are defined as temporal genetic changes in the hypothetical DNAs, regulated by a selection mechanism. The hypothetical DNA of each individual encodes a set of production rules of an L-system, which coordinates the growth of artificial neurons. Each ANN is trained to solve a given problem and is thereafter tested to check its generalization or prediction capability.
The genetic code from the perspective of mRNA, translated as in Fig. 4B. In the same table, the DNA’s metaphor
A simple example of the construction process of a branch of an iterated ANN using the rules of the L-system is illustrated in Table 1.
As a simple example, Fig. 3 illustrates the construction process of one branch of an ANN. Suppose that starting with the axiom
Here, Ninputs and Noutputs are the numbers of neurons in the input and output layers of the ANN, respectively, and NR is the number of neurons at each ramification.
(A) In the analogous artificial process, a binary string is transcribed into an integer string and the string is translated into the production rules of the L-system. (B) DNA transcription into RNA and translation of RNA into protein.
This subsection is dedicated to rule extraction by GAs. The neurons generated in the previous subsection are developed after the following process. To formulate a biologically realistic GA, we let the genes of the chromosomes (sequences of hypothetical DNA) encode a recipe (the production rules of the L-system described in Subsection 4.1 and illustrated in Table 1). The recursive rules in Table 1 drive the developmental stages of the neurons (Fig. 3).
In biological genetic processing Fig. 4B, DNA is transcribed into ribonucleic acid (RNA), and the RNA is translated into proteins. The proteins are derived from linear sequences of amino acids encoded by codons (groups of three nucleotides selected among U, G, A, and G of the genetic code (Table 2). Table 2 specifies the translation products of different codons; for example, the codon UUU is translated into phenylalanine (Phe), and UUA is translated into leucine (Leu). In Fig. 4B, the protein is formed by a sequence of amino acids starting with methionine (Met) and ending with proline (Pro). Such protein synthesis triggers all stages of the neuronal development (phenotypic effects), as shown in Fig. 4B.
The elements of the alphabet
As usual, GAs are slow; however, they are efficient for dealing with large and complex search space. Moreover, in this study, we investigate the possibilities of using biological inspiration in a local search problem.
Steps 1 and 2 of the function Rule-extraction-with-GA mimic the DNA transcription process into RNA, as shown in Fig. 4B. These Steps are repeated for all individuals in the population. All the integer strings obtained after Step 4 of this process see Fig. 4A are stored in the array ‘D’ and evaluated in Step 5. Step 5 translates the integer string to valid production rules of the L-system (Table 1) proposed in Subsection 4.1. Steps 5, 5.1 and 5.2 mimic the translation process of RNA into protein. The above Steps are repeated for all rows in table ‘D’.
Figure 4A illustrates the rule extraction for a single individual of the population. In this figure, the transcription string yields the integer string B.f[Ff.nB]Bf. We seek the shortest string containing all valid rules; in this case,
Collectively, these rules form a kind of executable program that prepares the artificial neurons for development. Once a set of valid production rules has been determined for each line of table ‘D’, the neuronal development presented in Subsection 4.1 is initiated. The development process is illustrated in Fig. 3. The resulting population of ANNs is trained and the fitness of each ANN is calculated by Eq. (4.3).
The ordered sequences of alphabetical characters (valid production rule:
Fitness evaluation and selection mechanism
The fitness of an ANN can be appropriately measured by the mean square error (MSE) given by Eq. (3). The MSE measures the expected squared distance between the predicted and true values. As such, it measures the quality of a predictor. The system adjusts the weights of its neural networks to minimize the MSE in the training and test sets. ADEANN aims not only to minimize the MSE of the ANN in the training set but also to generalize and properly predict in the test set. In Eq. (3),
where
In this paper, the reproducing individuals are selected by tournament selection [11]. The fitness difference between the mating pool and the population reflects the selection intensity (
In tournament selection, larger tournament sizes t is lead to higher expected loss of diversity (LD) [29]. Diversity is important in GAs because crossing over a homogeneous population does not yield new solutions. Accordingly, the tournament size t of ADEANN is set to 3, meaning that three chromosomes compete with each other, and the best chromosome among these three is selected for reproduction.
In this section we evaluate the performance of the ADEANN in evolving the architectures of ANNs.
Experimental setup
ADEANN parameters. The parameter values were empirically determined from the results of several XOR experiments. Experiments were run on a V14T-5470-A60 Core i7 computer with 8 GB memory, operating at 3.2 GHz. The various combinations of the parameter values are listed in Table 4. The first column specifies the numbers of the experiments, generations, and individuals per population. The second column states the neural network solution returned by the search process. The third, fourth, and fifth columns, respectively, specify the chromosome length, the number of crossover and mutation points, and the crossover rate. The last four columns present the simulations results: the average (
Parameters used in the experiments
Parameters used in the experiments
Genetic algorithm parameters and simulation results of 20 experiments of the XOR problem. ANN
Comparison between ADEANN and other methods in the XOR problem
The results of the XOR problem after twenty experiments are summarized in Table 4. In the last fifteen simulations, the fitness given by Eq. (4.3) automatically prevents the algorithm from unnecessarily producing complex ANNs with a large number of hidden neurons, and encourages smaller topologies. This constraint improves the generalization capacity of the ANNs. All networks returned by the search process contained two neurons in the input layer, two in the hidden layer, and one in the output layer, and can be described by ANN
The results of the best (19th) and worst (first) experiments, in terms of small MSE and minimum number of hidden neurons, are highlighted in bold in Table 4. NEAT [5] finds a structure for XOR in 32 generations (4.755 networks evaluated,
Generalization capacity obtained by the ANN with a MSE 
Normalized values for CPI in the period of Jan. 1998 to May 2010. Source, Central Bank of Brazil
The effect of 26 drugs on 15 experimental tumor and on a tumor in a patient with breast cancer
To verify the viability of the method for large problems, we carried out experiments for the prediction of the time series of the consumer price index (CPI), for the period of Jan. 1998 to May 2010 (source, Central Bank of Brazil, as seen in Table 6). The CPI quantifies the costs of products at different moments. In other words, it measures changes through time in the price level of consumer goods and services purchased by households and is useful for the calculation of the inflation rate.
In five runs, the best experiment shows that the ADEANN system finds an ANN for CPI in 50 generations (1500 networks evaluated, std
Testing results for the ANN for predicting the effect of 15 new drugs on malignant breast cancer. Eleven test examples are used. If a threshold of 1.0 is chosen to decide on an active neuron, the effects of all the new drugs tested are correctly predicted
Testing results for the ANN for predicting the effect of 15 new drugs on malignant breast cancer. Eleven test examples are used. If a threshold of 1.0 is chosen to decide on an active neuron, the effects of all the new drugs tested are correctly predicted
The problem of predicting the effect of a new drug on breast cancer [24] is to find an analogous mapping between the set of 15 experimentally induced tumors and a clinical tumor (malignant breast cancer), based on their known reactions to the same drugs, so that we are able to predict the effect of a new drug on the clinical tumor after having studied the experimental tumors. The solution is based on the presumed similarity between this tumor and a set of experimental tumors. Table 7 illustrates the dataset used for training, which contains the known effect of 26 established drugs on 15 experimental tumors and on a clinical tumor.
In five runs, the best experiment shows that the ADEANN system finds an ANN for BC in 100 generations (3000 networks evaluated, std
Discussion
The first challenge that we have addressed was the scalability problem. Our approach includes improvements to our previous proposal [6]. First, the chromosome used by the GA in our research has a variable length, from 180 to 516 bits, in contrast to before, when it had a fixed length of 1024 bits. For this reason, our current approach evolves a compressed description of the ANN that reduces the search space of the GA. Furthermore, it reduces the scalability problem of other methods [25, 26]. In the section about the experiments we present results that confirm these improvements.
The method of [26], as described in [6], improved the scalability problem presented by the method of [25], since an ANN with
Another challenge we have addressed is how to evolve minimal solutions. Our method also enables network optimization. As discussed in Subsection 4.3, the fitness function Eq. (4.3) automatically implements a penalty approach that rewards economical ANNs with better generalization capacities and ease of implementation (see the last fifteen lines and the second column of Table 4). This function selects networks by two criteria: the number of hidden-layer neurons, and the MSE. Therefore, smaller networks score higher fitness values than larger networks with the same performance. RNN-evolving NEAs are rarely reported in the literature. Our ADEANN system partially fills this gap, and provides accurate, automatic direct and recurrent ANNs for pattern recognition problems and dynamical systems simulations. Many methods don’t worry, as stated by [21], about avoiding optimizing overly complex topologies. Thus, at least part of the evolutionary search will be conducted in an unnecessarily high dimensional space, which is not desirable, as in some situations, less complex topologies can simulate the same task with the same precision.
Conclusion
ADEANN constitutes an automatic system to simulate tasks of pattern recognition, without any previous knowledge from the user. Also, two modifications of an earlier method [6] have been evaluated. The first one involves the fitness function, given by Eq. (4.3). It explicitly rewards smaller solutions. Our approach solves the three problems presented previously, in Section 5, without trouble in finding small topologies. The second, our L-System, presented in Subsection 4.2, enables generating more complex, direct, recurrent and multilayer networks, than the previous one [6]. Only the method of [14] formalizes the problem of ADANNs as a local search strategy. In Section 3 we described a new approach to formalizing the problem of ADANNs as a local search based on rational agents. Our approach increases the level of implicit parallelism of GA, which evolves a compressed description of the L-rules and enables generating more complex, direct, recurrent and multilayer networks than the previous one [6].
NEAs that evolve recurrent neural networks are rarely found in the literature [40]. To partially fill this gap, we designed ADEANN as an accurate automatic method for designing direct and recurrent ANNs and thereby solving pattern recognition problems and simulating dynamical systems.
Ultimately, the biological inspiration addressed in our research, makes it original. The merits of our Bio-Inspired algorithm compared to conventional algorithms are: flexibility, performance, scalability, flexibility in decision making and improvement scope. In future work, we will tackle other challenges, such as partially connected networks and ANNs that use Hebbian learning, thus exploiting aspects of neural plasticity.
