A new multiobjective genetic programming approach using compromise distance ranking for automated design of nonlinear system design

Abstract

This paper presents a new multiobjective genetic programming (MOGP) approach, to realize an all-in-one automatic nonlinear system design (NSD). The nonlinear system design is here modeled as a multiobjective optimization problem (MOP) to solve parameter estimation, structure optimization and feature selection simultaneously. The novel MOGP method is then proposed to rank individuals according to the ‘compromise distance’ between them, which has the benefit of combining decision making for NSD with the optimization process to get the final compromise solution in a single process. The effectiveness of the proposed learning approach for nonlinear system design is verified through experiments on the classical nonlinear autoregressive with extra inputs (NARX) system by comparison with classical aggregating method and a Pareto-based method for MOP. Finally, experimental results demonstrate the proposed approach is available to explore the unknown structure of nonlinear systems as well as the features and parameters with high accuracy and efficiency.

Keywords

Multiobjective genetic programming (GP)compromise distance nonlinear system design evolutionary multiobjective optimization (EMO)

1 Introduction

Nonlinear system design is crucial but a complex problem in many areas of engineering, i.e., industrial control systems [1], biomedical data modeling [2], and chemical process systems [3]. This problem not only requires to accurately estimate the unknown parameters of the system model, but also demands to identify the uncertainty structure of the nonlinear model even without any prior knowledge about the system. Based on these demands, genetic programming (GP) was employed by many researchers to discover and optimize the structure of nonlinearmodels [2, 3]. The reason is that GP learning algorithm can not only effectively detect the underlying relations among the huge data with little existed theory, but also take powerful global search in the function space and is able to co-evolve the model structure and parameters [4]. Consequently, it is better to solve nonlinear system design using GP method than the empirical black box models [5] which lack the capability of model reconstruction and provide little insight about the underlying dynamic of the system.

However, uncertainty nonlinear model structure without prior information makes the search function huge so that the GP algorithm convergence is time-consuming and the optimal solution is difficult to obtain, especially in the case that the input variables of system affect each other and significant features are not easy to recognize. Typically more than one model structure can be predicted by the nonlinear dynamic system with the same smaller error, thus many candidate models can get better model accuracy, but with redundant structure and poor comprehensibility. On the other hand, the rapid growth of tree sizes in the GP algorithm tends to cause the algorithm stagnating and leads to the phenomenon of bloating, which is resulted from evolution of non-functional subtrees and inefficient crossover operation [6]. In another words, the complexity of a GP tree should be considered as an objective to be restrained in a low level.

To overcome the above challenges, multiobjective GP (MOGP) approach was considered to learn the best nonlinear model with highest accuracy, most parsimonious model and appropriate features to exhibit the relationship between inputs and the response output of the nonlinear system. In mathematics, assume that an unknown system can be expressed in the form of a general regression model as Equation (1). $y = g (c, X) + ɛ;$ (1) where g is a nonlinear function with unknown structure; y is the output vector ( $y \in R^{m}$ ), c is the unknown parameter vector (c ∈ $R^{n}$ ); X denote the unknown model regressors (X=[x₁, x₂, ...x_n], $x_{i} \in R^{m}$ , i=1,2,...n) and ɛ means the approximation model noise. There are three variables–the available model structure of g, appropriate regressors X, optimal parameters c–which are required to automatically be determined simultaneously. Three objectives are optimized to minimize the estimated error and the model structure complexity together. Among these objectives, the approximation performance of model is normally in conflict with the complexity of model structure and features included. Evolutionary multiobjectives optimization (EMO) is therefore considered to approximate the trade-off in one process for nonlinear system design.

In our paper, a novel EMO approach based on GP,is proposed with the purpose of combing the preference of decision making with the optimization process to get the final compromise solution in a single run. Consider that all the classical EMO algorithms require predefining some information (i.e.: weight values or goal threshold) about the special application, which is hardly able to be determined exactly for decision maker without any prior information of a nonlinear system. In addition, Pareto-based approaches (such as, the multiobjective genetic algorithms (MOGA) [7], the nondominated sorting genetic algorithms (NSGA-II) [8] and the strength Pareto evolutionary algorithm (SPEA2) [9]) can only get the Pareto-optimal set which have many redundant solutions and then tend to shade the ‘best’ solution. Consequently, the complex multi-criteria decision making has to be made by designers to obtain the final ‘best’ model of nonlinear systems, which usually require goal and preference information as well. Therefore, in order to obtain the satisfied solution efficiently and effectively, this paper suggests a new multiobjective optimization approach, which cooperate a new multiobjective ranking approach with GP, to address nonlinear system design problem.

To certify the validity of our proposed method, a simulation test is then implemented to identify the model design of a classical nonlinear autoregressive with extra inputs (NARX) model, which is the most popular method as a foundation for model construction [2]. The paper is therefore organized as follows. The next section will present the multiobjective optimization model of nonlinear design problem. In Section 3, the definition and implementation of the proposed MOGP is presented in detail. This approach is then applied to a nonlinear system with NARX model in Section 4. Experimental results are presented along with a comparison to traditional single-objective GP method, as well as a classical MOP method and a Pareto-based MOP approach. Concluding remarks are given in Section 5.

2 Problem formulation

System modeling and system identification are two main difficulties of nonlinear systems design. The objective of system modeling is to determine the structure of a rule expressing the relationship between inputs and outputs of systems. After that, system identification aims to accurately estimate the unknown parameters of the system model. Traditional methods for nonlinear system modeling generate the physical relations in accordance with the prior knowledge about the system, and the regressors vector is simply composed of the values of all the n related process inputs, that is, X = [x₁, x₂, . . . x_n]. However, nowadays most of the systems are so complex that little prior knowledge about the system mechanism can be known, and the related inputs which are significant to express the system behavior generally should not cover all the n input variables but difficult to select. Such a problem can not be considered in the traditional methods. In other words, nonlinear system design without any priori knowledge should be formulated using several criteria to obtain a satisfied model structure with appropriate regressors that can explain particular behavior of the system.

Actually, it can be seen that these criteria for nonlinear system design have the following characteristics with regard to model performance and model structure.

The criteria of model structure complexity and model performance are incompatible to some extent. Usually a model with higher complexity structure performs better than others;

Due to the redundant terms appeared in a model, the same model performance can correspond to more than one model structure, such as: the result of (x₁ * x₂ + x₂) is the same as that of $(x_{1} * x_{2}^{2} / x_{2} + x_{2})$ but the later structure includes redundant terms;

Due to the dependence relationship of different variables, different feature combinations may generate the same model performance, such as: the result of (x₁ * x₂ + x₂) is the same as that of (x₁ * (x₃ + x₄) + x₂) when x₂ = x₃ + x₄ is satisfied. Besides, it is worth noting that some feature combinations may cause over-fitting problem.

Therefore, nonlinear system design problem can be formulated as a multiobjective optimization (MO) problem, in order to meet the demand for the solution model with high accuracy and parsimonious structure. This idea has been introduced by some literatures [2], aiming to identify the uncertainty structure of nonlinear systems which is the most difficulty for nonlinear system design. Assume an autoregressive nonlinear system model for the unknown output vector y based on n input vectors X_i with associated parameters c can be described as: $y = g (c, x_{1}, x_{2}, . . ., x_{n});$ (2) where y = [y (1) y (2) y (3) . . . y (m)] ^T is the output vector and m is the total number of data records at different times; x_i = [x_i (1) x_i (2) x_i (3) . . . x_i (m)] ^T(i = 1, 2, . . . n) is a vector composed by all the n known variables. It should be noted that some features among all the n variables are redundant and have no relation with the output. Therefore, the MO problems to design a nonlinear system (2) involve finding the mapping function g, the parameter vector c, the set of significant available features {x₁x₂x₃ … x_l} (1 ≤ l ≤ n) selected from n input variables. It can be modeled as a hybrid learning of model structure, regression parameters and selected features in the sense that all of them have to be estimated in the optimal form simultaneously. In mathematics, the MO problem of nonlinear system design can be described as the simultaneous minimization of three objectives, including approximation error, structure complexity and the number of selected features of the modelas (3). $\begin{matrix} \min {f_{1}, f_{2}, f_{3}}; \\ f_{1} = E [(y - g (c, x_{1}, x_{2}, . . . x_{n}))^{2}]; \\ f_{2} = model structure complexity; \\ f_{3} = the number of selected features . \end{matrix}$ (3)

These three objectives effect and restrict with each other. Compared with the single objective of approximation error, it benefits to increase the search speed to the optimal space and avoid the redundant structure and features appearing in the solution model. But the solution is much more difficult to obtain the satisfied solution than that of the single-objective method using the conventional stochastic optimization techniques, because the solution search space of multiple objectives increased enormously compared with that of a single objective. Thus, considering evolutionary algorithm is characterized by the parallel computing ability to find the optimal set, evolutionary multiobjective optimization (EMO) techniques are applied to address this problem (3).

3 New Multiobjective genetic programming

3.1 A new rank approach in evolutionary algorithm for NSD

The rank method of individuals with multiple inconsistent objective functions is a difficulty of MOP. Traditionally, classical EMO approaches convert the MOP to a single objective optimization by aggregating the cooperating objectives, and obtain only one final solution in one process. On the other side, Pareto-based EMO approaches generally rank the nondominance degree of individuals, and then achieve the solution as a vector which is composed by a set of ‘nondominated’ solutions for all the objectives. Thus, the final solution should be determined by the goal information provided by designers. The definition about ‘dominance’ is shown as below.

Definition 1. (Dominate) [10]: For a minimization optimization problem, a given vector u is said to dominate a vector v if and only if u is partially less than v, denoted by u ≤_Dv, if v - u ∈ D and u ≠ v. Here, D is a convex cone defined in $R^{k}$ .

Note that both of these methods used the comparison of absolute distance of different individuals for ranking. Generally, the classical aggregating method defines fixed weights of the multiple objectives while dominance rank method assumes that all objectives have an equal weighting for optimization, both of which are not suit for solving NSD problem. Obviously, for NSD problem (3), those three multiple objectives are required in the different priority levels in different cases for solving the optimal design of nonlinear systems. Particularly, the final solution prefers more to be a smallest approximate error model with appropriate structure, rather than a simplest model without smaller approximate error model. That means f₁ has the higher priority level than f₂ and f₃ in the model (3), while f₂ and f₃ locate in the same preference level, especially near the optimum solution on the Pareto Front. However, the weighting coefficients is usually difficult to be exactly defined to balance the different relative role of the objectives, the value of which often changed for different cases as well. In addition, the desired goal values of each objective can not also be determined in our problem. Therefore, a new rank approach of evolutionary algorithm is proposed here to be involved in the EMO solution algorithm for NSD problem, aiming to enhance the convergence efficient and directly evolve the unique accurate solution satisfied by the users’ favor.

Our new rank method can conduct the final Pareto-optimal solution in one process by comparing two individuals according to their compromise distances as the following definition. Let S_i and S_j be two individuals in the evolutionary algorithm. For our NSD problem, they are actually two candidate solutions and each solution includes a model structure, a parameter set, and the selected features.

Definition 2. (Compromise distance): The compromise distance from solution S_i to solution S_j is defined as the sum of all the relative distances of every objective from solution S_i to S_j in the objective space. For a minimization optimization problem, the positive distance denotes the decrease proportion while the negative distance denotes the increase proportion. Mathematically, it is expressedas Equation (4).

$d_{ij} = \sum_{k = 1}^{N} {\frac{f_{k} (S_{i}) - f_{k} (S_{j})}{f_{k} (S_{i})}};$ (4) Where N is the number of objectives and f_k (•) denotes the kth objective value. It is seen that compromise distance d_ij is defined based on the relative distance of individuals, which can be considered as the absolute distance with an adaptive weighting factor. This weighting factor is related with the value of f_k (S_i). That means different solutions have different weighting factors and these we do not need to predefine weighting values. So, the compromise distance value is different from the sum of the absolute distances of objectives with predefined weights. In addition, we found that these weighting factor in the compromise distance can automatically reflect the priority level between multiple objectives in the NSD problem as (3). In the case that the value of f₁ (S_i) is much different from the value of f₁ (S_j) and the values of other objectives are close, f₁ should have a higher priority level than f₂ and f₃ for optimization. According to Definition 2, the corresponding weighting value of the f₁ is bigger than that of f₂ and f₃, which is consistent with the priority level of these objective functions. So, ranking with compromise distance can ensure that the higher the priority level corresponds to much more weights of this objective. For the overfitting case, f₁ declines a few while f₂ and f₃ would raise much. According to Definition 2, even though weighting factors put much weights on the f₁, the term of absolute distance would contribute more to rank. Thus, the individuals with overfitting model would be eliminated by the optimal individual. So, ranking with compromise distance is effective to solve the overfitting problem. Besides, for the individuals far from the optimum, the values of f_k (S_i) have few relations with the priority level of objectives, then all the adaptive weights are close and it is helpful to let the individuals evolve to the better solution. Therefore, compromise distance between solutions is helpful to solve our problem. In addition, compromise distance is not equal to the sum of multiple objectives with adaptive weights. The rank method based on compromise distance has some benefit to replace the absolute distance of the classical aggregating method that requires predefined weights in different cases.

Because it is found that compromise distance can adaptively reflect the characteristics of these three objectives as above, we propose the rank rule of this new rank method as following: (Assume all the objective values are positive which is consistent with NSD problem). Here, the term ‘rank’ is used to measure the performance of every individual. For our minimization problem, the smaller rank the better optimal solution.

when d_ij=0, then rank(S_i) = rank(S_j);

when d_ij>0, then rank(S_i) > rank(S_j);

when d_ij<0 and d_ji>0, then rank(S_i) < rank(S_j);

when d_ij<0 and d_ji<0, if d_ij<d_ji, then rank(S_i) < rank(S_j); otherwise, rank(S_i) > rank(S_j).

Theorem 1: In the objective space, the rank order of vectors u, v calculated by the proposed rank method obeys the rank order generated by nondominance rule (as Definition 1), i.e.,

If u dominates v, then rank(u)< rank(v);

If rank(u)≤rank(v), then v can not dominate u. In other words, u dominates v or u and v are in the same non-dominating rank.

Prove:

In the situation I assuming that u is partially less than v, for every objective f_k, f_k (u)≤f_k (v) and at least there is one objective f_o, f_o (u)<f_o (v). In another saying, for every objective f_k, $f_{k} (u) - f_{k} (v) \leq 0$

and for all objectives, $\sum_{k = 1}^{N} (f_{k} (u) - f_{k} (v)) < 0 .$

Then, according to Equation (3), it can be induced that d_uv<0 and d_vu>0.

According to the rule 2), it can be concluded that rank(u) < rank(v) corresponding to the dominate saying, u dominate v. Obviously, this solution obeys the Definition1.

In the situation II that part of u is partially less than v while others of u is larger than v. Assume that for objectives k_a (a = 1 . . . h), f_{k
_a} (u)≤f_{k
_a} (v) while objectives k_b (b = h + 1 . . . N), f_{k
_b} (u)>f_{k
_b} (v).

That means $\sum_{a = 1}^{h} (f_{k_{a}} (u) - f_{k_{a}} (v)) < 0$

and $\sum_{b = h + 1}^{N} (f_{k_{b}} (u) - f_{k_{b}} (v)) > 0 .$

Then, d_uv can be positive or negative, totally depending on the absolute difference between negative gradient terms and positive gradient terms. In another saying, they (u and v) are in the same nondominate level as non-dominated sorting.

So, it can be concluded that this proposed rank approach assign the lower dominated front lower rank than the higher dominated front, which is consistent with dominate rule.

3.2 Main loop of the proposed MOGP approach

The flowchart of the proposed MOGP approach is shown in Fig. 1.

3.2.1 Initialization

Because GP considers a tree to represent a chromosome, the structures of trees express the nonlinear system structure, and the leaves of them are features selected. In order to let model structures consistent with semantic restraints, appropriate values within the function and terminal sets of GP should be determined to restrict the search space. Here, we take a test of classical polynomial NARX model, since it has been demonstrated as one of the most common model of the deterministic nonlinear systems in practical applications [11]. Then, the function set of our GP algorithm is defined as {+, *}. And the terminal set of our GP algorithm involves all the n decision variables of a nonlinear system as Equation (2).

In the initialization stage, the same as original GP, the ramped half-and-half method is adopted to select the initial trees. The operators of the system model are randomly chosen from the function set, and significant features are selected by random among all the n input variables as the leaves of every tree to form the initial generation. Nevertheless, this method would result in the problem that an internal node could be chosen from the feature set and its leaf nodes might be nonsensical as a solution. So every tree individual should be checked as Fig. 2 before calculating the fitness values of it. This procedure is called as model certification. In this procedure, all the internal nodes are checked to ensure whether it is a mathematics operator or a variable symbolic. If an internal node is a variable symbolic, the subtree from this internal node down to its leaf nodes is deleted. Model certification is of particular importance for calculating the complexity of the nonlinear model in the following evaluation step.

3.2.2 Evaluation

After generating the structures of candidate trees in the first step, Least squared method (LS) is employed to estimate parameters in the model this candidate tree equals. After this, three objective functions as Equation (5) are calculated tree by tree to compose a multi-objective values vector. $\begin{matrix} \min {f_{1}, f_{2}, f_{3}}; \\ f_{1} = \frac{1}{y^{'} * y} \sum_{i = 1}^{N} [(y_{e} (i) - y (i))^{2}]; \\ f_{2} = num (‘ {MULT}^{’}) + num (term); \\ f_{3} = l . \end{matrix}$ (5) Where f₁, f₂ and f₃ denote approximate error of model performance, model structure complexity and the numbers of features selected respectively; y_e and y denote the estimated values and real values respectively; y is the real output vector. Here, concerning the different standards of decision variables among different nonlinear systems, the model performance f₁ is denoted by the normalized square error (NSE) between estimated output values and real output data. The degree of model structure complexity f₂ is denoted by the proportion of nonlinearity. With the reason that the function set of our GP involved two operators: add and multiply, nonlinearity can thus be expressed by the present times of multiply operators and the number of terms in the equation. At last, feature selection f₃ is assessed by the numbers of features appeared in the equation.

Whereafter, the candidate trees in one generation are assigned a fitness rank value according to their multi-objective values vectors based on its compromise distance proposed in the previous section. The proposed rank scheme for our model (5) is implemented as follows:

Select the first tree T₁ and calculate the compromise distance d_1j from T₁ to any othertrees T_j;

Decide the rank value of T₁ using d_1j according to four rules of the compromise rank method. If the relationship between the rank of T₁ and the value of d_1j meets these rules, it doesn’t need to change any rank value; otherwise, the tree in the higher rank side should be re-ranked and its rank value is defined as one more than the rank of the tree in the lower rank side;

Select the next tree and repeat the steps (1-2) until all trees’ rank are set.

3.2.3 Evolution

In the evolution stage, selection, reproduction, crossover and mutation operations are adopted to generate new individual trees. First, tournament selection method is carried out to select individuals from the population. This method compares any two individuals and select the individual with better fitness value. Thus, the number of these selected individuals is the half of the population size. Then, these selected individuals are separated into three parts with the probability to apply for reproduction, crossover and mutation respectively. In the reproduction processing, the new individuals are generated by reproducing the individuals with better fitness values. In the crossover processing, every two individuals are selected, and a subtree of each individual is randomly chosen. Two new individuals are generated by exchanging the selected subtree of each individual. In the mutation processing, for every individual, a new individual is generated by replacing a random subtree of the individual with a new subtree. This new subtree is generated by the same method as the initial trees. After the operations of reproduction, crossover and mutation, an elitism mechanism is employed in order to let the better individual survive to avoid the lost of excellent genes due to random effects. This mechanism uses a competition strategy to obtain better offspring by a comparison of parents and their children. In this competition strategy, the fitness rank values of parents and children trees are evaluated and two trees with lower ranks are selected to construct the new generation. This elitism mechanism has been demonstrated the convergence property [12] and successfully applied in the genetic algorithm to solve some real application. It could improve the convergence speed and the accuracy of solutions.

Therefore, the main loop of the proposed algorithm is implemented as follows:

Initialize the function set and terminal set, set the parameters of the algorithm, such as the population size, generation number, crossover probability, mutation probability, etc.

Evaluate the rank value of every tree in a generation using the proposed rank rule based on compromise distance.

Operate the evolution processing to generate a new population. First, select some individuals using tournament selection method and the candidate tree with smaller rank is selected in the tournament comparison. Then, assign every tree of the selected individuals with an evolution operation among three different operations: reproduction, crossover and mutation, based on operation probability. Realize the corresponding operations of parent trees to generate their children trees, and calculate the fitness rank value of each children tree. After that, combine all the parent trees with children trees as an intermediate population and rank them by the proposed rank rule. Put the trees with smaller ranks in the next generation set until the population set is met.

Go to Step 2 and evaluate the new population. When the fitness value with the smallest rank meets the goal value or the terminate conditions are satisfied, the loop ends. Otherwise, continue to realize following steps.

4 Simulation

In order to test and illustrate the performance of this proposed algorithm, this approach is applied to the task of designing a typical nonlinear system with the NARX model and two comparison experiments are made. Firstly, the comparison of the results of the proposed MOGP approach with single-objective GP is presented in order to show that NSD problemcan be addressed better by considering as a multi-objective problem than a single objective problem. Without generalization, this single-objective GP choose the approximate error of model as the objective function. Secondly, one kind of Pareto-based EMO, NSGAII, is chosen to deal with the evaluation and evolution process of multi-objective GP. Here, this cooperation of EMO algorithm is called as NSGAII-GP. By comparing the proposed MOGP approach with NSGAII-GP, the validity and superior property of the proposed rank method based on compromise distance is demonstrated.

Assume the input regression matrix X = [x₁, x₂, … x_n] with four different features, and the output vector y follows the underlying relationship model as (6) with X in the presence of the additive white noise ɛ. The noise sequence ɛ has zero mean and 0.01 variance. Note that the model structure of y is unknown, thus the initial individuals of the algorithms always have complex structures even though the real model structure in (6) is simple. In order to increase the diversity of structures with the same approximate performance, we assume that x₁, x₂ and x₃ are independent variables generated by random, while x₄ is the sum of x₁ and x₂. That means countless models with complex structures could generate the same NSE between estimated output andreal output. $y = 10 * x_{4} * x_{3} + 5 * x_{3} + 5 + ɛ;$ (6) There are 2000 records in the training data. All of the genetic programming algorithms defined the same simulation parameters: population size = 100, generation = 20, maximum depth of trees = 5, crossover probability = 0.7, mutation probability = 0.3.

In the simulation, three objective functions as (5) are defined to optimize the model performance and mode structure. f₁ calculates the NSE between the estimated and true output, f₂ calculates the number of multiply operators and the number of terms in the model structure, f₃ calculates the number of selected features in the model structure. The optimal solution should have a smallest NSE, and a parsimonious model structure with fewer selected features, so the optimal solution is obtained by minimizing f₁, f₂ and f₃ in (5) simultaneously. Thus, the proposed MOGP method and NSGAII-GP method are applied to solve the multi-objective optimization model (5), and the single-objective GP method uses the sum of f₁, f₂ and f₃ with weights. Tables 1–3 present the results of the proposed MOGP approach, NSGAII-GP and single-objective GP in 10 trails. It is shown that the proposed MOGP approach can obtain the optimal solution with the minimum NSE and a parsimonious model structure that is the same as the real one.

As the comparison of the proposed MOGP and the single-objective GP, Table 3 reveals that the single-objective GP is unstable while the convergence of the proposed MOGP approach is fast and stable shown in the Table 1. For instance, the trial 2, 6 and 8 in the GP simulations do not even converged due to its excessive dependence on the initial population, and this convergence problem does not occur in the proposed MOGP simulation. Even though some trials of GP may converge to the minimum model error, the model solution is not in the parsimonious form (such as: trial 1, 5, 7) or has certain redundant terms (such as: trial 3, 10). In addition, the accuracy of solution model structure is 100% in the proposed MOGP for several trials but less than 50% in single-objective GP. The reason is that original GP tends to get stuck in local optimal point when more than one structure correspond to a smaller error. Therefore, it is more reasonable to treat NSD problem as a multi-objective optimization problem than as a single objective optimization problem.

Through the comparison of Tables 1 and 2, it can be found that NSGAII-GP cannot obtain the global optimal structure of models while the proposed MOGP can converge to the best solution. NSGA-II can only find the Pareto-optimal set, from which designers need use multi-criteria decision making (MCDM) techniques to obtain the best solution. But the realization of MCDM always needs some weight information among multiple objectives or goal information about the special application, which can hardly be obtained for most NSD problem. Hence, NSGAII-GP chooses the smallest approximated error in the Pareto-optimal set as the final result. It is observed that the proposed MOGP provides a better way to solve the NSD problem without any prior information and is able to obtain a satisfiedsolution.

The convergence properties of these three algorithms are shown in Fig. 3. In Fig. 3(a), average learning curves of three different algorithms in 10 trials are presented respectively by converting multiple objectives to a single objective as f₁ + λ₁f₂ + λ₂f₃. Here, the experience weights λ₁ and λ₂ are chosen as 0.001 by many training experiments. It can be seen that the proposed MOGP can converge quickly to the global optimum of multiple objectives while other two algorithms can not. To understand the reasons behind, their average learning curves for model error objective (f₁) and model complexity objectives (f₂ + f₃) are plotted in Fig. 3(b) and 3(c) respectively. It is observed that all these algorithms can converge to the best model error but only the proposed MOGP can converge to the best model structure with low complexity at the same time.

Due to the excellent convergence property of the proposed MOGP in terms of model complexity, tree chromosomes of the proposed MOGP are simpler than those of other two algorithms after some generations, consequently the proposed MOGP requires less time than other algorithms in a single run. This statement can also be validated by Table 4 which reports CPU computing time for comparison.

The above simulation results show that the proposed approach has superior performance to design nonlinear polynomial systems than single-objective GP and Pareto-based EMO, especially it can obtain the correct model structure for the nonlinear systems whose input variables are dependent. It can converge to more simple structures with fewer features in case of smallest approximated error, compared with single-objective GP and NSGAII-GP. Moreover, the optimum result of the proposed MOGP is almost the same as the real solution Equation (6) and the proposed MOGP can obtain the real structure of model in this test.

The reason why the proposed MOGP can have such a good performance for the NSD problem attributes to the special relationship of three objectives. For this test, some objective vectors (log₁₀f₁, f₂, f₃) around the global optimum are shown in Fig. 4 where model error is calculated by NSE. It can be observed that the model error values of the global optimum or local optimum are away from other points by more than 10³ but the difference of model complexity values is less than 10. Therefore, the methods of aggregating all the priori information as a fitness function tend to get stuck in the local optimum, and the proposed ranking method based on compromise distance can be more easy to achieve the global optimum.

In sum, the results show that the proposed MOGP is available to design a nonlinear system, especially the systems whose input variables are not independent. Moreover, it can converge to more simple structure with fewer features in the case of smallest approximate error, compared with single-objective GP and NSGAII-GP.

5 Conclusion

An all-in-one automated nonlinear system design scheme based on the proposed MOGP approach and NARX representation was presented. The nonlinear system design is modeled as hybrid estimation of mode parameters, model structure and feature selection simultaneously. The proposed approach combined the benefit of multi-objective optimization and structure learning attribute of genetic programming to identify the nonlinear system. In this new approach, a new multiobjective rank evolutionary algorithm for Pareto-based EMO is proposed to conduct the final Pareto-optimal solution in one process by comparing the compromise distances of individuals. The validity of this approach for nonlinear system design is proved by a dataset with classical NARX model. Simulation results present that the results of the proposed approach show higher solution accuracy and faster convergence than the single-objective GP and another multi-objective GP algorithm (NSGAII-GP). It can be concluded that multi-objective optimization techniques are more suitable than the single objective optimization methods to address the NSD problem. In addition, the proposed multiobjective evolutionary algorithm can obtain the final best solution with little information about the special applications, which is a superior character compared with other Pareto-based EMO algorithm. In the future, this algorithm can be implemented for more real-world applications to uncover the underlying models of whom structure and significant features are unknown as a priori.

Footnotes

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant no. 61401145), the Natural Science Foundation of Jiangsu Province (Grant no. BK20140858, Grant no. BK20151501), and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD). The authors would like to thank all the people who help us to overcome many difficulties during this research work.

References

Rodriguez-Vazquez

, Fonseca

C.M.

and Fleming

P.J.

, Identifying the structure of nonlinear dynamic systems using multiobjective genetic programming, IEEE Transactions on Systems, Man and Cybernetics, Part A34(4) (2004), 531–545.

Beligiannis

G.N.

, Skarlas

L.V.

, Likothanassis

S.D.

and Perdikouri

K.G.

, Nonlinear model structure identification of complex biomedical data using a genetic-programming-based technique, IEEE Trans on Instrumentation and Measurement54(6) (2005).

Willis

, Hiden

, Hinchliffe

, Mckay

and Barton

G.W.

, Systems modeling using genetic programming, Computers Chem Engng21 (1997), 1161–1166.

Koza

J.R.

, Genetic Programming, Cambridge, MA: M.I.T. Press, 1992.

Sjoberg

, Zhang

, Ljung

, Benveniste

, Delyon

, Glorennec

P.Y.

, Hjalmarsson

and Juditsky

, Nonlinear Blackbox Modelling in System Identification: A Unified Overview, Automatica31(12) (1995), 1691–1724.

Bleuler

, Brack

, Thiele

and Zitzler

, Multiobjective Genetic Programming: Reducing Bloat Using SPEA2, Proc Evolutionary Computation Conference, 2001, pp. 536–543.

Fonseca

C.M.

and Fleming

P.J.

, Multiobjective optimization and multiple constraint handling with evolutionary algorithms-part I: A unified formulation, IEEE Transactions on Systems, Man and Cybernetics, Part A: Systems and Humans28(1) (1998), 26–37.

Deb

, Agrawal

, Pratab

, Agarwal

and Meyarivan

, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Transactions on Evolutionary Computation6(2) (2002), 182–197.

Zitzler

, Laumanns

and Thiele

, SPEA2: Improving the strength Pareto evolutionary algorithm, in Proc EUROGEN 2001 Evolutionary Methods for Design, Optimization and Control With Applications to Industrial Problems, Giannakoglou

, Athens, Greece, 2001.

10.

Miettinen

K.M.

, Nonlinear Multiobjective Optimization, Section 2.7, Springer Science & Business Media, 1999), p. 23.

11.

Billings

S.A.

and Fadzil

M.B.

, The practical identification of systems with nonlinearities, in Proc 7th IFACSymp Identification Syst Parameter Estimation (1985), 155–160.

12.

Rudolph

and Agapie

, Convergence properties of some multiobjective evolutionary algorithms, InProceedings of the 2000 Conference on Evolutionary Computation, vol. 2, Piscataway, NJ, IEEE Press, 2006, pp. 1010–1016.