A knowledge-based technique for initializing a genetic algorithm

Abstract

Genetic Algorithms are efficient for the travelling salesman problem, but they have a premature convergence problem resulting in suboptimal solutions. As the initialization step has a profound impact on the algorithm’s performance, this study proposes a knowledge-based initialization technique to learn the patterns of evolved populations, and integrates four heuristic strategies to generate the initial population. Advanced initial solutions and high quality gene blocks can be quickly created with this method. Instances in the TSPLIB library are used to set the parameters and test different initialization methods. The results show that this proposed technique can improve the initial population and optimization performance of genetic algorithms.

Keywords

Genetic algorithm travelling salesman problem initial population heuristic technique

1 Introduction

The travelling salesman problem (TSP) is a widely studied combinatorial problem in operations research and computer science. It has various applications in engineering, such as controlling unmanned aerial vehicles [1], dairy transportation [2] and the design of electric vehicle information service systems [3]. The TSP is an NP-hard problem [4]. Exact algorithms, including enumeration methods and branch and bound algorithms [5] are only suitable for small scale problems due to limitations on time and memory. In contrast, heuristic techniques [6] and intelligent optimization algorithms such as genetic algorithms [7, 8], ant colony algorithms [9], neural networks [10, 11], simulated annealing [12] and differential evolution algorithms [13, 14] are able to reliably find acceptable solutions within a reasonable time.

Among these, genetic algorithms (GA) [15], which are based on an imitation of natural selection and evolution, provide a powerful means of solving the TSP. The first step is initialization, which aims to generate the initial population. The initial population, representing a set of initial candidate solutions, has been shown to play an important role in the performance of GAs [16]. The traditional method of initialization is to randomly generate a population of solutions [17]. This method is common, simple and fast, but is likely to lead to a time-consuming optimization process and unsatisfactory solutions because of the uncertain quality of the initial population.

To improve the performance of GAs, initialization techniques have been extensively studied [18]. Yugay et al. [19] sorted the initial population by fitness value. The Nearest Neighbor heuristic algorithm (NN) is an alternative to the random method. To generate an individual candidate solution, it chooses a start node randomly, and then the nearest node is chosen iteratively [20]. The Neighbor Field method is similar to NN, but the node is chosen from a set of unused points [21]. Wei et al. [22] used the gene bank(a matrix of permutations of nodes according to their distance) to generate the initial population. Paul et al. [23, 24] proposed a Vari-begin and Vari-diversity (VV) population seeding technique. It generates a permutation-coded population by selecting different start nodes and using a variation of the gene bank which is named the Ordered Distance Vector (ODV).

The above initialization heuristics use the greedy principle in terms of distance to some extent. However, the key to optimizing the initialization is to identify important patterns of better solutions (evolved populations) and features of high quality gene blocks (combinations of genes). Based on the knowledge learned from evolution, this paper proposes a practical technique with several strategies to generate advanced initial populations with high quality gene blocks.

In Section 2, we build the mathematical model of the TSP. In Section 3, the principles of GAs are introduced. In Section 4, the mechanism of knowledge-based initialization techniques is explained. In Section 5, the parameters are determined and the effectiveness of the proposed method is proven experimentally. Finally, we provide a summary and discussion in Section 6.

2 Travelling salesman problem

The travelling salesman problem is easy to describe but hard to solve. It aims to find the shortest path to visit each city once and only once. A TSP can be represented by a complete graph G = (N, A), where N is a set of n cities, and A is a set of arcs. The cost of distance of each arc (i, j) ∈ A is represented by d_ij in the distance matrix D. The problem is defined as follows:

The variables: $x_{ij} = {\begin{matrix} 1 & if if the arc(i,j) is selected in the path \\ 0 & otherwise \end{matrix}$

The objective function: $min C = \sum_{i = 1}^{n} \sum_{j = 1}^{n} d_{ij} x_{ij}$ (1)

The constraints: $\sum_{i = 1}^{n} x_{ij} = 1, j \in N$ (2) $\sum_{j = 1}^{n} x_{ij} = 1, i \in N$ (3) $x_{ij} \in {0, 1}, i, j \in N$ (4) $\sum_{i, j \in S} x_{ij} = | S | - 1, S \subseteq N$ (5) where |S| is the number of nodes in set S.

Equation (1) is the objective function minimizing the total cost (length) of the tour. Constraints (2-5) assure that each city is visited once and only once.

3 Genetic algorithm

Genetic algorithms are based on the principles and mechanisms of natural selection. With the advantages of a population of candidate solutions and the help of different genetic operators, GAs can solve complex problems effectively. The population is a collection of individual solutions that are represented by chromosomes. A chromosome is a permutation of genes. The pseudocode of a GA is shown in Algorithm 1.

The parameters include the maximum number of iterations, population size, and probabilities of crossover and mutation. The path representation method is widely used [25] to encode and decode the chromosomes. In other words, a chromosome C is a permutation of cities [c₁, c₂, ⋯ , c_n]. For instance, C = [1, 2, 4, 3] represents a path 1 → 2 →4 → 3 →1. The process of evolution ends when requirements on the number of iterations or the fitness value are satisfied. The fitness is the criterion of individual solutions’ quality and is associated with the objective function. The fitness function can be defined as Equation (6) for the TSP, in which $\sum_{i = 1}^{n - 1} d_{c_{i} c_{i + 1}} + d_{c_{n} c_{1}}$ represents the tour length. $fitness = (\sum_{i = 1}^{n - 1} d_{c_{i} c_{i + 1}} + d_{c_{n} c_{1}})^{- 1}$ (6)

A larger fitness value means that the solution is better and has a greater chance of being selected into the next generation. Both crossover and mutation are probabilistic processes. The crossover operator uses two existing chromosomes to generate new chromosomes, while the mutation operator changes chromosomes individually. When the termination conditions are satisfied, the GA decodes chromosomes into solutions. Figure 1 illustrates the typical development of a solution.

4 Knowledge-based initialization

4.1 General patterns of evolution

The optimization process of a GA is based on the evolution of populations. As shown in Fig. 1, the general patterns of evolution for the TSP are:

Initially, the solution has several intersections in the path. The solution is improved by iteration with a decrease in the number of intersections. For example, the number of intersections is decreased from two to zero in Fig. 1.

There are high quality gene blocks in the evolved solution, which represent combinations of several neighboring cities. For instance, gene blocks [6, 5] and [2, 7] in Fig. 1.

4.2 Framework of the initialization technique

In practice, uncertainty in the quality of the population will influence the robustness of the GA. Furthermore, low gene diversity can result in the algorithm converging to a local optimum (premature convergence). Therefore, the initialization step should generate advanced candidate solutions and others with high quality gene blocks that will be used as components of the final solution.

The patterns discovered inspire the proposal of a knowledge-based initialization (KI) technique to ensure gene diversity and quality. As knowledge about the evolved population’s features and patterns is gained, we can generate an advanced initial population with high quality gene blocks and enable the GA to achieve a better solution in a short run time, thanks to improvement of the initial population.

Figure 2 displays the mechanism of the KI. First, a constructed solution strategy can generate an initial solution without path intersections quickly. Second, a random strategy is used to keep gene diversity. Last, some high quality gene blocks can be created by using a local optimization strategy and a cross detection and deletion strategy.

4.2.1 Constructed solution strategy

The constructed solution strategy uses coordinate transformation and the polar angle to generate an initial route without path crossing. The key is to divide the plane into non-intersecting sectors. Using a rule based on increasing the polar angle to choose cities to visit can avoid intersections of paths.

Figure 3 illustrates the process to generate initial solutions (individuals) using this strategy. The steps of the constructed solution strategy are shown below:

Step 1: Calculate the mean of the cities’ coordinates as the centroid;

Step 2: Select the start city randomly as the first gene c₁ of chromosome C;

Step 3: Apply the polar coordinate system in which each city on the plane is determined by a distance from a reference point (the centroid) and the city’s polar angle (the angle from a reference direction), as shown in Fig. 3;

Step 4: Calculate the i_th city’s polar angle θ_i, i ∈ [1, n] , i ∈ N. The city’s polar angle is the angle (If θ_i < 0, θ_i = θ_i + 2π);

Step 5: Generate the permutation of cities as the remaining genes c₂c₃ ⋯ c_n in chromosome C according to their polar angle θ_i in ascending order;

Step 6: Terminate (an individual is generated).

4.2.2 Random strategy

The random strategy is the classic way to generate individuals, and it can increase the diversity of the initial population. The process is shown below.

Step 1: Initialize the set of unused nodes U = {1, 2,. . . , n}, chromosome C [j] _1×n and counter i = 1;

Step 2: Choose an element in U randomly and assign its value to the i_th gene c_i in chromosome C;

Step 3: Remove the selected element from U;

Step 4: If U =∅, terminate; otherwise, back to Step 2.

4.2.3 Local optimization strategy

A local optimization strategy is used to improve the quality of gene blocks of individuals based on the principles of greedy algorithms. Figure 4 illustrates the main process.

1) Gene block determination: Choose the length (L) of a gene block of chromosome C randomly; L ∈ [4, min (10, n)]; n is the number of cities.

2) Gene block optimization:

Step 1: Set counter i = 1; input Chromosome C;

Step 2: Choose the gene block C [i, i + 1,. . . , i + L - 1]; set j = 1;

Step 3: Choose the nearest city to C [i + j - 1] in the gene block as C [i + j];

Step 4:j = j + 1;

Step 5: If j = L, i = i + 1; otherwise, return to Step 3;

Step 6: If the new gene block’s fitness value is higher (cross detection), replace the old gene block with the new one; otherwise, keep the former one;

Step 7: If i > n + 1 - L, terminate; otherwise, return to Step 2.

4.2.4 Cross detection and deletion strategy

A cross detection and deletion strategy is another way to optimize the gene blocks in terms of crossing of paths. Figure 5 represents a strategy to detect and delete crossovers in the path [26, 27]. But when the scale of the problem is large, it is time-consuming to examine every combination of two cities. Therefore, a random number I is used to limit the number of pairs to be examined. The main steps are listedbelow.

Step 1: Set counter i = 1; generate a random integer I, I ∈ [0, m/10]; m is the total number of combinations of two cities;

Step 2: If i > I, terminate; otherwise, continue to Step 3;

Step 3: Randomly choose a combination of two cities, reverse the gene block between them in the chromosome;

Step 4: If the chromosome’s fitness value is higher (cross detection), update the chromosome (cross deletion); otherwise, keep the former one without reversals;

Step 5:i = i + 1, turn to Step 2.

5 Experiments and results

The program is coded in the Matlab environment on a computer with a 2.94 GHz CPU and 2 GB physical RAM, using the parameters from [23], as shown in Table 1. According to Equation (6), the tour length is inversely proportional to the fitness of the individual. Therefore, the tour length is used as the optimality criterion for solutions.

5.1 Parameter setting

To determine optimal parameters for the knowledge-based initialization technique, i.e., the percentages of individuals generated by different strategies, parameter setting experiments are done.

Table 2 shows parameter sets of different strategies. SN, CSS, RS, LOS and CDDS represent the sequence number, constructed solution strategy, random strategy, local optimization strategy and cross detection and deletion strategy respectively. For example, the first parameter of CSS (25%) means that 25% of the initial population is generated by applying the constructed solution strategy in the KI. The instance KroA100 in TSPLIB is used to run the tests and the results of 50 runs are averaged.

The results in Table 2 indicate that the 6th parameter set is the best, and helps the KI achieve the best average initial length of tour and solutions. Furthermore, its initialization time is the second-shortest. The KI with the 7th parameter set is the fastest, in which over half of the initial solutions are generated with the random strategy. It is noted that the local optimization strategy and the cross detection and deletion strategy are relatively time-consuming.

5.2 Comparative studies

The performance of the GAs with different initialization techniques are compared to verify the effectiveness of the KI technique.

Comparative instances are from TSPLIB (Eil51, Pr76, KroA100, Pr144 and Gil262). The competitors are Random initialization technique (RD), Nearest Neighbor technique (NN), Gene Bank technique (GB) and Vari-begin with Vari-diversity technique (VV). Their performance in [23] is compared with that of the KI, using the parameters given in Table 1. For each case, the average result of 50 runs is collected for analysis.

The results in Table 3 show that the KI helps the GA obtain the best solution and the greatest improvement in all instances. Because the platforms are different, computation times cannot be compared. However, the initialization time of the KI is shortest, indicating that the computational complexity is acceptable.

5.3 Application

An important real-life application of this novel initialization method is for path planning for Unmanned Aerial Vehicles (UAV) in surveillance or aerial filming. For example, a UAV should visit several places in one flight to monitor an area. Considering the time effectiveness and cost, an optimized solution for the flight path can be generated more quickly using a genetic algorithm with the KI technique. The case lin105 in TSPLIB is used for testing on the Matlab platform.

Figure 6 shows the optimization process of the GA initialized by the RD and the KI in a run for lin105. The initial tour length of the GA with the KI is much shorter than that of the GA with the RD in the beginning, which illustrates that the initial candidate solutions generated by the KI are advanced. The other advantage of the proposed initialization technique is the improvement in convergence rate, that is, we can get a better solution in shorter time.

Figures 7 and 8 illustrate path planning results using these two initialization methods respectively. It is noted that the path in Fig. 8 is an improvement over that in Fig. 7 because there is no path crossover.

6 Conclusion

This paper proposes a knowledge-based initialization technique to enhance the ability of a genetic algorithm to solve the travelling salesman problem. It integrates four heuristic strategies to use the patterns and features of evolved populations to generate a high quality initial population. Experimental results indicate that the proposed initialization technique outperforms other published methods for the improvement of GAs.

The performance of the GA is also determined by its operators and parameters. For this reason, in further research, we intend to design and improve operators of the GA to cooperate with the initialization technique and optimize the parameters.

Footnotes

Acknowledgments

This research is supported by the National Natural Science Foundation of China (Nos. 71331008 and 71101150), the Youth Training Program for Innovation and Entrepreneurship Platform of Science and Technology in Hunan Province, Foundation for the Author of National Excellent Doctoral Dissertation of PR China (201492), the Outstanding Youth Fund Project of Hunan Provincial Natural Science Foundation, the Top-notch Innovative Talents Training Plan of National University of Defense Technology, the Outstanding Youth Fund Project of National University of Defense Technology, and the Program for New Century Excellent Talents in University.

References

Evers

, Dollevoet

, Barros

A.I.

and Monsuur

, Robust UAV mission planning, Annals of Operations Research222(1) (2014), 293–315.

Lahrichi

, Crainic

T.G.

, Gendreau

, Rei

and Rousseau

L.M.

, Strategic analysis of the dairy transportation problem, Journal of the Operational Research Society66(1) (2013), 44–56.

Lee

and Park

G.L.

, Computer Science and its Applications, Design of Interaction Coordination in Electric Vehicle Information Service Frameworks, Springer, Berlin, Heidelberg, 2015, 747–750.

Papadimitriou

C.H.

and Steiglitz

, Combinatorial optimization: Algorithms and complexity, Courier Corporation, 1982.

Dasgupta

P.S.

, Empirical study of some branch and bound algorithms for the traveling salesman problem, ADV Modell Anal A Gen Math Comput Tools18(1) (1994), 21–34.

Kahng

A.B.

and Reda

, Match twice and stitch: A new TSP tour construction heuristic, Operations Research Letters32(6) (2004), 499–509.

Muehlenbein

, Gorges-Schleuter

and Kraemer

, Evolution algorithms in combinatorial optimization, Parallel Computing7(1) (1988), 65–85.

Y.M.

, Li

K.L.

, Hu

J.T.

and Li

K.Q.

, A genetic algorithm for task scheduling on heterogeneous computing systems using multiple priority queues, Information Sciences270 (2014), 255–287.

Gong

B.C.

, Jiang

T.Y.

, Xu

S.Z.

and Chen

, An improved ant colony algorithm based on path optimization strategy for TSP, Applied Mechanics and Materials543 (2014), 1681–1684.

10.

Wolfe

W.J.

, Parry

M.H.

and MacMillan

J.M.

, Hopfield-style neural networks and the TSP, Proceedings of the IEEE International Conference on Neural Networks, 1994, pp. 4577–4582.

11.

Wang

, Li

, Tian

and Fu

, A noisy chaotic neural network for solving combinatorial optimization problems: Stochastic chaotic simulated annealing, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics34(5) (2004), 2119–2125.

12.

Hasan

and Alkhamis

, Simulated annealing procedure for scheduling competing tasks in flexible manufacturing, Production Planning & Control8(4) (1997), 356–362.

13.

Wang

, Wang

B.C.

, Li

H.X.

and Yen

G.G.

, Incorporating objective function information into the feasibility rule for constrained evolutionary optimization,in press, IEEE Transactions on Cybernetics, 2015, doi: 10.1109/TCYB.2015.2493239

14.

Wang

, Li

H.X.

, Yen

G.G.

and Song

, Mommop: Multiobjective optimization for locating multiple optimal solutions of multimodal optimization problems, IEEE Transactions on Cybernetics45(4) (2015), 830–843.

15.

Holland

J.H.

, Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence, MIT press, 1992.

16.

Tğan

and Daloğlu

A.T

, An improved genetic algorithm with initial population strategy and self-adaptive member grouping, Computers & Structures86(11) (2008), 1204–1218.

17.

Pullan

, Adapting the genetic algorithm to the travelling salesman problem, Proceedings of the Congress on Evolutionary Computation, 2003, pp. 1029–1035.

18.

Schmitt

L.J.

and Amini

M.M.

, Performance characteristics of alternative genetic algorithmic approaches to the traveling salesman problem using path representation: An empirical study, European Journal of Operational Research108(3) (1998), 551–570.

19.

Yugay

, Kim

and Ko

F.I.

, Hybrid genetic algorithm for solving traveling salesman problem with sorted population, Proceedings of the Third IEEE International Conference on Convergence and Hybrid Information Technology, 2008, pp. 1024–1028.

20.

Rego

, Gamboa

, Glover

and Osterman

, Traveling salesman problem heuristics: Leading methods, implementations and latest advances, European Journal of Operational Research211(3) (2011), 427–441.

21.

Luo

C.Y.

, Lu

and Liu

, Neighbour field method for population initialization of TSP, Journal of Chongqing University32(11) (2009), 1311–1315.

22.

Wei

, Hu

and Gu

, Parallel search strategies for TSPs using a greedy genetic algorithm, Proceedings of the Third IEEE International Conference on Natural Computation, 2007, pp. 786–790.

23.

Paul

P.V.

, Ramalingam

, Baskaran

, Dhavachelvan

, Vivekanandan

and Subramanian

, A new population seeding technique for permutation-coded Genetic Algorithm: Service transfer approach, Journal of Computational Science5(2) (2014), 277–297.

24.

Paul

P.V.

, Dhavachelvan

and Baskaran

, A novel population initialization technique for Genetic Algorithm, Proceedings of the IEEE International Conference on Circuits, Power and Computing Technologies (ICCPCT), pp. 2013, 1235–1238.

25.

Potvin

J.Y.

, Genetic algorithms for the traveling salesman problem, Annals of Operations Research63(3) (1996), 337–370.

26.

Pan

, Li

, Ouyang

and Li

, Hybrid immune algorithm based on greedy algorithm and delete-cross operator for solving TSP, Soft Computing20(2) (2016), 555–566.

27.

Wang

, Duan

and Zhang

, An improved greedy genetic algorithm for solving travelling salesman problem, Proceedings of the Fifth IEEE International Conference on Natural Computation, 2009, pp. 374–378.