Abstract
The traditional ant colony algorithm has some problems, such as low search efficiency, slow convergence speed and local optimum. To solve those problems, an adaptive heuristic function is proposed, heuristic information is updated by using the shortest actual distance, which ant passed. The reward and punishment rules are introduced to optimize the local pheromone updating strategy. The state transfer function is optimized by using pseudo-random state transition rules. By comparing with other algorithms’ simulation results in different simulation environments, the results show that it has effectiveness and superiority on path planning.
Introduction
Path optimization of mobile robots has become one of the hot research topics. Good path planning algorithm not only saves a lot of time, but also reduces capital investment and machine wear [1]. At present, the widely used planning methods include artificial potential field method, neural network algorithm, fuzzy algorithm, the grid method, genetic algorithm, nonlinear model prediction algorithm, ant colony algorithm and the improvement and fusion of the above algorithms. These algorithms have their characteristics and are suitable for different applications. Ant colony algorithm is a bionic intelligent algorithm, which is applied in path planning, but it has some problems such as slow convergence speed, low efficiency and local optimal solution.
In this paper, we proposed an adaptive ant colony algorithm, the heuristic information of each node is studied and improved, and the new heuristic function is used to dynamically adjust the heuristic information. The reward and punishment mechanism is introduced to optimize the update of local pheromones, which can not only increase the global searchability, but also accelerate the convergence of the algorithm.
Related work
Since the 1960 s, path planning has aroused the interest of many scholars. Dijkstra algorithm, as a heuristic algorithm, was proposed in 1959 by E.W. Dijkstra[2]. The main feature is a typical shortest path algorithm that starts from the center and extends to the end to solve the directed shortest path problem in the figure. As a classical local path planning method, an artificial potential field method cannot guarantee global optimization and is prone to deadlock [3, 4]. With the development of artificial intelligence algorithms, the application of bionic algorithms such as ant colony algorithms in the field of path planning is attracting more and more attention. Ant colony algorithm not only has the global search ability of the population, but also has the coordination among individuals, and even if it does not know all the information of the environment, it can find better foraging paths [5]. However, in the early stage of the algorithm, the convergence rate is slow, a large amount of computing time is required, and there are precocious problems [6–8]. Wen et al. [9] improved the pheromone’s information function to optimize the global path, but the ant colony algorithm is easy to converge. You [10] et al. proposed a chaotic ant colony algorithm to improve the performance of path planning of mobile robots. Luo Q [11] et al. proposed an improved ant colony optimization algorithm. To avoid blind search in early planning, an initial pheromone with uneven distribution was constructed. The reward and punishment rules are used to improve the global pheromone. A dynamic penalty method is used to solve the deadlock problem. At present, most scholars have done a lot of research and improvement on pheromone diffusion mode and fusion algorithm of traditional ant colony algorithm[12–15], but few studies on the updating of heuristic pheromone functions [16].
Description of basic ant colony algorithm
Environmental model
Grid method is one of the common environment modeling methods for mobile robot path planning and simulation. To maintain generality, the search environment is modeled by the grid method.
The position and volume direction of obstacles in the two-dimensional plane do not change and the robot works in the planar space. As shown in Fig. 1, G is the moving area of the robot in the two-dimensional plane, where the obstacle grid is represented by the black grid, and the free grid without obstacles is represented by the white grid [17, 18]. The sequence i is used to identify the grid.

Map G (environment 1).
The transformation formulas of the relation between robot position and grid number are shown in Equations (1) and (2).
Where a is the side length of the grid, i is the serial number of grid node. MM is the number of rows and columns, and each grid coordinate is (x i y i ).
Ant colony algorithm is an intelligent heuristic search algorithm developed according to the rules of ant foraging. During the process of searching for food, the ant leaves a certain amount of pheromone on the path that passed. When the individual finds the food, more ants are attracted to move along the path, increasing the concentration of the path. At the same time, the pheromone also undergoes slow evaporation at all times. Obviously, the faster the pheromone volatilizes on the longer path, the shorter path is balanced by the ant’s replenishment and evaporation rate, making the pheromone intensity on the shortest path grow faster. This is the process of finding the optimal path by using the positive feedback effect of ant colony algorithm [19].
In the process of one iteration, ant m moves from the current grid point i to the next grid point j, which is determined by the heuristic information and pheromone concentration of the current node. When faced with multiple nodes to be selected, the next path point is randomly selected according to the transition probability of Equation (3).
Where τ
ij
(t) is the pheromone value; η
ij
(t) is the heuristic information value, which denotes the degree of expectation of the ants move from node j to target node E, the expression is shown in Equation (4); αis the pheromone incentive factor; β is the expected heuristic factor; {N - tbu
k
} is the set of nodes that allow ants move next step.
Where d jE is represented the Euclidean distance of node j and node E.
After the completion of an iteration, part of the pheromones in the path will evaporate, preventing the pheromones from growing too fast. Then the dominant path pheromones are supplemented to gradually complete the accumulation process of the dominant path pheromones. The pheromone updating formula is shown in (6) and (7)
Where τ ij (t) is the increment of the pheromone in the current iteration. ρ is the pheromone volatilization coefficient, Q is increased strength coefficient of pheromone, l k (t) is the current path length of the current iteration.
Adaptive update heuristic function
In the basic ant colony algorithm, the heuristic information on paths < i, j > is the reciprocal of the Euclidean distance from node i to destination E on the map, as shown in Equation (4). It is easier for ants to choose the closer path points, but it reduces the searchability, it is easy to produce a local optimal solution. Meanwhile, due to the limitations of heuristic search, the convergence speed of the algorithm is slow. To improve the performance, an adaptive update heuristic function is constructed as follows:
Firstly, an N×M distance matrix
Where N is the grid’s number, M is an ant’s number, each element L
nm
represents the distance of between node n to the target node E at one iteration. The value of L
nm
is calculated as follows: if the ant m reaches the target rode through node n, L
nm
is the actual distance from node n to E. If the target node is not found after node n, L
nm
is set as ω (ω is a constant and plays a punishment role, ω > max(L
nm
)). Other nodes are not processed as unknown nodes, L
nm
is sets 0. The data of matrix
Then, adaptive update the heuristic information of each node after all ants complete the search in one iteration, shown in Equation (9).
The heuristic information is dynamically updated on the path node, and the following three processes are respectively performed: First, when all the ants never pass through node j, the heuristic information remains initial value according to Equation (4), it can increase the search range. Second, when the ant colony passes through j and finally finds the target point, it thinks that the node is likely to be on the optimal path, updates the heuristic information with the shortest actual distance from the node j to the endpoint. As a result, η ij effectively increase, the current node is more likely to be selected. Finally, node j is always on the incomplete path, it is considered that the node j is not in the optimal path, or even a trap node. The constant ω is introduced to punish the node, it reduces the probability of selecting this node.
This paper introduces the reward and punishment mechanism to optimize the local update of pheromones. In iteration, the ants that find the target point are rewarded as Equation (10) to increase the local pheromone concentration of this advantageous path. The ant that fails to find the target point is punished as shown in Equation (11) to reduce the local pheromone concentration in the inferior path.
Where l k t is the current path length, φ is the pheromone reward factor.
After all ants complete the search in this iteration, a pheromone global update is performed. Combined with the local updating rules of pheromones, the improved global updating rules of pheromones is shown in Equation (12).
When the ant searches for a path, the improved pheromone update strategy increases the dominant path pheromone concentration, reduces the inferior path pheromone concentration, and improves the global search efficiency of the ant colony. With the combination of global pheromone updating and local pheromone updating, the robot can find an optimal path with the minimum cost.
Different path selection modes can make more effective use of the accumulated search information of each generation [20]. Select the next node of the path by ant m largely determines the operation speed and efficiency of the algorithm. To increase the search scope and efficiency, a pseudo-random state transition rule is adopted, as shown in Equation (13).
Where q0 is the probability selection factor, which is the critical condition for probability selection; q is a random number whose value is (0,1).
When q ⩽ q0, ant moving depends directly on the maximum value of the heuristic information and the pheromone concentration product. The others, are determined by the roulette mode. In the roulette mode, the probability share sorting selection strategy is added to avoid the deadlock problem. That is calculate the proportion of the transition probability of each node in the total probability, and sort the transfer probability shares from large to small.
The flow diagram of the adaptive ant colony algorithm is shown in Fig. 2.

Flowchart of adaptive ant colony algorithm.
To verify the effectiveness and superiority of the adaptive ant colony algorithm described in this paper, Matlab2016a was adopted as the simulation software to conduct a large number of simulation experiments.
Select the main parameters
In an ant colony algorithm, many parameters affect its comprehensive performance. Due to the close coupling between these parameters, it is difficult to determine the optimal parameter combination through theoretical methods [21, 22]. At present, it is the most commonly used method through repeated experimental comparison.
In this paper, the heuristic information reduction parameter ω, the pheromone reward factor φ, and the pseudo-random probability factor q0 are combined and optimized by multiple experiments. Other parameter combinations have been widely studied. The most commonly used combinations are selected as follows: α = 1.1, β = 7, Q = 1, ρ= 0.3, the ant’s number is 50, and the maximum iteration’s number is 100.
Selection of parameters ω and φ
The two parameters of the heuristic information reduction factor ω and the pheromone reward factor φ affect the timeliness of signal transmission during the ant search path, and have similar characteristics. The combination optimization rules of the two parameters are as follows: The value range of the heuristic information reduction factor ω is theoretically larger than the actual path distance of a complete path, so that the heuristic information of this node is reduced to achieve punishment. But it cannot be too large to cause the value to fail. The pheromone reward factor φ, rewards each node of the complete path in the local update of pheromone. Therefore, the simulation analysis is carried out when ω∈ { 60, 70, 80, 90, 100 }, φ∈ { 10, 20, 30, 40, 50 }.
To achieve the optimal effect of the combination of these two parameters, the specific operation during the experiment is as follows: one parameter remains unchanged, and the other changes within the value range. The optimal combination is determined by the convergence times, missing ant number and shortest path length. To avoid large errors caused by accidental circumstances, each group of parameters was selected for 10 times of simulation, and the obtained results were averaged for comparison.
The simulation is carried out in the map environment shown in Fig. 1. The experimental results are shown in Figs. 3, 4 and 5. Fig. 3 is a comparison of the lost ant number, Fig. 4 is the trend of iteration number; Fig. 5 is the average optical path length. Figs. 3, 4 and 5 show that when ω = 90, φ = 40 achieves the optimal combination. The lost ant number is the smallest, the convergence speed is faster, and the optimal path can be stably planned.

Number of lost ants.

Comparison of average convergence times.

Comparison of average path lengths.
According to the pseudo-random principle, the value of q0 affects the convergence speed and the stability. If q0 is too small, the convergence speed will be reduced; if q0 is too large, it is not conducive to convergence to the global optimal path, and local optimal may occur easily. The shortest path length and iteration number are tested, when q0 = 0.1, q0 = 0.2, q0 = 0.2, …, q0 = 1. The experimental results are shown in Figs. 6 and 7. When q0 = 0.1, has the shorter path length and lower iteration number.

Relationship between probability selection factors and optimal path length.

Relationship between probability selection factors and iteration times.
Comparison of 20 experimental results
To verify the improvement effect of the adaptive ant colony algorithm in this paper, simulation experiments are completed in environment 1 and environment 2 respectively, and performance optimization is compared.
Comparison of environment 1
In the same environment, comparative experiments were conducted, it uses the parameters (parameters: α = 1.1, β = 7, Q = 1, ρ= 0.3, the ant’s number is 50, the maximum iteration’s number is 100, ω = 90, φ = 40, q0 = 0.1). The simulation results are shown in Figs. 8 and 9. To avoid accidental influences on the experimental results, 20 simulation experiments were conducted to calculate the average optimization level. The results are shown in Table 1.

Comparison of convergence curves.

Comparison of path planning in environment 1.
As shown in Figs. 8, 9 and Table 1, the convergence times of the improved algorithm are greatly reduced. After 5 iterations, ants find the globally optimal path and converge stably, the convergence rate is 10 times that of the traditional ant colony algorithm. And also the lost ants are effectively reduced, which can reduce the energy consumption of algorithm operation and improve the efficiency of path optimization. Compared with the traditional ant colony algorithm, the adaptive ant colony algorithm has obvious advantages, which proves that the improved algorithm in this paper is effective.
This paper conducted 20 simulation experiments based on the environment of Ref. [12]. The simulation results of the algorithm in this paper were compared with Ref. [11] and Ref. [12], respectively. The results are shown in Table 2, Figs. 10 and 11.
Comparison of experimental results
Comparison of experimental results

Path planning in three algorithms.

The algorithm in Ref. [12] needs 2 iterations on average to achieve stable convergence, and the planned path length is 35.0771. So, it has the fastest convergence speed, but not found the globally optimal path. Compared with Ref. [11] the optimal path length calculated by both is 30.968. However, the algorithm in this paper achieves stable convergence with an average of 4 iterations, which is faster than that in Ref. [11]. The algorithm presented in this paper shows good global search ability, fast convergence speed and effective improvement of optimization performance.
To verify the effectiveness and adaptability of the improved algorithm, expand the scope of the map, and increase the complexity of the map, in the reference environment (Environment 3) of reference [15], carry out simulation experiments to carry out path planning. Compared with the simulation results of the proposed algorithm, the path planning diagram in [15] is shown in Fig. 12a. The path planning results in this paper are shown in Fig. 12b. The convergence curves of the two algorithms are shown in Fig. 13.

Path diagram in a complex environment.

Convergence comparison chart.
As shown in (12) and (13), in complex environments, the number of iterations of the two algorithms is increased compared to the simple environment. However, both the algorithm and the literature [15] can plan the optimal path. In [15], the corner inspiration information is introduced into the transition probability, which makes the initial direction of the path planning have a certain direction, increases the convergence speed, and converges to the optimal path around 16 times. In this paper, the algorithm adaptive adjusts the deterministic selection and the randomly selected probability selection mode according to the increase of the number of iterations, and increases the direction of the ant pathfinding by dynamically adjusting the heuristic information, and plans the shortest path around 10 times to improve the convergence speed.
Perform 20 simulation experiments on the two algorithms to calculate the average path length and average iteration. The results are shown in Table (3). In this experiment, the average path length and average convergence speed of the algorithm are relatively small, indicating that the improved algorithm has certain stability. In a complex environment, the improved algorithm has good adaptability and superiority.
Experimental comparison
For improving the convergence speed, searchability and optimal solution ability, this paper improves the algorithm from three aspects.
First, construct a new heuristic function. The heuristic information of nodes is adjusted with the shortest actual distance from the current node to the target node.
Second, introduce reward and punishment rules to improve local update of pheromones. Reward the ants that find the complete path, increase the pheromone secretion on the path, punish the path that fails to find the target point, reduce the pheromone secretion on the path.
Thirdly, using the pseudo-random state transition rule to optimize the state transition function. In the roulette method, the probability share sorting selection strategy is added to avoid the deadlock problem.
Finally, taking a lot of comparative experiments with other improved algorithms. It is verified the effectiveness and superiority of this algorithm. However, there are still some shortcomings. Based on a large number of previous studies, this paper does not analyze the basic parameters of the algorithm. The next step will be to study the coupling between the parameters and the basic parameters of the algorithm, and analyze the impact on the overall performance of the algorithm.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no competing interests.
Footnotes
Acknowledgments
This study was supported by Chongqing Municipal Education Commission Science and Technology Fund Project (Grant No. KJ1601032, KJQN201901238, KJZD-K201901202); Intelligent Manufacturing Pilot Technology Chongqing University Engineering Research Center (Grant No. 2019yjzx0101); Chongqing Three Gorges University Science and Technology Fund Project (Grant No. 19QN06); Chongqing Three Gorges University Graduate Innovation Project (Grant No. YJSKY1804).
