Abstract
Through innovatively introducing the receding horizon into probabilistic model checking, an online strategy synthesis method for multi-robot systems from local automatons is proposed to complete complex tasks that are assigned to each robot. Firstly, each robot is modeled as a Markov decision process which models both probabilistic and nondeterministic behavior. Secondly, the task specification of each robot is expressed as a linear temporal logic formula. For some tasks that robots cannot complete by themselves, the collaboration requirements take the form of atomic proposition into the LTL specifications. And the LTL specifications are transformed to deterministic rabin automatons over which a task progression metric is defined to determine the local goal states in the finite-horizon product systems. Thirdly, two horizons are set to determine the running steps in automatons and MDPs. By dynamically building local finite-horizon product systems, the collaboration strategies are synthesized iteratively for each robot to satisfy the task specifications with maximum probability. Finally, through simulation experiments in an indoor environment, the results show that the method can synthesize correct strategies online for multi-robot systems which has no restriction on the LTL operators and reduce the computational burden brought by the automaton-based approach.
Keywords
Introduction
In recent years, more and more attention has been paid to the research of strategy synthesis for robots to complete the complex tasks (such as coverage, search, rescue, etc.). Model checking [1], as a formal mathematical tool and technique that successfully applied in a variety of engineering fields, provides a brand-new solution for the problem of robot strategy synthesis [2]. The strategy synthesis method that takes linear temporal logic as the task description language is one of the main research in this direction, because the linear temporal logic has ability to formalize the complex task requirements of robots in a rich time-related language and its close relevance to natural language. There have been many advances in using model checking to synthesize strategies for robots to satisfy task specifications. Ding et al. [3] proposed a method to synthesize strategy for a robot to complete a task with maximum probability, which not only considered the uncertainty of the robot motion, but also the uncertainty of the robot’s observation of properties in the environment. Guo et al. [4] proposed a novel framework combining motion planning based on model checking with action planning using action description languages. And an optimal controller was designed to generate discrete motion to satisfy LTL specifications. Ulusoy et al. [5] proposed an efficient computational incremental algorithm to automatically synthesize the optimal strategy for a robot interacting with a set of independent uncontrollable agents in a graph-like environment. The control goal is to maximize the probability that the robot satisfies the syntactically co-safe LTL specifications. Lacerda et al. [6] proposed a method to synthesize the cost-optimal strategy for a stochastic system to satisfy the co-safe LTL specifications. In order to address scenarios in which the task specification may not be satisfied with probability 1, the multi-objective probabilistic model checking is used to synthesize strategy with formal guarantee, that is, when the task becomes no longer fully satisfied during the execution, the strategy can still satisfy the task specification as much as possible.
At present, the research of strategies synthesis for robots from temporal logic is gradually expanding from single-robot to multi-robot systems. In this paper, the focus is on the latter one. The first step of collaboration strategy synthesis for multi-robot systems under model checking is to accurately describe the tasks by temporal logic language such as signal temporal logic etc. Then realizing the mapping of system control from continuous domain to discrete domain [7–9]. At last the task specifications of the multi-robot systems are analyzed and the strategies whose correctness can be verified are synthesized. In recent work, strategy synthesis for multi-robot systems from temporal logic specifications has been explored and developed. Ulusoy et al. [10] presented a method to generate the optimal paths for robot team to satisfy LTL specifications, which depended on describing the movement of the robot as a timed automata to satisfy task specifications while minimizing transfer time between optimization proposition. But this method has not considered the collaboration among robots. Yushan et al. [11] proposed a top-down LTL planning method for multi-robot systems. The multi-robot systems are given a global task specification and attempt to decompose the LTL specification into local specifications that each robot could deal with independently. However, this method also has not considered the collaboration among robots. Guo et al. [12] proposed a bottom-up partially distributed planning method, which divided individual robots with task dependency into a group and only considered the groups rather than the whole team during strategy synthesis. As a result of it, the calculation burden was reduced. Although model checking has been successfully applied, the problem of state explosion and computational complexity limit its further development, especially in the multi-robot systems with large state space and complex task specifications. Therefore, some receding horizon strategy synthesis methods under model checking appeared [13–16].
The receding horizon method optimizes an objective function of the current state over a finite horizon at each time instant, but only applies the first element of the optimal finite sequence of controls. The whole process is repeated for the new evaluated state at the next time instant. At present, the research of receding horizon strategy synthesis under model checking mainly focuses on single-robot. Ding et al. [13] proposed a receding horizon method to generate trajectories for a single agent, maximizing the collected time-varying rewards related to the state while ensuring that the task specifications are met. An energy equation similar to the Lyapunov equation was defined to ensure that the trajectory satisfied the acceptance conditions of the automaton and optimized a predetermined finite trajectory. Ulusoy et al. [14] proposed a receding horizon strategy synthesis method for single agent to ensure that the LTL specifications including static tasks with known locations and dynamic tasks sensed locally were satisfied. But the complexity of this method scales with the size of the local sensing range and the number of static tasks, so the state space can be potentially very large, and some low-priority dynamic tasks may be not serviced if it gets out of the sensing range when the agent moves towards a higher-priority dynamic task. Mingyu et al. [15] proposed a receding horizon strategy synthesis method from the LTL specification for a single agent according to the priority of tasks. The task specification consists of soft constraints and hard constraints, in which the hard constraints need to be satisfied and the soft constraints can be partially satisfied, while maximizing the collected rewards. Until now, there are only a few studies on receding horizon strategy synthesis for multi-robot systems under model checking [16]. In [16], Tumova et al. proposed a method to synthesize strategies for multi-agent system that each agent satisfies local task specifications, which considering the collaboration among agents. Through dynamic building of the local interaction automatons and the local product systems, the problem of state explosion brought by model checking is greatly reduced and the calculation efficiency is improved. As can be seen from above, the receding horizon strategy synthesis for multi-robot systems under model checking needs further research.
According to our knowledge, in the current research field of receding horizon strategy synthesis for robots under the model checking, either for single-robot [13–15] or multi-robot systems [6] are aimed at the deterministic system model. Up to now, there has been no relevant research under the probabilistic model checking framework for the probabilistic system model. The main contribution of this paper is the first one to introduce the receding horizon into strategy synthesis under probabilistic model checking such that dynamically modeling the multi-robot systems and iteratively synthesizing collaboration strategies that satisfy the task specifications with maximum probability. Firstly, considering that in a multi-robot system, each robot is modeled as a global MDP, different from the deterministic transition system (TS) model used in [13–16], which is a widely used formalism for modeling systems that exhibit both probabilistic and nondeterministic behaviour. Probability is a valuable tool for the modeling and analysis of multi-robot systems which contain uncertain factors, for example there is delay in sending messages among robots or deviation in executing the strategy due to dynamic changes in the environment and failures of robots. Secondly, each robot is assigned a LTL specification. For some tasks that a robot cannot complete by itself, the collaboration of other robots takes the form of atomic proposition into the robots’ LTL specifications. And the LTL specification of each robot is transformed into a deterministic rabin automaton (DRA). Compared with the buchi automaton used in [13–16], the rabin automaton has stronger power of representation [17]. Unlike [16], where the task progression metric is defined on the local product automaton, a task progression metric is defined over rabin automatons transformed by the LTL specifications of each robot. Thus, the task progression rewards for each state of each rabin automaton are obtained to determine the corresponding local goal states in the finite-horizon product systems. As the experimental results show, the proposed method can correctly determine the local goal states in the finite-horizon product systems and loose the assumptions in [16] that the product automaton must contain at least an accepting state within the horizons. Also, the proposed method has no restrictions on the LTL operators. Thirdly, by setting two horizons to determine the steps of running in automatons and MDPs, the local finite-horizon product systems are dynamically built at each iteration according to the task dependency among robots, and the strategies of multi-robot system within the horizon are synthesized online based on the value iteration algorithm. And then, the strategies are synthesized iteratively until the global task specifications of the multi-robot system are satisfied. Finally, to verify the correctness of the proposed method, experiments are made in an indoor environment to synthesize collaboration strategies for a multi-robot system. The results show that the proposed method can be used for the online strategies synthesis for multi-robot systems of which the state space is large with the maximum probability to satisfy the task specifications, while it is difficult for global computation.
The rest of this paper is structured as follows. In Section 2, the preliminaries are introduced. Section 3 introduces the formulation of the problem and framework. In Section 4 the details of the strategy synthesis method are provided. In Section 5 the experimental results are presented, and a conclusion is made in Section 6.
Preliminaries
Given a set X, 2
X
is defined as the set of all subsets of X. A finite or infinite sequence of elements of X is called a finite or infinite word over X, respectively.
Markov decision process
Each robot is modeled as a Markov decision process of which states are labeled with atomic proposition.
The set of enabled actions in s ∈ S is defined as
Linear temporal logic
Linear temporal logic (LTL) is an extension of propositional logic [17], first proposed by Pnueli in the late 1970s. LTL is a logical language used to describe the linear temporal relation of propositional changes. It provides a convenient and powerful method to formalize various qualitative attributes. As a task description language, temporal logic can describe complex tasks accurately. In this paper, the task specifications are expressed as LTL formulas. Same as ordinary predicate logic, the basic unit of linear temporal logic is the atomic proposition, which is the assertion of the state variables in the process being described. The LTL formula φ based on the set of atomic propositions AP is defined by the following syntax:
Given an infinite path σ of an MDP
Deterministic rabin automaton
For any LTL formula φ over AP, a deterministic rabin automaton (DRA) can be built [17].
A run of a DRA
For any LTL formula φ over AP, a DRA with input alphabet Σ = 2 AP can be built and only accepts words over φ that satisfy φ [18, 19].
Problem formulation and framework
Problem description
A system containing robots
When completing the tasks that need collaboration of multiple robots, synchronization among robots is required. For example, a robot will send a message to the other robots that responsible for the collaboration to remind them that it has reached the synchronization point as soon as it arrives the location of the task. Then, it must wait to receive a message from each of the collaborating robots before performing any action, that is, until all robots reach the synchronization point. In this paper, robots synchronize by sending messages to each other, but introducing the details of the synchronization mechanism is beyond the scope of this paper. Each robot is required to send synchronization messages only when it performs a task that needs collaboration, and there is no idle time in the system model.

The example of workspace.
The overall framework of online receding horizon strategy synthesis for multi-robot system is shown in Fig. 2. The collaboration requirements between robots

The overall framework.
In this paper, a receding horizon method with two horizons
Task progression metric
In order to reduce the huge computational burden brought by model checking, an online iterative strategy synthesis method based on local automatons is proposed in this paper. To indicate a progress towards the satisfaction of the task specifications and be subsequently used to determine the goal states within the horizon, different from [16] defining a task progression metric over the product automaton, refer to [6], a task progression metric is defined over the DRA
If a state in K
i
can be reached from state q
i
, then Dis (q
i
) ≤ |Q
i
|-1, where |Q
i
| represents the number of states in DRA
The method of [16] to build local product automaton and local product system is referenced in this paper.
Product automaton
For robots i with task collaboration relationships, they are divided into the same group
Suppose that
In the product automaton
Product system
Building the product automaton and task progression metric allow to evaluate which tasks should be completed to maximize the progress towards the satisfaction of the task specifications, and then planning the transitions of each robot
A strategy π
p
={ μ0,p, μ1,p, … } in the product system
Strategy synthesis
The goal is to synthesize a strategy for each robot
In this paper, the proposed method combines the probabilistic reachability theory with the value iteration algorithm. The probabilistic reachability theory is to calculate the maximum probability of satisfying the task specifications in the product system
Target states determination
The method of determining the set of goal states in the product system

A example of goal states determination within the horizon. s0,p is the initial state within the current horizon, and all states in the regions with the same color have the same task progression reward. The states in the red region have the smallest task progression reward, and the states in the yellow region have the largest task progression reward. So the goal states of the product system within the current horizon need to be determined in the yellow region. The Dijkstra algorithm is used to calculate the state nearest to the initial state as s13,p, then the current goal state is s13,p.
Experiment settings
To verify the correctness of the method and show the high efficiency, the strategy synthesis for single robot and collaborative robots within different horizons are tested and analyzed. Before that, the local goal states of different LTL specifications within different horizons are tested and analyzed to verify the correctness of local goal states choosen by the proposed method. The proposed solution is implemented in Ubuntu18.04 based on Python2.7 using the working environment and settings as shown in the Example 1. The MDP of each robot has 34 states. By default, there are 5 actions e, s, w, n and g. The execution of each action has different probabilities to reach different states, but the robots can only move between two adjacent grids without blocking, and the execution of the action g can stay in the current grid. The ltl2dstar tool [21] is used to transform the LTL specification of each robot into a DRA. The LTL specification of the robot1 is φ1 = ¬ door U lG ∧ (X F door ∧ (X F elevator)). The LTL specification of the robot2 is φ2 = F hG ∧ (X F can1 ∧ (X F can2)). The LTL specification of the robot3 is φ3 = F tearoom ∧ (X F desk1 ∧ (X F desk2)).
Testing and analysis of local goal states
To verify the correctness of local goal states choosen by the proposed method, the determinations of goal states for LTL specifications containing different types of LTL operators within different horizons are tested. For testing the different situations that the target state is within the horizon or not, the strategies are synthesized under different H and h. For robot2, firstly its LTL specification is set as φ = ¬ desk1 U hG. The strategies are synthesized under H = 5, h = 1 and H = 7, h = 1 respectively. The results of the local goal state choosen at the first iteration are shown in Figs. 5. Then its LTL specification is set as φ = G F can1 ∧ X can2. The strategies are synthesized under H = 3, h = 3 and H = 6, h = 3 respectively. The results of the local goal state choosen at the first iteration are shown in Figs. 7. Finally, its LTL specification is set as φ = G (desk2 ⟶ X ¬ desk2 U can2). The strategies are synthesized under H = 5, h = 2 and H = 9, h = 2 respectively. The results of the local goal state choosen at the first iteration are shown in Figs. 9. As can be seen from the results, the goal states within the horizons are correct for the LTL specifications containing different types of operators and for the conditions that the states of tasks exist or donot exist within the horizon, because the strategies make the robots moving towards satisfying their task specifications.

H = 5, h = 1 .

H = 7, h = 1 .

H = 3, h = 3 .

H = 6, h = 3 .

H = 3, h = 3 .

H = 6, h = 3 .
Computing task progression rewards of DRAs
According to the task collaboration relationships among the robots, the multi-robot system is divided into two groups of

Task progression rewards.
The strategy synthesis is tested for single robot and collaborative robots within different H and h. At the same time, the results are compared with global strategy synthesis, PRISM model checker [24] and the incremental synthesis method proposed in [25]. The strategy synthesized for each robot with the maximum probability of satisfying the task specification is shown in Fig. 11. As can be seen from the figures, the strategies synthesized are correct for they are making the robots moving towards satisfying their task specifications.

Strategies synthesized for each robot.
The performance of strategies synthesise under different horizons H and h are analyzed from three aspects of iteration number, computing time and size of state space respectively, and compared with some other methods. The results are shown in Table 1-4, and the strategy synthesized for each robot is shown in Fig. 11. As can be seen from the Table 1-2, for group
The time consumption of
The time consumption of
The state space and iteration number of
The time consumption of
The state space and iteration number of
A receding horizon strategy synthesis method for collaborative multi-robot systems from local automatons is proposed. The receding horizon is first introduced into the strategy synthesis for multi-robot systems under probabilistic model checking. By setting two horizons, the running steps of automatons and Markov decision processes are limited. Based on the collaborative task relationship between robots, the local product systems within the horizons are dynamically built when the robots are running. In the product systems, the strategies that satisfy the task specifications with the maximum probability are synthesized online for each robot based on the value iteration algorithm, which greatly alleviates the computational complexity brought by the model checking.
Future research includes meeting various optimality requirements by replacing the value iteration algorithm with the linear programming algorithm or applying multi-objective probabilistic model checking techniques. Furthermore, in this paper we assume that the system models are fully known. In the case that they are not, we would like to address the formulation of the same problem. And the synchronization mechanism among robots and the theoretical guarantees need further research and verification.
