Vectorial Pattern Databases

Abstract

Pattern Databases are among the most capable means for solving hard combinatorial problems. Since their conception, they have been enhanced along different directions. Recently, it has been shown that Pattern Databases can induce non-consistent heuristic functions and it has been conjectured that this sort of heuristic functions can be better informed than others. As a matter of fact, non-consistent heuristic functions allow specific rules to take place in order to propagate these inconsistencies with the hope of improving the heuristic estimates at some nodes. Also, it has been studied how to recognize infeasible values in Pattern Databases with the hope of being able to introduce corrections that would allow for more prunes.

In this work, a new approach is suggested that fulfills both ideas simultaneously: inducing naturally non-consistent heuristic functions just by recognizing feasible, yet admissible, heuristic values which serve to improve even further the bidirectional pathmax propagation rules. Appealing as it might seem, this idea has various pros and cons which are examined. Experiments on different benchmarks show a noticeable improvement in the number of generated nodes over classical Pattern Databases when applicable, though the difference does not necessarily payoff in running time.

Keywords

Heuristic search Pattern Databases inconsistent heuristics

1. Introduction

Since they were introduced in the literature for the first time [1,2], Pattern Databases (or PDBs for short) have received a lot of attention. After being widely used mainly for solving hard combinatorial problems, they were generalized by Stefan Edelkamp to be applied in domain-independent problem solvers [4,5]. Later on, Malte Helmert et al. showed that PDBs are just a special case of a more general mechanism for deriving automatically admissible heuristics known as merge-and-shrink [7,8]. In this paper, however, attention is restricted to the first case, using Pattern Databases for solving hard combinatorial problems with domain-dependent solvers.

In this particular case, the advantages of their use is twofold: first, they are effective means to compute admissible heuristic functions automatically in contraposition to other approaches which automatically derive inadmissible estimates such as the delete relaxation [9]; secondly, in practice the resulting heuristic functions save various orders of magnitude of generated nodes when being compared to other approaches which usually consist of computing heuristics by hand using the constraint relaxation procedure [16].

Pattern Databases are defined as abstractions of the original state space where each constant appearing in the original state space gets replaced by either a dedicated symbol or a special “don’t care” symbol. Thus, Pattern Databases are simply hash tables which store, for every pattern (or arrangement of symbols in the abstracted state), the minimum number of moves required to place the symbols considered in the abstraction for the very first time in their goal location while ignoring the others. This value can be easily computed with a backwards brute-force breadth-first search from the goal pattern, applying at each step the inverse of the available operators. So far, Pattern Databases are admissible heuristic functions.

Originally, all moves were counted in so that when comparing the values retrieved from different Pattern Databases (for a collection of different patterns), the only way to get an admissible heuristic is just to take the maximum of all of them – and thus these PDBs are usually denoted as max. However, when the constants appearing in the original state space can be split into disjoint sets (as in the N-puzzle or the Towers of Hanoi, but not in the Rubik’s Cube or the TopSpin puzzle), it is possible to sum their values [12]. This idea is known as disjoint or add Pattern Databases.

In practice, disjoint Pattern Databases are usually more efficient than max Pattern Databases and can decrease the number of expanded nodes in various orders of magnitude. However, it has been shown that max Pattern Databases can return better heuristic values in some cases [20]. Furthermore, it has been shown that it is possible to compute a disjoint Pattern Database just by dividing the cost of the operator generating a given pattern by the number of PDBs that share it. Recently, it has been also suggested to model the combination of various PDBs with a linear programming task which can enhance further the heuristic values [17].

After setting up an abstraction for a given state space, the resulting PDB is usually a consistent heuristic function. Moreover, even if an ensemble of heuristic functions is simultaneously considered, and then the maximum of them picked up, the resulting value preserves consistency provided that the original heuristic functions are consistent as well [10]. In turn, it has been suggested that inconsistent heuristic functions can behave better in practice than consistent heuristic functions, because it is possible to propagate these inconsistencies throughout the search tree with the pathmax propagation rules [6]. For example, when a collection of heuristic functions are available and none strictly dominates the others, randomly selecting a heuristic leads very easily to inconsistent heuristic values which usually improve the overall performance. Also, when traversing a permutation state space, it is possible to return the maximum of a regular and a dual lookup [21]. The resulting heuristic function, known as the dual heuristic, is admissible, yet inconsistent, and usually far better informed. A third interesting approach to generate inconsistent heuristic functions consists of compressing PDBs [22]. Clearly, when reducing their size, consistency cannot be enforced and thus, inconsistencies are created. However, as opposed to the previous methods, inconsistent estimates computed this way do not improve over the original heuristic values by definition.

In this work, a new idea is considered for enhancing max PDBs: instead of storing only the minimum distance of the first occurrence of every pattern to the goal pattern, it is suggested to store the minimum distance of a predefined number of occurrences of every pattern. Therefore, the PDB computed this way would contain a vector instead of a scalar. Thus, these PDBs are termed as vectorial Pattern Databases.

The paper is arranged as follows: first, previous work related to the idea of identifying feasible heuristic values is examined; immediately after, the idea of max Pattern Databases is reviewed; next, vectorial Pattern Databases are introduced, and a number of experiments are conducted in Section 5. The paper ends with some concluding remarks.

2. Related work

The idea of checking for infeasible values was introduced for the first time by Fan Yang et al. [19,20]. This was achieved by splitting the cost of every operator in two different costs: the primary cost, C and the residual cost, R which, obviously, demands more memory to store the resulting PDB. However, their approach applied only to additive Pattern Databases. An attempt was made later to generalize the idea and to detect infeasibility in max PDBs [18]. However: first, when an infeasible value was spotted it was incremented just by one unit to guarantee admissibility while in this work it will be shown that arbitrary increments are possible; secondly, no experimental evaluation was performed on combinatorial domains to document the reduction in the number of expanded nodes and/or overall running time.

Recently, it has been suggested to store a predefined number of feasible values [14]. As a result, infeasible values are easily recognized as those not been recorded in the same interval examined by every PDB. The idea, as appealing as it is, has some drawbacks to be addressed but, as it will be shown, they can lead to a significant reduction in the number of generated nodes by pruning large subtrees under some circumstances.

Fig. 1.

A partial view of the 6-Pancake and one particular abstraction of it. (a) A partial view of the original state space of the 6-Pancake. (b) A particular abstraction of the state space shown on the left.

Compared to the conference publication, the present paper includes the following novel content:

An analysis of the time complexity of vectorial Pattern Databases in Section 4.

A deeper analysis of the inconsistencies created by vectorial Pattern Databases in Section 4.1.

A novel review of the Bidirectional Pathmax propagation rules that enhance further the heuristic values in Section 4.2.

3. max Pattern Databases

This section adheres to the nomenclature and definitions suggested by Fan Yang et al. [20] to introduce max Pattern Databases.

Definition 1.
A state space is a weighted directed graph $S = ⟨ T, Π, C ⟩$ , where T is a finite set of states, $Π \subseteq T \times T$ is a set of directed edges (ordered pairs of states) representing state transitions, and $C : Π \to N$ is the edge cost function.

Consider, for example, the Pancake puzzle. The problem was originally posted by Jacob E. Goodman under the pseudonym of “Harry Dweighter” in 1975 [3]. In this domain, a particular instance is characterized by a permutation over K integers ${0, \dots, (K - 1)}$ and there are $(K - 1)$ operators available: $O_{2}, O_{3}, \dots, O_{K}$ . The operator $O_{i}$ flips the first i items of the permutation. The goal consists of arranging the symbols in increasing order $0, 1, \dots, (K - 1)$ . Figure 1(a) shows a partial view of the state space of the 6-Pancake. The typical goal chosen for this particular problem is the identity permutation shown in boldface. All operators have the same cost and thus function C maps each edge to one.

From a particular state space it is possible to compute a Pattern Database by mapping some constants appearing in the original state space into distinguished constants while mapping the rest to “don’t care” symbols. The same mapping, when being applied to the goal state results in the goal pattern. Thus, the resulting state space is different. In the following, let $A$ denote the abstract state space.

Figure 1(b) shows a particular abstraction of the states considered in the previous example. In this case, constants 3, 4 and 5 are preserved in the abstract state space whereas the constants 0, 1 and 2 are mapped to the “don’t care” symbol, represented as □. The goal pattern has been distinguished in boldface.

Now, it is possible to compute the minimum distance of every pattern in the abstract state space to the goal pattern with a backwards breadth-first search which is initialized with an OPEN list which only contains the goal pattern with a cost equal to 0. As shown in Fig. 1(b) all the descendants of the goal pattern are generated by applying the inverse of all the available operators. Doing so, new patterns are discovered and the cost to reach them is stored in the Pattern Database which is usually implemented as a hash table whose entries contain pairs of the form (pattern, cost), where the cost consists of a single scalar value. For example, patterns $3 □ □ □ 45$ , $43 □ □ □ 5$ and $543 □ □ □$ lead to the goal pattern in just one step. However, after applying either operator $O_{2}$ or $O_{3}$ (shown with dotted lines), the goal pattern results again whose entry in the hash table is already filled with cost 0. In other words, duplicates are not re-inserted into the OPEN list used by the backwards breadth-first search. As a consequence, the resulting state space $A$ is clearly smaller than the original state space $S$ . This makes sense since traversing the original state space would be unfeasible for large state spaces.

Since the exploration of the abstract state space is conducted with a breadth-first search, the cost for reaching each pattern is known to be optimal. So far, Pattern Databases are admissible heuristic functions.
Definition 2.
An abstraction mapping $ψ : S \to A$ between state space $S$ and abstract state space $A$ , is defined by a mapping between the states of $S$ and the states of $A$ .

The way these abstractions are defined guarantees that the mapping is a homomorphism or, in other words, that all edges in the original state space are preserved in the abstract state space: $\forall (u, v) \in Π$ , $(ψ (u), ψ (v)) \in ψ (Π)$ . Thus, the abstract state space is necessarily smaller (otherwise it would make no sense to explore it instead of the original state space) so that the homomorphism implies that many edges in the original state space collapse into the same edge in the abstract state space. For example, it is easy to see that the concatenation of the operators $⟨ O_{2}, O_{3} ⟩$ to the goal pattern is still applicable in the abstract space shown in Fig. 1(b), but it leads to the same node with a cost equal to zero.

Fig. 2.
A partial view of the 6-Pancake and one abstraction of it as traversed by a vectorial Pattern Database. (a) A partial view of the original state space of the 6-Pancake. (b) A particular abstraction of the state space shown on the left as traversed by a vectorial Pattern Database with pattern generation depth $d > 1$ .

Of course, different mappings can be considered, each resulting in a different Pattern Database. For example, an abstraction which preserves constants 0, 1 and 2 while mapping 3, 4 and 5 to “don’t care” in the running example, would yield a heuristic value equal to 1 to the pattern $102 □ □ □$ and 2 to the result of the concatenation of the operators $⟨ O_{2}, O_{3} ⟩$ .

Now, given a collection of N abstractions $ψ_{i} |_{i = 1}^{N}$ , the heuristic value of a node $n \in S$ in the original state space can be computed from the different available abstractions. In particular, if each abstraction summed the cost $C_{i}$ of each operator, the only way to get admissible estimates is to pick up the maximum of all of them: $h (n) = max_{i = 1, \dots, N} {h_{i} (n)},$ (1) where $h_{i} (n)$ is the minimum cost to reach the goal pattern in the ith abstraction, $ψ_{i}$ , from $ψ_{i} (n)$ . To distinguish these Pattern Databases from those suggested here, these are called classical Pattern Databases in this work.

An interesting property of the heuristic function shown in Eq. (1) is that it preserves consistency. In other words, it is a consistent heuristic function if $h_{i}$ are consistent as well. The proof is straightforward: given two states in the original state space, n and one of its descendants $n^{'}$ , the same arc that joins them in $S$ shall join their abstractions, $ψ_{i} (n)$ and $ψ_{i} (n^{'})$ , in the ith abstract state space since the mapping is known to be a homomorphism. Although there might be other arcs joining the same abstract states, recall that PDBs record the minimum cost to reach each pattern for the first time, so that the cost of $n^{'}$ cannot exceed the cost of n more than the cheaper of all operators joining them in the original state space.

This technique is able to solve hard combinatorial problems saving orders of magnitude on the number of generated nodes. Among other important achievements, one of their main contributions was to solve optimally the first instances of the $3 \times 3 \times 3$ Rubik’s Cube [11].
4. Vectorial max Pattern Databases

Instead of storing just the minimum distance to the first occurrence of each pattern in a given abstract state space $ψ_{i}$ , vectorial Pattern Databases store the distance to a successive number of occurrences of the same pattern in an array, $H_{i}$ . The jth component of this array, $H_{i} [j]$ , stores the distance to the jth occurrence of the pattern referenced in each case. The procedure for computing these vectors is still the same than the one described for computing the classical Pattern Databases in the preceding section: a backwards breadth-first search suffices to compute a vectorial Pattern Database. The only difference is that every time a node is generated in the abstract state space, it is still inserted in OPEN if and only if the number of occurrences of this pattern is strictly less than a predefined value denoted as the pattern generation depth, d. Clearly, the same patterns are found at depths either equal or increasingly larger, so that the resulting vectors are always automatically sorted in increasing order.

Fig. 3.

Two vectorial Pattern Databases generated with pattern generation depth $d = 2$ .

Figure 2 illustrates the main idea. The figure shows the same partial view of the 6-Pancake discussed in the previous section. As shown in Fig. 2(b), the abstracted state space differs from the state space shown in Fig. 1(b) in how duplicated nodes are treated. Instead of discarding them, they are accepted up to a given threshold (i.e., the pattern generation depth), d. Thus, the resulting Pattern Database consists of a hash table whose entries contain pairs of the form (pattern, $H$ ) where $H$ is a vector that stores the minimum distance to reach successive occurrences of the same pattern. In particular, the vectorial Pattern Database shown in Fig. 2(b) would assign $H = [0, 1, 2, \dots]$ to the pattern $□ □ □ 345$ instead of just 0 as in Fig. 1(b).

Recall from the previous section that abstractions are homomorphisms. Hence, a path $⟨ n_{1}, n_{2}, \dots, t ⟩$ from any node $n_{1}$ to the goal state, t, is mapped to a path in every abstraction $ψ_{i}$ as well, issued from $ψ_{i} (n_{1})$ and getting to $ψ_{i} (t)$ . However, since several nodes in the original state space can match the same pattern in the abstracted state space, the length of the path in the abstract space is less or equal than the length of the path in the original state space. This observation can be used to exploit the information provided by an arbitrary number of vectorial Pattern Databases as follows:

If two (or more) vectorial Pattern Databases return vectors $H_{i}$ such that their first component, $H_{i} [0]$ , is always the same, there is no reason to believe that the path traversed in every abstraction is not the same and equal to the path to be traversed in the original state space towards the goal. Thus, the resulting heuristic estimate should be $H_{i} [0]$ if admissibility has to be guaranteed.

On the other hand, if two (or more) vectorial Pattern Databases return different values for the first component, it is clear that both Pattern Databases have traversed different paths from the target goal – in other words, different paths to different nodes with the same pattern have been found. In the classical case, the best one can do is to pick up the maximum of all of them, as in Eq. (1). However, in the vectorial case, if there are still more components in each vector to examine, one can scale up through each vector looking for an agreement among all of them. If any is found, this would be the heuristic estimate to return; otherwise, the maximum of all components is returned.

Figure 3 shows different cases that can arise in the comparison of two vectors, $H_{1}$ and $H_{2}$ , from two different abstractions $ψ_{1}$ and $ψ_{2}$ explored with a pattern generation depth $d = 2$ . The vectors returned from each Pattern Database are symbolically depicted as vertical lines. For instance, in case (d) (Fig. 3) there is no reason to doubt that the path traversed both in $ψ_{1}$ and $ψ_{2}$ are not the same (since $H_{1} [0] = H_{2} [0] = 10$ ) and thus, it might be the one to be traversed in the original state space. Therefore, the heuristic value of a node with such vectors is 10 if admissibility is about to be guaranteed. However, all the other cases reveal quite a different scenario. For example, in Fig. 3(a) the same node n in the original state space is mapped to different abstract nodes such that the first one returns $[12, 13]$ and the second one $[10, 11]$ : clearly, the paths traversed in each case are not the same, since they have different lengths $H_{1} [0] = 12 \neq 10 = H_{2} [0]$ . Thus, one can check the next value in $H_{2}$ to see whether it is 12 or not. The key observation is that a heuristic value is returned only if an agreement is found between the vectorial Pattern Databases considered, since only in this case it can be concluded that the same path might have been found by all abstractions. In the example, the next value is 11. Since there are no more values to examine in $H_{2}$ , no path has been found of length 12 to the mapping of this node in $ψ_{2}$ . However, it is possible to provide an admissible heuristic estimate just by returning the maximum of ${H_{1} [0], H_{2} [1]} = 12$ . In this case, the heuristic value computed so far is not different than the value that would have been computed by a classical Pattern Database since the maximum of the first component of both vectors is 12 as well.

Much the same happens in Fig. 3(b). In this case, after noticing that the first component of $H_{1}$ and $H_{2}$ differ, the next value in $H_{2}$ agrees with the first value in the first vector. As a matter of fact, this coincidence can be understood as an evidence that there is a path of length 12 in the original state space. Therefore, this is the value to return.

However, in Fig. 3(c) there is never an agreement between both components. After scaling up through both vectors, the last comparison ends with 12 and 13. Since there are no more values to examine, the best one can do is to return 13 and the resulting value is still admissible. Much the same happens in Fig. 3(e), the only difference being that in this case an agreement has been found at the last position. In both cases, the values returned, 13, are larger than the values that would have been returned by two classical Pattern Databases, 11.

Since every $H_{i}$ provides a number of observed distances to the goal pattern, it is possible to compute a vector of feasible values. Let H denote such vector. The vector H is computed from an arbitrary number of vectorial Pattern Databases. After looking up each vectorial Pattern Database, it examines the vectors $H_{i}$ and records those values that meet any of the following conditions:

The same value has been repeatedly observed in all the other vectors, $H_{j}$ , or

it exceeds the maximum value observed in all the other vectors.

The first case serves to identify all the agreements among the vectors $H_{i}$ . For example, in Fig. 3(d), 10 would be recorded in H since there is a mutual agreement between both heuristics. The second condition is a little bit more subtle and can be seen, in fact, as a generalization of Eq. (1). When computing the vector of feasible heuristic values H, if an abstraction $ψ_{i}$ has found a path of length $H_{i} [k]$ ( $k ⩽ d$ ) to a given pattern, and all the other abstractions report distances which are shorter, there is no reason to believe that there are no paths whose length lies in the interval ${H_{i} [k], H_{i} [d - 1]}$ , so that they are included in H as well. Considering again Fig. 3(d), the value 12 can be safely discarded but nothing can be said of 13 (since the second abstraction was not given an opportunity to record a third match) and it shall be included in H as well for the sake of completeness. When putting both conditions together, it becomes clear that H stores all values which appear in all abstractions (condition (1)) and the values ${H_{i} [k], H_{i} [d - 1]}$ in the ith abstraction which are larger than the maximum values in all the other abstractions, $H_{j}$ , $j \neq i$ – condition (2). In the end, the value of H in Fig. 3(d) is $[10, 13]$ .

Algorithm 1.

Computation of H

Algorithm 1 shows how to compute H for a particular node n. It receives N vectors $H_{i}$ with the cost of reaching the goal pattern from the abstracted node $ψ_{i} (n)$ in the N different abstractions. In the first line, vector H is initialized to the empty vector. Lines 2–4 look for all the agreements among the vectors $H_{i}$ . Since these vectors are automatically sorted in increasing order, line 3 is executed in $O (N log d)$ and the whole loop is performed in $O (d N log d)$ . Lines 5–8 copy the values exceeding the maximum in all the others in the vector H. To make this computation more efficient the vectors $H_{i}$ are sorted in increasing order of their last value in line 5. Clearly, this operation imposes an overhead which is $O (N log N)$ which results in an overall complexity equal to $O (N (log N + d log d))$ . However, both parameters, N and d are not related to the size of the problem at hand (e.g., the number of tiles of a particular sliding-tile puzzle or the number of locations of a Pancake) and they are usually bounded by small constants. For typical cases of two, three or four vectorial Pattern Databases with a pattern generation depth $d ⩽ 4$ , a small number of comparisons suffice.

It can be easily proven that in order to improve over the heuristic values returned by classical Pattern Databases, there should be at least one vectorial Pattern Database with a difference between two successive observations equal to 2 or larger. In other words,

Lemma 1.

If in the confrontation of different $H_{i}$ none has a difference between successive observations larger than1, the heuristic value computed so far is exactly $max_{i = 1, \dots, N} {H_{i} [0]}$ leading to no improvement over classical Pattern Databases.

Proof.

The proof is trivial and proceeds by enumeration of a few cases.

Let us first consider two different vectors $H_{1} = [h_{1}, h_{1} + 1, \dots, h_{1} + c_{1} - 1]$ and $H_{2} = [h_{2}, h_{2} + 1, \dots, h_{2} + c_{2} - 1]$ whose values (after removing duplicates) are arranged according to an arithmetic progression with a difference equal to 1. Besides, $0 < c_{1}, c_{2} ⩽ d$ , so that if either $c_{1}$ or $c_{2}$ (or both) is less than d – the pattern generation depth – at least one value appears multiple times. In particular, no assumption is made about $c_{1}$ or $c_{2}$ other than they should be values strictly positive.

If $h_{1} + c_{1} - 1 < h_{2}$ then after scaling up through vector $H_{1}$ , $h_{2}$ is selected. Indeed, $h_{2} = max {H_{1} [0], H_{2} [0]} = max {h_{1}, h_{2}} = h_{2}$ since $h_{2} > h_{1} + c_{1} - 1$ by hypothesis and $c_{1}$ is necessarily positive.

If $h_{1} < h_{2}$ and $h_{1} + c_{1} - 1 ⩾ h_{2}$ then an agreement would be eventually found at $h_{2}$ since the values in $H_{1}$ follow an arithmetic progression whose difference equals 1 and $h_{2} \in [h_{1}, h_{1} + c_{1})$ by hypothesis. On the other hand, $max {H_{1} [0], H_{2} [0]} = max {h_{1}, h_{2}} = h_{2}$ .

If $H_{1} = H_{2}$ then $h_{1} = h_{2}$ so that $max {H_{1} [0], H_{2} [0]} = max {h_{1}, h_{2}} = h_{1}$ or $h_{2}$ .

Other arrangements of $H_{1}$ and $H_{2}$ are symmetrical so that their discussion is skipped.

If more vectors are considered, recall that H stores all values which appear in all abstractions and the values in the ith abstraction which are larger than the maximum values in all the other abstractions. Thus, after comparing two vectors $H_{1}$ and $H_{2}$ the result consists again of an interval of values arranged as in an arithmetic progression with a difference equal to 1: If $h_{1} + c_{1} - 1 < h_{2}$ , then only $h_{2}$ is stored; if $h_{1} + c_{1} - 1 ⩾ h_{2}$ , then all values in the interval $[h_{2}, h_{1} + c_{1})$ are considered; finally, if $H_{1} = H_{2}$ then one copy of these vectors is used for the next comparison.

In any case, in the comparison of two vectors (one that results from $H_{1}$ and $H_{2}$ and, on the other hand, $H_{3}$ ) whose elements are distributed according to an arithmetic progression with a difference equal to 1, the lemma applies again. □

As a matter of fact, it might happen in practice that many nodes in the original state space are mapped to the same abstract state in one abstraction $ψ_{i}$ such that the length of each path is the same or very similar. Consider again the Pancake puzzle. Typical mappings for this problem split the permutation in different chunks and compute a Pattern Database for each one. For example, a Pancake with 12 positions can be easily solved with two Pattern Databases of six elements each: one mapping the first six constants and getting rid of the second half; and a second one ignoring the first half but preserving the constants of the second part, instead. For any of these abstractions, if the first two locations are not occupied by constants preserved by it, the operator $O_{2}$ would lead to exactly the same pattern with an increment of the cost equal to 1. In the end, all patterns reachable from any node whose first two locations are not mapped, will get to a vector whose costs are distributed according to an arithmetic progression with a difference equal to 1. Since the minimum difference must be 2, the performance of this technique relies solely on the accuracy of the other Pattern Database since this one is clearly useless.

On the opposite side, there are at least two different ways for exploiting the heuristic values computed with vectorial Pattern Databases: on one hand vectorial Pattern Databases naturally induce inconsistent heuristic functions; on the other hand, since the components of $H_{i}$ can be used to compute a vector H of feasible values, a feasibility analysis can be conducted to enhance the heuristic estimates. Both directions are explored in the next subsections.

4.1. Inconsistent heuristics

As it was already noted in the Introduction, it has been suggested that inconsistency (while preserving admissibility) might be seen as a desired property. An example is provided to prove that vectorial Pattern Databases lead to inconsistent heuristic functions very easily.

Consider a node n and two abstractions $ψ_{1}$ and $ψ_{2}$ . The values retrieved from two classical Pattern Databases are found to be the same $h_{1} = h_{2} = 13$ . According to Eq. (1), the heuristic value of node n would be $h (n) = {max}_{i = 1, 2} {h_{i} (n)} = 13$ . Due to the consistency of the underlying heuristics, the net changes in the heuristic estimates between node n and any of its descendants, $n^{'}$ cannot exceed the cost of the edge joining them in the original state space. Thus, assume that the heuristic values found now by two classical Pattern Databases are $h_{1} = 12$ and $h_{2} = 14$ , yielding a final heuristic value $h (n^{'}) = {max}_{i = 1, 2} {h_{i} (n^{'})} = 14$ , which is consistent with the heuristic value of its ancestor.

On the other hand, assume that the values returned by two vectorial Pattern Databases generated with a pattern generation depth equal to 2 (which actually explore the same abstract space than their classical counterparts and would extend it a little bit further until all patterns are found again) for the same node n are $H_{1} = [13, 14]$ and $H_{2} = [13, 15]$ . According to the procedure depicted above the heuristic value is 13 as well, since this is the first value agreed by both vectors. Again, due to the consistency of the underlying abstractions, the net changes in the values stored at each component cannot exceed the cost of the arc joining the states in the original state space. Therefore, let us assume that the vectors computed for the same descendant $n^{'}$ are $H_{1} = [12, 15]$ and $H_{2} = [14, 16]$ . Now, there is no coincidence between both vectors so that the resulting heuristic value is the maximum of the last component of both vectors, yielding the value 16, which is inconsistent with the heuristic value of its parent, 13, but still admissible.

When using inconsistent, yet admissible, heuristic functions it is possible to save the expansion of a (usually large) number of nodes by using the pathmax propagation rules [15]. Furthermore, if the original state space is an undirected graph, then these propagation rules can be extended with an additional rule [6].

In a nutshell, the pathmax propagation rules propagate the heuristic values between a node n and its descendant $n^{'}$ as follows:

$h (n^{'}) = max {h (n^{'}), h (n) - c (n, n^{'})};$

$\begin{array}{rcl} h (n) \\ = max {h (n), \\ min_{n_{i} \in SCS (n)} {h (n_{i}) + c (n, n_{i})}}, \end{array}$

where

c (n, n_{i})

is the cost of the edge connecting nodes n and

n_{i}

and

SCS (n)

is the set of all children of node n. While the second rule applies in directed state spaces it was regarded as not being very useful [6]. Instead, if the state space is undirected, the pathmax propagation rules were extended with a third rule:

$h (n) = max {h (n), h (n^{'}) - c (n, n^{'})},$

which enables the effective propagation of heuristic values from children to parent. Bidirectional pathmax (or BPMX for short) uses rules 1 and 3 to propagate (inconsistent) heuristic values in any direction. As it will be shown in the following subsection, these updates can be further improved.

4.2. Feasibility analysis

A closer inspection to the preceding rules reveal that subtracting the cost of the edge is just a conservative assumption that is necessary to guarantee admissibility. Clearly, these updates can be improved if the feasible values of each node are known. The key observation is that the values not recorded by a given abstraction, $ψ_{i}$ in the interval $(H_{i} [0], H_{i} [d - 1])$ , should never be considered for computing the heuristic value of the corresponding node.1

¹
Credit shall be given to Fan Yang who also mentioned the possibility of storing more than one cost in each Pattern Database in the concluding remarks of one of her works [18] for this particular purpose.

Since the vectorial Pattern Database is built with a backwards breadth-first search it can be assured that no path exist of such length. Take, for instance, the vector

H_{1}

shown in Fig. 3(d). Clearly, there are no paths of lengths 11 and 12 leading to the pattern represented by the first abstraction.

Incidentally, the vector H contains feasible values as explained above: It computes all values which appear in all abstractions (so that they are likely to be feasible in the original state space) and the values in the ith abstraction which are larger than the maximum values in all the other abstractions (for which nothing can be said about its unfeasibility) with the purpose of performing the following feasibility analysis.

Since the heuristic function induced by vectorial Pattern Databases is known to be inconsistent (see Section 4.1), the first pathmax propagation rule can be applied after expanding node n and generating one of its children, $n^{'}$ , so that its heuristic value is likely to be updated. After returning to its parent, node n can use the third pathmax propagation rule (in case the state space is undirected) to update its own heuristic value which can be used, in turn, to update the values of other children. Consider the case shown in Fig. 4(a), where edges are undirected so that they can be traversed in both directions. After expanding the root node, the left child gets an inconsistent heuristic value equal to 13 whose difference with the heuristic value of its parent, 16, exceeds the cost of the edge joining them. According to the first pathmax propagation rule, it can update its value to $max {13, 16 - 1} = 15$ . This value is returned to its parent which cannot be used to improve its own heuristic value further since according to the third pathmax propagation rule $max {16, 15 - 1} = 16$ , which is already equal to its value. Finally, since the value of the rightmost child is consistent with it, no more updates take place. Figure 4(b) shows the final heuristic values of all nodes.

Fig. 4.

Example of BPMX without and with a feasibility analysis. (a) Original heuristic values. (b) Result of BPMX without a feasibility analysis. (c) Result of BPMX with a feasibility analysis.

However, using the feasible values reported in the vector H it is possible to improve the heuristic values of all nodes. Instead of correcting the heuristic values by subtracting one unit to either the heuristic value of the parent or the child (as suggested by BPMX and done also in preliminary work, see Section 2), it is possible now to pick up the next feasible value – which is greater or equal than the value returned by BPMX. The reason is that all values in between are known to be unfeasible.

Consider again the example shown in Fig. 4. Let us assume that two vectorial Pattern Databases with pattern generation depth $d = 3$ are used. All nodes show the looked up vectors $H_{1}$ and $H_{2}$ and the vector of feasible values H in Fig. 4(c) – the values included in H from either $H_{1}$ or $H_{2}$ are shown in boldface. The heuristic values shown in Fig. 4(a) are exactly equal to the first component of each H vector. Now, it will be shown that a feasibility analysis can improve the values that result from applying BPMX, shown in Fig. 4(b). According to the first pathmax rule, the heuristic value of the leftmost child is updated to 15. However, this value is known to be unfeasible since no path of such length was ever found in the second abstraction as witnessed by the vector $H_{2}$ that does not contain it so that it is not included in H. Fortunately, a value larger than 15 is found and this means that all values from 15 up to it are unfeasible. Therefore, the heuristic value of this node can be updated to 19. Next, after resuming the search to its parent, the third pathmax rule can be used to update its value to $max {16, 19 - 1} = 18$ . Since this value is recorded in its H vector it is readily accepted as an admissible estimate. While it was not possible to update the heuristic value of the rightmost child when using BPMX without a feasibility analysis, the new value can be propagated using again the first pathmax rule yielding a new estimate equal to $max {15, 18 - 1} = 17$ . As it happened with its sibiling, this value is not found in its H vector so that the smallest value larger than it is selected resulting in a new estimate of 19. As shown in Fig. 4(c), the final values of all nodes are significantly larger than those that resulted from the application of BPMX without this feasibility analysis, shown in Fig. 4(b).

To summarize, while adhering to the conservative schema implemented in BPMX is a must to guarantee admissibility, the values returned by it are accepted if and only if they appear in the vector of feasible values H. Otherwise, it is possible to pick up the smallest value in H that exceeds the value returned by BPMX without violating admissibility. This is true since H records only values that appear in all the looked up vectors $H_{i}$ (and finding a path of the same length in all abstractions make it likely that the same path exists in the original state space) and the values in one abstraction which are larger than the maximum values in all the other abstractions – whose feasibility is just unknown because the other abstractions could not enumerate paths of that length.

5. Results

In order to show that it is possible a significant reduction in the number of nodes generated with vectorial Pattern Databases, the puzzles M12 and M24 have been selected [13]. Other typical puzzles in this sort of experiments such as the sliding-tile puzzle, TopSpin or the Rubik’s Cube have the same difficulties than the Pancake already discussed in Section 4: namely, the fact that there are operators in the abstract space that will leave many patterns unaffected. In turn, M12 and M24 have operators that affect all locations at the same time. Besides, they do add some interesting properties such as the fact that the underlying state space is directed, in comparison with the aforementioned puzzles which do always engender undirected graphs. On the other hand, no optimal results have ever been reported.

These puzzles are motivated by the fact that they induce group structures different than those created by other problems. In fact, their underlying structure follows two of the five different Mathieu groups, which were the first sporadic simple groups found. Published at the end of the nineteenth century by Émile Léonard Mathieu, M12 is the second smaller group and M24 is the biggest of all. Nevertheless, these groups can be regarded as permutation groups (and hence as permutation puzzles) on sets of 12 and 24 objects, respectively. First, the results on every benchmark are reported and then a discussion of the results is offered.

All experiments have been run on an iMac 2.8 GHz Intel Core 2 Duo with 4 Gb of RAM.

5.1. M12

M12 consists of a permutation of 12 constants and two operators: invert and merge. While the former just inverts the whole permutation (e.g., from 0..11 to 11..0), the later rearranges the whole permutation as follows: the first six constants are distributed in the even locations whereas the second half is copied in inverted order in the odd positions – e.g., from 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 to 0, 11, 1, 10, 2, 9, 3, 8, 4, 7, 5, 6. M12 is known to have exactly 95,040 states.

Since the classical Pattern Databases are consistent in this case, IDA^∗ with no modification has been used. On the other hand, since vectorial Pattern Databases result in a inconsistent heuristic, the first pathmax propagation rule has been used along with the feasibility analysis suggested in Section 4.2 with two different pattern generation depths,

d = {2, 3}

Table 1
Results in the M12 puzzle (number of generated nodes)

d	4-4	4-4-4
1	96,829	45,315
2	71,767 (74.1%)	39,932 (88.1%)
3	64,875 (66.9%)	36,804 (81.2%)

Table 1 shows the number of nodes generated when solving 800 instances randomly generated in the M12 with either classical Pattern Databases ( $d = 1$ ) or vectorial Pattern Databases – $d = {2, 3}$ . As it can be seen, two different arrangements of patterns have been tried: 4-4 and 4-4-4. The first one maps the heading and trailing four positions into two distinct Pattern Databases, leaving the four middle unmapped. The second one maps every group of four successive positions into a different Pattern Database.

From the preceding table it is clear that using vectorial PDBs leads to a reduction in the number of nodes generated of the 25,9% in the 4-4 setting, from 96,829 to 71,767. When the heuristic function is better informed, as in the case of the 4-4-4 arrangement, the reduction is smaller and only a little bit larger than 10%. The values for $d = 3$ are reported to show that the reduction increases with the pattern generation depth as expected. In all cases, the average running time was always below 0.01 s.

5.2. M24

M24 consists of a permutation of 24 positions, out of which 1, …, 23 are arranged over a circumference, and 0 is placed out of the circle. At every step, it is possible to either rotate the 23 constants clockwise (therefore leaving the content of position 0 unaltered), or to switch the 24 locations in pairs according to a particular pattern. Thus, the switch operator actually consists of 12 transpositions denoted in cycle notation as (0 1) (2 23) (3 4) (5 22) (6 11) (7 8) (9 10) (12 21) (13 14) (15 20) (16 17) (18 19). Its state space is known to contain exactly 244,823,040 nodes.

As it turned out in the M12, this is also a directed state space (because the rotation cannot be immediately unwound) so that an IDA^∗ was used with classical Pattern Databases whereas the first pathmax propagation rule along with the feasibility analysis proposed in Section 4.2 has been implemented for various values of the pattern generation depth.

Table 2
Results in the 12-M24 puzzle (number of generated nodes)

d	5-5	6-6	4-4-4
1	33,112,637	4,842,120	166,040,667
2	21,142,126 (63.8%)	3,024,885 (62.4%)	105,878,854 (63.7%)
3	18,790,477 (56.7%)	2,682,071 (55.3%)	90,663,551 (54.6%)
4	18,106,592 (54.6%)	2,588,118 (53.4%)	85,226,273 (51.3%)

The first set of experiments involved the same domain but only with 12 constants – to be denoted as 12-M24. In this case, the switch operator implemented in cycle notation is (0 1) (2 5) (3 4) (6 11) (7 8) (9 10), very similar to the switch operator defined for the same puzzle with 24 constants. Table 2 shows the number of generated nodes when solving 1000 instances randomly generated for different arrangements of Pattern Databases and pattern generation depths,

d = {1, 2, 3, 4}

. As in the previous case, an arrangement denoted with n-n refers to two different mappings: the first one maps the first n contents, while the second maps the last n. An arrangement denoted as n-n-n takes three successive groups of length n each and maps them into different abstractions. As it can be seen, when going from classical Pattern Databases (

d = 1

) to vectorial Pattern Databases (

d = 2

) it is possible to save almost 40% of the number of nodes generated with

d = 1

. Successive pattern generation depths only improve marginally and asymptotically to roughly half the number of nodes. When using the 4-4-4 arrangement, vectorial PDBs take on average 0.11 s with

d = 2

and 0.10 s with

d = {3, 4}

while classical PDBs can solve all these cases in 0.07 s on average. If the 5-5 arrangement is used instead, the classical PDBs solve every case in 0.01 s on average which is the same time taken by vectorial PDBs when the perimeter generation depth equals 3 or 4. If

d = 2

, then the average running time is 0.02 s. Finally, in case the arrangement 6-6 is selected, all PDBs (either classical or vectorial with different values of d) solve every case in less than 0.01 s on average.

Table 3

Results in the 24-M24 puzzle (number of generated nodes)

d	6-6-6
1	87,856
2	81,086 (92.2%)

The second set of experiments consisted of running the original M24 puzzle with a 6-6-6 arrangement denoted as 24-M24. Table 3 shows the number of nodes generated when solving 1000 random instances. The savings in this case are very moderate mainly because the classical Pattern Database results in a very informed heuristic function (as witnessed by the low number of generated nodes) so there is no much room for improvement. Indeed, all instances are solved in less than 0.01 s on average in all configurations.

5.3. Discussion

Although the numbers might seem modest in some cases, truth is that vectorial Pattern Databases prune large subtrees both in M12 and M24. To prove it, let us consider that in both cases there is one operator $O_{1}$ which is invertible (invert in M12 and switch in M24) so that $O_{1} = O_{1}^{- 1}$ , and a second operator, $O_{2}$ , which is not invertible – merge in M12 and rotate in M24. Now, let $T (z)$ denote the number of nodes generated at an arbitrary depth z by a search algorithm in any of these domains. Clearly, $T (z) = T_{1} (z) + T_{2} (z),$ where $T_{i} (z)$ denotes the number of nodes generated by operator i at depth z. Since the inverse of $O_{2}$ is not available, it can be applied to every node of the preceding depth and thus, $T_{2} (z)$ is bounded by $T_{2} (z) ⩽ T (z - 1)$ . On the other hand, $O_{1}$ can be applied only to the nodes generated with $O_{2}$ in the preceding level, since applying $O_{1}$ twice in a row would revert to the previously generated node. Therefore, $T_{1} (z)$ is bounded by $T_{1} (z) ⩽ T_{2} (z - 1)$ which, at the same time is bounded by $T (z - 2)$ . Hence: $T (z) ⩽ T (z - 2) + T (z - 1),$ which is the Fibonacci series.

Therefore, even if a new heuristic prunes a tree of height 10 (which would be quite remarkable), the number of saved nodes is bounded by the tenth number of the Fibonacci series, 144. To save thousands of nodes (see Table 3), tens of thousands (Table 1) or even millions (as in Table 2) it is necessary to prune many very deep subtrees.

6. Conclusions

It has been discussed to extend the abstract state space to visit all patterns a predefined number of times, instead of only once. The technique results in an inconsistent heuristic function that can be used also for jumping between feasible heuristic values. When considering the pros and cons, it became clear that this technique is useless in those cases where there are operators that leave several patterns unaffected, while it can contribute to a significant reduction in the number of generated nodes in other cases, such as the puzzles M12 and M24. Nevertheless, experiments in these domains show that this reduction does not necessarily payoff in running time due to the extra computation effort analyzed in Section 4.

On the other hand, the technique is clearly compatible with other ideas such as using symmetries, duality or maximizing over various multiple pattern databases and it only augments the size of the original Pattern Database by a constant factor.

Footnotes

Acknowledgements

This work has been partially supported by the Spanish MINECO project PlanInteraction: TIN2011-27652-C03-02.

References

[1]

J.C.

Culberson and

Schaeffer, Searching with pattern databases, in: Advances in Artificial Intelligence, Springer, 1996, pp. 402–416.

[2]

J.C.

Culberson and

Schaeffer, Pattern databases, Computational Intelligence 14(3) (1998), 318–334.

[3]

Dweighter, Problem E2569, American Mathematical Monthly 82(10) (1975), 1010.

[4]

Edelkamp, Planning with pattern databases, in: Proceedings of the Sixth European Conference on Planning (ECP-01), Toledo, Spain, 2001, pp. 13–34.

[5]

Edelkamp, Symbolic pattern databases in heuristic search planning, in: Proceedings of the Sixth International Conference on Artificial Intelligence Planning Systems (AIPS-02), Toulouse, France, 2002, pp. 274–283.

[6]

Felner,

Zahavi,

Holte,

Schaeffer,

Sturtevant and

Zhang, Inconsistent heuristics in theory and practice, Artificial Intelligence 175 (2011), 1570–1603.

[7]

Helmert,

Haslum and

Hoffmann, Flexible abstraction heuristics for optimal sequential planning, in: Proceedings of the Seventeenth International Conference on Automated Planning and Scheduling (ICAPS-07), Providence, RI, USA, 2007, pp. 176–183.

[8]

Helmert,

Haslum,

Hoffmann and

Nissim, Merge-and-shrink abstraction: A method for generating lower bounds in factored state spaces, Journal of the ACM 61(3) (2015), 1–63.

[9]

Hoffmann and

Nebel, The FF planning system: fast plan generation through heuristic search, Journal of Artificial Intelligence Research 14 (2001), 253–302.

10.

[10]

R.C.

Holte,

Felner,

Newton,

Meshulam and

Furcy, Maximizing over multiple pattern databases speeds up heuristic search, Artificial Intelligence 170(16,17) (2006), 1123–1136.

11.

[11]

R.E.

Korf, Finding optimal solutions to Rubik’s cube using pattern databases, in: Proceedings of the Fourteenth National Conference on Artificial Intelligence (AAAI-97), 1997, pp. 700–705.

12.

[12]

R.E.

Korf and

Felner, Disjoint pattern database heuristics, Artificial Intelligence 134(1,2) (2002), 9–22.

13.

[13]

Kriz and

Siegel, Rubik’s Cube inspired puzzles demonstrate math’s “simple groups”, Scientific American (2008).

14.

[14]

Linares Lopez, Vectorial Pattern Databases, in: Proceedings of the Nineteenth European Conference on Artificial Intelligence (ECAI’10), Lisbon, Portugal, 2010, pp. 1059–1060.

15.

[15]

Mero, A heuristic search algorithm with modifiable estimate, Artificial Intelligence 23(1) (1984), 13–27.

16.

[16]

Pearl, Heuristics, Addison-Wesley, Reading, MA, 1984.

17.

[17]

Pommerening,

Röger and

Helmert, Getting the most out of pattern databases for classical planning, in: Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI-13), Beijing, China, 2013, pp. 2357–2364.

18.

[18]

Yang, Exploring infeasibility for abstraction-based heuristics, in: Search in Artificial Intelligence and Robotics: 2008 AAAI Workshop, Chicago, USA, 2008, pp. 134–139.

19.

[19]

Yang,

Culberson and

Holte, Using infeasibility to improve abstraction-based heuristics, in: Abstraction, Reformulation, and Approximation, Lecture Notes in Computer Science, Vol. 4612, Springer, Heidelberg, 2007, pp. 413–414.

20.

[20]

Yang,

J.C.

Culberson,

Holte,

Zahavi and

Felner, A general theory of additive state space abstractions, Journal of Artificial Intelligence Research 32 (2008), 631–662.

21.

[21]

Zahavi,

Felner,

R.C.

Holte and

Schaeffer, Duality in permutation state spaces and the dual search algorithm, Artificial Intelligence 172 (2007), 514–540.

22.

[22]

Zahavi,

Felner,

Schaeffer and

Stutervant, Inconsistent heuristics, in: Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI-07), Vancouver, BC, Canada, 2007, pp. 1211–1216.

Vectorial Pattern Databases

Abstract

Keywords

1. Introduction

2. Related work

4.2. Feasibility analysis

1 Credit shall be given to Fan Yang who also mentioned the possibility of storing more than one cost in each Pattern Database in the concluding remarks of one of her works [18] for this particular purpose.

5.1. M12

Table 1 Results in the M12 puzzle (number of generated nodes)

Table 2 Results in the 12-M24 puzzle (number of generated nodes)

6. Conclusions

Footnotes

Acknowledgements

References

¹
Credit shall be given to Fan Yang who also mentioned the possibility of storing more than one cost in each Pattern Database in the concluding remarks of one of her works [18] for this particular purpose.

Table 1
Results in the M12 puzzle (number of generated nodes)

Table 2
Results in the 12-M24 puzzle (number of generated nodes)