A fast nonogram solver that won the TAAI 2017 and ICGA 2018 tournaments

Abstract

A nonogram is a pen and paper single-player logic game in which players paint each cell of a two-dimensional grid according to clues for specific rows and columns. This paper improves on the method proposed by Wu et al. by adding a freedom parameter which can significantly reduce the total computation cost of maximal painting. Combining a well-designed bitboard with BMI instructions in the CPU, we reduce memory loading while accelerating operations. Our Nonogram program, named Requiem, solved all 1000 puzzles in every Nonogram Tournament from 2011 to 2018 faster than all previous medal programs. Furthermore, we beat four other teams to win the TAAI 2017 Tournament, six other teams to win the ICGA 2018 Tournament and three other teams to win the TAAI 2018. To our knowledge, Requiem offers the best puzzle solving performance at the present time.

Keywords

Nonogram bitboard puzzle dynamic programming BMI

1. Introduction

Nonograms, also called Paint by Numbers puzzles, were invented by a Japanese graphics editor named Non Ishida in 1987. Players are presented with a blank two-dimensional grid, and color in certain squares based on a series of clues to reveal a resulting image. Figure 1 shows a $5 \times 5$ Nonogram puzzle and its solution. The clue “1 2” at the fourth column means that the fourth column in the solution must contain 2 black segments – the first black segment with length 1 and the second black segment with length 2. The two black segments must be separated with by least one white cell. A Nonogram puzzle may have multiple solutions but we only need to derive one solution. It is also an issue whether or not a Nonogram puzzle has more than one solution.

Players typically go through the grid and address the most obvious clues first. For example, in Fig. 2, the clue is “1 2”. Initially, the solution for all five cells is unknown, and we thus label the line “uuuuu”. All the 3 possible patterns for this line are shown in Fig. 2. We can see that the fourth cell must be black, while the four remaining cells are uncertain, and thus the line is denoted as “uuu1u”. Most Nonogram puzzles can be solved through repeated iterations of this process for each row and column, a process called maximal painting (Wu et al., 2013).

Solving Nonograms has been shown to be NP-complete (Ueda and Nagao, 1996). At present, only puzzles with moderate size can be solved in a reasonable amount of time. The ratio of black cells in the solution will affect the difficulty of solving it. Having more black cells in the solution means that more cells can be determined in advance using the maximal painting operations. For example, the puzzle in Fig. 1 can easily be completely solved by repeatedly using the maximal painting operations on each row and column. If the number of black cells in the solution is smaller, we can randomly guess some cells as being black, and then apply the maximal painting operations on each row and column to determine many unknown cells. Hence a “difficult” Nonogram puzzle needs an appropriate ratio of black cells in the solution (Batenburg et al., 2009). In each Nonogram tournament from $2011 \sim 2018$ , 1000 $25 \times 25$ puzzles were generated using a random number generator where the ratio of black cells in the solutions was set from 35% to 50%. Due to this relatively low ratio, most participating programs were unable to determine all the cells using maximal painting operations alone and followed up with backtracking to guess some unknown cells. This process was repeated until all the cells were determined to obtain the final solution. Therefore, the main issue is how to accelerate maximal painting operations.

Fig. 1.

A nonogram example, (a) blank puzzle, (b) corresponding solution.

Fig. 2.

Maximal painting.

In the rest of this paper, Section 2 presents our ideas for improving the efficiency of line solving. The proposed technique involves the “freedom” number of each line which is the difference between the upper bound of the number of white pixels and the number of current white pixels of the line. The new algorithm can quickly stop the max-painting when the freedom number equals zero. Section 3 compares the advantages and disadvantages of three so-called Fully Probing (FP) methods to solve more pixels before backtracking. Among FP1, FP2, and FP3, FP1 is the most efficient way since it can solve almost the same number of pixels while only using 80% execution time on average. From this concrete comparison, we use only FP1 to let our program Requiem solve more puzzles in much less time. Section 4 describes our policy for selecting unknown cells and its speedup in backtracking. Employing “⩾” in the condition for selecting unknown cells can improve the execution time for 30% of the Nonogram problems. In Section 5, a high-performance bitboard data structure for implementing parallel instructions is proposed to store lines and clues. Experiments are presented in Section 6. We conduct complete experiments and show that the new techniques can dramatically reduce the computation cost of previous programs. Concluding remarks are given in Section 7.

2. Solving a single line

Many researchers have proposed algorithms to solve for Nonograms, with various methods for treating the maximal painting of a line (Wolter). Yen et al. (2010) generated all available solutions for a line based on available clues and generated solutions as in Fig. 2. Wu et al. (2013) applied dynamic programming techniques (Cormen et al., 2009) to derive a method for maximal painting with time complexity $O (k l)$ when the line length is l and the number of integers in the clue is k. Recently, Chen and Hung (2017) simplified the formula developed in Wu et al. (2013) and reduced the difficulty of implementation.

As with previous researchers, regular expressions are utilized to represent the various patterns that the line needs to match. We use 1 (or 0) to indicate that a cell is painted black (or white), and use u to indicate that a cell’s state is uncertain. Each clue $D = (d_{1}, d_{2}, \dots, d_{k})$ corresponds to a line $S = s_{1} s_{2} \dots s_{l} \to 0^{*} 1^{d_{1}} 0^{+} 1^{d_{2}} \dots 0^{+} 1^{d_{k}} 0^{*}$ , where $1^{d_{i}}$ means 1 repeats $d_{i}$ times, $0^{+}$ means at least one occurrence of 0, and $0^{*}$ means zero or more occurrences of 0. The → symbol denotes the assignment relation between two strings as defined in Wu et al. (2013). Following Wu et al. (2013) and Chen and Hung (2017), for simplicity and consistency, we simply add one 0 at the beginning of S and S becomes $0^{+} 1^{d_{1}} 0^{+} 1^{d_{2}} \dots 0^{+} 1^{d_{k}} 0^{*}$ . Now the left-hand sides of every $1^{d}$ have a 0, and there are $0^{*}$ as separations between each $01^{d}$ . Let $σ (d) = 01^{d}$ , we have $\begin{matrix} S \to 0^{*} σ (d_{1}) 0^{*} σ (d_{2}) 0^{*} \dots 0^{*} σ (d_{k}) 0^{*} \end{matrix}$

Let $S 3 = merge (S 1, S 2)$ , then for all i, $S 3 [i] = S 1 [i] \oplus S 2 [i]$ , where ⊕ is defined as follows. For all $x \in Γ = {ϵ, 0, 1, u}$ , $\begin{array}{l} x \oplus x = x \\ ϵ \oplus x = x \\ u \oplus x = u \\ 0 \oplus 1 = u \end{array}$

Note that ⊕ is a symmetric operation (i.e., $a \oplus b = b \oplus a$ ).

The following formulas (Wu et al., 2013; Chen and Hung, 2017) are developed and a dynamic programming approach proposed in Batenburg et al. (2009) can be applied to derive the maximal painting with a time complexity of $O (k l)$ . $\begin{array}{l} P (i, j) = \{\begin{matrix} merge (P 0 (i, j), P 1 (i, j)), & if i > 0 and j ⩾ 0 \\ ϵ^{i}, & otherwise \end{matrix} \\ P 0 (i, j) = \{\begin{matrix} P (i - 1, j) \cdot 0, & if s [i] matches (0 | u) \\ ϵ^{i}, & otherwise \end{matrix} \\ P 1 (i, j) = \{\begin{matrix} P (i - d_{j} - 1, j - 1) \cdot 0 \cdot 1^{d_{j}}, & if s [i - d_{j}, i] matches ((0 | u) \cdot {(1 | u)}^{d_{j}}) \\ ϵ^{i}, & otherwise \end{matrix} \end{array}$

Here ϵ means null or conflicted, and $P (i, j)$ is the result of maximal painting in terms of the substring $s_{1} s_{2} \dots s_{i}$ and the clue $(d_{1}, d_{2}, \dots, d_{j})$ . A string $T = t_{1} t_{2} \dots t_{l}$ matches $S = s_{1} s_{2} \dots s_{l}$ if and only if $t_{1} t_{2} \dots t_{l} = s_{1} s_{2} \dots s_{l}$ except that $t_{i} = (0 | 1)$ and $s_{i} = u$ for some i.

We found that the maximal painting in Wu et al. (2013) and Chen and Hung (2017) involves many impossible situations. For the example of Fig. 2, $D = (1, 2)$ , the original length of S is $l = 5$ . We add one 0 at the beginning of S and now $S = 0 u u u u u \to 0^{+} 1^{1} 0^{+} 1^{2} 0^{*}$ . We use the above formulas to compute $P (6, 2)$ and then $P (4, 2)$ will be called. Finally, $P (0, 2)$ will be called and its value $ϵ^{0}$ (meaning null or conflicted) will be returned. Part of the computation process is shown below. $\begin{array}{l} P (6, 2) = merge (P 0 (6, 2), P 1 (6, 2)) = merge (P (5, 2) 0, P 1 (6, 2)) \\ P (5, 2) = merge (P 0 (5, 2), P 1 (5, 2)) = merge (P (4, 2) 0, P 1 (5, 2)) \\ P (4, 2) = merge (P 0 (4, 2), P 1 (4, 2)) = merge (P (3, 2) 0, P 1 (4, 2)) \\ P (3, 2) = merge (P 0 (3, 2), P 1 (3, 2)) = merge (P (2, 2) 0, P 1 (3, 2)) \\ P (2, 2) = merge (P 0 (2, 2), P 1 (2, 2)) = merge (P (1, 2) 0, P 1 (2, 2)) \\ \begin{matrix} P (1, 2) & = merge (P 0 (1, 2), P 1 (1, 2)) = merge (P (0, 2) 0, P 1 (1, 2)) \\ = merge (ϵ^{0} 0, P 1 (1, 2)) \end{matrix} \end{array}$

From this example, we see that the clue $D = (1, 2)$ implies that the length should be at least 5 to paint it successfully. But $P (4, 2)$ has a length of 4 and hence cannot make it in terms of $D = (1, 2)$ . This allows us to stop the expansion of $P (4, 2)$ earlier and save run time for maximal painting. In addition, we also find that $ϵ^{i}$ plays two roles – null and conflict. Only when $i = 0$ and $j = 0$ does $ϵ^{i}$ stand for null. In the other cases, $ϵ^{i}$ represents conflict. This time, $P (0, 2)$ is in conflict because it cannot fit two clues.

If we let $P 0 (5, 2)$ stop calling $P (4, 2)$ at once, then we can reduce the run time for maximal painting. For this purpose, we need to know the necessary lengths for each subset of clues. However, this requires extra calculations and memory space. In this paper, we investigate a new parameter $free$ in the function to tackle the problem efficiently as follows: $\begin{matrix} f = free (D) = l + 1 - \sum_{d \in D} (d + 1) . \end{matrix}$

Note that d denotes the length of each black segment in D. The summation of $(d + 1)$ for all $d \in D$ calculates the length required for the clue and $l + 1$ is the original length of S plus the extra added prefix ‘0’ for each painting string, so $free$ indicates how many 0s can be placed at will. The paper of Bacchus and van Run (1995) may also contain formulas similar to the definition of “free”.

Bacchus and van Run (1995) defined a description d as “l-consistent” if $\begin{matrix} \sum_{d \in D} (d + 1) ⩽ l + 1 . \end{matrix}$

This is a similar definition as our freedom formula. But they used this to only show the difficulty of solving a Nonogram. In Lemma 4.1 and Lemma 4.2 of Bacchus and van Run (1995), they categorized two kinds of Nonograms: Simple type if $\sum_{d \in D} (d + 1) = l + 1$ or not-simple type if $\sum_{d \in D} (d + 1) < l + 1$ .

The previous maximal painting P and $merge$ functions are modified with the H and $join$ functions as below. $\begin{array}{l} H (f, j) = \{\begin{matrix} 0^{f}, & if j = 0 and 0^{f} prefix matches S \\ c, & if j = 0 and 0^{f} doesn’t prefix matches S \\ join (H 0 (f, j), H 1 (f, j)), & otherwise \end{matrix} \\ H 0 (f, j) = \{\begin{matrix} H (f - 1, j) 0, & if f > 0 and H (f - 1, j) 0 prefix matches S \\ c, & otherwise \end{matrix} \\ H 1 (f, j) = \{\begin{matrix} H (f, j - 1) σ (d_{j}), & if j > 0 and H (f, j - 1) σ (d_{j}) prefix matches S \\ c, & otherwise \end{matrix} \end{array}$

Here c is a single character that means conflicted, and $H (f, j)$ is the result of maximal painting in terms of the substring $s_{1} s_{2} \dots s_{i}$ with a $free$ value f and the clue $(d_{1}, d_{2}, \dots, d_{j})$ . A string $T = t_{1} t_{2} \dots t_{m}$ prefix matches $S = s_{1} s_{2} \dots s_{l}$ if and only if $m ⩽ l$ and $t_{1} t_{2} \dots t_{m} = s_{1} s_{2} \dots s_{m}$ except that $t_{i} = (0 | 1)$ and $s_{i} = u$ for some i. For example, $0 u 101$ prefix matches $0 u u u 1 u$ , because $0 u 101$ matches $0 u u u 1$ , that is the first 5 letters of $0 u u u 1 u$ . But $0 u c 01$ does not prefix match $0 u u u 1 u$ because $0 u c 01$ does not match $0 u u u 1$ , that is the first 5 letters of $0 u u u 1 u$ .

Let $S 3 = join (S 1, S 2)$ , then $S 3 = S 2$ if $S 1$ contains a character c; $S 3 = S 1$ if $S 2$ contains a character c; otherwise for all i, $S 3 [i] = S 1 [i] \oplus S 2 [i]$ , where ⊕ is defined as follows. $\forall x \in Γ = {0, 1, u}$ , $\begin{array}{l} x \oplus x = x \\ u \oplus x = u \\ 0 \oplus 1 = u \end{array}$

The previous function $P (6, 2)$ is expanded until $P (0, 2)$ . This is due to the deficiency of freedom of putting 0s in $S = s_{1} s_{2} \dots s_{6}$ . When it finally calls $P (0, 2)$ , it then knows S has been including too many 0s and obtains a failure output $ϵ^{0}$ . Here we investigate a new parameter $free$ in the modified function which is the number of extra slots for inserting 0s in $S = s_{1} s_{2} \dots s_{l}$ . If $free$ equals 0, then we cannot add extra 0s and will immediately stop calling the recursive function.

For the example in Fig. 2, $D = (1, 2)$ , the original length of S is $l = 5$ . We add one 0 at the beginning of S and $S = 0 u u u u u \to 0^{+} 1^{1} 0^{+} 1^{2} 0^{*}$ . Now $\begin{array}{l} f & = free (D) = free ((1, 2)) = l + 1 - \sum_{d \in D} (d + 1) \\ = 5 + 1 - ((1 + 1) + (2 + 1)) \\ = 1 \end{array}$

For this example, we have a linear space with 5 slots and we need to put 2 black segments with 1 and 2 slots respectively, and we need at least one slot to separate them. So there are already 4 slots needed and we have an extra free slot ( $f = 1$ ) to allocate. To derive the maximal painting, we first call $H (f, j) = H (1, 2)$ . Its expansion is illustrated in Fig. 3, and part of the process is shown below. $\begin{array}{l} H (1, 2) = join (H 0 (1, 2), H 1 (1, 2)) = join (H (0, 2) 0, H (1, 1) 011) \\ H (0, 2) = join (H 0 (0, 2), H 1 (0, 2)) = join (c, H (0, 1) 011) \end{array}$

We see that $H 0 (0, 2)$ quickly stops the recursion calling and returns the conflicted character c because the parameter $free$ equals 0. For completeness, we list the detailed calculations as follows. Note that some function values like $H (0, 1)$ and $H (0, 0)$ can be computed just once and stored in the dynamic programming table for later retrieving. $\begin{array}{l} H (1, 2) = join (H 0 (1, 2), H 1 (1, 2)) = join (H (0, 2), H (1, 1) 011) \\ H (0, 2) = join (H 0 (0, 2), H 1 (0, 2)) = join (c, H (0, 1) 011) \\ H (0, 1) = join (H 0 (0, 1), H 1 (0, 1)) = join (c, H (0, 0) 01) = join (c, 0^{0} 01) = 01 \\ So, H (0, 2) = join (c, H (0, 1) 011) = join (c, 01011) = 01011. \\ \begin{matrix} H (1, 1) & = join (H 0 (1, 1), H 1 (1, 1)) = join (H (0, 1) 0, H (1, 0) 01) \\ = join (010, 0^{1} 01) = join (010, 001) = 0 u u \end{matrix} \\ So, H (1, 2) = join (H (0, 2) 0, H (1, 1) 011) = join (010110, 0 u u 011) = 0 u u u 1 u . \end{array}$

In Table 1, we randomly generated 10000 puzzles and recorded the running time of maximal painting. The equipment we used is an Intel^® Core™ i9-7900X 3.30 GHz with 64 GB RAM. Most of the experiments in this paper were performed on the same device except Table 5. After using the freedom parameter, we obtained more than 4 times acceleration. The implementation of the solver without using the freedom parameter in Table 1 is based on the more efficient method in Appendix of Wu et al. (2013) which has a time complexity of $O (k l)$ . So Requiem is compared to the most efficient method in Wu et al. (2013). In addition, both are implemented by dynamic programming. The time complexity changes from $O (k l)$ to $O (k f)$ , and f is clearly less than l.

Table 1
Speed comparison with and without freedom

Total run time

With freedom 63 ms

Without freedom 264 ms

	Total run time
With freedom	63 ms
Without freedom	264 ms

Fig. 3.

$H (1, 2)$ for a string $S = 0 u u u u u$ with respect to $D = (1, 2)$ .

3. Fully Probing

After maximal painting, it may still have many unknown pixels on the board. The Davis-Putnam-Logemann-Loveland (DPLL) algorithm is a backtracking algorithm for examining all possibilities of unknown pixels. However, the time complexity for the DPLL algorithm is $O (2^{n})$ , where n is the number of unknown pixels. To determine more cell states, Batenburg and Kosters (2009) combined the constraints obtained from relaxations of the Nonogram problem into a 2-Satisfiability (2-SAT) problem (Cohen et al., 2008) to deduce more pixel values in the Nonogram solution before backtracking. In their method, if we can infer $a \to b$ and $a \to \neg b$ , the pixel a should be 0, where a and b are both unknown pixels. The time complexity for their method is $O (l^{7})$ . By iterating this procedure, starting from an empty grid, it is often possible to solve the puzzle completely. In Wu et al. (2013), using implications from contrapositives, three Fully Probing (FP) methods FP1, FP2, and FP3 were proposed to assign uncertain cells as either black or white after the maximal painting operations to determine more cell states. But practical implementations entail a high degree of time complexity. The time complexity for both FP1 and FP2 is $O (k l^{5})$ , and that for FP3 is $O (l^{6})$ . Although FP1 and FP2 have the same time complexity, FP2 needs to do more. However, the major factor of the time complexity in FP2 is covered by maximal painting. But now we have a faster way to do maximal painting by using freedom parameter so the overhead can be reduced. Therefore, we wrote a program to randomly generate 1000 $25 \times 25$ Nonogram puzzles and compared the average numbers of determined cells and their total running time of using FP1, FP2, and FP3. The results are shown in Table 2. We found that FP2 and FP3 determine slightly more pixels in the puzzles, but incur large computational overhead. In other words, among FP1, FP2, and FP3, FP1 is the most efficient way since it can solve almost the same number of pixels while only using 80% execution time on average. Hence, Requiem only uses FP1, as depicted in Fig. 4. This arrangement lets our program solve more puzzles in much less time.

Recently, Huang et al. (2018) sought to reduce some instructions in FP1, but this replicated the work of Wu et al. (Chen et al., 2015) in their program LalaFrogKK. Chen and Huang (2018) investigated the Group-Base Fully Probing (GP) to reveal the relations between unknown pixels. Instead of guessing the colors of unknown pixels one by one, GP can determine more pixels simultaneously during backtracking. Unfortunately, this causes their (GP) algorithm to incur excessive overhead, and thus it was not used in tournaments.

Table 2
Comparison of FP1, FP2, and FP3

Fully Probing methods Averaged No. of painted pixels Total runtime (ms)

FP1 198.744 1179

FP2 210.595 1485

FP3 210.597 1578

Fully Probing methods	Averaged No. of painted pixels	Total runtime (ms)
FP1	198.744	1179
FP2	210.595	1485
FP3	210.597	1578

Fig. 4.

FP1 scheme.

4. Policy for selecting unknown cells

In many puzzles, the FP1 method cannot correctly determine all pixels in the grid, and good performance still requires backtracking to paint the undetermined cells by carefully choosing the next pixel to paint. Seven choose-pixel heuristics were tried and compared in Wu et al. (2013). Among these, the Min-logd heuristic performed the fastest in all testing cases (Wu et al., 2013). $\begin{matrix} Min - logd = min (m_{p, 0}, m_{p, 1}) + | log (m_{p, 0} + 1) - log (m_{p, 1} + 1) | \end{matrix}$

Here we replace it with new Min-logd which has one less log calculation but with similar effect. $\begin{matrix} NewMin - logd = min (m_{p, 0}, m_{p, 1}) + log (| m_{p, 0} - m_{p, 1} | + 1) \end{matrix}$

A performance comparison between the original Min-logd and the new Min-logd heuristics is done by executing 100,000,000 times for each computation. The original Min-logd takes 4517 milliseconds while our new one takes only 2168 milliseconds.

When we apply the modified Min-logd to choose the next pixel to paint in the backtracking method, we use a loop to check each undetermined cell and choose the one with the maximal Min-logd value. The process usually compares the value with the “>” operation, but “⩾” is also available. The difference between “>” and “⩾” is that the former will choose the first pixel with the largest Min-logd value, and the latter will choose the last pixel with the largest Min-logd value in case multiple pixels have the same largest value. As shown in Tables 3 and 4, “⩾” occasionally has an advantage for total computation time for the 1000 puzzles of TAAI 2016. All of them used the same machine as Table 1. In other words, employing “⩾” in the condition for selecting unknown cells can improve the execution time for 30% Nonogram problems. Hence, we use “⩾” in our program although we don’t know the exact reason. The similar situation has also occurred in other studies. Huang et al. (2018) tried four different orders in Fully Probing and got different effects. We can regard “⩾” as “>” in a different order.

Table 3
The speed comparison of using “⩾” and “>” for the 1000 puzzles in TAAI 2016

Comparison item Quantity

The number of puzzles that “⩾” is quicker than “>” 374

The number of puzzles that “⩾” is about the same with “>” 321

The number of puzzles that “⩾” is slower than “>” 305

Comparison item	Quantity
The number of puzzles that “⩾” is quicker than “>”	374
The number of puzzles that “⩾” is about the same with “>”	321
The number of puzzles that “⩾” is slower than “>”	305

Table 4

The run time comparison of using “⩾” and “>” for the 1000 puzzles in TAAI 2016

	The total run time (s)
Using “⩾”	205
Using “>”	358

5. High performance data structures

In this section, we present some high performance data structures for dealing with the $25 \times 25$ Nonogram puzzles used in the competitions.

First, each cell in each line of the board has three possible states: black (01)₂, white (10)₂, and unknown (11)₂ and this state needs to be represented by at least 2 bits. An intuitive approach is to use two consecutive bits to represent a cell and a 64-bit long integer can sufficiently represent a line of 25 cells, as shown in Fig. 5(a). To retrieve the state of the n^th cell, we need a calculation like “ $line ≫ (n ≪ 1) & 0 x 3$ ” which has an essential overhead, where “≪” and “≫” are bitwise shift operators in the C/C++ language. To speed up this operation, we separate the two bits into two 32-bit integers whose combination is a 64-bit long integer. Bits $0 \sim 31$ are used to represent the black state, and bits $32 \sim 63$ are used to represent the white state, as shown in Fig. 5(b). If both bits corresponding to a cell are 1, then it indicates an unknown state. Instead we use the operation “ $(line ≫ n) & 0 x 100000001$ ” which is quicker. A performance comparison between the calculation “ $line ≫ (n ≪ 1) & 0 x 3$ ” and the new one “ $(line ≫ n) & 0 x 100000001$ ” is done by executing 1,000,000,000 times for each calculation using the same environment as in the competitions. The former takes 566 milliseconds while the latter takes only 481 milliseconds.

Furthermore, we use the remaining pattern (00)₂ to express c, the conflicted result for the H and $join$ functions. Since our design uses the integer zero directly to represent c, the $join$ function can be done efficiently by the bitwise-OR operation.

Fig. 5.

Two representations of a line.

To represent the clue of a line, we use a 32-bit unsigned integer. Each number in the clue is denoted by consecutive 0s followed by a 1. Figure 6 illustrates an example of the clue $D = (1, 2, 5)$ which is stored as $0 x 48200000$ . When we want to retrieve the first number of the clue, we can execute the lzcnt instruction in the CPU which counts the number of leading zero bits in an integer. The free parameter f of $P (f, j)$ in the previous section can be computed by the LS1B operation to extract the rightmost 1 and then perform the lzcnt operation. On a modern CPU, the blsi instruction in BMI performs the same function as lzcnt. Therefore, the free parameter f can be calculated as follows. $\begin{matrix} free (clue) = 25 - lzcnt (blsi (clue)) \end{matrix}$

For the example in Fig. 6, $clue = 0 x 48200000$ , its free parameter f is calculated as follows and has a value of 15. $\begin{array}{l} free (0 x 48200000) \\ = 25 - lzcnt (blsi (0 x 48200000)) \\ = 25 - lzcnt (0 x 200000) \\ = 25 - 10 \\ = 15 \end{array}$

Fig. 6.

Data structure for the clue $D = (1, 2, 5)$ .

This means the number of free slots for putting extra 0s is 15 in a line with 26 cells. Note that we have added an extra 0 at the left of the original line with 25 cells. In the calculation, $blsi (0 x 48200000)$ extracts the rightmost 1 and gets $0 x 200000$ . Then $lzcnt (0 x 200000)$ counts the number of leading zero bits in $0 x 200000$ and gets 10. This efficient approach significantly reduces the total processing time.

6. Experiments

We collected all the Nonogram competition records of ICGA, TAAI, and TCGA from 2011 to 2018 and ran our program to solve all 1000 puzzles for each competition. Not all computers used in the competition are identical, most of them had $3 \sim 3.6$ GHz processors, so we chose a slower machine (Intel^® Core™ i7-4720HQ 2.6 GHz) to run our program for fair treatment. The 1000 puzzles were different in each competition. They were randomly generated from the seeds input by the participants. All contestants run on the same computer. Each program is only allowed to use a single CPU core. The time limitation is 2 hours. The rank is determined by the number of correctly solved puzzles. If two participants have the same numbers of correctly solved puzzles, the rank is determined by the time for solving puzzles. According to the above rules, even if you spend a relatively short period of time but give a wrong answer, you will lose to the person who solves all puzzles. As shown in Tables 5 to 8, the total solving time of our program was faster than all previous programs. Our program successfully solved all 1000 puzzles in every tournament and took less time than the half time of all other programs. Unfortunately, in the game record, we only learn how many puzzles were solved by each program. But we do not exactly know which puzzle cannot be solved correctly. There are three reasons why the contestants cannot solve 1000 puzzles. First, it may due to time limitation. They cannot solve all puzzles in two hours. Second, they may have some bugs, which cause incorrect answers. Finally, some contestants may skip the hard puzzles to save time. For example, suppose that the 500^th puzzle has already taken an hour and has not been solved yet, they may end up with only 499 solved puzzles. However, if they give up strategically by skipping this puzzle, they still have a chance to get the other 999 correct answers.

Furthermore, our program Requiem won the TAAI 2017 Tournament, the ICGA 2018 Tournament and TAAI 2018 Tournament (see Tables 6, 7 and 8). ICGA 2018 featured 7 participants, most of which successfully applied the method proposed by Wu et al. to solve all the 1000 puzzles. However, Requiem was the fastest.

Table 5
Comparison of Requiem and previous gold medal programs (Wu, 2011)

Competitions Total solving time of Requiem Total No. of puzzles solved correctly with Requiem Total solving time of gold medal program Total No. of puzzles solved correctly with gold medal program

ICGA 2017 304 s 1000 2021 s 1000

TCGA 2017 563 s 1000 1752 s 1000

TAAI 2016 205 s 1000 407 s 998

ICGA 2016 300 s 1000 7200 s 257

TCGA 2016 276 s 1000 1881 s 1000

TAAI 2015 459 s 1000 7200 s 184

TCGA 2015 286 s 1000 908 s 1000

TAAI 2014 288 s 1000 664 s 1000

TCGA 2014 322 s 1000 1638 s 1000

TAAI 2013 243 s 1000 3440 s 1000

TCGA 2013 363 s 1000 672 s 1000

TCGA 2012 1135 s 1000 1433 s 1000

TAAI 2011 317 s 1000 645 s 1000

Competitions	Total solving time of Requiem	Total No. of puzzles solved correctly with Requiem	Total solving time of gold medal program	Total No. of puzzles solved correctly with gold medal program
ICGA 2017	304 s	1000	2021 s	1000
TCGA 2017	563 s	1000	1752 s	1000
TAAI 2016	205 s	1000	407 s	998
ICGA 2016	300 s	1000	7200 s	257
TCGA 2016	276 s	1000	1881 s	1000
TAAI 2015	459 s	1000	7200 s	184
TCGA 2015	286 s	1000	908 s	1000
TAAI 2014	288 s	1000	664 s	1000
TCGA 2014	322 s	1000	1638 s	1000
TAAI 2013	243 s	1000	3440 s	1000
TCGA 2013	363 s	1000	672 s	1000
TCGA 2012	1135 s	1000	1433 s	1000
TAAI 2011	317 s	1000	645 s	1000

Table 6

Competition results of TAAI 2017

Program name	Solved problems	Total solving time	Rank
Requiem	1000	868 s	1
LBC3	1000	3736 s	2
YCJ	894	7200 s	3
ThuNoNo	142	7200 s	4
CH	6	7200 s	5

Table 7

Competition results of ICGA 2018

Program name	Solved problems	Total solving time	Rank
Requiem	1000	772 s	1
LBC4	1000	3212 s	2
Sequence	1000	4147 s	3
Jjyeh	1000	4232 s	4
CH	1000	4331 s	5
Paradox Nonogram	275	7200 s	6
ThuNoNo	131	7200 s	7

Table 8

Competition results of TAAI 2018

Program name	Solved problems	Total solving time	Rank
Requiem	1000	212 s	1
YCH	1000	906 s	2
LBCV	999	1940 s	3
ThuNoNo	48	7200 s	4

7. Concluding remarks

This paper investigates a new free parameter f in the dynamic programming approach to finish maximal painting earlier without significant overhead expense. Compared with previous approaches, maximal painting with the free parameter achieves more than 4 times speedup. Combined with high performance data structures, this approach substantially reduces memory usage for puzzle clues.

However, Fully Probing (FP) 1 still maintains lots of data. Fully Probing 1 embedded with the concept of the free parameter is only $2 \sim 3$ times faster than previous approaches. Future speed enhancement may require exploring more efficient methods for FP2 and FP3 in practical implementations. As mentioned in Wu et al. (2013), The FP method is actually a generic method for many puzzle problems such as Nurikabe, Slitherlink, and Sudoku. We believe that our techniques may also be generalized for solving other kinds of puzzles. Also, there are many methods for solving satisfiability problems (Biere et al., 2009). These methods may help us to speed up the efficiency of solving Nonograms in the future.

Footnotes

Acknowledgement

This research was supported in part by a grant MOST 106-2221-E-003-027-MY2 from the Ministry of Science and Technology, R.O.C.

References

Bacchus, F. & van Run, P. (1995). Dynamic variable ordering in CSPs. In Principles and Practice of Constraints Programming (CP-95). Lecture Notes in Computer Science (Vol. 976, pp. 258–277). Berlin, Germany: Springer. doi:10.1007/3-540-60299-2_16.

Batenburg, K.J., Henstra, S., Kosters, W.A. & Palenstijn, W.J. (2009). Constructing simple nonograms of varying difficulty. Pure Math. Appl., 20, 1–15.

Batenburg, K.J. & Kosters, W.A. (2009). Solving nonograms by combining relaxations. Pattern Recognition, 42(8), 1672–1683. doi:10.1016/j.patcog.2008.12.003.

Biere, A., Heule, M., Van Maaren, H. & Walsh, T. (2009). Handbook of Satisfiability. Frontiers in Artificial Intelligence and Applications (Vol. 185).

Bitboard Serialization, Chess programming, [Online]. Available at: https://chessprogramming.wikispaces.com/Bitboard+Serialization.

BMI instruction sets – Wikipedia, [Online]. Available at: https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets.

Chen, K.Y., Kuo, C.H., Kang, H.H., Sun, D.J. & Wu, I.C. (2015). LalaFrogKK source code, [Online]. Available at: https://github.com/CGI-LAB/Nonogram.

Chen, L.-P. & Hung, C.-Y. (2017). A new simplified line solver for nonogram puzzle games. In 2017 TCGA Computer Game, National Penhu University, Taiwan, May 5–8.

Chen, L.P. & Huang, K.C. (2018). Solving nonogram puzzles by using group-based fully probing. In The 10 ^th International Conference on Computers and Games , New Taipei City, Taiwan.

10.

Cohen, D., Jeavons, P. & Gyssens, M. (2008). A unified theory of structural tractability for constraint satisfaction problems. J. Comput. Syst. Sci., 74(5), 721–743. doi:10.1016/j.jcss.2007.08.001.

11.

Contraposition – Wikipedia, [Online]. Available at: https://en.wikipedia.org/wiki/Contraposition.

12.

Cormen, T.H., Leiserson, C.E., Rivest, R.L. & Stein, C. (2009). Introduction to Algorithms (3rd ed.). Cambridge, MA, USA: MIT Press.

13.

Huang, K.C., Yeh, J.J., Huang, W.C., Guo, Y.R. & Chen, L.P. (2018). Exploring effects of fully probing sequence on solving nonogram puzzles. In The 10 ^th International Conference on Computers and Games , New Taipei City, Taiwan.

14.

ICGA 2018 computer game tournaments, [Online]. Available at: http://www.tcga.tw/icga-computer-olympiad-2018.

15.

TAAI 2017 computer game tournaments, [Online]. Available at: http://www.tcga.tw/taai2017.

16.

TAAI 2018 computer game tournaments, [Online]. Available at: https://www.tcga.tw/taai2018.

17.

Ueda, N. & Nagao, T. (1996). NP-completeness results for NONOGRAM via parsimonious reductions. Technical Report TR96-0008, Department of Computer Science, Tokyo Institute of Technology.

18.

Wolter, J. The ‘pbnsolve’ paint-by-number puzzle solver, [Online]. Available at: http://webpbn.com/pbnsolve.html.

19.

Wu, I.C., Sun, D.J., Chen, L.P., Chen, K.Y., Kuo, C.H., Kang, H.H. & Lin, H.H. (2013). An efficient approach to solving nonograms. IEEE Transactions on Computational Intelligence and AI in Games, 5(3), 251–264. doi:10.1109/TCIAIG.2013.2251884.

20.

Wu, K.-C. (2011). TAAI 2011 nonogram tournament result, [Online]. Available at: http://kcwu.csie.org/~kcwu/nonogram/taai11/.

21.

Yen, S.-J., Su, T.-C., Chiu, S.-Y. & Chen, J.-C. (2010). Optimization of nonogram’s solver by using an efficient algorithm. In 2010 International Conference on Technologies and Applications of Artificial Intelligence (pp. 444–449). doi:10.1109/TAAI.2010.95.