A probabilistic anytime algorithm for the halting problem

Abstract

The Halting Problem, the most (in)famous undecidable problem, has important applications in theoretical and applied computer science and beyond, hence the interest in its approximate solutions.

Experimental results reported on various models of computation suggest that halting programs are not uniformly distributed – running times play an important role. A reason is that a program which eventually stops but does not halt “quickly”, stops at a time which is algorithmically compressible.

In this paper we work with running times to define a class of computable probability distributions on the set of halting programs in order to construct an anytime algorithm for the Halting problem with a probabilistic evaluation of the error of the decision.

Keywords

Halting Problem anytime algorithm running time distribution

1. Introduction

The Halting Problem asks to decide, from a description of an arbitrary program and an input, whether the computation of the program on that input will eventually stop or continue forever. In 1936 Alonzo Church, and independently Alan Turing, proved that (in Turing’s formulation) an algorithm to solve the Halting Problem for all possible program-input pairs does not exist; two equivalent models have been used to describe computation by algorithms (an informal notion), the lambda calculus by Church and Turing machines by Turing. The Halting Problem is historically the first proved undecidable problem; it has many applications in mathematics, logic and theoretical as well as applied computer science, mathematics, physics, biology, etc. Due to its practical importance approximate solutions for this problem have been proposed for quite a long time, see [2,5–7,9,14,16,19,21,26].

Anytime algorithms trade execution time for quality of results [13]. These algorithms can be executed in two modes: either by a given contract time to execute or an interruptible method. Instead of correctness, an anytime algorithm returns a result together with a “quality measure” which evaluates how close the obtained result is to the result that would be returned if the algorithm ran until completion (which may be prohibitively long). To improve the solution, anytime algorithms can be continued after they have halted. That is similar to the use of iterative processes in numerical computing in which, after the process has been halted, if the output is not considered to be acceptable, then it can be refined by resuming the iterative process.

Following Manin [21] we use a more general form of anytime algorithm as an approximation for a computation which may never stop. An anytime algorithm for the Halting Problem works in the following way: to test whether a program eventually stops on a given input we first effectively compute a threshold time – the interruptible (stopping) condition – and then run the program for that specific time. If the computation stops, then the program was proved to halt; if the computation does not stop, then we declare that: (a) the program will never stop and (b) evaluate the probability of error, i.e. the probability that the program may eventually stop. The goal is to prove that the probability of error can be made as small as we wish. By running the program a longer time we can improve its performance either by getting to the halting time or by decreasing the probability of error. Another important goal is to develop feasible anytime algorithms for the Halting Problem.

In [6 ,7] anytime algorithms for the Halting Problem have been developed using the fact – proved in [7] – that programs which take a long time to halt stop at “algorithmically compressible times”, i.e. times which a computer can generate from “smaller” inputs. Although the set of algorithmically compressible times has constructive density zero, the stopping times obtained using this method are very large and the theoretical bounds cannot be improved in general [6]. However, experimental results in [30] for small Turing machines indicate much smaller stopping times. Furthermore, the experimental results in [27,30,31] and theoretical results in [14,15] show that halting programs are not uniformly distributed and their distributions depend on the running times of the specific model of computation.

In this paper we construct a class of computable probability distributions on the set of halting programs based on their running times. Each computable probability distribution induces a probability space on the set of halting programs. Using this probabilistic framework we construct an anytime algorithm for the Halting problem with a probabilistic evaluation of the error of the decision.

The paper is organised as follows. We start with a section on basic notation. Section 3 presents the computability and complexity part while Section 4 is dedicated to probability and statistics. Section 5 is dedicated to the presentation of the probabilistic framework for the anytime algorithms. We start by examining two plausible a priori halting probabilities. The analysis of their inadequacy leads to the introduction of running time probability spaces on the set of halting programs which, as their names indicate, are constructed using computable probability distributions on the set of running times. In Section 6 describe the anytime algorithm and prove its correctness. In Section 8 we propose methods to improve the performance of the algorithm and evaluate it power and limits. Finally, we discuss some open problems.

2. Notation

In the following we will denote by $Z^{+}$ the set of positive integers ${1, 2, \dots}$ and let $\overline{Z^{+}} = Z^{+} \cup {\infty}$ ; $R$ is the set of reals. The domain of a partial function $F : Z^{+} ⟶ \overline{Z^{+}}$ is denoted by $dom (F)$ : $dom (F) = {x \in Z^{+} ∣ F (x) < \infty}$ . We denote by $# S$ the cardinality of the set S and by $P (X)$ the power set of X.

We assume familiarity with elementary computability theory and algorithmic information theory [4,11,20].

For a partially computable function $F : Z^{+} ⟶ \overline{Z^{+}}$ we denote by $F (x) [t] < \infty$ the statement “the algorithm computing F has stopped exactly in time t”. For $t \in Z^{+}$ we consider the computable set $Stop (F, t) = {x \in Z^{+} ∣ F (x) [t] < \infty}$ , and note that $\begin{matrix} (1) & dom (F) = ⋃_{t \in Z^{+}} Stop (F, t) . \end{matrix}$

3. Complexity and universality

The algorithmic complexity relative to a partially computable function $F : Z^{+} ⟶ \overline{Z^{+}}$ is the partial function $\nabla_{F} : Z^{+} ⟶ \overline{Z^{+}}$ defined by $\nabla_{F} (x) = inf {y \in Z^{+} ∣ F (y) = x}$ . If $F (y) \neq x$ for every $y ⩾ 1$ , then $\nabla_{F} (x) = \infty$ . That is, the algorithmic complexity of x is the smallest description/encoding of x with respect to the interpreter/decoder F, or infinity if F cannot produce x.

A partially computable function U is called universal if for every partially computable function $F : Z^{+} ⟶ \overline{Z^{+}}$ there exists a constant $k_{U, F}$ such that for every $x \in dom (\nabla_{F})$ we have $\begin{matrix} (2) & \nabla_{U} (x) ⩽ k_{U, F} \cdot \nabla_{F} (x) . \end{matrix}$

Theorem 3.1 ([6]).

A partially computable function U is universal iff for every partially computable function $F : Z^{+} ⟶ \overline{Z^{+}}$ there exists a constant $c_{U, F}$ such that for every $x \in dom (F)$ we have $\begin{matrix} (3) & \nabla_{U} (F (x)) ⩽ c_{U, F} \cdot x . \end{matrix}$

The difference between (2) and (3) is in the role played by F: in the traditional condition (2), F appears through $\nabla_{F}$ (which for some F can be incomputable), while in (3) F appears as argument of $\nabla_{U}$ , making the second member of the inequality always computable.

A universal partially computable function U “simulates” any other partially computable function F in the following sense: if $x \in dom (F)$ , then from (3) we deduce that $\nabla_{U} (F (x)) ⩽ c_{U, F} \cdot x$ , hence there exists an $y ⩽ c_{U, F} \cdot x$ in $dom (U)$ such that $U (y) = F (x)$ . In particular, $\nabla_{U} (x) < \infty$ , for all $x \in Z^{+}$ .

The set $dom (U)$ (see (1) for $U = F$ ) is computably enumerable, but not computable (the undecidability of the Halting Problem), its complement $\overline{dom (U)}$ is not computably enumerable, but the sets ${(Stop (U, t))}_{t ⩾ 1}$ are all computable.

To solve the Halting Problem means to determine for an arbitrarily pair $(F, x)$ , where F is a partially computable function and $x \in Z^{+}$ , whether $F (x)$ stops or not, or equivalently, whether $x \in dom (F)$ , that is, $x \in Stop (F, t)$ , for some $t \in Z^{+}$ . In view of (2) or (3) solving the Halting Problem for a fixed universal U is enough to solve the Halting Problem. From now on we fix a universal U and study the Halting Problem $U (x) < \infty$ , for $x \in Z^{+}$ .

4. A glimpse of probability theory

In this section we define the main notions from probability theory used in this paper. For more details see [10,25].

A measurable space $(Ω, B (Ω))$ consists of a non-empty set Ω and a Borel field of subsets of Ω, $B (Ω) \subseteq P (Ω)$ . A probability space is a triple $(Ω, B (Ω), Pr)$ , where $(Ω, B (Ω))$ is a measurable space and $Pr : B (Ω) ⟶ [0, 1]$ is a probability measure, that is, Pr satisfies the following two conditions: (a) the probability of a countable union of mutually-exclusive sets in $B (Ω)$ is equal to the countable sum of the probabilities of each of these sets, and (b) $Pr (Ω) = 1$ . We interpret $B (Ω)$ as “the family of events” and Ω as “the certain event”.

Consider a probability space $(Ω, B (Ω), Pr)$ and a measurable space $(A, B (A))$ . A random variable is a measurable function $X : Ω ⟶ A$ , that is, for every $B \in B (A)$ we have $X^{- 1} (B) \in B (Ω)$ . In this case X induces a probability (called probability distribution of X) $P_{X} : B (A) ⟶ [0, 1]$ defined by $\begin{matrix} P_{X} (B) = Pr (X^{- 1} (B)) = Pr ({ω ∣ X (ω) \in B}), B \in B (A), \end{matrix}$ which identifies the probability space $(A, B (A), P_{X})$ .

The random variable X has a discrete probability distribution if A is at most countable. If we denote by $P_{X} ({x})$ the probability of the event ${X = x} = {ω \in Ω ∣ X (ω) = x}$ , then the discrete probability distribution of X is completely defined by the numbers $P_{X} ({x}) \in [0, 1]$ , $x \in A$ , with $\sum_{x \in A} P_{X} ({x}) = 1$ . A computable probability distribution $P_{X}$ is a discrete probability distribution such that the function $x \in A ↪ P_{X} ({x})$ is computable (in particular, $P_{X} ({x})$ is a computable real for each $x \in A$ [23, p. 159]; see also [24,29]).

In what follows we assume that $A \subseteq R$ .

The Cumulative Distribution Function of a random variable X is the function ${CDF}_{X} : R ⟶ [0, 1]$ defined by ${CDF}_{X} (y) = Pr (X ⩽ y)$ , $y \in R$ . In case X is a random variable with a discrete distribution, ${CDF}_{X}$ is the stair-function (with piecewise-constant sections) given by $\begin{matrix} {CDF}_{X} (y) = \sum_{{x \in A ∣ x ⩽ y}} P_{X} (x), y \in R . \end{matrix}$

For example, the generally accepted model for “the time-to-the-first-success” is the geometric distribution – the discrete probability distribution that expresses the probability that the first occurrence of an event (“success”) requires k independent trials, each with the same success probability θ. More precisely, the random variable X that takes values in $A = Z^{+}$ has a geometric distribution with the rate of success $θ \in (0, 1)$ if $P_{X} (k) = {(1 - θ)}^{k - 1} \cdot θ$ , $k ⩾ 1$ . In this case $\begin{matrix} {CDF}_{X} (y) = \sum_{\begin{array}{c} k = 1, k ⩽ y \end{array}}^{\infty} {(1 - θ)}^{k - 1} \cdot θ, y \in R, \end{matrix}$ and ${lim}_{y \to \infty} {CDF}_{X} (y) = 1$ .

The Quantile Function of the random variable X with a discrete distribution is the function $q_{X} : [0, 1] ⟶ A$ defined by $q_{X} (p) = inf {y \in A ∣ p ⩽ {CDF}_{X} (y)}$ . By definition, ${CDF}_{X} (q_{X} (p)) ⩾ p$ , for all $p \in [0, 1]$ .

For fixed $r \in [0, 1]$ , the value (number) $q_{X} (r)$ is called the rth quantile of the random variable X. Quantiles are important indicators that give information about the location and clustering of the probability values ${P_{X} (x), x \in A}$ . For example, if the data being studied are not actually distributed according to an assumed underlying probability distribution or if there are outliers far removed from the mean, then quantiles may provide useful information. Beside the classical quartiles – first, second (median), third – the lower and upper εth quantiles, $q_{X} (ε)$ and $q_{X} (1 - ε)$ , give important informations about the “tails” of the probability distribution (for small $ε > 0$ ). For more details see [1].

Proposition 4.1.
For every $ε \in (0, 1)$ we have $P_{X} ({x \in A ∣ x > q_{X} (1 - ε)}) ⩽ ε$ , hence ${lim}_{ε \to 0} P_{X} ({x \in A ∣ x > q_{X} (1 - ε)}) = 0$ .
Proof.
Indeed, we have: $\begin{array}{rcl} P_{X} ({x \in A ∣ x > q_{X} (1 - ε)}) & = & 1 - P_{X} ({x \in A ∣ x ⩽ q_{X} (1 - ε)}) \\ = & 1 - {CDF}_{X} (q_{X} (1 - ε)) ⩽ ε . \end{array}$ □

5. A probabilistic framework

In this section we describe a probabilistic framework for developing two anytime algorithms based on an analysis of the finite running times of all halting computations $U (x)$ , $x \in dom (U)$ .

5.1. Halting probability

Both approaches discussed in this paper depend – directly or indirectly – on a computable probability which has to model the informal notion of “halting probability” for programs for a universal U, that is, the probability that U stops on input x. Which probability measure should we choose?

First, various studies of concrete probability distributions for halting programs in different models of computations (see [15,27,30,31]) show that halting programs are not uniformly distributed and their probability distributions depend on the running times on the specific model of computation. This experimental evidence is reflected also in the theoretical result stating that programs which take a long time to halt stop at “algorithmically compressible times”, a set of constructive density zero [7].

Second, if we interpret the event “U stops on input x” as a “success” in a sequence of trials, then the running time of the computation $U (x)$ becomes “the-time-to-the-first-success”. In this setting, the event “ $U (x)$ stops exactly in time t”, that is “ $x \in Stop (U, t)$ ”, is interpreted as “t is the achieved time-to-the-first success”. Then, the truncated geometric distribution is a candidate for the halting probability. Is the geometric distribution probability a “natural” model for the halting probability? Can it be used for developing anytime algorithms for the Halting Problem? The answer to the first question is negative. Indeed, the phenomenon modelled by the geometric distribution requires a sequence of independent trials in which each trial has the same probability of success – a strong condition hardly satisfied by any U. For the second question we note that the computability of the probability – an essential property for evaluating the stopping condition of the anytime algorithm – depends on the computability of θ and $\sum_{t \in T_{U}} {(1 - θ)}^{t - 1}$ , which may be problematic even if $T_{U}$ is computable, see [28].

This brief analysis suggests that instead of “assuming” an a priori halting probability we should instead construct a model for the halting probability based on the running times $T_{U}$ .

5.2. A running time probability space

Recall that the finite running times1

¹
See [12] for modelling running times.

of the computations

U (x)

are the set of exact stopping times for the halting programs of U:

\begin{array}{l} T_{U} & = {t \in Z^{+} ∣ there exists x \in Z^{+} such that x \in Stop (U, t)} \\ = {t \in Z^{+} ∣ there exists x \in Z^{+} such that U (x) [t] < \infty} . \end{array}

Lemma 5.1.

The set $T_{U}$ is infinite.

Proof.

The statement in the lemma is true because for every $M \in Z^{+}$ there is a program $x \in dom (U)$ which stops in time larger than M: indeed, otherwise all programs would stop in time at most M, hence $dom (U)$ would be decidable, a contradiction. □

The undecidability of the Halting Problem casts a “computational uncertainty” on the membership problem of the set $dom (U)$ . In what follows we model this phenomenon by introducing a probabilistic structure on $dom (U)$ .

From (1) we have $\begin{matrix} (4) & dom (U) = ⋃_{t \in Z^{+}} Stop (U, t) = ⋃_{t \in T_{U}} Stop (U, t) . \end{matrix}$

Next we note that $\begin{matrix} Stop (U, t) \cap Stop (U, t^{'}) = \{\begin{matrix} \emptyset, & if t \neq t^{'}, \\ Stop (U, t), & otherwise. \end{matrix} \end{matrix}$

Let us consider the family of (finite and countable) unions of sets $Stop (U, t)$ , $t \in Z^{+}$ . This family includes $dom (U)$ and is closed under complement $\begin{matrix} \overline{Stop (U, t)} = ⋃_{t^{'} \in T_{U} ∖ {t}} Stop (U, t^{'}), \end{matrix}$ and countable unions; accordingly, it is a Borel field of subsets of $dom (U)$ , which we denote by $B (dom (U))$ . To define the discrete probability measure on the measurable set $(dom (U), B (dom (U)))$ we fix a computable probability distribution ρ on $T_{U}$ (see Section 5.3). Then, we put $\begin{matrix} (5) & Pr = {Pr}_{ρ} : B (dom (U)) ⟶ [0, 1], Pr (Stop (U, t)) = ρ (t), t \in T_{U} . \end{matrix}$

Now we can introduce a probability structure on the set $T_{U}$ via a random variable. Let $B (T_{U})$ be the family of all subsets of $T_{U}$ . The function $\begin{matrix} (6) & RT = {RT}_{U} : dom (U) ⟶ T_{U}, RT (x) = min {t > 0 ∣ x \in Stop (U, t)} \end{matrix}$ has the property that for every $t \in T_{U}$ , ${RT}^{- 1} ({t}) = Stop (U, t) \in B (dom (U))$ . Consequently, $RT$ is a random variable – in fact a stopping time – which will be called the running time associated withU. As described in Section 4, the random variable $RT$ induces the probability space $(T_{U}, B (T_{U}), P_{RT})$ on $T_{U}$ in which the probability is defined as follows: $\begin{matrix} P_{RT} ({t}) = Pr ({RT}^{- 1} ({t})), t \in T_{U} . \end{matrix}$ It is seen that for every $t \in T_{U}$ : $\begin{matrix} P_{RT} ({t}) = Pr (Stop (U, t)) = ρ (t) . \end{matrix}$

In what follows a computable probability space $(dom (U), B (dom (U)), {Pr}_{ρ_{U}})$ of the form defined in (5) will be called a running time probability space.

The random variable $RT$ is completely specified by a computable probability distribution on the set of finite running times of programs ofU, $\begin{matrix} (7) & {ρ (t) ∣ t \in T_{U}} . \end{matrix}$ Of course, we need to prove that a computable probability distribution (7) exists: this will be done in Section 5.3.

The cumulative distribution function ${CDF}_{RT} : T_{U} ⟶ [0, 1]$ of the discrete random variable $RT : dom (U) ⟶ T_{U}$ is then defined by the formula: $\begin{matrix} {CDF}_{RT} (k) = P_{RT} ({t \in T_{U} ∣ 1 ⩽ t ⩽ k}) = \sum_{t = 1, t \in T_{U}}^{k} P_{RT} (t), k \in T_{U}, \end{matrix}$ and the quantile function of $RT$ is $\begin{matrix} q_{RT} (r) = inf {k \in T_{U} ∣ {CDF}_{RT} (k) ⩾ r}, r \in (0, 1) . \end{matrix}$

For $ε \in (0, 1)$ we now use the $(1 - ε)$ -quantile $q_{RT} (1 - ε)$ as a probabilistic threshold separating the “the upper ε-tail” of the distribution, i.e. those very large running times t making the event “ $U (x) [t] < \infty$ ” negligible according to $P_{RT}$ .

5.3. Running time computable probability distributions

We now discuss examples of finite running time computable probability distributions for (7).

Every computable probability on $dom (U)$ is defined on a computably enumerable, but not computable set and has to satisfy the equality $ρ_{U} (t) = Pr (Stop (U, t))$ , for all $t \in T_{U}$ , so it has to depend on U. Furthermore, to construct $ρ_{U}$ we need:

a computable function ${pr}_{U} : Z^{+} \times Z^{+} \to [0, 1]$ such that ${pr}_{U} (x, t) > 0$ iff $x \in Stop (U, t)$ (the probability that $U (x)$ stops on time t),

a computable sequence of computable reals $υ_{U} (t) = \sum_{x = 1}^{\infty} pr (x, t)$ .

If $\sum_{t = 1}^{\infty} υ_{U} (t) = 1$ , then we can set $ρ_{U} (t) = υ_{U} (t) = Pr (Stop (U, t))$ . If $\sum_{t = 1}^{\infty} υ_{U} (t) < 1$ , we need to normalise with $Υ_{U} = \sum_{t = 1}^{\infty} υ_{U} (t)$ , hence $Υ_{U}$ has to be computable. In this case $\begin{matrix} ρ_{U} (t) = \frac{1}{Υ_{U}} \cdot υ_{U} (t) = Pr (Stop (U, t)) . \end{matrix}$

As an example we construct the computable probability distribution $ρ_{U}$ by using the function $pr = {pr}_{U} : Z^{+} \times Z^{+} \to [0, 1]$ defined by $\begin{matrix} (8) & pr (x, t) = \{\begin{matrix} \frac{2^{- x}}{t}, & if x \in Stop (U, t), \\ 0, & otherwise. \end{matrix} \end{matrix}$

The function pr is computable because the sets $Stop (U, t)$ are computable for every $t \in Z^{+}$ .

Theorem 5.2.
The real numbers $\begin{array}{l} (9) & υ (t) = υ_{U} (t) = \sum_{x = 1}^{\infty} pr (x, t), t \in Z^{+}, \\ (10) & Υ = Υ_{U} = \sum_{t = 1}^{\infty} υ (t) \end{array}$ are computable and $0 ⩽ υ (t) < Υ < 1$ .
Proof.
To prove that $υ (t)$ is computable we use Theorem 4.2.3 in [29] which says that the limit of a computable sequence of rationals having a computable modulus of convergence is a computable real. To this aim we need to prove that the modulus of convergence of the series (9) is computable. Indeed, for every positive integers $i < j$ and n we have $\begin{array}{l} \sum_{x = 1, x \in Stop (U, t)}^{j} \frac{2^{- x}}{t} - \sum_{x = 1, x \in Stop (U, t)}^{i} \frac{2^{- x}}{t} & = \sum_{x = i + 1, x \in Stop (U, t)}^{j} \frac{2^{- x}}{t} \\ ⩽ \sum_{x = i + 1}^{\infty} \frac{2^{- x}}{t} = \frac{2^{- i}}{t} ⩽ 2^{- n}, \end{array}$ for $i ⩾ ⌈ {log}_{2} (\frac{2^{- i}}{t}) ⌉$ , a computable convergence modulus.

A similar argument works for the series (10) because the set ${(t, x) \in Z^{+} \times Z^{+} ∣ x \in Stop (U, t)}$ is computable. □

Note that if $x \notin dom (U)$ , then for every $t \in Z^{+}$ we have $pr (x, t) = 0$ , so $υ (t) = 0$ .

Using Theorem 5.2 we construct the following computable probability distribution ρ on the set of finite running times $T_{U}$ : $\begin{array}{l} (11) & ρ (t) = ρ_{U} (t) = \frac{υ (t)}{Υ} = \frac{1}{Υ} \sum_{x = 1}^{\infty} pr (x, t), t \in T_{U} . \end{array}$ Indeed, from (11) and (10) we have: $\begin{matrix} \sum_{t \in T_{U}} ρ (t) = \sum_{t = 1}^{\infty} ρ (t) = \sum_{t = 1}^{\infty} \frac{1}{Υ} \sum_{x = 1}^{\infty} pr (x, t) = 1 . \end{matrix}$

As described in Section 4, using ρ we define the computable discrete probability space $(T_{U}, B (T_{U}), P_{ρ})$ , where $B (T_{U})$ is the set of all subsets of $T_{U}$ and $P_{ρ} (t) = ρ (t)$ .

The series (9) is a semi-measure [18, Section 4] with a computable sum2
²
The sum of a computable semi-measure may be not computable as Specker theorem [28] indicates.

– by Theorem 5.2, (10) – an essential property for the computability of the probability $P_{ρ}$ .

The above probability space – inspired from [7] – is “natural” because the discrete probability distribution combines the uniform distribution assumed in the halting probability Ω number, see [4], with the time complexity of halting programs (normalised by the computable number Υ, see (10)). In detail, the function (8) biases the programs x – assumed to be uniformly distributed – by dividing $2^{- x}$ to the program’s stopping time t.

A program which eventually stops but does not halt “quickly” stops at an algorithmically compressible time, hence the probability of a program, that doesn’t stop for a long time, to halt tends to zero, see [6,7]. More precisely, if $x \in Stop (U, t)$ , then the longer t is, the smaller the halting probability of x is; if the program never halts, that is $x \notin Stop (U, t)$ , for all t, then the halting probability of x tends to zero when $t \to \infty$ . Any computable probability distribution not reflecting this phenomenon is “un-natural”.

To further justify the “naturalness” of the probability $P_{ρ}$ we now show that it reflects the behaviour of both halting and non-halting programs. To this aim we use the series (9) to define a variation of ρ, namely a semi-computable probability distribution r on the set of all running times, finite or infinite, $T_{U} \cup {\infty}$ as follows: $\begin{matrix} r (t) = \{\begin{matrix} υ (t), & if t \in T_{U}, \\ 1 - Υ, & t = \infty . \end{matrix} \end{matrix}$ As by (10) and (11) $\begin{matrix} \sum_{t = 1}^{\infty} r (t) + r (\infty) = \sum_{t = 1}^{\infty} \sum_{x = 1}^{\infty} pr (x, t) + r (\infty) = 1, \end{matrix}$ we can define $P_{r} ({t}) = r (t)$ for $t \in T_{U} \cup {\infty}$ to obtain the semi-computable probability space $(T_{U} \cup {\infty}, B (T_{U} \cup {\infty}), P_{r})$ , where $B (T_{U} \cup {\infty})$ is the set of all subsets of $T_{U} \cup {\infty}$ .

In contrast with $P_{ρ}$ – which deals only with finite running times – $P_{r}$ handles also the infinite running time, the running time of non-halting programs. The normalisation factor Υ makes $P_{ρ}$ “reflect” the behaviour of non-halting programs too as the restriction of $P_{r}$ to $T_{U}$ is $\begin{matrix} P_{ρ} (t) = \frac{P_{r} ({t})}{Υ}, t \in T_{U} . \end{matrix}$

In (8) t can be replaced with $log (t + 1)$ or, more generally, with $g (t)$ , where g is a non-decreasing, unbounded, computable function: in this way we obtain a class of computable probabilistic distributions on $T_{U}$ .

In what follows, a computable probability distribution ρ on $T_{U}$ will be extended to $Z^{+}$ by setting $ρ (t) = 0$ for every $t \in Z^{+} ∖ T_{U}$ .

Finally, we note that the problem whether a concrete computable probability distribution is “natural” depends on various factors, some objective, others subjective. There are a few ways to mitigate subjectivity, in particular, to make (11) more “practical”. One possibility is to use instead of t a very slow increasing computable function $g (t)$ . A more substantial improvement can be obtained using Proposition 1.5.2 in [22]. Yet another way will be discussed in Section 7.
6. The probabilistic anytime algorithm

As we mentioned in Section 3, to solve the Halting Problem is enough to fix a universal U and to decide, for an arbitrary program x, whether $U (x) < \infty$ or $U (x) = \infty$ .

Our aim is to construct tan anytime algorithm for testing the incomputable predicate “ $U (x) < \infty$ ”. The decision to accept/reject the hypothesis “ $U (x) < \infty$ ” will be based on the running time of the computation $U (x)$ . A decision made by the anytime algorithm is erroneous when it returns the output “ $U (x) = \infty$ ”, when, in fact, $U (x) < \infty$ (that is, $U (x)$ eventually stops after a very long time).

The Halting Problem will be re-formulated within the probabilistic framework presented in Section 5 as follows: $\begin{matrix} For arbitrary x \in Z^{+}, test the hypothesis H_{x} : {U (x) < \infty} against the alternative H_{x}^{'} : {U (x) = \infty} . \end{matrix}$

The decision of rejecting $H_{x}$ will be taken on the basis of a critical time region $B_{x}$ . In both proposed anytime algorithms, the critical regions will not depend on x, that is, $B = B_{x}$ , for every $x \in Z^{+}$ .

An erroneous decision occurs when we reject $H_{x}$ on the basis of B, but $H_{x}$ is true. The quality of this decision is expressed by the probability of an erroneous decision, i.e. the probability that a halting program x stops in a time $t \in B$ .

In what follows we will work with an a priori running time probability space $(T_{U}, B (T_{U}), P_{RT})$ defined in (7) and the running time random variable $RT$ defined in (6). Our main example is $P_{RT} = P_{ρ}$ , where ρ comes from (8).

In what follows we fix a computable probability distribution $P_{RT}$ .

First we apply Proposition 4.1 to the random variable $RT$ :

Corollary 6.1.
For every $ε \in (0, 1)$ we have $\begin{matrix} (12) & P_{RT} ({t \in T_{U} ∣ t > q_{RT} (1 - ε)}) ⩽ ε, \end{matrix}$ hence ${lim}_{ε \to 0} P_{RT} ({t \in T_{U} ∣ t > q_{RT} (1 - ε)}) = 0$ .

We now use the inequality (12) in Corrolary 6.1 to propose the following probabilistic anytime algorithm for the Halting Problem:3
³
A probabilistic anytime algorithm is different from a Monte Carlo algorithm. For such an algorithm there is no need of amplification as the quality measure is fixed a priori by the bound on the probability of error.

Fix $ε = \frac{1}{M}$ with $M \in Z^{+}$ . Let x be an arbitrary program forU. If the computation $U (x)$ does not stop in time less than or equal to $q_{RT} (1 - ε)$ , then declare that $U (x) = \infty$ .

If the computation $U (x)$ stops in time less than or equal to $q_{RT} (1 - ε)$ , then obviously $U (x) < \infty$ . Otherwise, the answer to the question whether $U (x) < \infty$ is unknown and algorithmically unknowable. The above anytime algorithm gives an approximate answer. To analyse the quality of the answer produced by this anytime algorithm we choose the computable critical time region4
⁴
$B_{P_{RT}, ε}$ is independent of x; recall that $ε = \frac{1}{M}$ , $M \in Z^{+}$ .

$\begin{matrix} B (P_{RT}, ε) = {t \in T_{U} ∣ t > q_{RT} (1 - ε)}, \end{matrix}$ and the critical program region $\begin{array}{rcl} C (P_{RT}, ε) & = & {x \in Z^{+} ∣ U (x) [t] = \infty, for some t \in B (P_{RT}, ε)} \\ = & {x \in dom (U) ∣ RT (x) \in B (P_{RT}, ε)} . \end{array}$

Note that $\begin{matrix} Z^{+} ∖ dom (U) \subseteq C (P_{RT}, ε) \subset Z^{+} . \end{matrix}$ The first inclusion above is not necessarily an equality as there may exist $t_{1} > t_{0} > q_{RT} (1 - ε)$ such that $U (x) [t_{0}] = \infty$ and $U (x) [t_{1}] < \infty$ .

The anytime algorithm may output the answer “ $U (x) = \infty$ ” when in fact $U (x) < \infty$ . To evaluate the quality of the anytime algorithm we need to “compare” the set $C (P_{RT}, ε)$ – which gives the “anytime” answers “ $U (x) = \infty$ ” – with the exact set $Z^{+} ∖ dom (U)$ – giving the correct answers “ $U (x) = \infty$ ”. To this aim we evaluate the “size” of the set $C (P_{RT}, ε)$ with Pr.
Corollary 6.2.
For every $M \in Z^{+}$ $\begin{matrix} Pr (C (P_{RT}, \frac{1}{M})) = P_{RT} (B (P_{RT}, \frac{1}{M})) ⩽ \frac{1}{M} . \end{matrix}$
Proof.
As $C (P_{RT}, ε) = {RT}^{- 1} (B (P_{RT}, ε))$ , we have $\begin{matrix} Pr (C (P_{RT}, ε)) = P_{RT} (B (P_{RT}, ε)) \end{matrix}$ so from Corrolary 6.1 we deduce that for every $M \in Z^{+}$ we have: $\begin{array}{l} Pr (C (P_{RT}, \frac{1}{M})) & = P_{RT} (B (P_{RT}, \frac{1}{M})) \\ = 1 - {CDF}_{RT} (q_{RT} (1 - \frac{1}{M})) . \end{array}$ □
Comment.
Corollary 6.2 is stronger than the ones obtained in [6,7] where the probability and the stopping decision depend on the program x.

To implement the anytime algorithm above we need an algorithm to compute $\begin{matrix} q_{RT} (1 - \frac{1}{M}) = min {t \in T_{U} | {CDF}_{RT} (t) ⩾ 1 - \frac{1}{M}} . \end{matrix}$ As the set $T_{U}$ is only computably enumerable, we will not be able to compute exactly $q_{RT} (1 - \frac{1}{M})$ , but an upper bound for it. To this aim we consider a computably enumeration of $T_{U} = {t_{1}, t_{2}, \dots, t_{i}, \dots}$ and compute the following new bound: $\begin{matrix} \tilde{q_{RT}} (1 - \frac{1}{M}) = \sum_{i = 1}^{k} ρ (t_{i}) ⩾ q_{RT} (1 - \frac{1}{M}), \end{matrix}$ where $\begin{matrix} k = min {s \in Z^{+} | \sum_{i = 1}^{s} ρ (t_{i}) ⩾ 1 - \frac{1}{M}} . \end{matrix}$

Obviously, the anytime algorithm will work correctly with the larger bound, but this will increase its time complexity.
7. Testing the quality of the running time distribution

Inference-based-decisions are made using statistical procedures based on sets of observations. An inference-based-decision of a hypothesis results in one of two outcomes: the hypothesis is accepted or rejected. The outcome can be correct or erroneous. The set of observations leading to the decision “reject the hypothesis” is called the critical region.

Fix the probability space $(A, B (A), P_{X})$ induced by a random variable X. Consider a critical region $B \subset A$ , $B \in B (A)$ and an observed value $x \in A$ . For every $x \in A$ , a hypothesis $H_{x}$ is a statement such that “ $H_{x}$ is true” and “ $H_{x}$ is false” are measurable sets from $B (A)$ .

An inference-based-decision has the following form: $\begin{matrix} If the observed value x \in A belongs to B, then decide to reject the hypothesis H_{x} . \end{matrix}$

An error occurs if we reject $H_{x}$ on the basis of B, when $H_{x}$ is true. The probability of error, that is, the probability of an erroneous decision, is $P_{X} ({x \in B ∣ “ H_{x} is true”})$ . Of course, only decisions with (very) low probability of error are of genuine interest.

Making the “right” choices for the running time computable probability distributions is essential for successful applications. There are a few ways to guide and improve the quality of these choices. One possibility is to test how “natural” is a particular computable probability distribution for some universal U using the sampling algorithm. For example we can use the two-sample Kolmogorov–Smirnov goodness-and-fit test (see [8, pp. 309–314]) to test how “natural” is a particular computable probability distribution, say $P_{ρ}$ , for a given universal U.

We randomly sample $T_{U}$ to obtain a long sequence of independent, identically distributed running times $(t_{1}, \dots, t_{N})$ according to the discrete random variable $RT$ . This can be achieved by a dovetailing method to generate sufficiently many L halting programs ${POS}_{L} = (x_{1}, \dots, x_{L})$ and their running times $T_{{POS}_{L}} = (τ_{1}, \dots, τ_{L})$ . Then we implement a random sampling (see, for example, [3,17]) to extract N identically distributed running times from $T_{{POS}_{L}}, (t_{1}, \dots, t_{N})$ , which represent N independent, identically distributed replicates of the random variable $RT$ .

We can now use the associated Empirical Cumulative Distribution Function defined by $\begin{matrix} (13) & {ECDF}_{RT, N} (t) = \frac{# {1 ⩽ i ⩽ N ∣ t_{i} ⩽ t}}{N}, t \in T_{U} . \end{matrix}$

The two-sample Kolmogorov–Smirnov test compares these two empirical distribution functions in order to accept/reject the null hypothesis that the two datasets were drawn from “the same stochastic source”. The null hypothesis, denoted by $H_{0} : {P_{RT} = P_{ρ}}$ , states that the data produced by the sampling algorithm for the particular universal U fits the computable probability distribution $P_{ρ}$ . The decision of accepting/rejecting $H_{0}$ is taken on the basis of numerical comparison of ${ECDF}_{RT, N}$ and ${ECDF}_{ρ, N}$ .

8. Conclusions

In this paper we have proposed a probabilistic anytime algorithm for the Halting Problem. The anytime algorithm depends on the model of computation U; its quality depend on the computable probability distribution on the set stopping times.

The main motivation comes from experimental results reported in [27,30,31] and the theoretical results in [14,15]: they all suggest that halting programs are not uniformly distributed and depend on the running times of the specific model of computation.

We have used the fact that programs which take a long time to halt stop at “algorithmically compressible times” [7] to construct a class of computable probability distributions on the set stopping times of halting programs. Each computable probability distribution induces a running time probability space on the set of halting programs, the probabilistic framework for our anytime algorithms.

Next we discuss some features of the proposed anytime algorithm. We start with positive features. (P1) The cut-off temporal bound does not depend on programs. (P2) The a priori class of computable probabilities distributions presented in Section 5.3are not arbitrarily pre-imposed: they reflect the halting behaviour of the chosen universal machine through its running times. (P3) We can test empirically the choice of the computable probability distribution and sampling, hence adopt parameters suiting different universal machines.

However, the approach has limits. (L1) Because of (P1) and the use of a universal machine, the cut-off temporal bound could be large. This can be mitigated to some extent by (P3). (L2) Working with a fixed universal machine and programs x instead of pairs (program, y) increases the computational time as the simulation of the computation $program (y)$ on U takes longer than running $program (y)$ .

It is a natural follow-up to study the computational complexity of the proposed anytime algorithms and, complimentary, to test experimentally their performance for classes of interesting programs.

Footnotes

Acknowledgements

We thank N. Allen, M. Holmes, Yu. Manin, L. Staiger and G. Tee for useful comments and suggestions. Special thanks to the anonymous referee for excellent critical remarks and suggestions that improved the paper. This work has been supported in part by the Quantum Computing Research Initiatives at Lockheed Martin.

References

B.C.

Arnold ,

Balakrishnan and

N.N.

Nagaraja , A First Course in Order Statistics, John Wiley, New York, 2008.

Bienvenu ,

Desfontaines and

Shen , What percentage of programs halt?, in: Automata, Languages, and Programming I,

M.M.

Halldórsson ,

Iwama ,

Kobayashi and

Speckmann , eds, LNCS, Vol. 9134, Springer, 2015, pp. 219–230.

Bringmann and

Panagiotou , Efficient sampling methods for discrete distributions, Algorithmica (2016), 1–25. doi: 10.1007/s00453-016-0205-0.

C.S.

Calude , Information and Randomness: An Algorithmic Perspective, 2nd edn, Springer, Berlin, 2002.

C.S.

Calude and

Desfontaines , Universality and almost decidability, Fundamenta Informaticae138(1–2) (2015), 77–84.

C.S.

Calude and

Desfontaines , Anytime algorithms for non-ending computations, International Journal of Foundations of Computer Science26(4) (2015), 465–475. doi:10.1142/S0129054115500252.

C.S.

Calude and

M.A.

Stay , Most programs stop quickly or never halt, Advances in Applied Mathematics40 (2008), 295–308. doi:10.1016/j.aam.2007.01.001.

W.J.

Conover , Practical Nonparametric Statistics, John Wiley, New York, 1971.

Cook ,

Podelski and

Rybalchenko , Proving program termination, Communications ACM54(5) (2011), 88–98. doi:10.1145/1941487.1941509.

10.

DasGupta , Probability for Statistics and Machine Learning, Springer, New York, 2011.

11.

Downey and

Hirschfeldt , Algorithmic Randomness and Complexity, Springer, Heidelberg, 2010.

12.

C.A.

Furia ,

Mandrioli ,

Morzenti and

Rossi , Modeling Time in Computing, Springer, Berlin, 2012.

13.

Grass , Reasoning about computational resource allocation. An introduction to anytime algorithms, Magazine Crossroads3(1) (1996), 16–20. doi:10.1145/332148.332154.

14.

J.D.

Hamkins and

Miasnikov , The halting problem is decidable on a set of asymptotic probability one, Notre Dame Journal of Formal Logic47(4) (2006), 515–524. doi:10.1305/ndjfl/1168352664.

15.

Köhler ,

Schindelhauer and

Ziegler , On approximating real-world halting problems, in: Fundamentals of Computation Theory 2005,

Liskiewicz and

Reischuk , eds, LNCS, Vol. 3623, Springer, 2005, pp. 454–466. doi:10.1007/11537311_40.

16.

R.H.

Lathrop , On the learnability of the uncomputable, in: Proceedings International Conference on Machine Learning,

Saitta , ed., Morgan Kaufmann, 1996, pp. 302–309.

17.

P.S.

Levy and

Lemeshow , Sampling of Populations. Methods and Applications, 3rd edn, John Wiley, NJ, 1999.

18.

Li and

P.M.B.

Vitányi , An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn, Springer Verlag, New York, 2008.

19.

Lynch , Approximations to the halting problem, Journal of Computer and System Sciences9 (1974), 143–150. doi:10.1016/S0022-0000(74)80003-6.

20.

Y.I.

Manin , A Course in Mathematical Logic for Mathematicians, 2nd edn, Springer, Berlin, 2010.

21.

Y.I.

Manin , Renormalisation and computation II: Time cut-off and the halting problem, Mathematical Structures in Computer Science22 (2012), 729–751. doi:10.1017/S0960129511000508.

22.

Y.I.

Manin , Zipf’s law and L. Levin probability distributions, Functional Analysis and Its Applications48(2) (2014), 116–127. doi:10.1007/s10688-014-0052-1.

23.

Minsky , Computation: Finite and Infinite Machines, Prentice-Hall, Inc., Englewood Cliffs, NJ, 1967.

24.

Mori ,

Tsujii and

Yasugi , Computability of probability distributions and distribution functions, in: 6th International Conference on Computability and Complexity in Analysis, Schloss Dagstuhl–Leibniz-Zentrum für Informatik,

Bauer ,

Hertling and

K.-I.

Ko , eds, Dagstuhl, 2009, pp. 185–196.

25.

Olofsson , Probability, Statistics, and Stochastic Processes, Wiley-Interscience, New York, 2005.

26.

Rybalov , On the generic undecidability of the halting problem for normalized Turing machines, Theory of Computing Systems60 (2017), 671–676.

27.

Soler-Toscano ,

Zenil ,

J.-P.

Delahaye and

Gauvrit , Calculating Kolmogorov complexity from the output frequency distributions of small Turing machines, PLoS ONE9(5) (2014), e96223.

28.

Specker , Nicht konstruktiv beweisbare Sätze der Analysis, The Journal of Symbolic Logic14 (1949), 145–158. doi:10.2307/2267043.

29.

Weihrauch , Computable Analysis. An Introduction, Springer, Berlin, 2000.

30.

Zenil , Computer runtimes and the length of proofs, in: Computation, Physics and Beyond,

M.J.

Dinneen ,

Khoussainov and

Nies , eds, LNCS, Vol. 7160, Springer, 2012, pp. 224–240. doi:10.1007/978-3-642-27654-5_17.

31.

Zenil and

J.-P.

Delahaye , On the algorithmic nature of the world, in: Information and Computation. Essays on Scientific and Philosophical Understanding of Foundations of Information and Computation,

Dodig-Crnkovic and

Burgin , eds, World Scientific, Singapore, 2010, pp. 477–499.

A probabilistic anytime algorithm for the halting problem

Abstract

Keywords

1. Introduction

2. Notation

3. Complexity and universality

Theorem 3.1 ([6]).

4. A glimpse of probability theory

5.1. Halting probability

5.2. A running time probability space

1 See [12] for modelling running times.

8. Conclusions

Footnotes

Acknowledgements

References

¹
See [12] for modelling running times.