In a previous paper we used computer running times to define a class of computable probability distributions on the set of halting programs and developed a probabilistic anytime algorithm for the Halting Problem. The choice of a computable probability distribution – essential for the algorithm – can be rather subjective and hard to substantiate.
In this paper we propose and study an efficient statistical anytime algorithm for the Halting Problem. The main advantage of the statistical algorithm is that it can be implemented without any prior information about the running times on the specific model of computation and the cut-off temporal bound is reasonably small. The algorithm has two parts: the pre-processing which is done only once (when the parameters of the quality of solutions are fixed) and the main part which is run for any input program. With a confidence level as large as required, the algorithm produces correct decisions with a probability as large as required. Three implementations of the algorithm are presented and numerically illustrated.
The Halting Problem asks to decide, from a description of an arbitrary program and an input, whether the computation of the program on that input will eventually stop or continue forever. In 1936 A. Church, and independently A. Turing, proved that there is no algorithm solving the Halting Problem for all possible program-input pairs. The Halting Problem has many applications in logic and theoretical as well as applied computer science, mathematics, physics, biology, etc. Due to its practical importance approximate solutions for this problem have been proposed for quite a long time, see [2,6–10,14,17,20,22,25].
Anytime algorithms trade execution time for quality of results [13]. An anytime algorithm returns a result together with a “quality measure” which evaluates how close the obtained result is to the result that would be returned if the algorithm ran until completion (which may be prohibitively long). To improve the quality of the solution, anytime algorithms can be continued after they have halted if the output is not considered acceptable.
Here we use a more general form of anytime algorithm as an approximation for a computation which may never stop (see Manin [22]). Running times play an important role in this problem because halting programs are not uniformly distributed, see [16,27,29,30] for experimental work and [7,9,14,15] for theoretical results. Furthermore, every program either stops “quickly” or never stops [9]. This result was used in [8] to design an anytime probabilistic algorithm which simulates the program to be tested up to a threshold stopping time (this cut-off temporal bound is computed from the a priori accepted decision error and the probability distribution of stoping times of halting programs): if the computation still has not terminated by then, the algorithm reports (possibly wrongly) ‘The program does not halt!’. This anytime algorithm uses essentially a computable probability distribution on the set of stopping times of halting programs “reflecting” the halting behaviour of the chosen universal machine. The quantile of this probability distribution is utilised to compute the stoping threshold time, hence, the name “anytime probabilistic algorithm”. The probability of a wrong decision is no larger than the accepted error.
In this paper we propose a statistical anytime algorithm for the Halting Problem which improves the anytime probabilistic algorithm developed in [8] using a different strategy. In a pre-processing stage we sample sufficiently many terminating programs in an independent way (see [3,18]), determine their running times and consider the induced empirical cumulative distribution as approximation to the true, but unknown, cumulative distribution. Using an appropriate statistical framework we construct an anytime statistical algorithm which uses three parameters: the probability of an erroneous decision, the precision of and the confidence level in the estimation. The input program – tested for termination – is run for up to the largest number of steps made by any of the sampled programs. If the computation does not terminate by this time threshold (cut-off), then the (possible wrong) decision is that the program will never stop. With a confidence level as large as required, the anytime statistical algorithm produces correct decisions with a probability as large as required. Three implementations of the anytime algorithm are presented; numerical illustrations show that their time complexities are reasonably small.
The paper is organised as follows. We start with Section 2 on computability and complexity part for the Halting problem; Section 3 presents the probability framework and the probabilistic anytime algorithm for the Halting Problem, Section 4 presents the statistical framework, then Section 5 presents the statistical anytime algorithm for the Halting Problem and the proof of its main properties. Finally, in Section 6 we discuss three possible implementations of the statistical algorithm and present numerical illustrations; the last section is devoted to conclusions and possible extensions.
The Halting Problem
We denote by the set of positive integers ; and is the set of reals. For , is the ceiling function that maps α to the least integer greater than or equal to α. The domain of a partial function is denoted by : . We denote by the cardinality of the set S and by the power set of X. The indicator (or characteristic) function of a set M is denoted by .
We assume familiarity with elementary computability theory and algorithmic information theory [5,12,21]. For a partially computable function we denote by the statement “the algorithm computing F has stopped on x exactly in time t”. For we consider the computable set , and note that
The algorithmic complexity relative to a partially computable function is the partial function defined by . If for every , then . A partially computable function U is universal if for every partially computable function there exists a constant such that for every we have , see [7].
The set (see (1) for ) is computably enumerable, but not computable (the undecidability of the Halting Problem); its complement is not computably enumerable, but the sets are computable. To solve the Halting Problem means to determine for an arbitrarily pair , where F is a partially computable function and , whether stops or not, or equivalently, whether , that is, , for some . Solving the Halting Problem for a fixed universal U is enough to solve the Halting Problem. From now on we fix a universal function U and study the Halting Problem “For every , does ?”.
The probabilistic anytime algorithm for the Halting Problem
A probability space is a triple , where is a measurable space and is a probability measure, see [11,24]. A random variable is a measurable function defined on Ω with values in a set of real numbers; its probability distribution is denoted . The random variable X has a discrete probability distribution if A is at most countable. A computable probability distribution is a discrete probability distribution such that the function is computable (in particular, is a computable real for each , see [23,28]). The mean (or expected value) of the discrete random variable is defined by , if the series converges.
The Cumulative Distribution Function of a random variable X is the function defined by , . The Inverse Cumulative Distribution Function or Quantile function of the random variable X with a discrete distribution is the function defined by . For more details see [1].
Next we present the probability framework introduced in [8]. The finite running times of the computations are the set of exact stopping times for the halting programs of U: . The family of (finite and countable) unions of sets , generates the Borel field . A computable running time probability space is defined from a computable probability distribution ρ on by setting , , .
We introduce a probability structure on the set via a random variable. Let be the family of all subsets of . The function , has the property that for every , . The random variable RT – called the running time associated withU – induces the probability space on in which the probability is defined by , . For every we have: . For more details see [8].
Inference-based decisions are made using statistical procedures based on sets of observations. An inference-based decision of a hypothesis results in one of two outcomes: the hypothesis is accepted or rejected. The outcome can be correct or erroneous. The set of observations leading to the decision “reject the hypothesis” is called the critical region and its complement is called the acceptance region.
Consider a probability space induced by a random variable X. Consider an acceptance region with . For every observed value , a hypothesis is a predicate such that the sets and are in .
An inference-based decision has the following form:
An error occurs if we reject on the basis of when is true. The probability of error, that is, the probability of an erroneous decision, is .
We can reformulate the Halting Problem as an inference-based decision in which we test, for an arbitrary , the hypothesis (remember is a predicate) against the alternative . An “erroneous decision” means rejecting when .
The construction of the anytime algorithm for the Halting Problem is based on a computable acceptance region . Accordingly, the algorithm rejects the hypothesis if . This decision is correct if and it is wrong if . In detail, for an arbitrary program there are three possibilities: a) , b) , c) . Condition a) is decidable, but conditions b) and c) are undecidable. If the anytime algorithm gives the correct decision ; otherwise , so the anytime algorithm decides – rightly or wrongly – that . Furthermore, for every ,
As the right-hand side of (2) is a subset of we don’t need to work in but in , that is, in the probability space or, equivalently, in . The goal to minimise the probability of an erroneous decision can be achieved in this space as long as D and are measurable sets. This condition is satisfied by taking the decidable set of the form , for some appropriate T.
The anytime probabilistic algorithm proposed in [8] is:
The acceptance program region of the algorithm is
where is the decision error and the threshold is the -quantile of the probability distribution . The correctness of the algorithm comes from the inequality on the critical program region: .
Condition (3) can be equivalently stated in terms of running times as:
The critical time region is and the correctness comes from .
Statistical framework
The notions and results discussed in Section 3 are based on the assumption that the computable probability distribution on the set of finite running times of programs of U and the random variable RT are known. In case this assumption is not satisfied, can an inferential approach be used to extract information about the true but unknown probability distribution of the random variable RT from observations of the phenomenon described by RT? More precisely, instead of working with the theoretical , can we can we approximate the true, unknown probability distribution of the random variable RT by means of a (long-enough) sequence of independent, identically distributed random variables with the same distribution as RT? The answer is affirmative.
The following form of Hoeffding’s inequality (see [26]) is essential in what follows:
Letbe an integer and,. For everyindependent random variables defined onwith values inwe have:
Consider the probability space , the random variable and N replicates of X. In what follows will denote the observed values of a sample of size N corresponding to the random variables : . The vector will be called an N-dimensional sample and its values data points. The Empirical Cumulative Distribution Function is defined by
Suppose that we order increasingly the observed data points and denote the sequence by
The order statistics of rank k is the kth smallest value in (6): . See more in [11, Ch. 6]).
The statistical anytime algorithm assumes that the probability distribution of RT is unknown. Therefore, the cumulative distribution function of RT, , is also unknown and has to be estimated. For evaluating the quality of the approximation we fix a positive integer N and consider the true, unknown N-dimensional program sampling space.
The elements of will be denoted by . Projections , , , , are independent random variables. If we denote by , , , then are independent, identical distributed random variables. Furthermore, for every , we have:
For every we put , and denote the N-dimensional time sampling space by .
In the following Lemma 4.2, is estimated by the Empirical Cumulative Distribution Function (5)
On the other hand, using the linearity of the operator E we have:
As , , are independent random variables, for every , : , are also independent random variables. Consequently, the inequality (8) follows from Theorem 4.1 applied to the random variables . □
If we define the set of “good program samples” by
then by Lemma 4.2 we have
were λ is the precision parameter and can be interpreted as the confidence level that a program is in , i.e. it is a good program sample.
With this interpretation, Lemma 4.2 says that the probability of the set of programs on which estimates with precision at least λ can be made as “large” as one wishes. To measure the size of this set (according to ) we introduce the confidence level by the condition
which is equivalent to
The following result shows that for every the set of good program samples can be made as “large” as required in probability :
For every,,andwe have
A statistical anytime algorithm for the Halting Problem
For a fixed confidence level , precision parameter λ and good program sample x (which produces a “reliable” estimate of ) we use the critical time region (see (4)) to reject the hypothesis and to measure the probability of an erroneous decision of the anytime algorithm. Accordingly, for , the critical time region should satisfy the following two conditions:
For a sample of programs x we use the notation , where , . We increasingly order the observed running times and get the values of the corresponding order statistics . As one of these order statistics will be the choice for the threshold, , we must find the smallest number such that . In terms of order statistics, generates a statistical which must satisfy (12). Explicitly these two requirements are:
The statistical anytime algorithm for the Halting Problem will operate with three parameters: a) a bound for the decision error, b) a precision parameter which is a bound on the approximation of with , and c) a confidence parameter which is a probabilistic bound on the confidence in the precision parameter. In detail, the approximation parameter and confidence level control the quality of the approximation of by : a) the precision parameter λ is the numerical difference between the values of the two functions in a given point, b) the confidence level is the probability that the N sampled programs produce an approximation of with a requested precision, c) the decision error is the probability that the decision is returned when, in reality, .
We sample N independent halting programs (see [3,18]) and, by running them till they stop, calculate their respective running times . Let and . Randomisation is done according to the probability distribution induced by an injective computable enumeration of the halting programs.
The statistical anytime algorithm is:
We now evaluate the error the statistical anytime algorithm can make by deciding that does not stop when in fact it stops. To this aim we use the threshold and the critical regions
For every,with, we have:
We have:
□
For every integer,with, we have:
We only need to prove the implication:
If and , then , and .
Consequently, if , then
so . □
For everywithandwe have:
From the definition (7) and the choice of the statistical threshold (16) we have
In view of (17), (18) and (19) and (11) we have
□
According to (20), the probability of the event that the statistical anytime algorithm gives a wrong decision, that is, it declares when there exists such that stops in time t, is smaller or equal than ε, is larger than , i.e. the probability of error is smaller than or equal than ε with confidence larger than .
Implementations of the statistical anytime algorithm
In this section we present three implementations of the anytime algorithm; numerical illustrations show that their time complexities are reasonably small.
The standard implementation of the statistical anytime algorithm is as follows. Given three rational numbers with , first compute the sample size from (10), ; this positive integer is fixed as long as λ, δ are fixed. Then use an algorithm to generate a random injective computable enumeration of till N programs and their running times are obtained; again, these programs are fixed with ε, λ, δ. Then, for every program , if the computation does not stop in time , then declare that . In the latter case the probability of error is smaller than or equal to ε with confidence larger than .
In Table 1 we illustrate numerically and for the first implementation with fixed parameters ε, λ, δ having a few statistically standard values.
Numerical illustration of the first implementation of the anytime algorithm
δ
ε
λ ()
In a second implementation we start with two rational numbers with and an “affordable” size of samples (programs and running times), then compute the rational satisfying the inequality (9). We continue with the standard implementation of the statistical anytime algorithm with parameters ε, λ () and
to calculate the size sample . However, the value in (21) is not rational, so to preserve the inequality (9) we need to calculate a rational approximation
which implies the inequality .
The “price” paid working with an affordable, smaller sample size is a (possibly sharp) decrease in the confidence level; below is a numerical illustration of the second implementation with fixed parameters ε, λ, . Table 2 illustrates the second implementation.
Numerical illustration of the second implementation of the anytime algorithm
ε
λ ()
(very good)
(good)
(excellent)
(very good)
(hardly acceptable)
(unacceptable)
In a third implementation we start with two rational numbers with and an “affordable” upper bound T on the running time of the computation in the statistical anytime algorithm. We then use an injective dovetailing algorithm to generate as many elements of as possible in time T. In this way we will obtain a sample of programs and their respective running times such that each program stops in time at least (in fact, much smaller than) T: , for all . We then continue with the second implementation with parameters .
Both second and third implementations can be improved by increasing the sample size or the time bound, respectively.
Approximations in algorithmic information theory [4], for example, the approximations of Solomonoff universal distribution1
Solomonoff universal distribution [19] is an priori incomputable probability distribution over the set of finite binary strings which is different from the computable probability distribution on the set of stopping times discussed in Section 3.
[19], involve constants; in contrast, all implementations of the statistical anytime algorithm are free from such uncertainty.
Final comments
The anytime probabilistic algorithm for the Halting Problem proposed in [8] uses essentially a computable probability distribution on the set of stopping times of halting programs which reflects the halting behaviour of the chosen universal machine. The quantile of this probability distribution is used to compute the stopping threshold time. The probability of a wrong decision is no larger than the accepted error.
The statistical anytime algorithm for the Halting Problem – which is inspired by the probabilistic one – does not make any assumption on the probability distribution on the halting programs and uses an order statistics to compute the stopping threshold time (the cut-off temporal bound). In a nutshell, this anytime algorithm works on an arbitrary program as follows:
“Sample” sufficiently many halting programs independently at random.
Determine their running times and consider the induced empirical distribution as approximation to the true but unknown distribution.
Simulate the given program for the largest number of steps made by any of the sampled programs: if it still has not terminated by then, report (possibly wrongly) ‘The program does not halt!’.
In detail, the statistical algorithm uses three parameters for evaluating the quality of solutions, namely the probability of an erroneous decision ε, the precision λ and the confidence level δ of the statistical approximation. The sample size and critical regions are constructed based on these parameters. The main advantage of the statistical algorithm is that it can be implemented without any prior information about the distribution of running times. Another advantage is that the threshold (cut-off) temporal bound is calculated only once (when with are fixed) and then used for running the algorithm on any input. We proved that with a confidence level as large as required, the algorithm produces correct decisions with a probability as large as required.
The main advantage of the anytime statistical algorithm is that it can be implemented without any prior information about the running times on the specific model of computation – the choice of a computable probability distribution for the anytime probabilistic algorithm can be rather subjective and hard to substantiate; also, the cut-off temporal bound is reasonably small.
Finally, three implementations of the algorithm have been presented and numerically illustrated. Recent experimental work with Turing machines in [16] shows that “the halting probability of a TM decreases with time and will have a smaller chance of halting at every step it progresses”, reflecting the behaviour discovered in [9] which is at the core of both anytime probabilistic and statistic algorithms for the Halting Problem. It will be interesting to experiment these algorithms, particularly the statistical one, with different models of computations in order to understand their practical utility.
Footnotes
Acknowledgements
We thank Dr. Ned Allen for discussions and comments and for motivating one author (CC) to study practical approximate solutions to the Halting Problem. We thank the anonymous referees for useful comments that improved the presentation; we distinctly thank the referee who made many excellent proposals including the description of the statistical anytime algorithm presented in Section . This work was supported in part by the Quantum Computing Research Initiatives at Lockheed Martin.
References
1.
B.C.Arnold, N.Balakrishnan and H.N.Nagaraja, A First Course in Order Statistics, John Wiley, New York, 2008.
2.
L.Bienvenu, D.Desfontaines and A.Shen, What percentage of programs halt?, in: Automata, Languages, and Programming I, M.M.Halldórsson, K.Iwama, N.Kobayashi and B.Speckmann, eds, LNCS, Vol. 9134, Springer, 2015, pp. 219–230. doi:10.1007/978-3-662-47672-7_18.
3.
K.Bringmann and K.Panagiotou, Efficient sampling methods for discrete distributions, Algorithmica (2016), 1–25.
4.
C.Calude, Theories of Computational Complexity, North Holland, Amsterdam, 1988.
5.
C.S.Calude, Information and Randomness: An Algorithmic Perspective, 2nd edn, Springer, Berlin, 2002.
6.
C.S.Calude and D.Desfontaines, Universality and almost decidability, Fundamenta Informaticae138(1–2) (2015), 77–84. doi:10.3233/FI-2015-1199.
7.
C.S.Calude and D.Desfontaines, Anytime algorithms for non-ending computations, International Journal of Foundations of Computer Science26(4) (2015), 465–475. doi:10.1142/S0129054115500252.
8.
C.S.Calude and M.Dumitrescu, A probabilistic anytime algorithm for the Halting Problem, Computability7 (2018), 259–271. doi:10.3233/COM-170073.
9.
C.S.Calude and M.A.Stay, Most programs stop quickly or never halt, Advances in Applied Mathematics40 (2008), 295–308. doi:10.1016/j.aam.2007.01.001.
10.
B.Cook, A.Podelski and A.Rybalchenko, Proving program termination, Communications ACM54(5) (2011), 88–98. doi:10.1145/1941487.1941509.
11.
A.DasGupta, Probability for Statistics and Machine Learning, Springer, New York, 2011.
12.
R.Downey and D.Hirschfeldt, Algorithmic Randomness and Complexity, Springer, Heidelberg, 2010.
13.
J.Grass, Reasoning about computational resource allocation. An introduction to anytime algorithms, Magazine Crossroads3(1) (1996), 16–20. doi:10.1145/332148.332154.
14.
J.D.Hamkins and A.Miasnikov, The halting problem is decidable on a set of asymptotic probability one, Notre Dame Journal of Formal Logic47(4) (2006), 515–524. doi:10.1305/ndjfl/1168352664.
15.
S.Köhler, C.Schindelhauer and M.Ziegler, On approximating real-world halting problems, in: Fundamentals of Computation Theory 2005, M.Liskiewicz and R.Reischuk, eds, LNCS, Vol. 3623, Springer, 2005, pp. 454–466. doi:10.1007/11537311_40.
16.
K.Krzyzanska, Exploring halting times for unconventional halting schemes, Complex Systems27(1) (2018), 85–99. doi:10.25088/ComplexSystems.27.1.85.
17.
R.H.Lathrop, On the learnability of the uncomputable, in: Proceedings International Conference on Machine Learning, L.Saitta, ed., Morgan Kaufmann, 1996, pp. 302–309.
18.
P.S.Levy and S.Lemeshow, Sampling of Populations. Methods and Applications, 3rd edn, John Wiley, 1999.
19.
M.Li and P.M.B.Vitányi, An Introduction to Kolmogorov Complexity and Its Applications, 3rd edn, Springer Verlag, New York, 2008.
20.
N.Lynch, Approximations to the Halting Problem, Journal of Computer and System Sciences9 (1974), 143–150. doi:10.1016/S0022-0000(74)80003-6.
21.
Yu.I.Manin, A Course in Mathematical Logic for Mathematicians, 2nd edn, Springer, Berlin, 2010.
22.
Yu.I.Manin, Renormalisation and computation II: Time cut-off and the Halting Problem, Mathematical Structures in Computer Science22 (2012), 729–751. doi:10.1017/S0960129511000508.
23.
T.Mori, Y.Tsujii and M.Yasugi, Computability of probability distributions and distribution functions, in: 6th International Conference on Computability and Complexity in Analysis, Schloss Dagstuhl–Leibniz-Zentrum Für Informatik, A.Bauer, P.Hertling and K.-IKo, eds, Dagstuhl, 2009, pp. 185–196.
24.
P.Olofsson, Probability, Statistics, and Stochastic Processes, Wiley-Interscience, New York, 2005.
25.
A.Rybalov, On the generic undecidability of the halting problem for normalized Turing machines, Theory of Computing Systems (2016), 1–6.
26.
C.Scott, Statistical learning theory, topic 3: Hoeffding’s inequality, University of Toronto, 2014, https://www.coursehero.com/file/18068309/03-hoeffding, retrieved 4 June 2019.
27.
F.Soler-Toscano, H.Zenil, J.-P.Delahaye and N.Gauvrit, Calculating Kolmogorov complexity from the output frequency distributions of small Turing machines, PLoS ONE9(5) (2014), e96223.
28.
K.Weihrauch, Computable Analysis. An Introduction, Springer, Berlin, 2000.
29.
H.Zenil, Computer runtimes and the length of proofs, in: Computation, Physics and Beyond, M.J.Dinneen, B.Khoussainov and A.Nies, eds, LNCS, Vol. 7160, Springer, 2012, pp. 224–240. doi:10.1007/978-3-642-27654-5_17.
30.
H.Zenil and J.-P.Delahaye, On the algorithmic nature of the world, in: Information and Computation. Essays on Scientific and Philosophical Understanding of Foundations of Information and Computation, G.Dodig-Crnkovic and M.Burgin, eds, World Scientific, Singapore, 2010, pp. 477–499.