Approximating Kolmogorov complexity

Abstract

It is well known that the Kolmogorov complexity function (the minimal length of a program producing a given string, when an optimal programming language is used) is not computable and, moreover, does not have computable lower bounds. In this paper we investigate a more general question: can this function be approximated? By approximation we mean two things: firstly, some (small) difference between the values of the complexity function and its approximation is allowed; secondly, at some (rare) points the values of the approximating function may be arbitrary.

For some values of the parameters such approximation is trivial (e.g., the length function is an approximation with error d except for a $O (2^{- d})$ fraction of inputs). However, if we require a significantly better approximation, the approximation problem becomes hard, and we prove it in several settings. Firstly, we show that a finite table that provides good approximations for Kolmogorov complexities of n-bit strings, necessarily has high complexity. Secondly, we show that there is no good computable approximation for Kolmogorov complexity of all strings. In particular, Kolmogorov complexity function is neither generically nor coarsely computable, as well as its approximations, and the time-bounded Kolmogorov complexity (for any computable time bound) deviates significantly from the unbounded complexity function. We also prove hardness of Kolmogorov complexity approximation in another setting: the mass problem whose solutions are good approximations for Kolmogorov complexity function is above the halting problem in the Medvedev lattice. Finally, we mention some proof-theoretic counterparts of these results.

A preliminary version of this paper was presented at CiE 2019 conference (In Computing with Foresight and Industry – 15th Conference on Computability in Europe, CiE 2019, Durham, UK, July 15–19, 2019, Proceedings (2019) 230–239 Springer).

Keywords

Kolmogorov complexity generic computability coarse computability approximation mass problems

1. Introduction

Kolmogorov complexity $C (x)$ of a string x is the minimal length of a program that generates x, if we use some optimal programming language. This notion was introduced in the 1960s and since then the area was comprehensively studied. Kolmogorov complexity has many applications in computability theory, computational complexity, foundations of probability theory and statistics, learning theory, combinatorics, etc. (see, e.g., [9,11] for the details).

A simple observation similar to the Berry paradox shows that Kolmogorov complexity is not computable, and, moreover, all computable lower bounds for the Kolmogorov complexity function, even partial ones, are bounded. In this paper we study how far is $C (x)$ from being computable. There are two well known notions of an approximate computability. The first one is generic computability: a total function f is generically computable if there is a computable function that is defined “almost everywhere” and equals f on its domain. Since Kolmogorov complexity has no non-trivial computable lower bounds, it cannot be generically computable. The second one is coarse computability: a function f is coarsely computable if there exists a total computable function that coincides with f almost everywhere. We show that Kolmogorov complexity function is not coarsely computable either, even if we relax this notion by allowing a computable function to be close to the complexity function (and not necessarily equal).

One may consider the length function as an approximation for Kolmogorov complexity. It is well known that $C (x) ⩽ | x | + O (1)$ , where $| x |$ stands for the length of x, and that this inequality is close to equality for most strings: the fraction of strings x of some length n such that $C (x) < n - d$ is at most $2^{- d}$ . Can we improve this estimate substantially by using another computable function (e.g., the resource-bounded version of Kolmogorov complexity) instead of the length function? A surprising result is that the answer is negative: a similar tradeoff between the required precision and the fraction of points where the approximation does not work, applies to every computable function. We prove this result (in different versions) in Section 3. This proof is based on a finite version of this statement that says that a finite table that approximates the values $C (x)$ for strings x of length n with good precision and few errors, is necessarily complex. This finite version is given in Section 2.

In Section 4 we consider the approximation problem as a mass problem in the sense of Medvedev (an element of Medvedev lattice, see, e.g., [12] for definitions). We prove, under some natural assumptions about the approximation parameters, that this problem is hard (is above the halting problem in the Medvedev lattice).

Finally, in Section 5 we discuss some results about provability in formal arithmetic that are counterparts of the non-approximability results for Kolmogorov complexity (and use non-approximability in their proofs).

2. Approximations have high complexity

2.1. Plain and prefix complexities

Saying that “Kolmogorov complexity of a string is the length of a shortest program that generates this string”, we need to specify a programming language. Formally speaking, we introduce the notion of a decompressor. Let us give this definition for a more general case of conditional complexity.

Definition 1.
Let D be a computable function of two arguments. The complexity $C_{D} (x | y)$ of a string $x \in {0, 1}^{}$ conditional to a string* $y \in {0, 1}^{}$ with respect to the decompressor D* is the minimal length of a string p such that $D (p, y) = x$ . (The strings p such that $D (p, y) = x$ are called descriptions of x given y with respect to decompressor D). If there are no descriptions, let $C_{D} (x | y) = \infty$ .

For every string x one may consider a decompressor that contains x as a constant and outputs it on all inputs. Then the empty string is a description and the complexity is zero. Thus, by changing the decompressor, we may drastically change the complexity. The following result (discovered by Solomonoff and Kolmogorov) shows that this change is limited. It was the starting point for the algorithmic information theory.
Proposition 1.
There exists a decompressor U such that for every decompressor D there exists a constant c such that $C_{U} (x | y) ⩽ C_{D} (x | y) + c$ for all x and y.

Such machine U is called an optimal decompressor. The proof is simple: U treats a part of its first input as a self-delimited description of some decompressor D and launches D on the rest of the input. Considering a different optimal decompressor U, we change the complexity function at most by a constant. So the Kolmogorov complexity function is defined up to an $O (1)$ additive term, and most inequalities in the algorithmic information theory are valid up to $O (1)$ additive terms.
Definition 2.
Fix some optimal decompressor U. The value $C_{U} (x | y)$ is called the conditional complexity of x given y and denoted by $C (x | y)$ . By unconditional complexity $C (x)$ of a binary string x we mean $C (x | ϵ)$ , where ϵ is the empty string.

Here is the statement relating complexity and length (mentioned in the Introduction).
Proposition 2.
There is some c such that $C (x) ⩽ | x | + c$ for all x. For every m the number of strings x that have complexity less than m is less than $2^{m}$ .

The first statement is obtained by comparing the optimal decompressor D used in the definition of complexity to the trivial one were each x is a description of itself. The second statement is true because different strings must have different descriptions and there are at most $1 + 2 + \dots + 2^{m - 1} < 2^{m}$ descriptions of length less than m. It implies that the number of strings x of length n such that $C (x) < n - c$ is less than $2^{n - c}$ , i.e., their fraction among all n-bit strings is less than $2^{- c}$ .

We will need also a version of complexity called prefix complexity introduced (independently) by Levin and Chaitin.
Definition 3.
A decompressor D is prefix-free if $D (x, y)$ and $D (x^{'}, y)$ for any proper prefix $x^{'}$ of x cannot be defined at the same time.

There is a version of the Solomonoff–Kolmogorov result for prefix-free decompressors saying that there exists an optimal one (for this class):
Proposition 3.
There exists a prefix-free decompressor U such that for every prefix-free decompressor D there exists a constant c such that $\begin{matrix} C_{U} (x | y) ⩽ C_{D} (x | y) + c \end{matrix}$ for all x and y.
Definition 4.
Fix some optimal prefix-free decompressor U. Then the complexity $C_{U} (x | y)$ is called conditional prefix complexity of x given y and is denoted by $K (x | y)$ . Fixing $y = ε$ , we get the (unconditional) prefix complexity $K (x)$ .

Since the class of allowed decompressors is smaller for prefix complexity, the plain complexity (Definition 2) does not exceed the prefix complexity (up to a constant). The difference between them is not very big, as the following result says:
Proposition 4.
$\begin{matrix} C (x) - O (1) ⩽ K (x) ⩽ C (x) + 2 log C (x) + O (1) . \end{matrix}$

The left inequality is obvious; to prove the right one we have to convert an arbitrary programming language (decompressor) into a prefix one. Informally speaking, one could prepend each program by the self-delimited code of its length. Here is the sketch. Consider, for each natural number k, a binary string $\hat{k}$ : each bit of the binary representation of k is doubled, and then group 01 is added. Then, reading a string that has prefix $\hat{k}$ , we can determine where the binary representation ends (group 01 is distinguishable from 00 and 11 and therefore acts as a separator). Now, given a description p for an original decompressor, we consider a prefix-free decompressor that reads $\hat{| p |} p$ from left to right. As we have seen, the decompressor sees where the first part ends, and then knows how many bits remain. This guarantees the prefix-free property.

We refer the reader to [11] for the details of this proof and for other properties of plain and prefix complexities that are used in the sequel.
2.2. Approximations and their complexity

In this section we prove a basic technical result that deals with finite strings. This result says that every table that approximates (with small error) the complexity values for all strings of length n, except for a small fraction, has high complexity. Here is the exact statement.

Theorem 1.
Consider some n and a finite function $\tilde{C} : {0, 1}^{n} \to N$ (i.e., an array of $2^{n}$ natural numbers) that differs from the complexity function $C (x)$ by less than d for all strings x of length n, except for a $2^{- e}$ -fraction of them, where $e ⩽ n$ . Then $\begin{matrix} C (\tilde{C} | n) ⩾ e - 2 d - O (log n) . \end{matrix}$
Proof.
Reorder all strings x of length n in the order of non-increasing values $\tilde{C} (x)$ . Let us show that after the reordering there is a string z of complexity at least $n - 2 d - O (1)$ among the first $2^{n - e + 1}$ strings. Indeed, we may assume without loss of generality that $e > 2$ (otherwise the inequality is trivial). At least half of n-bit strings have complexity $n - 1$ or greater. Since we assume that $e > 2$ , there are less than $25 %$ strings where C deviates from $\tilde{C}$ by d or more, so for at least $25 %$ of n-bit strings the value of C is at least $n - 1$ and at the same time the value of $\tilde{C}$ is at least $n - d - 1$ . Therefore the $\tilde{C}$ -values for the first $2^{n - e + 1}$ strings (after reordering) are at least $n - d - 1$ (if there are at least $25 %$ of big values in an array, then after sorting they form a prefix of size at least $25 %$ ). One of these strings is not in the error set (of size $2^{n - e}$ ), so its complexity is at least $n - 2 d - O (1)$ .

Knowing n, the array of $\tilde{C}$ -values, and a $(n - e + 1)$ -bit string that indicates the position of a string z of complexity at least $n - 2 d - O (1)$ among the first $2^{n - e + 1}$ positions in the reordered array, we may reconstruct z. The string z has complexity at least $n - 2 d - O (1)$ . On the other hand, to describe z, we concatenate a self-delimited descriptions $\hat{n}$ , $\hat{e}$ of n, e (for that we need $O (log n)$ bits, since both numbers are at most n), the $(n - e + 1)$ -bit string indicating the position of z, and the description of $\tilde{C}$ given n. This is enough to reconstruct z: first we read n and e from their self-delimited descriptions, then we read the next $n - e + 1$ bits to get the index, and the rest is the description of the table $\tilde{C}$ given n (and n is already known). The total length of a description of z should be at least $n - 2 d - O (1)$ , since z has complexity at least $n - 2 d - O (1)$ . Therefore, $\begin{matrix} O (log n) + (n - e + 1) + C (\tilde{C} | n) ⩾ n - 2 d - O (1), \end{matrix}$ and we get the desired inequality. □
Remark 1.
In this theorem the length of the interval where the values of complexity should lie, is $2 d$ (approximation plus/minus d). If we are interested in the upper bounds for $C (x)$ that approximate it (i.e., allow only one-sided error not exceeding some d), then the interval is twice smaller (subtract $d / 2$ to get a two-sided $d / 2$ -approximation), so $2 d$ in Theorem 1 could be replaced by d.
Remark 2.
As we have mentioned, the bound provided by Theorem 1 is tight: the complexity of n-bit string is in the interval $[n - d, n + O (1)]$ except for $O (2^{- d})$ -fraction of n-bit strings.

2.3. Refinements

For a tighter bound (that will be used in the next section) we need to use the prefix version of complexity.

Theorem 2.
Under the conditions of Theorem 1 , we have $\begin{matrix} K (\tilde{C} | n) ⩾ e - 2 d - K (d) - O (1) . \end{matrix}$
Proof.
Now a bit more careful analysis is needed. Consider the concatenation of the following strings:
prefix-free description of d, of length $K (d)$ ;

prefix-free description of $\tilde{C}$ conditional on n, of length $K (\tilde{C} | n)$ ;

the bit string of length $n - e + 1$ that represents the ordinal number of z in the reordered array $\tilde{C}$ .
We claim that the length of this concatenation, i.e., $K (d) + K (\tilde{C} | n) + n - e + 1$ , is at least $n - 2 d - O (1)$ . To prove this, assume that its length is $n - 2 d - m$ for some positive m. We will show that $m ⩽ O (1)$ . Consider a prefix-free description of m of length $K (m) = O (log m)$ , and add it as a prefix to the concatenation we constructed. In this way we get a (plain, not necessarily prefix-free) description of z.

Indeed, having this combined string as an input, we first find m by reading the prefix-free description of it. Then we find d in the same way. We also know $n - 2 d - m$ since this is the length of the string without the prefix-free description of m (and we know where this prefix-free description ends). Since d and m are also known, we reconstruct n. Knowing n, we find the prefix-free description of $\tilde{C}$ given n, due to the prefix-free property.1
¹
It is important that at this moment we already know n. Indeed, the prefix-free property is valid only for the fixed condition (cf. the discussion in [1]).

Moreover, we can now find the array $\tilde{C}$ , and we know the index of z in this array (it follows the prefix-free description of $\tilde{C}$ given n), so we finally determine z. Since $C (z) ⩾ n - 2 d - O (1)$ , we get $\begin{matrix} K (m) + n - 2 d - m ⩾ n - 2 d - O (1), \end{matrix}$ i.e., $m - K (m) ⩽ O (1)$ , and this implies $m = O (1)$ , since $K (m) ⩽ log m ⩽ m / 2 + O (1)$ . Knowing that $m = O (1)$ , we get $\begin{matrix} K (d) + K (\tilde{C} | n) + (n - e + 1) ⩾ n - 2 d - O (1), \end{matrix}$ i.e., the required inequality. □

Looking at the proof, we can extract a bit more. We used d only to find n when $n - 2 d$ is known (note that $K (m)$ and the corresponding prefix are determined at that moment). Therefore, one could replace $K (d)$ by $K (n | n - 2 d)$ and get a bit stronger result that will be used in the next section:
Corollary 1.
Under the conditions of Theorem 1 , we have $\begin{matrix} K (\tilde{C} | n) ⩾ e - 2 d - K (n | n - 2 d) - O (1) . \end{matrix}$

Note also that $K (n | n - 2 d)$ can be replaced by $K (d | n - 2 d)$ : when $n - 2 d$ is given, n can be transformed to d and vice versa. We get simpler (though weaker) statements if we remove the condition, i.e., replace $K (n | n - 2 d)$ by $K (n)$ or $K (d)$ .

The following variation of our result extends it to the case when approximation is partial, i.e., when we allow $\tilde{C} (x)$ to be undefined for some x. To avoid complexity of partial objects, we consider the prefix conditional complexity of a program of a partial function that computes $\tilde{C} (x)$ .
Theorem 3.
Consider some n and some program $c$ that, given x, terminates and computes some number that differs from $C (x)$ by less than d, for all strings x of length n except for a $2^{- e}$ -fraction of them, where $e ⩽ n$ . (For those exceptional inputs the program $c$ may never terminate or give an arbitrary output.) Then $\begin{matrix} K (c | n) ⩾ e - 2 d - K (n | n - 2 d) - O (1) . \end{matrix}$
Proof.
The proof goes as before up to the point where we reconstruct $c$ (instead of the array $\tilde{C}$ ). Now we cannot rearrange the array since it has gaps (points where $c$ does not terminate), and we do not know exactly where these gaps are. However, at that point we can reconstruct e, since we isolated the last piece of length $n - e + 1$ and know n at the same time. Therefore, we may run $c$ in parallel on all inputs of length n until the fraction of non-terminated (yet) computations becomes $2^{- e}$ or less. This will happen at some point since the total number of strings where the computation $c (x)$ never terminates or gives a bad value, is at most $2^{- e}$ . Then we fill the gaps in the array arbitrarily. In this way we get at most $2 \cdot 2^{- e}$ exceptional points when the filled array deviates from the complexity function by d or more (at most $2^{- e}$ original points and at most $2^{- e}$ filled gaps). After that we may apply the same argument; the difference between $2^{- e}$ and $2 \cdot 2^{- e}$ is absorbed by $O (1)$ in the statement. (Technically, we should also replace $n - e + 1$ by $n - e + 2$ and change the procedure for reconstruction of e accordingly.) □
Remark 3.
There is an alternative argument that may look a bit easier in the case of a partial table computed by some program. Instead of reordering the array and specifying an index in the reordered array, we may let m be the number of strings x of length n such that $c (x)$ is defined and $C (x) < c (x) - d$ . By assumption, this number does not exceed $2^{n - e}$ since all those x are exceptional points. Then we consider an $(n - e)$ -bit string that represents m in binary, and use it in our description. Knowing this string (and n), we can find the complex string z as follows: we compute the values of $c (x)$ and the approximations from above to $C (x)$ in parallel for all strings x of length n, until we discover all m strings x such that $C (x) < c (x) - d$ . For all other strings x where $c (x)$ is defined, the complexity of x is at least $c (x) - d$ , so it is enough to select the first encountered string z such that $c (x)$ is defined and is at least $n - d - 1$ . Such a string z exists since $C (x) ⩾ n - 1$ for more than a half of all strings x of length n, and both the set ${x : c (x) is undefined}$ and the set ${x : c (x) < C (x) - d}$ are small minorities among the strings of length n (at most $25 %$ since $e ⩾ 2$ ).

A better and simpler bound can be obtained for the length conditional complexity, i.e., for function $C (x | length of x)$ . Theorem 4.
Consider some n and a finite function $\tilde{C} : {0, 1}^{n} \to N$ (i.e., an array of $2^{n}$ natural numbers) that differs from the complexity function $C (x | n)$ by less than d for all strings x of length n, except for a $2^{- e}$ -fraction of them, where $e ⩽ n$ . Then $\begin{matrix} K (\tilde{C} | n) ⩾ e - 2 d - O (1) . \end{matrix}$
Proof.
Using the same argument as in the proof of Theorem 1, we can find a string z among the first $2^{n - e + 1}$ strings in the reordered array that has conditional complexity $C (z | n) ⩾ n - 2 d - O (1)$ . On the other hand, when n is given, this string can be described by the prefix-free encoding of $\tilde{C}$ given n, concatenated with $(n - e + 1)$ -bit string that represents the index of z in the reordered array. Therefore, $\begin{matrix} K (\tilde{C} | n) + n - e + 1 ⩾ n - 2 d - O (1), \end{matrix}$ and we get the desired inequality. □

3. Noncomputability of approximations

Now we consider a uniform setting where some computable function is given and we study how well it approximates the complexity function for strings of all lengths. There are several ways to ask this question.

3.1. Computable bounds

Let us start with a version where parameters e and d are computable functions of n (the length of the string whose complexity is approximated).

Theorem 5.
Let $c (x)$ be a total computable function on bit strings with natural values, and let $d (n)$ and $e (n)$ be total computable functions on natural numbers with natural values, and $e (n) ⩽ n$ for all n. Assume that for every n the fraction of n-bit strings x such that $| c (x) - C (x) | > d (n)$ is at most $2^{- e (n)}$ . Then $e (n) - 2 d (n)$ is bounded from above by some constant.
Proof.
Assume this is not the case, and $e (n) - 2 d (n)$ has no constant upper bound. Then for every m one can computably find some n such that $e (n) - 2 d (n) > m$ (recall that e and d are computable functions). Let us denote such n (say, the first n found) by $n_{m}$ ; then $C (n_{m}) ⩽ C (m) + O (1) = O (log m)$ .

On the other hand, for each n we may consider the array ${\tilde{C}}_{n}$ formed by the values of c on all strings of length n, and apply Corollary 1 to it. Since ${\tilde{C}}_{n}$ can be computed given n, we have $K ({\tilde{C}}_{n} | n) = O (1)$ , and $\begin{matrix} O (1) ⩾ e (n) - 2 d (n) - K (n | n - 2 d (n)) - O (1) \end{matrix}$ for all n. The inequality remains true if we delete the condition $n - 2 d (n)$ , so $\begin{matrix} e (n) - 2 d (n) ⩽ K (n) + O (1) . \end{matrix}$ Now we let $n = n_{m}$ and get a contradiction: the left hand side exceeds m by construction, while the right hand side is $O (log m)$ . □
Remark 4.
The bound provided by this theorem is rather tight. Indeed, for every computable function $d (n)$ that is less than $n / 2$ , we may consider the function $c (x) = | x | - d (| x |)$ . Then the approximation fails for an n-bit string x only if $C (x) ⩽ n - 2 d (n)$ , and the fraction of those strings (among all n-bit strings) is at most $2^{- 2 d (n)}$ .

This result can be applied to resource-bounded versions of complexity. Let $t (n)$ be some total computable function. Then for every string x we may consider its time-bounded complexity $C^{t (| x |)} (x)$ , defined as the minimal length of a program that computes x in time at most $t (| x |)$ . This time-bounded complexity is a total2
²
We assume that $t (n)$ is large enough to allow at least some program to give output x using at most $t (| x |)$ steps.

computable function of x and is an upper bound for $C (x)$ . Therefore, recalling Remark 1 (p. 286), we can apply Theorem 5. It shows that in terms of approximation this upper bound is almost as bad as the string length (where the one-sided approximation error is at most $d (n)$ for all strings x of length n, except for a $2^{- d (n)}$ -fraction).
3.2. Generic and coarse non-computability

There are two well-known notions of approximate computability defined in [7].

Definition 5.
Let $S \subset {0, 1}^{}$ . The density of S in length n is defined as the fraction of n-bit strings that belong to S, i.e., $ρ_{n} (S) = | S \cap {0, 1}^{n} | / 2^{n}$ . If the sequence $ρ_{n} (S)$ has a limit $ρ (S)$ , then it is called3
³
There is a subtle point in this definition. When defining the density for subsets of $N$ , we usually do not group natural numbers according to their lengths. Instead, we consider all initial segments and require that the fraction of S-elements in these segments has a limit. It is easy to see that this is a stronger condition in general, but for generic or negligible sets there is no difference.

the asymptotic density* of S. If $ρ (S) = 1$ , we call S a generic set. If $ρ (S) = 0$ , we call S a negligible set.

A total function $h : {0, 1}^{} \to N$ is called generically computable* if there exists a partially computable function $f : {0, 1}^{} \to N$ such that h extends f (if $f (x)$ is defined, then $f (x) = h (x)$ ), and the domain of f is a generic set.

A total function $h : {0, 1}^{} \to N$ is called coarsely computable if there exists a total computable function $f : {0, 1}^{} \to N$ such that the set ${x | f (x) = h (x)}$ is a generic set.

It is natural to ask whether the complexity function is generically or coarsely computable. For the generic computability the negative answer is obvious, since any partial computable function that is a restriction of the complexity function and has infinite domain would allow us to find effectively, for every m, a string of complexity at least m (and this is impossible since this string is determined by m, that is, by $log m$ bits). For the coarse computability the answer is not so obvious, but it is also negative and can be derived from the bounds proven in Section 2.
Theorem 6.
The complexity function* $C (x)$ is not coarsely computable. Moreover, for every constant d there is no coarsely computable function c such that $| C (x) - c (x) | < d$ for all x.
Proof.
Assume that such a function c exists. By definition of coarse computability there exists a total computable function $c^{'}$ that coincides with c on a generic set. Fix some n and consider an array $\tilde{C}$ formed by the values of $c^{'}$ on all n-bit strings. Then this array approximates complexity on n-bit strings with precision d, except for a negligible fraction of n-bit strings. We may assume that this fraction is at most $2^{- e (n)}$ where $e (n) \to \infty$ as $n \to \infty$ . (Here we do not assume that $e (n)$ is a computable function of n.)

Now we apply Corollary 1 and note that $K (\tilde{C} | n) = O (1)$ (since $c^{'}$ is computable), and $K (n | n - 2 d) = O (1)$ (since d is a constant). It gives $e (n) - 2 d ⩽ O (1)$ , a contradiction. □
Remark 5.
If d is no more a constant but a computable function of n such that $d (n) \to \infty$ , then the function $| x | - d (| x |)$ is an approximation for complexity function that has error at most $d (n) + O (1)$ for all n-bit strings except for a $2^{- 2 d (n)}$ -fraction. Replacing the values of this function on all exceptional strings by the true complexity values, we get a total function that $d (n)$ -approximates the complexity function everywhere and is coarsely computable. If we want to have a coarsely computable approximation that is an upper bound for complexity, we can start with the length function plus $O (1)$ , and replace its values by true complexities for inputs where the approximation error exceeds $d (n)$ . This function also will be coarsely computable: it coincides with the length function (plus constant) except for $O (2^{- d (n)})$ fraction of n-bit strings.

A more general statement gives a lower bound for the number of n-bit strings where some partial computable function c fails to produce a d-approximation for complexity, for any value of d.
Theorem 7.
Let c be a partial computable function. For every two integers n and d consider the fraction of n-bit strings x where $c (x)$ is undefined or deviates from $C (x)$ by more than d. This fraction is at least $Ω (2^{- 2 d - K (d)})$ , where the hidden constant in this notation does not depend on n and d. Here we assume that $2 d + K (d) < n - O (1)$ .
Proof.
We apply Theorem 3 to the program $c$ that computes the partial function c. As we have discussed, we may replace $K (n | n - 2 d)$ by $K (d)$ . The left hand side of the inequality $\begin{matrix} K (c | n) ⩾ e - 2 d - K (n | n - 2 d) - O (1) \end{matrix}$ is now $O (1)$ , since we consider some fixed $c$ , therefore $\begin{matrix} e - 2 d - K (d) ⩽ O (1), \end{matrix}$ if the error is at most d except for a $2^{- e}$ -fraction of n-bit strings and $e ⩽ n$ . A contraposition says that if $e > 2 d + K (d) + O (1)$ and still $e ⩽ n$ , then the fraction of bad strings (among all n-bit strings) is at least $2^{- e}$ . This is what we need (and the condition $2 d + K (d) < n - O (1)$ is needed to ensure that $e ⩽ n$ ). □

The term $K (d)$ in the exponent can be removed in some cases. Let us say that an integer computable function $d (n)$ with an integer argument is regular if the function $u (n) = n - 2 d (n)$ is non-negative and has the following property: there exists some constant $c > 0$ such that the value of $u (n)$ increases when n increases by c or more. For example, this is guaranteed if d is a computable function that satisfies the Lipschitz condition with the constant less than $1 / 2$ for large enough n, e.g., $d (n) = n / 3$ , or $d (n) = log n$ , or $d (n) = log log n$ , etc.

In this case $K (n | n - 2 d (n))$ is bounded by a constant not depending on n, since the value of $n - d (n)$ determines the value of n up to an additive constant. Applying the same reasoning, we get the following result: Theorem 8.
Let $d (n)$ be some computable regular function, and let $c (x)$ be some partial computable function on strings with integer values. Then the fraction of strings x of length n such that $c (x)$ is defined and $| C (x) - c (x) | < d (n)$ , among all n-bit strings, is at most $1 - Ω (2^{- 2 d (n)})$ .

4. Approximation as a mass problem

In this section we consider approximation as a mass problem in the sense of Medvedev (see, e.g., [12] for details).

Definition 6.
A mass problem $M$ is an arbitrary set of total functions. Elements of $M$ are called solutions. Mass problem $A$ is reducible to another mass problem $B$ if there is an oracle machine M such that for every total function $b \in B$ the machine M with oracle b computes some solution $a \in A$ .
Remark 6.
Usually total functions in this definition are understood as functions of type $N \to N$ . But this does not matter due to computable encodings, and we will use functions with string arguments and natural values. One should also note that usually the reducibility in the sense of this definition is called strong reducibility, or Medvedev reducibility (as opposed to weak, or Muchnik reducibility where the machine M may depend on the choice of $b \in B$ ). However, we do not need other notions of reducibility for mass problems, so we omit the word “strong”.

For every set A we may consider a decision problem for A, i.e., the mass problem that is a singleton formed by the characteristic function of A. In this way the semilattice of Turing degrees is embedded in the semilattice of mass problems (ordered by reducibility). In particular, we may consider the halting problem as a mass problem (the decision problem for the set of halting programs).

Let $d (n)$ and $e (n)$ be two computable functions with natural arguments and values. Consider the mass problem $C_{d, e}$ whose solutions are total functions c (with string arguments and natural values) such that for every n $\begin{matrix} | c (x) - C (x) | < d (n) for all n -bit strings x, except for a 2^{- e (n)} -fraction . \end{matrix}$ The following result is a strong form of Theorem 5. Theorem 9.
Let $d (n)$ and $e (n)$ be computable functions such that $e (n) ⩽ n$ for all n and $e (n) - 2 d (n)$ is unbounded. Then the halting problem is reducible to the mass problem $C_{d, e}$ .
Proof.
Consider a function c that is a solution of $C_{d, e}$ . How can we solve the halting problem if c is given as an oracle? Here is the sketch.

Fix some n. For every number T consider the values $C^{T} (x)$ for all strings of length n, where $C^{T} (x)$ is the time-bounded complexity (we consider only programs that terminate after T steps). As T increases, the value $C^{T} (x)$ decreases (or stays the same) and converges to $C (x)$ as $T \to \infty$ . Therefore, for large T we have $\begin{matrix} () & C^{T} (x) ⩽ c (x) + d (n) for all n -bit strings x, except for a 2^{- e (n)} -fraction . \end{matrix}$ If c is given as an oracle, we can (for the given value of n) find some T satisfying the condition () just by waiting until this condition becomes true. This will ultimately happen since () is true in the limit, i.e., for $C (x)$ . Note that if () is true for some T, it remains true for all larger values of T. We claim that if () is true, then T is so large that all terminating programs of length at most $e (n) - 2 d (n)$ (approximately) terminate in time T. Since $e (n) - 2 d (n)$ is unbounded, we can use this to find out whether an arbitrary program stops or not. To prove the termination, we use the characterization of busy beaver numbers in terms of Kolmogorov complexity: if $C (t) ⩾ u$ for all $t ⩾ T$ , then all programs of size $u - O (1)$ either terminate in time T or do not terminate at all. So we have to show that all $t ⩾ T$ have large complexity if () is true for T. And to show this, we notice that for $t ⩾ T$ the table of values $C^{t} (x)$ approximates complexities for n-bit strings and therefore Theorem 1 guarantees the high complexity of this table (and therefore the high complexity of t).

Let us provide more details. Recall that we used some fixed optimal decompressor U in the definition of Kolmogorov complexity. Fix some computational model (say, Turing machines) and some machine that computes U. Now, consider the function $C_{U}^{T} (x | y)$ defined as the minimal length of a string p such that $U (p, y)$ produces x in at most T steps. This function has three arguments (strings x, y and integer T). It may decrease (or stay the same) as T increases, and converges to $C_{U} (x | y)$ . We will consider only the case of unconditional complexity ( $y = ε$ ) and use the notation $C^{T} (x) = C_{U}^{T} (x | ε)$ .

We use also the relation between Kolmogorov complexity and busy beaver numbers that goes back to Chaitin (see [11, Section 1.2.2] for details). Denote by $B (n)$ the maximal number that has complexity at most n, and let $BB (n)$ be maximal time used by terminating computations of $U (p, ε)$ for all strings p of length at most n. These two quantities are closely related (see [11, Theorem 14, p.24] for the proof): Proposition 5.
There exists some c such that $\begin{matrix} BB (n) ⩽ B (n + c) and B (n) ⩽ BB (n + c) \end{matrix}$ for all n.

The standard construction of the optimal decompressor uses the universal Turing machine, so the halting problem is reducible to the domain of U. Proposition 5 shows that this domain is reducible to the mass problem
given m, compute some upper bound for $B (m)$ ,
since this upper bound can be used to find out how long should we wait for the termination of the universal machine on a given input. Note also that the latter mass problem does not depend on the choice of the optimal decompressor, since the complexity functions for different optimal decompressors differ at most by $O (1)$ .

Therefore, it remains to show how this mass problem can be reduced to the mass problem $C_{d, e}$ . We will now describe the corresponding oracle machine (following the scheme explained above).

Assume that some function $c \in C_{d, e}$ is given as an oracle, and for some m we want to find an upper bound for $B (m)$ , i.e., we want to find some number T such that $C (t) > m$ for all $t > T$ . First, we find some n such that $e (n) - 2 d (n)$ significantly exceeds m; it is enough to have $e (n) - 2 d (n) ⩾ m + k log m$ for some constant k (to be chosen later). Functions e and d are computable, so we may assume that this n is a computable function of m. We denote it by $m \mapsto n_{m}$ to stress that n is a computable function of m. Note that $C (n_{m}) ⩽ C (m) + O (1) ⩽ log m + O (1)$ .

Now we get from the oracle c the array of length $2^{n}$ that consists of the values of $c (x)$ for all strings of length n. Then we look for a number T such that $\begin{matrix} () & C^{T} (x) ⩽ c (x) + d (n) with probability at least 1 - 2^{- e (n)}, \end{matrix}$ where the probability is taken over the uniform distribution on n-bit strings (we reformulate the condition using the probability language and keep the notation). There is a number T with this property, since we assume that c belongs to $C_{d, e}$ , and for sufficiently large values of T we have $C^{T} (x) = C (x)$ for all n-bit strings x. Knowing c-values on n-bit strings, we can find such a number T since $C^{T} (x)$ is a computable function of x and T. It remains to show that T is an upper bound for $B (m)$ , i.e., to prove the following lemma Lemma 1.
If $t > T$ , then* $C (t) > m$ .
Proof of Lemma 1.
The condition (*) remains true when T increases, so $\begin{matrix} C^{t} (x) ⩽ c (x) + d (n) with probability at least 1 - 2^{- e (n)} . \end{matrix}$ On the other hand, the oracle function c belongs to $C_{d, e}$ , therefore $\begin{matrix} c (x) - d (n) ⩽ C (x) ⩽ c (x) + d (n) with probability at least 1 - 2^{- e (n)} \end{matrix}$ (in both cases the probability is taken over the uniform distribution of x among the n-bit strings). These inequalities imply that $\begin{matrix} C^{t} (x) ⩽ C (x) + 2 d (n) with probability at least 1 - 2 \cdot 2^{- e (n)} \end{matrix}$ (if the first two inequalities hold for some x, the third is also true). Recalling that $C (x) ⩽ C^{t} (x)$ for all x and t, we see that $C^{t} (x) + d (n)$ approximates $C (x)$ with two-sided error at most $d (n)$ for all n-bit strings x, except for $2 \cdot 2^{- d (n)}$ -fraction of them. Corollary 1 then guarantees that the prefix complexity of the table $C^{t} (x) + d (n)$ (for n-bit strings x) given n is at least $e (n) - 1 - 2 d (n) - K (n) - O (1)$ . Recalling that $e (n) - 2 d (n)$ is at least $m + k log m$ , and $K (n_{m}) ⩽ K (m) ⩽ O (log m)$ , we see the prefix complexity of the table (conditional to n) is at least $m + k log m - O (log m)$ . On the other hand, the table can be reconstructed if n and t are known, so $K (t) ⩾ m + k log m - O (log m)$ . It remains to note that prefix and plain complexity differ by only a logarithmic term ( $C (u) ⩾ K (u) - O (log K (u))$ ), and that the constant k can be chosen large enough to overweight both $O (log m)$ and this additional logarithmic term. □

Lemma 1 is proven, and this finishes the proof of Theorem 9. □

5. Computability and proof theory

There is a connection between Kolmogorov complexity theory and logic. Probably the most simple and impressive example of this type is Chaitin’s proof [5] of Gödel incompleteness theorem: if all true statements were provable, we would be able to find effectively strings of high complexity searching for the proofs of statements “ $C (x) > n$ ” for all strings x and numbers n. (There is also a nice proof of the Second incompleteness theorem using Kolmogorov complexity, based on the surprise examination paradox [8].) In this section we show some other examples of interplay between Kolmogorov complexity theory and logic (extending the results of [2]).

5.1. Universal complexity statements

Universal statements are statements of the form $\forall x R (x)$ where R is a decidable predicate. We consider arithmetical universal statements, and one should specify how the decidable predicate R is represented as an arithmetical formula. To avoid these details, it is convenient to agree that a universal statement is a statement saying that some program (without input) never terminates. Many mathematical theorems (e.g., Fermat’s Last Theorem) are universal statements; many conjectures (e.g., Riemann hypothesis) are provably equivalent to universal statements (though this is not immediately obvious, since Riemann hypothesis speaks about complex numbers and the ζ-function). Universal statements played an important role in Hilbert’s foundational program: if formal arithmetic is consistent, then every provable universal statement is true (a counterexample can be easily transformed into a proof of the negation of the statement).

Following the idea suggested in [3,4], one may classify universal statements according to their complexity: the complexity of a universal statement U is the minimal length of a program u such that U is provably equivalent (in formal arithmetic) to non-termination of u. As usual, we should consider some optimal programming language that makes the complexity (defined as above) minimal, see [2, Section 3.2, p. 1388] for details. One can also consider the minimal Kolmogorov complexity of a program whose non-termination is provably equivalent to U.

There is a special class of universal statements considered by Chaitin [5]: the statements of the form “ $C (x) ⩾ n$ ”, where x and n are some constants (x is a string and n is an integer). Let us call the statements of this form universal complexity statements. Chaitin noted that there exists some constant c such that all provable universal complexity statements $C (x) ⩾ n$ have $n ⩽ c$ . (The value of c depends on the choice of the optimal decompressor.)

The notion of a universal complexity statement depends on the choice of the optimal decompressor (though Chaitin’s result is true for all optimal decompressors). We assume in the sequel that a standard decompressor based on the universal machine is used.

It is shown in [2, Theorem 5, p. 1387] that if we add to formal arithmetic all true universal complexity statements as axioms, then all true universal statements are provable in the resulting theory. This motivates the following question: is it true that every universal statement is provably equivalent to some universal complexity statement? (Speaking about provable equivalence, we always have in mind provability in formal arithmetic.) If it were the case, then the mentioned result from [2] would be a corollary of this conjecture. However, some results from computability theory imply that this conjecture is false.

Theorem 10.
There is a universal statement that is not provably equivalent to any universal complexity statement.
Proof.
This result is a proof-theoretic corollary of the following result in computability theory: Proposition 6.
The set $U = {⟨ x, n ⟩ : C (x) < n}$ is not m-complete.

This proposition is a special case of [10, Theorem 2.3]; we will provide its proof below for the reader’s convenience, but first let us show that it implies Theorem 10.

Assume that every universal statement is provably equivalent to some complexity statement. Let us construct a total computable function that m-reduces the halting problem to U. Given a program p, we are looking for a universal complexity statement that is provably equivalent to non-termination of p. By assumption, such a statement $C (x_{p}) ⩾ n_{p}$ will be ultimately found (we enumerate all universal complexity statements and all proofs in parallel). Note that the program p may terminate; in this case we will find some false (and therefore refutable) universal complexity statement; still the mapping $p \mapsto ⟨ x_{p}, n_{p} ⟩$ is a total computable mapping. It m-reduces the set of terminating programs to U. (Note that this proof uses that every provable equivalence is true.) Proof of Proposition 6.
Consider two disjoint (computably) enumerable sets $U_{0}$ and $U_{1}$ that are inseparable (cannot be separated by a decidable set). For example, $U_{0}$ / $U_{1}$ can be the sets of programs without input that output 0/1. Assume that $U_{0}$ is reducible to U by some total computable function $p \mapsto ⟨ x_{p}, n_{p} ⟩$ . This means that
if $p \in U_{0}$ , then $C (x_{p}) < n_{p}$ ;

if $p \notin U_{0}$ , then $C (x_{p}) ⩾ n_{p}$ .
In particular, the second case happens for all $p \in U_{1}$ . The values of $n_{p}$ for $p \in U_{1}$ are bounded by some constant d. Indeed, if they are unbounded, we can effectively get an arbitrarily complex string by enumerating all $p \in U_{1}$ until a large $n_{p}$ appears.

To separate $U_{0}$ and $U_{1}$ , only the information about complexities at most d is needed, and the list of all strings of complexity at most d (with information about complexity) is a finite object, which leads to a contradiction. More precisely, consider the set $U^{'}$ of all pairs $⟨ x, n ⟩$ such that $C (x) < n$ or $n > d$ .
The set $U^{'}$ is decidable (recursive, computable) since it is determined by the (finite) list of all strings of complexity less than d and their complexities: if $n > d$ , then $⟨ x, n ⟩$ is guaranteed to be in $U^{'}$ , and if $n ⩽ d$ , we can check whether $C (x) < n$ using this list.

If $p \in U_{0}$ , then $C (x_{p}) < n_{p}$ , so $⟨ x_{p}, n_{p} ⟩ \in U^{'}$ (note that $U^{'} \supset U$ ).

If $p \in U_{1}$ , then $n_{p} ⩽ d$ , so $⟨ x_{p}, n_{p} ⟩ \in U^{'}$ is equivalent to $C (x_{p}) < n_{p}$ , and this cannot happen for $p \notin U_{0}$ (in particular, for $p \in U_{1}$ ).
Therefore, the set of p such that $⟨ x_{p}, n_{p} ⟩ \in U^{'}$ is a decidable set that separates $U_{0}$ from $U_{1}$ , a contradiction. Proposition 6 is proven. □

This finishes the proof of Theorem 10. □
Question.
We may consider universal conditional complexity statements, i.e., statements of the form $C (x | y) > n$ for some strings x, y and for some integer n. Is it true that for every universal statement there exists a provably equivalent universal conditional complexity statement?

It is easy to see that for some optimal decompressors it is true for trivial reasons. For example, we may consider an optimal decompressor $D (p, y)$ that is not defined for empty p, and then let $D (ε, q) = ε$ if program q halts. Then the statement $D (ε | q) > 0$ means that q does not halt. So the real question is whether the statement is true for all optimal decompressors.

It may seem at first glance that the answer is provided by [10, Theorem 2.1]: it is shown there that the set ${⟨ x, y, n ⟩ : C (x | y) < n}$ is m-complete. The proof can be expressed in formal arithmetic, so one may think that the reducing function can be used to convert every universal statement (about non-termination of some program) into a statement about conditional complexity. However, looking closely at the proof, we see that it proves (in formal arithmetic) that there exists a computable reduction without providing a specific provable reduction.

5.2. Quasi conservative extensions

In this section we present some other results that connect Kolmogorov complexity and proof theory. They are related to approximations for the Kolmogorov complexity function and extend the remarks made in [2, Section 3.1].

We have considered approximations up to an additive term; one may also consider approximations up to a constant factor. It is easy to see, for example, that there is no computable approximation up to factor 2 (everywhere; now we do not allow exceptions). Indeed, assume that some computable function c is such an approximation, i.e., $C (x) / 2 ⩽ c (x) ⩽ 2 C (x)$ for all strings x. We know that for each n there exists some n-bit string x such that $c (x) ⩾ n / 2$ (any incompressible string will work). For a given n take the first string with $c (x) ⩾ n / 2$ ; it has true complexity at least $n / 4$ , so we get a string of high complexity in a computable way, and this is impossible. (Of course, factor 2 in this argument can be replaced by an arbitrary constant.)

As before, this result can be extended: the mass problem whose solutions are total approximations for C up to factor 2, is above the halting problem in the Medvedev lattice. In other words, we can construct a machine that solves the halting problem using (any) approximation c up to factor 2 as an oracle. Indeed, given some n and having access to c, we increase T until the function $C^{T} (x)$ becomes compatible with c. In other words, we find T such that $C^{T} (x) ⩽ 2 c (x)$ for all n-bit strings x. (Such T exists since $C^{T} (x)$ decreases as T increases and converges to $C (x) ⩽ 2 c (x)$ for every x.) We claim that T exceeds $B (n / 4 - O (log n))$ . Indeed, knowing some $t ⩾ T$ and n, we can find a string x such that $C^{t} (x) ⩾ n$ (any incompressible bit string x will work). For this x we have $c (x) ⩾ n / 2$ and $C (x) ⩾ n / 4$ , so the complexity of t is at least $n / 4 - O (log n)$ (the last term appears because we need to know not only t, but also n). Therefore, this T can be used to determine which programs of size at most $n / 4 - O (log n)$ terminate, and n is an arbitrary number.

One can also rephrase this argument in a different way. Consider the following mass problem: to separate incompressible strings x (such that $C (x) ⩾ | x |$ ) and highly compressible strings x (such that $C (x) < | x | / 4$ ). This mass problem can be reduced to the factor-2 approximation problem. Indeed, every approximation up to factor 2 can be used to distinguish between the two cases (we compare the approximation with $| x | / 2$ ). Still this separation mass problem is above the halting problem in the Medvedev lattice. To show this, let us first prove the following lemma.

Lemma 2.

If for some length n and for some T the bounded complexity $C^{T} (x)$ is less than n for all strings x of length n with $C (x) < n / 4$ , then $T ⩾ B (n / 4 - O (log n))$ .

(Recall that $B (m)$ is the maximal number of complexity at most m, and $B (m + O (1))$ steps are enough for every terminating program of size at most m.)

Proof.

We need to show that for every $t ⩾ T$ the complexity $C (t)$ exceeds $n / 4 - O (log n)$ . For that we will show that, knowing n and t, we may effectively construct a string of complexity at least $n / 4$ , so the complexity of the pair $⟨ n, t ⟩$ is at least $n / 4$ and, on the other hand, $C (⟨ n, t ⟩) ⩽ C (t) + O (log n)$ . Assume that n and t are known. Then we look at all strings of length n one by one until we find the first string y such that $C^{t} (y) ⩾ n$ . Such a string y exists since there are incompressible strings y with $C (y) ⩾ n$ , and $C^{t} (y)$ may be only bigger. Now we use our assumption ( $t ⩾ T$ and the property of T) and conclude that $C (y) ⩾ n / 4$ . □

Now, given n and the oracle for the separation problem that contains all strings of complexity less than $n / 4$ and no incompressible strings, we can wait until the time-bounded complexity becomes less than n for all strings in the oracle. We know that the oracle does not contain incompressible strings, so such a time T can be found. We know also that for all strings in the oracle (and, therefore, for all strings of complexity less than $n / 4$ ) the bounded complexity $C^{T} (x)$ is less than n. The lemma guarantees that T exceeds $B (n / 4 - O (log n))$ . So T can be used to solve the halting problem for programs of size at most $n / 4 - O (log n)$ . The length n can be arbitrarily large, so we indeed reduced the halting problem to the separation problem.

As before, this argument can be converted into a logic statement (briefly mentioned in [2, Section 3.1, remark on p. 1388]; here we provide a more detailed proof).

Theorem 11.

For every string x consider its complexity $m = C (x)$ and add the universal complexity statement $C (x) > m / 4$ as an axiom. The resulting theory proves all true universal statements.

Proof.

In this proof we use that Lemma 2 can be stated and proven in formal arithmetic.

Consider some length n. Consider all compressible strings of length n. Consider some T that is enough to discover their compressibility, i.e., $C^{T} (x) < n$ for every compressible string x. These statements (about time-bounded complexity for all n-bit strings x) are provable in formal arithmetic (since they are proven by computation). For all incompressible strings of length n we have added axioms saying that their complexity is at least $n / 4$ . So by case analysis we may prove in formal arithmetic that for every string x of length n such that $C (x) < n / 4$ we have $C^{T} (x) < n$ . Combining this with the statement of Lemma 2 (also provable in formal arithmetic), we see that we can prove that $T ⩾ B (n / 4) - O (log n)$ . Proposition 5 is also provable in formal arithmetic, so we can prove an upper bound on the maximal computation time for programs of length at most $n / 4 - O (log n)$ . Then, since we can check whether a computation terminates after a given number of steps, and prove the answer in formal arithmetic, we can derive (using the added axioms) the non-termination for every non-terminating program of length at most $n / 4 - O (log n)$ . The length n can be arbitrary, so the added axioms are enough to prove non-termination for every non-terminating program. □

On the other hand, if we add only one axiom of this type (for some n-bit string x of complexity $m = C (x)$ we consider a theory with one additional axiom $C (x) > m^{'}$ for some $m^{'} ≪ m$ ), we get a “quasi-conservative” extension of formal arithmetic: no new simple provable statements (up to complexity $m - m^{'} - O (log m^{'})$ ) appear.

Theorem 12.

Let x be a string of complexity m. Assume that some formula φ is provable in formal arithmetic with additional axiom $C (x) > m^{'}$ , where $m^{'} < m$ , and $C (φ) < (m - m^{'}) - O (log m^{'})$ . Then φ can be proven in formal arithmetic without additional axioms.

One can speculate about the philosophical meaning of this result as follows. Usually we want to prove formulas that have rather small complexity (and even small length), so an extension of a theory that does not change the class of provable formulas of small complexity can be considered as a “quasi-conservative” extension. So our result shows that a universal complexity statement that has a large gap between the actual complexity and the claimed lower bound generates a quasi-conservative extension.

Proof.

Assume that φ is not provable without this additional axiom. Consider all strings x such that the implication $\begin{matrix} (C (x) > m^{'}) \Rightarrow φ \end{matrix}$ is provable (including all strings x such that $C (x) ⩽ m^{'}$ : for them the assumption is provably false).

Note that there are less than $2^{m^{'} + 1}$ strings x with this property. Indeed, if we have strings $x_{1}, \dots, x_{M}$ with this property and $M ⩾ 2^{m^{'} + 1}$ , we can prove that $\begin{matrix} (C (x_{1}) > m^{'}) \lor \dots \lor (C (x_{M}) > m^{'}) \end{matrix}$ (since there are at most $1 + 2 + \dots + 2^{m^{'}} < M$ programs of length at most $m^{'}$ ). Then the case analysis allows us to prove φ without the additional axiom.

We can enumerate strings that have this property (implication is provable). For that we need to know only φ and $m^{'}$ . Therefore, every string x with this property can be specified by φ, $m^{'}$ and the ordinal number in the enumeration that can be represented as an $m^{'}$ -bit string. So we may concatenate the self-delimited description of $m^{'}$ , the ordinal number ( $m^{'}$ -bit string) and the description of φ to describe x, and this gives us $\begin{matrix} C (x) ⩽ O (log m^{'}) + m^{'} + C (φ) . \end{matrix}$ If some string x of complexity m is among the strings with the property, we have $\begin{matrix} m ⩽ O (log m^{'}) + m^{'} + C (φ), \end{matrix}$ as required. □

Question.

Assume that we add several (say, two) universal complexity statements with large gaps between the actual complexity and the claim. Can we then prove some simple statement that is not provable without them?

Footnotes

Acknowledgements

The authors are grateful to the participants of Kolmogorov seminar and to their colleagues at LIRMM for useful discussions. We also thank the anonymous reviewers for their comments and the editors of Computability for their kind support.

The first author was supported by RaCAF ANR-15-CE40-0016-01 grant. The third author was supported by ANR-15-CE40-0016-01 RACAF and ANR-21-CE48-0023 FLITTLA grants.

References

Bauwens, Information distance revisited, in: STACS 2020, pp. 46:1–46:14, https://drops.dagstuhl.de/opus/volltexte/2020/11907/pdf/LIPIcs-STACS-2020-46.pdf, see also https://arxiv.org/abs/1807.11087.

Bienvenu,

Romashchenko,

Shen,

Tavenaux and

Vermeeren, The axiomatic power of Kolmogorov complexity, Annals of Pure and Applied Logic 165(9) (2014), 1380–1402. doi:10.1016/j.apal.2014.04.009.

C.S.

Calude and

Calude, Evaluating the complexity of mathematical problems: Part 1, Complex Systems 18(3) (2009), 267–285. doi:10.25088/ComplexSystems.18.3.267.

C.S.

Calude and

Calude, Evaluating the complexity of mathematical problems: Part 2, Complex Systems 18(4) (2010), 387–401, https://wpmedia.wolfram.com/uploads/sites/13/2019/03/18-4-1.pdf . doi:10.25088/ComplexSystems.18.4.387.

G.J.

Chaitin, Computational complexity and Gödel’s incompleteness theorem, SIGACT News 9 (1971), 11–12. doi:10.1145/1247066.1247068.

Ishkuvatov and

Musatov, On approximate uncomputability of the Kolmogorov complexity function, in: Computing with Foresight and Industry – 15th Conference on Computability in Europe, CiE 2019, Durham, UK, July 15–19, 2019, Proceedings, Lecture Notes in Computer Science, Vol. 11558, Springer, 2019, pp. 230–239.

C.G.

Jockusch and

P.E.

Schupp, Generic computability, Turing degrees, and asymptotic density, Journal of the London Mathematical Society 85(2) (2012), 472–490. doi:10.1112/jlms/jdr051.

Kritchman and

Raz, The surprise examination paradox and the second incompleteness theorem, Notices of the AMS 57(11) (2010), 1454–1458, https://www.ams.org/notices/201011/rtx101101454p.pdf .

Ming and

P.M.B.

Vitányi, An Introduction to Kolmogorov Complexity and Its Applications, 4th edn, Springer, 2019. ISBN 978-3-0301-1297-4.

10.

A.A.

Muchnik and

S.Y.

Positselsky, Kolmogorov entropy in the context of computability theory, Theoretical Computer Science 271(1–2) (2002), 15–35. doi:10.1016/S0304-3975(01)00028-7.

11.

Shen,

V.A.

Uspensky and

Vereshchagin, Kolmogorov Complexity and Algorithmic Randomness, Mathematical Surveys and Monographs, Vol. 220, American Mathematical Society, 2017, https://www.lirmm.fr/~ashen/kolmbook-eng-scan.pdf . ISBN 978-1-4704-3182-2. doi:10.1090/surv/220.

12.

S.G.

Simpson, Mass problems and randomness, The Bulletin of Symbolic Logic 11(1) (2005), 1–27. doi:10.2178/bsl/1107959497.