In this article, we study a notion of the extraction rate of Turing functionals that translate between notions of randomness with respect to different underlying probability measures. We analyze several classes of extraction procedures: (1) a class that generalizes von Neumann’s trick for extracting unbiased randomness from the tosses of a biased coin, (2) a class based on work of by Knuth and Yao (which more properly can be characterized as extracting biased randomness from unbiased randomness), and (3) a class independently developed by Levin and Kautz that generalizes the data compression technique of arithmetic coding. For the first two classes of extraction procedures, we identify a level of algorithmic randomness for an input that guarantees that we attain the extraction rate along that input, while for the third class, we calculate the rate attained along sufficiently random input sequences.
The aim of this study is to analyze the rate of the extraction of randomness via various effective procedures using the tools of computability theory and algorithmic randomness. Our starting point is a classic problem posed by von Neumann in [32], namely that of extracting unbiased randomness from the tosses of a biased coin. Von Neumann provides an elegant solution to the problem: Toss the biased coin twice. If the outcome is or , then discard these tosses. Otherwise, if the outcome is , then output H, and if the outcome is , then output T. Notice in the case that the coin comes up heads with probability p,
the probability of is ,
the probability of is , and
the probability of (and that of ) is .
It follows from the independence of the events H and T that with probability one the derived sequence will be an infinite sequence in which the events H and T each occur with probability 1/2.
It is well known that von Neumann’s procedure is rather inefficient, since on average biased bits are required to produce one unbiased bit when the biased coin comes up heads with probability . In particular, in the case that , where we are given a fair coin to begin with, four tosses on average yield one bit of output (a rate that is four times the rate attained simply by reading off the tosses of the coin). However, a number of improvements have been found. For instance, in [23], Peres studies a sequence of procedures obtained by iterating von Neumann’s procedure and calculates the associated extraction rate of each such procedure. As defined by Peres, given a monotone function , the extraction rate of ϕ with respect to the bias p is defined to be
where the bits are independently -distributed and E stands for expected value (with respect to the p-Bernoulli measure on ). Setting to be the sequence of procedures defined by Peres, he proves that, when tossing a coin that comes up heads with probability ,
where is the entropy associated with the underlying source.
The topic of randomness extraction is well-studied in the context of complexity theory, with an emphasis on efficiently simulating a randomized algorithm using a weak source of randomness. Work in this area has led to significant developments, such was the work of Impagliazzo and Wigderson on the question [15]. For a survey of work in this area, see [1, Chapter 16] or [27].
By contrast, the notion of an extraction rate of an effective procedure has been much less thoroughly studied from the point of view of computability theory (however, see Doty [7] and Toska [30], each of which study a more local notion of rate of certain procedures applied to specific inputs). In this article, we study a definition of the extraction rate for Turing functionals that accept their input with probability one (referred to as almost total functionals). In particular, we can formalize certain randomness extraction procedures as Turing functionals and study the behavior of these functionals when applied to algorithmically random sequences. For a number of such functionals, it is known that almost every sequence attains the extraction rate; here we provide a sufficient level of algorithmic randomness that guarantees this result.
We consider three main examples here:
functionals defined in terms of maps on that we call block maps, which generalize von Neumann’s procedure,
functionals derived from certain trees called discrete distribution generating trees (or DDG trees, for short), introduced by Knuth and Yao [18] in the study of non-uniform random number generation, and
a procedure independently developed by Levin [19] and Kautz [16] for converting biased random sequences into unbiased random sequences.
Notably, our analysis of the extraction rates of these three classes of examples draws upon the machinery of effective ergodic theory, using certain effective versions of Birkhoff’s ergodic theorem (and, in the case of the Levin-Kautz procedure, an effective version of the Shannon-McMillan-Breiman theorem from classical information theory due to Hoyrup [14]).
The remainder of the paper is as follows. In Section 2, we lay out the requisite background for this study. Next, in Section 3, we formally define the extraction rate of a Turing functional, derive several preliminary results, and introduce several basic examples. We then turn to more general examples: Turing functionals derived from block maps in Section 4, Turing functionals derived from computable DDG trees in Section 5, and the Levin-Kautz procedure in Section 6. We conclude with several open questions in Section 7.
Background
Notation
The set of finite binary strings will be written as ; members of will be written as lowercase Greek letters, σ, τ, ρ, and so on. The set of infinite binary sequences will be written as ; members of will be written as uppercase Roman letters X, Y, Z. For a finite string , let denote the length of n. For two strings σ, τ, say that τ extends σ and write if and for . For , means that for . Let denote the concatenation of ; we similarly define the concatenation of and . Let denote the string with . For , denotes the string . The empty string will be written as ϵ.
Two sequences may be coded together into , where and for all n. For a finite string σ, let denote . We shall refer to as the cylinder determined by σ. Each such interval is a clopen set and the clopen sets are just finite unions of intervals.
Trees
A nonempty closed set may be identified with a tree where . Note that has no dead ends. That is, if , then either or (or both). For an arbitrary tree , let denote the set of infinite paths through T; that is, . It is well-known that is a closed set if and only if for some tree T. P is a class, or an effectively closed set, if for some computable tree T.
Turing functionals
Recall that a continuous function may be defined from a function , which we refer to as a representation of Φ, satisfying the conditions
For , if , then .
For all , .
Note by the compactness of , a representation ϕ of a continuous function Φ satisfies the condition:
For all and , there exists such that for every , .
We then have . The total Turing functionals are those which may be defined in this manner from a computable representation . We will sometimes refer to total Turing functionals as -functionals. The partial Turing functionals are given by those which only satisfy condition (i) (we will still refer to such functions as representations). In this case may be only a finite string.
We set . For we also define
In particular, by our above convention, we have . Similarly, for we define . For , we denote by the set . Note in particular that .
Computable measures on
Recall that a measure μ on is computable if there is a computable function such that . For a prefix-free (i.e., for , if , then ), we set . Hereafter, we will write as for strings σ and as for . We also denote the Lebesgue measure by λ, where for .
Notions of algorithmic randomness
We assume that the reader is familiar with the basics of algorithmic randomness; see, for instance [8,21,28], or the more recent [10]. Let μ be a computable measure on . Recall that a μ-Martin-Löf test is a sequence of uniformly effectively open subsets of such that for each i,
Moreover, passes the μ-Martin-Löf test if . Lastly, is μ-Martin-Löf random, denoted , if X passes every μ-Martin-Löf test. When μ is the Lebesgue measure λ, we often abbreviate by .
We can obtain alternative notions of randomness by modifying the definition of a Martin-Löf test. We will work with two such alternatives in this paper. Let μ be a computable measure on and .
X is μ-Schnorr random (written ) if and only if X is not contained in any μ-Martin-Löf test with the additional condition that is computable uniformly in i.
X is μ-Kurtz random (written ) if and only if X is not contained in any class of μ-measure 0 (equivalently, if and only if it is not contained in any class of μ-measure 0).
Note that for every computable measure μ.
We are particularly interested in the interaction between Turing functionals and computable measures on . For a computable measure μ on , a Turing functional is μ-almost total if .
A Turing functional Φ is μ-almost total if and only if.
If , then clearly Φ is μ-almost total. For the other direction, observe that is a subset of . Thus, if Φ is μ-almost total, it follows that is a μ-nullset. Thus if , X cannot be μ-Kurtz random. □
Extraction rates
The definition of extraction rate via a representation
We are interested in a version of the use function of a Turing functional Φ which arises from a given representation ϕ. Let be the least m such that . Then the extraction rate of the computation of from X is given by the ratio
that is, the relative amount of input from X needed to compute the first n bits of Y.
There is an alternative definition which is more straightforward. The ϕ-output/input ratio of σ, , is defined to be
For any Turing functional Φ with representation ϕ and anysuch that,provided that both limits exists.
Fix an input X. Let and let be the least such that . Let . Then for each , and hence
so that the two sequences have identical infinite subsequences, and hence the limits must be equal (since they are assumed to exist). □
Let us write for ; we refer to this as the ϕ-extraction rate along X. For the specific examples of extraction rates that we calculate in the remaining sections, we will work with specific representations defined from the randomness extraction literature.
Canonical representations
Suppose that we are given an almost total Turing functional and would like to determine its extraction rate. Which representation should we use? For instance, we would like to say that the extraction rate for a constant function should be very low and should approach 0 in the limit. However, consider the following example.
Let for all and let for all . Then for all n and thus for all X.
To avoid this problem, we can work with a canonical representation of a Turing functional, which may be defined as follows.
For any partial continuous function Φ, the canonical representation ϕ for Φ is defined by letting be the longest common initial segment of all members of .
The identity function ϕ on strings is the canonical representation of the identity function on and thus for , the use for all , so that .
If , then its canonical representation is given by for (where is the finite string defined as in the infinite case). Thus for , .
Note that if ϕ is the canonical representation for a constant function , then we have , an infinite sequence, for every σ. To avoid this unpleasantness, we can further restrict our functions to the non-constant functions.
A partial continuous function Φ is nowhere constant if for every , either there is some or there exist in such that .
It is easy to see that if Φ is nowhere constant, then the canonical representation is a well-defined map taking strings to strings and satisfies condition (i) in the definition of a representative of a functional. Moreover, the canonical representation of a functional has the following nice property, which is immediate from the definition.
Let Φ be a partial continuous functional onwith canonical representation. Then for all σ such that, ifand, then.
Next we consider the computability of the canonical represenation.
If Φ is a total, nowhere constant Turing functional, then the canonical representation ϕ of Φ is computable.
Let be a total, nowhere constant Turing functional and let be some computable representation of Φ. Then we can compute, for each m, a value such that for all strings σ of length . Now let a string σ be given. Since Φ is nowhere constant, we can compute a least value m such that there exist such that for and . Then the value of the canonical representation can be computed by letting be the common initial segment for all of length extending σ. □
On the other hand, if Φ is only a partial computable, nowhere constant function, then the canonical representation of Φ need not be computable.
Let E be some noncomputable c.e. set and define if and undefined otherwise. Then for the canonical representation ϕ of Φ, we have if and , otherwise. We can modify this to get an almost total functional by letting if either or if for some i. In this case, for each , we have if and equals ϵ otherwise.
For any partial computable, nowhere constant Turing functional Φ, the canonical representation ϕ is computable in.
Let ψ be some computable representation of Φ. Then for the canonical representation ϕ derived from ψ as in the proof of Proposition 8, we have if and only if
, and
for , .
Thus the graph of ψ is a set and in fact is a difference of c.e. sets. □
Lastly, we can define the output/input ratio of a Turing functional given in terms of its canonical representation.
Let Φ be a partial Turing functional with canonical representation ϕ. The Φ-output/input ratio given by σ, , is defined to be
Similarly, for we define to be
We refer to as the Φ-extraction rate along X.
Average output/input ratios
For a given representation ϕ of a Turing functional Φ, we would like to define the average ϕ-output/input ratio. However, such an average depends on an underlying probability measure on . Since we are interested, at least in part, in Turing functionals that extract unbiased randomness from biased random inputs, we need to consider average ϕ-output/input ratios parametrized by an underlying measure.
Given , the average ϕ-output/input ratio for strings of length n with respect to μ, denoted , is defined to be
Equivalently, we have
Note that this is the μ-average value of over the space , since this function is constant on each interval . That is, if we fix n and let , then is a computable map from to and the average value of on is given by
We consider the behavior of this average in the limit, which leads to the following definition (which is adapted from one provided by Peres in [23]).
For a function , the μ-extraction rate of ϕ, denoted , is defined to be
In the case that ϕ is the canonical representation of a functional Φ, we further define
Let , with a representation given by (which, as noted above, is the canonical representation of Φ). Then and hence for all strings σ. Thus the average ϕ-output/input ratio is 2. Certainly but at the same time , so that and hence . Although these rates do not agree at each finite level, they do agree in the limit. Indeed, since for all n, we have the limit for all X and hence the average output input ratio over all is
Moreover, as well.
An interesting problem is to determine for which Turing functionals the limsup in the definition of extraction rate can be replaced with a limit. The following is an example where the limit does not exist.
Given a fixed function , define the total functional for any input X to be the infinite concatenation of the strings . Thus if for all n, then . If , then
Now let , which is a strictly increasing function. It is clear that for any strictly increasing function , there is a function α such that and that, in general, α is computable if and only if is computable. Fix α and and let ϕ be the canonical representation of . Then we have for each , so that for any string σ of length n. Now the behavior of this limit is completely arbitrary. For example, let and let for all n and for all . Then whenever is a power of 2, but for and in particular, if , then . Thus whereas .
For to exist, the function ϕ must be regular in the relative amount of input needed for a given amount of output. The authors have studied some families of functions for which this is the case. First, there are the so-called online continuous (or computable) functions, which compute exactly one bit of output for each bit of input (see [5]). On the other hand, there are the random continuous functions which produce regularity in a probabilistic sense. For example, the random continuous functions as defined by Barmpalias et al. [2] produce outputs which are roughly as long, on the average, as the inputs. See also [4].
Another example for which the limsup in the definition of rate is actually a limit is given by the following result.
Suppose there exists somesuch thatfor alland that there is somesuch thatfor μ-almost every. Then.
Since there is some c such that for all , by the dominated convergence theorem,
□
In the next two sections, we consider examples of Turing functionals Φ given by representations ϕ for which the following two conditions hold:
exists (for an appropriate choice of the measure μ), and
for all sufficiently μ-random sequences X.
That is, the extraction rate of ϕ is attained along sufficiently random inputs of Φ. In yet another section, we then consider an example in which condition (ii) holds, but for which the satisfaction of condition (i) is left as an open question.
The rate of block functionals
For fixed , an n-block map is a function satisfying the following property: Given , we first write , where for and . Then we have
That is, the behavior of ϕ is completely determined by its values of strings of length n (and is undefined on all strings of length ). An n-block functional is a Turing functional Φ that has an n-block map ϕ as its canonical representation. In this case we refer to ϕ as the n-block map associated to Φ. (Note that every n-block map can be extended to an -block map for that induces the same functional. Thus, the requirement that an n-block functional has an n-block map as a canonical representation ensures that an n-block functional isn’t also an -block functional for every .) We say that an n-block map is non-trivial if for some .
Block maps show up in the literature on randomness extraction, where typically one attempts to extract a sequence of unbiased random bits from a biased source. For example, the 2-block map defined by setting
,
, and
is precisely von Neumann’s procedure. Other examples of block maps in the randomness extraction literature are the randomizing functions studied by Elias in [9], the iterations of von Neumann’s procedure studied by Peres in [23], and extracting procedures studied by Pae in [22].
We will determine the extraction rate of a n-block function with respect to a certain class of measures. An n-step Bernoulli measure is a Bernoulli measure on . That is, an n-step Bernoulli measure is obtained by taking an infinite product of copies of some fixed measure on the set . Clearly, an n-step Bernoulli measure extends naturally to a measure on . Hereafter, we will use the term n-step Bernoulli measure to refer to this extension. Recall that a measure μ on is positive if for all .
Let. Suppose that μ is a positive n-step Bernoulli measure onandis a non-trivial n-block map with associated n-block functional Φ. Then Φ is μ-almost total.
Let , which is not equal to since ϕ is non-trivial. Let , the set of all infinite sequences built up by concatenating members of S. Since μ is positive and , , from which it follows that . Next, for each such that for some , let . Clearly , since μ is an n-step Bernoulli measure. Then , from which it follows that . □
Let μ be a positive n-step Bernoulli measure onanda non-trivial n-block map with associated n-block functional Φ. Then
We first note that if we consider the bits of as a sequence of random variables, then the expected value of is
from which it follows that
For , given a string of length of length (where for ), since μ is an n-step Bernoulli measure, the blocks are independent. Thus, the μ-expected number of output bits for a string of length is
Thus
For and , we have , since the expected number of output bits of strings for inputs of length is equal to the expected number of output bits for inputs of length . It thus follows that
□
Given, let μ be a computable, positive n-step Bernoulli measure on, and letbe μ-Schnorr random. Then for every non-trivial n-block mapwith associated n-block functional Φ,
To prove Theorem 19, we first need to develop some background. Let be the n-shift operator; that is, for and , . For an n-step Bernoulli measure μ on , T is μ-invariant, i.e., for any , . Indeed, for any cylinder ,
Thus,
Recall that a μ-invariant transformation is ergodic if for any such that , we have or . The following lemma is a useful characterization of ergodic transformations on (see [29, Theorem 5.1.5, Theorem 6.3.4(1)]).
Let μ be a measure onand letbe μ-invariant. Then T is ergodic if and only iffor all.
The following result is straightforward, but we include it here for the sake of completeness.
The n-shift onis ergodic with respect to an n-step Bernoulli measure.
We apply Lemma 20. Let be given. Then there is some and m with such that . Then for , there are strings of length such that . Then . Note that for ,
Then we have
and hence
A similar argument shows that for all . It follows that
and hence by Lemma 20, T is ergodic. □
The last ingredient we will use in the proof of Theorem 19 is the following effective version of Birkhoff’s Ergodic Theorem due to Franklin and Towsner:
(Franklin-Towsner [11]).
Let μ be a computable measure onand letbe a computable, μ-invariant, ergodic transformation. Then for any bounded computable function F and any μ-Schnorr random,
Let . Given , let μ be an n-step Bernoulli measure on and let T be the n-shift. We define . Then
where the last equality holds by Theorem 18. Next, for any μ-Schnorr random sequence ,
where the last equality follows from the fact that ϕ is an n-block map. Then
where the third equality follows from Theorem 22, as the function F is bounded. □
The rate of functionals induced by DDG-trees
The next example we consider is given in terms of DDG-trees (discrete distribution generating trees), first introduced by Knuth and Yao in [18]. In their study, Knuth and Yao were concerned with generating an arbitrary nonuniform distribution using a random source of bits. Properly speaking, this task of nonuniform random number generation can be seen as a kind orf inverse of classical randomness extraction, which is concerned with transforming biased randomness into unbiased randomness. However, in our context, we can view nonuniform random number generation as biased randomness extraction, although the Knuth/Yao framework allows us to efficiently generate unbiased randomness over an arbitrary alphabet (possibly infinite) from unbiased randomness over .
The main idea behind a DDG-tree is that it is a binary tree in which all nodes are either branching nodes or terminal nodes, the latter of which are labelled with symbols from some fixed alphabet A. Each branching node in the tree corresponds to the toss of an unbiased coin. Starting from the root of the tree, we follow the outcome of consecutive coin tosses until we arrive at a terminal node, which is then output. This in turn induces a probability distribution on A.
More precisely, we define a DDG-tree as follows. Given a tree consisting of branching nodes and terminal nodes, we label the terminal nodes of S, the set of which is denoted by , with values from a fixed set (which we will assume to be finite). In particular, we define a labeling function such that for all , is the label assigned to τ. To ensure we have a probability distribution on A, the labels on S must satisfy the following condition: For , if we set
then
The distribution on A is induced by the following process:
For each branching node in the tree, we use the toss of an unbiased coin to determine which direction we will take.
If we arrive at a terminal node τ, the process outputs .
A DDG-tree S defines a function from to A as follows: For , the output determined by X is the unique element such that for some , is a terminal node in S labelled with a, if it exists; otherwise, the output is the empty string ϵ. That is, we look for the first n such that , and if such an n exists, we output the value .
Knuth and Yao define the average running time of randomness extraction by a DDG-tree S to be
That is, is the average number of input bits needed to produce a single output bit.
Hereafter we will restrict our attention to computable DDG-trees, where a DDG-tree S is computable if the set is a computable set and the function is computable (which together imply that the values assigned to members of A are computable).
We can use a computable DDG-tree S to define a Turing functional as follows. First, for every , we set . Then for any , if σ does not extend any , then we set . However, if σ extends some , then we can write , where and (and is possibly empty). Note that this decomposition is unique, as is prefix-free. Then we set
We next extend to a Turing functional . For , we define a possibly finite sequence inductively as follows:
is the unique n such that , if it exists; otherwise is undefined.
Suppose have been defined. Then is the unique n such that ; otherwise is undefined.
Hereafter we will refer to the sequence of strings as the S-blocks of X.
If, for a given , the corresponding infinite sequence is defined, then we set
In the case that the corresponding sequence of block lengths is finite, then is undefined.
The issue of determining the canonical representation of a Turing functional defined in terms of a DDG-tree is a delicate one. Knuth and Yao spend a considerable portion of their study [18] on the identification of the DDG-tree that most efficiently induces a distribution on a set A (as well as more general distributions), where this efficiency is given in terms of extraction rate. Hereafter, we will restrict our attention to DDG-trees S that are minimal with respect to extraction rate, which amounts to assuming that the corresponding map on is the canonical representation of the associated Turing functional . Let us refer to such DDG-trees as minimal DDG-trees.
If S is a computable DDG-tree, then the Turing functionalis almost total.
First, observe that collection of cylinders determined by the elements of yields a set of Lebesgue measure one. Indeed,
It then follows that the set is a class of Lebesgue measure zero. As in the proof of Proposition 17, if we set , then we have . Then , and so we have . □
We would like to calculate the extraction rate for a Turing functional induced by a minimal DDG-tree S. To do so, we will first prove the following:
Letbe Schnorr random. Then for every computable, minimal DDG-tree S, we have
To prove Theorem 24, we would like to mimic the proof of Theorem 19. In particular, we need to find an appropriate effective version of Birkhoff’s ergodic theorem to derive the result. However, to do so, we need to define the appropriate measure-preserving transformation.
Let be a tree with . The tree-shift is defined by setting , where and . Moreover, in the case that for all , is undefined.
Note that if S is a computable DDG-tree, then the associated tree-shift is computable by an almost total Turing functional, as is defined on .
If S is a tree with, then the tree-shiftis λ-invariant and ergodic.
First, we show λ-invariance. For , we have
Then
Next, we prove that is ergodic. Towards this end, we claim that for every and , . We show this by induction on n. For the case in which , given , there is a prefix-free set such that and
that is, up to a set of λ-measure zero. Then
and hence
Next, suppose that . Then
Then
where the second equality follows from the inductive hypothesis. It follows that
and thus, by Lemma 20, is ergodic. □
The effective version of the ergodic theorem that we will use in the proof of Theorem 24 requires us to introduce some additional notions. First, a function is a.e. computable if it is computable on a set of Lebesgue measure 1. As noted above, is computable on , which is a class of measure 1, and so it is a.e. computable.
Next, a function is effectively integrable (also -computable) if there is a computable sequence of rational step functions such that (whenever ) and for all , ; see, e.g. [24] or [20].
We now can formulate the relevant effective version of Birkhoff’s ergodic theorem, due to Gács, Hoyrup, and Rojas [12] (as observed by Rute [24], Gács, Hoyrup, and Rojas prove a slightly different result, but the proof of their result establishes the following).
(Effective Birkhoff’s Ergodic Theorem, version 2 [12] ).
Let μ be a computable measure onand letbe an a.e. computable, μ-invariant, ergodic transformation. Then for any a.e. computable function F that is effectively integrable and any Schnorr random,
Let be Schnorr random. We define so that is the unique n such that ; that is, F counts the number of input bits of a given sequence X needed to generate one bit of output using the DDG-tree S. Clearly F is also computable on and is thus a.e. computable.
To see that f is effectively integrable, we define a sequence of rational step functions on as follows:
Observe that , and if and only if there is some such that . For , let us set .
Since and is a computable set, the sequence is uniformly computable and converges to 0. Thus by choosing an appropriate subsequence of the functions , it follows that F is effectively integrable.
Next, observe that
Given a Schnorr random , if we repeatedly apply the tree-shift to X followed by the function F, we have
where is the sequence determined by the S-blocks of X. Then it follows from (3) that
Thus,
where the first equality is (4), the second equality follows from Theorem 27, and the third equality comes from (2).
Lastly, we consider the values
for . Fix , if is the sequence determined by the S-blocks of X, then for the maximum value k such that ,
Then
Since
we have
It thus follows that
□
.
We apply Lemma 16. First, we have for every . By Theorem 24,
for every Schnorr random sequence. The conclusion immediately follows from Lemma 16. □
The extraction rate of the Levin–Kautz conversion procedure
We now calculate the extraction rate of a general procedure due independently to Levin [19], Kautz [16], Schnorr and Fuchs in [26], and Knuth and Yao [18]. In addition, this procedure has been studied in the randomness extraction literature under the label of the interval algorithm (see, for instance, [13]) and is the main idea behind the data compression technique known as arithmetic coding (see [25, Chapter 4]).
Following [3], we will refer to this procedure as the Levin-Kautz conversion procedure. Here we prioritize Levin and Kautz, as they both used this procedure to study the conversion of Martin-Löf random sequences with respect to one measure into Martin-Löf random sequences with respect to another measure. In particular, Levin and Kautz use the procedure to prove the following:
For every computable, ifand X is not computable (and in particular,), then there is somesuch that.
We will give the basic idea of Levin-Kautz conversion procedure using the succinct approach due to Schnorr and Fuchs [26] in the context of converting biased randomness into unbiased randomness (i.e., randomness with respect to the Lebesgue measure). For computable and , we define two subintervals of :
and
(here defines the lexicographic ordering on strings of a fixed length). We define a Turing functional as follows: For , enumerate into if . Thus, given a μ-random sequence X as input, treats X as the representation of some the bit values of which are determined by the values of the measure μ. For instance, the first bit of r is a 0 if and a 1 if (and the procedure is undefined if ). then outputs the standard binary representation of the real number r. One can verify that the resulting Turing functional is μ-almost total and induces the Lebesgue measure on .
More generally, for computable measures μ, ν, one can similarly define an almost total functional that transforms μ-randomness into ν-randomness. Moreover, one can verify that, for non-computable sequences and such that ,
, and
.
Thus, given such a pair X and Y, we clearly have .
We will consider this result in the context of strongly positive measures. A measure μ on is strongly positive if there is some such that for every , ; that is, all of the conditional probabilities associated with μ are bounded away from 0 and 1 by a fixed distance. The main theorem we will prove in this section is an effective, pointwise version of a result due to Uyematsu and Kanaya [31], who studied the extraction rate of the interval algorithm with respect to a general class of measures, namely the shift-invariant ergodic measures. Recall that for a shift-invariant ergodic measure μ on , the entropy of μ is defined to be
Let μ and ν be computable shift-invariant ergodic measures that are strongly positive. Then for every non-computable,In particular, in the case that, we have.
Several remarks are in order. First, by the Shannon source coding theorem [6, Section 5.10], is the optimal rate for converting between μ-randomness and ν-randomness. Second, Han and Hoshi [13] showed that in the case that μ and ν are Bernoulli measures, , but in the case that μ and ν are shift-invariant and ergodic, this is appears to be open (see [33, Remark 14]).
As a first step towards proving Theorem 30, we define an auxiliary function. Given and , let be the canonical representation of and set
Equivalently, is the maximum value k such that . It follows that the -extraction rate of the computation is .
We now calculate for each . We will make use of two additional results. First, we use the following lemma due to Kautz:
Let μ be a computable shift-invariant ergodic measure on. Then for every μ-Martin-Löf random sequence,
With these pieces, we now turn to the proof of our theorem.
Let for . By Theorem 29, we have . Since μ and ν are strongly positive, choose such that for every . By part (i) of Lemma 31, we have
for all . Applying the negative logarithm to both sides and dividing through by n yields
for all . It thus follows that
for all . Next, by part (ii) of Lemma 31, we have
for infinitely many . Again, applying the negative logarithm to both sides and dividing through by n yields, for some ,
for infinitely . It thus follows that
Combining this inequality with (5), we thus have
By Theorem 32, since , it follows from our assumptions on μ that
Combining (6) and (7), we conclude
Next, we use the fact that for positive sequences and such that exists,
along with Equation (8) and the fact that
which follows from Theorem 32, to derive the following:
From this we can conclude that
□
As noted above after the statement of Theorem 30, determining appears to be an open question in the randomness extraction literature. Essentially, this problem boils down to finding a uniform bound of the sequence of functions given by to apply the dominated convergence theorem to calculate .
Open questions
We conclude with several open questions. First, there is a general question about generalizing the results from Sections 4, 5, and 6 to apply to a broader class of Turing functionals:
What features of an almost total Turing functional Φ guarantee that
for the appropriate choice of measure μ and all sufficiently μ-random sequences X?
Next, the results we have established in showing the level of randomness that is sufficient for a sequence to witness the extraction rate of a Turing functional do not tell us what level of randomness is necessary for this to hold. Thus we can ask:
For each of the classes of Turing functionals that we have discussed, what is the level of randomness necessary for a sequence to witness the associated extraction rate?
References
1.
S.Arora and B.Barak, Computational Complexity: A Modern Approach, Cambridge University Press, 2009.
2.
G.Barmpalias, K.Brodhead, D.Cenzer, J.B.Remmel and R.Weber, Algorithmic randomness of continuous functions, Arch. Math. Logic46(7–8) (2008), 533–546. doi:10.1007/s00153-007-0060-4.
3.
L.Bienvenu and B.Monin, Von Neumann’s biased coin revisited, in: 2012 27th Annual IEEE Symposium on Logic in Computer Science, IEEE, 2012, pp. 145–154. doi:10.1109/LICS.2012.26.
4.
D.Cenzer and C.P.Porter, Algorithmically random functions and effective capacities, in: Theory and Methods of Computation (TAMC 2015), Lecture Notes in Computer Science, Vol. 9076, Springer Verlag, 2015, pp. 22–37.
5.
D.Cenzer and D.A.Rojas, Online computability and differentiation in the Cantor space, in: Sailing Routes in the World of Computation, Lecture Notes in Comput. Sci., Vol. 10936, Springer, Cham, 2018, pp. 136–145. doi:10.1007/978-3-319-94418-0_14.
6.
T.M.Cover and J.A.Thomas, Elements of Information Theory, John Wiley & Sons, 2012.
7.
D.Doty, Dimension extractors and optimal decompression, Theory Comput. Syst.43 (2008), 425–463. doi:10.1007/s00224-007-9024-7.
8.
R.G.Downey and D.R.Hirschfeldt, Algorithmic Randomness and Complexity, Springer, 2010.
9.
P.Elias, The efficient construction of an unbiased random sequence, Ann. Math. Statist.43 (1972), 865–870. doi:10.1214/aoms/1177692552.
10.
J.N.Y.Franklin and C.P.Porter, Key developments in algorithmic randomness, in: Algorithmic Randomness: Progress and Prospects, J.N.Y.Franklin and C.P.Porter, eds, Lecture Notes in Logic, Vol. 50, Cambridge University Press, 2020. doi:10.1017/9781108781718.
11.
J.N.Y.Franklin and H.Towsner, Randomness and non-ergodic systems, Moscow Mathematical Journal14(4) (2014), 711–744. doi:10.17323/1609-4514-2014-14-4-711-744.
12.
P.Gács, M.Hoyrup and C.Rojas, Randomness on computable probability spaces – a dynamical point of view, Theory of Computing Systems48(3) (2011), 465–485. doi:10.1007/s00224-010-9263-x.
13.
T.S.Han and M.Hoshi, Interval algorithm for random number generation, IEEE Transactions on Information Theory43(2) (1997), 599–611. doi:10.1109/18.556116.
14.
M.Hoyrup, The dimension of ergodic random sequences, in: 29th International Symposium on Theoretical Aspects of Computer Science, LIPIcs, Leibniz Int. Proc. Inform., Vol. 14, Schloss Dagstuhl. Leibniz-Zent. Inform, Wadern, 2012, pp. 567–576.
15.
R.Impagliazzo and A.Wigderson, P = BPP if E requires exponential circuits: Derandomizing the XOR lemma, in: Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, 1997, pp. 220–229.
16.
S.M.Kautz, Degrees of Random Sets, ProQuest LLC, Thesis (Ph.D.)–Cornell University, Ann Arbor, MI, 1991, p. 129.
17.
S.M.Kautz, Resource-bounded randomness and compressibility with respect to nonuniform measures, in: Proceedings of the International Workshop on Randomization and Approximation Techniques in Computer Science, Springer-Verlag, 1997, pp. 197–211. doi:10.1007/3-540-63248-4_17.
18.
D.E.Knuth and A.C.Yao, The complexity of nonuniform random number generation, in: Algorithms and Complexity, Proc. Sympos., Carnegie-Mellon Univ., Pittsburgh, Pa., 1976, 1976, pp. 357–428.
19.
L.Levin and A.K.Zvonkin, The complexity of finite objects and the development of the concepts of information and randomness of means of the theory of algorithms, Uspekhi Mat. Nauk25 (1970), 85–127.
A.Nies, Computability and Randomness, Oxford Logic Guides, Vol. 51, Oxford University Press, 2009, p. xvi+433.
22.
S-i.Pae, Binarizations in random number generation, in: 2016 IEEE International Symposium on Information Theory (ISIT), IEEE, 2016, pp. 2923–2927. doi:10.1109/ISIT.2016.7541834.
23.
Y.Peres, Iterating von Neumann’s procedure for extracting random bits, Ann. Statist.20 (1992), 590–597.
24.
J.Rute, Algorithmic randomness and constructive/computable measure theory, in: Algorithmic Randomness: Progress and Prospects, J.N.Y.Franklin and C.P.Porter, eds, Lecture Notes in Logic, Vol. 50, Cambridge University Press, 2020.
25.
K.Sayood, Introduction to Data Compression, Morgan Kaufmann, 2017.
26.
C.-P.Schnorr and H.-P.Fuchs, General random sequences and learnable sequences, The Journal of Symbolic Logic42(3) (1977), 329–340. doi:10.2307/2272862.
27.
R.Shaltiel, An introduction to randomness extractors, in: International Colloquium on Automata, Languages, and Programming, Springer, 2011, pp. 21–41. doi:10.1007/978-3-642-22012-8_2.
28.
A.Shen, V.A.Uspensky and N.Vereshchagin, Kolmogorov Complexity and Algorithmic Randomness, Vol. 220, American Mathematical Soc., 2017.
29.
C.E.Silva, Invitation to Ergodic Theory, Vol. 42, American Mathematical Soc., 2008.
T.Uyematsu and F.Kanaya, Almost sure convergence theorems of rate of coin tosses for random number generation by interval algorithm, in: 2000 IEEE International Symposium on Information Theory, IEEE, 2000, p. 457.
32.
J.von Neumann, Various techniques used in connection with random digits, Applied Math Series (1951), 36–38.
33.
S.Watanabe and T.S.Han, Interval algorithm for random number generation: Information spectrum approach, IEEE Transactions on Information Theory (2019).