Randomness extraction in computability theory

Abstract

In this article, we study a notion of the extraction rate of Turing functionals that translate between notions of randomness with respect to different underlying probability measures. We analyze several classes of extraction procedures: (1) a class that generalizes von Neumann’s trick for extracting unbiased randomness from the tosses of a biased coin, (2) a class based on work of by Knuth and Yao (which more properly can be characterized as extracting biased randomness from unbiased randomness), and (3) a class independently developed by Levin and Kautz that generalizes the data compression technique of arithmetic coding. For the first two classes of extraction procedures, we identify a level of algorithmic randomness for an input that guarantees that we attain the extraction rate along that input, while for the third class, we calculate the rate attained along sufficiently random input sequences.

Keywords

Computability theory algorithmic randomness randomness extraction

1. Introduction

The aim of this study is to analyze the rate of the extraction of randomness via various effective procedures using the tools of computability theory and algorithmic randomness. Our starting point is a classic problem posed by von Neumann in [32], namely that of extracting unbiased randomness from the tosses of a biased coin. Von Neumann provides an elegant solution to the problem: Toss the biased coin twice. If the outcome is $HH$ or $TT$ , then discard these tosses. Otherwise, if the outcome is $HT$ , then output H, and if the outcome is $TH$ , then output T. Notice in the case that the coin comes up heads with probability p,

the probability of $HH$ is $p^{2}$ ,

the probability of $TT$ is ${(1 - p)}^{2}$ , and

the probability of $HT$ (and that of $TH$ ) is $p (1 - p)$ .

It follows from the independence of the events H and T that with probability one the derived sequence will be an infinite sequence in which the events H and T each occur with probability 1/2.

It is well known that von Neumann’s procedure is rather inefficient, since on average $\frac{1}{p (1 - p)}$ biased bits are required to produce one unbiased bit when the biased coin comes up heads with probability $p \in (0, 1)$ . In particular, in the case that $p = \frac{1}{2}$ , where we are given a fair coin to begin with, four tosses on average yield one bit of output (a rate that is four times the rate attained simply by reading off the tosses of the coin). However, a number of improvements have been found. For instance, in [23], Peres studies a sequence of procedures obtained by iterating von Neumann’s procedure and calculates the associated extraction rate of each such procedure. As defined by Peres, given a monotone function $ϕ : 2^{< ω} \to 2^{< ω}$ , the extraction rate of ϕ with respect to the bias p is defined to be $\begin{matrix} \underset{n \to \infty}{lim sup} \frac{E (| ϕ (x_{1}, x_{2}, \dots, x_{n}) |)}{n}, \end{matrix}$ where the bits $x_{i}$ are independently $(p, 1 - p)$ -distributed and E stands for expected value (with respect to the p-Bernoulli measure on $2^{ω}$ ). Setting ${(ϕ_{k})}_{k \in ω}$ to be the sequence of procedures defined by Peres, he proves that, when tossing a coin that comes up heads with probability $p \in (0, 1)$ , $\begin{matrix} lim_{k \to \infty} \underset{n \to \infty}{lim sup} \frac{E (| ϕ_{k} (x_{1}, x_{2}, \dots, x_{n}) |)}{n} = H (p), \end{matrix}$ where $H (p) = - p log (p) - (1 - p) log (1 - p)$ is the entropy associated with the underlying source.

The topic of randomness extraction is well-studied in the context of complexity theory, with an emphasis on efficiently simulating a randomized algorithm using a weak source of randomness. Work in this area has led to significant developments, such was the work of Impagliazzo and Wigderson on the $P \neq BPP$ question [15]. For a survey of work in this area, see [1, Chapter 16] or [27].

By contrast, the notion of an extraction rate of an effective procedure has been much less thoroughly studied from the point of view of computability theory (however, see Doty [7] and Toska [30], each of which study a more local notion of rate of certain procedures applied to specific inputs). In this article, we study a definition of the extraction rate for Turing functionals that accept their input with probability one (referred to as almost total functionals). In particular, we can formalize certain randomness extraction procedures as Turing functionals and study the behavior of these functionals when applied to algorithmically random sequences. For a number of such functionals, it is known that almost every sequence attains the extraction rate; here we provide a sufficient level of algorithmic randomness that guarantees this result.

We consider three main examples here:

functionals defined in terms of maps on $2^{< ω}$ that we call block maps, which generalize von Neumann’s procedure,

functionals derived from certain trees called discrete distribution generating trees (or DDG trees, for short), introduced by Knuth and Yao [18] in the study of non-uniform random number generation, and

a procedure independently developed by Levin [19] and Kautz [16] for converting biased random sequences into unbiased random sequences.

Notably, our analysis of the extraction rates of these three classes of examples draws upon the machinery of effective ergodic theory, using certain effective versions of Birkhoff’s ergodic theorem (and, in the case of the Levin-Kautz procedure, an effective version of the Shannon-McMillan-Breiman theorem from classical information theory due to Hoyrup [14]).

The remainder of the paper is as follows. In Section 2, we lay out the requisite background for this study. Next, in Section 3, we formally define the extraction rate of a Turing functional, derive several preliminary results, and introduce several basic examples. We then turn to more general examples: Turing functionals derived from block maps in Section 4, Turing functionals derived from computable DDG trees in Section 5, and the Levin-Kautz procedure in Section 6. We conclude with several open questions in Section 7.

2. Background

2.1. Notation

The set of finite binary strings will be written as $2^{< ω}$ ; members of $2^{< ω}$ will be written as lowercase Greek letters, σ, τ, ρ, and so on. The set of infinite binary sequences will be written as $2^{ω}$ ; members of $2^{ω}$ will be written as uppercase Roman letters X, Y, Z. For a finite string $σ \in 2^{< ω}$ , let $| σ |$ denote the length of n. For two strings σ, τ, say that τ extends σ and write $σ ⪯ τ$ if $| σ | ⩽ | τ |$ and $σ (i) = τ (i)$ for $i < | σ |$ . For $X \in 2^{ω}$ , $σ ≺ X$ means that $σ (i) = X (i)$ for $i < | σ |$ . Let $σ^{⌢} τ$ denote the concatenation of $σ, τ \in 2^{< ω}$ ; we similarly define the concatenation $σ^{⌢} X$ of $σ \in 2^{< ω}$ and $X \in 2^{ω}$ . Let $X ↾ n$ denote the string $σ ≺ X$ with $| σ | = n$ . For $n < m$ , $X ↾ [n, m)$ denotes the string $X (n) \dots X (m - 1)$ . The empty string will be written as ϵ.

Two sequences $X, Y \in 2^{ω}$ may be coded together into $Z = X \oplus Y$ , where $Z (2 n) = X (n)$ and $Z (2 n + 1) = Y (n)$ for all n. For a finite string σ, let $[[σ]]$ denote ${X \in 2^{ω} : σ ≺ X}$ . We shall refer to $[[σ]]$ as the cylinder determined by σ. Each such interval is a clopen set and the clopen sets are just finite unions of intervals.

2.2. Trees

A nonempty closed set $P \subseteq 2^{ω}$ may be identified with a tree $T_{P} \subseteq 2^{< ω}$ where $T_{P} = {σ : P \cap [[σ]] \neq \emptyset}$ . Note that $T_{P}$ has no dead ends. That is, if $σ \in T_{P}$ , then either $σ^{⌢} 0 \in T_{P}$ or $σ^{⌢} 1 \in T_{P}$ (or both). For an arbitrary tree $T \subseteq 2^{< ω}$ , let $[T]$ denote the set of infinite paths through T; that is, $[T] = {X \in 2^{ω} : (\forall n) X ↾ n \in T}$ . It is well-known that $P \subseteq 2^{ω}$ is a closed set if and only if $P = [T]$ for some tree T. P is a $Π_{1}^{0}$ class, or an effectively closed set, if $P = [T]$ for some computable tree T.

2.3. Turing functionals

Recall that a continuous function $Φ : 2^{ω} \to 2^{ω}$ may be defined from a function $ϕ : 2^{< ω} \to 2^{< ω}$ , which we refer to as a representation of Φ, satisfying the conditions

For $σ, τ \in 2^{< ω}$ , if $σ ⪯ τ$ , then $ϕ (σ) ⪯ ϕ (τ)$ .

For all $X \in 2^{ω}$ , ${lim}_{n \to \infty} | ϕ (X ↾ n) | = \infty$ .

Note by the compactness of

2^{ω}

, a representation ϕ of a continuous function Φ satisfies the condition:

For all $σ \in 2^{< ω}$ and $m \in ω$ , there exists $n \in ω$ such that for every $σ \in {0, 1}^{n}$ , $| ϕ (σ) | ⩾ m$ .

We then have

Φ (X) = ⋃_{n} ϕ (X ↾ n)

. The total Turing functionals

Φ : 2^{ω} \to 2^{ω}

are those which may be defined in this manner from a computable representation

ϕ : 2^{< ω} \to 2^{< ω}

. We will sometimes refer to total Turing functionals as

tt

-functionals. The partial Turing functionals

Φ : \subseteq 2^{ω} \to 2^{ω}

are given by those

ϕ : 2^{< ω} \to 2^{< ω}

which only satisfy condition (i) (we will still refer to such functions as representations). In this case

Φ (X) = ⋃_{n} ϕ (X ↾ n)

may be only a finite string.

We set $dom (Φ) = {X : Φ (X) \in 2^{ω}}$ . For $τ \in 2^{< ω}$ we also define $\begin{matrix} Φ^{- 1} (τ) = {σ \in 2^{< ω} : τ ⪯ ϕ (σ) & (\forall σ^{'} ≺ σ) τ ⋠ ϕ (σ^{'})} . \end{matrix}$ In particular, by our above convention, we have $Φ^{- 1} (ϵ) = {ϵ}$ . Similarly, for $S \subseteq 2^{< ω}$ we define $Φ^{- 1} (S) = ⋃_{τ \in S} Φ^{- 1} (τ)$ . For $A \subseteq 2^{ω}$ , we denote by $Φ^{- 1} (A)$ the set ${X \in dom (Φ) : Φ (X) \in A}$ . Note in particular that $Φ^{- 1} ([[τ]]) = [[Φ^{- 1} (τ)]] \cap dom (Φ)$ .

2.4. Computable measures on $2^{ω}$

Recall that a measure μ on $2^{ω}$ is computable if there is a computable function $f : 2^{< ω} \times ω \to Q_{2}$ such that $| μ ([[σ]]) - f (σ, i) | ⩽ 2^{- i}$ . For a prefix-free $V \subseteq 2^{< ω}$ (i.e., for $σ \in V$ , if $σ ≺ τ$ , then $τ \notin V$ ), we set $μ ([[V]]) = \sum_{σ \in V} μ (σ)$ . Hereafter, we will write $μ ([[σ]])$ as $μ (σ)$ for strings σ and $μ ([[V]])$ as $μ (V)$ for $V \subseteq 2^{< ω}$ . We also denote the Lebesgue measure by λ, where $λ (σ) = 2^{- | σ |}$ for $σ \in 2^{< ω}$ .

2.5. Notions of algorithmic randomness

We assume that the reader is familiar with the basics of algorithmic randomness; see, for instance [8,21,28], or the more recent [10]. Let μ be a computable measure on $2^{ω}$ . Recall that a μ-Martin-Löf test is a sequence ${(U_{i})}_{i \in ω}$ of uniformly effectively open subsets of $2^{ω}$ such that for each i, $\begin{matrix} μ (U_{i}) ⩽ 2^{- i} . \end{matrix}$ Moreover, $X \in 2^{ω}$ passes the μ-Martin-Löf test ${(U_{i})}_{i \in ω}$ if $X \notin ⋂_{i \in ω} U_{i}$ . Lastly, $X \in 2^{ω}$ is μ-Martin-Löf random, denoted $X \in {MLR}_{μ}$ , if X passes every μ-Martin-Löf test. When μ is the Lebesgue measure λ, we often abbreviate ${MLR}_{μ}$ by $MLR$ .

We can obtain alternative notions of randomness by modifying the definition of a Martin-Löf test. We will work with two such alternatives in this paper. Let μ be a computable measure on $2^{ω}$ and $X \in 2^{ω}$ .

X is μ-Schnorr random (written $X \in {SR}_{μ}$ ) if and only if X is not contained in any μ-Martin-Löf test ${(U_{i})}_{i \in ω}$ with the additional condition that $μ (U_{i})$ is computable uniformly in i.

X is μ-Kurtz random (written $X \in {KR}_{μ}$ ) if and only if X is not contained in any $Π_{1}^{0}$ class of μ-measure 0 (equivalently, if and only if it is not contained in any $Σ_{2}^{0}$ class of μ-measure 0).

Note that

{MLR}_{μ} \subseteq {SR}_{μ} \subseteq {KR}_{μ}

for every computable measure μ.

We are particularly interested in the interaction between Turing functionals and computable measures on $2^{ω}$ . For a computable measure μ on $2^{ω}$ , a Turing functional $Φ : 2^{ω} \to 2^{ω}$ is μ-almost total if $μ (dom (Φ)) = 1$ .

Lemma 1.
A Turing functional Φ is μ-almost total if and only if ${KR}_{μ} \subseteq dom (Φ)$ .
Proof.
If ${KR}_{μ} \subseteq dom (Φ)$ , then clearly Φ is μ-almost total. For the other direction, observe that $dom (Φ)$ is a $Π_{2}^{0}$ subset of $2^{ω}$ . Thus, if Φ is μ-almost total, it follows that $2^{ω} ∖ dom (Φ)$ is a $Σ_{2}^{0}$ μ-nullset. Thus if $X \notin dom (Φ)$ , X cannot be μ-Kurtz random. □

3. Extraction rates

3.1. The definition of extraction rate via a representation

We are interested in a version of the use function of a Turing functional Φ which arises from a given representation ϕ. Let $u_{ϕ} (X, n)$ be the least m such that $| ϕ (X ↾ m) | ⩾ n$ . Then the extraction rate of the computation of $Y = Φ (X)$ from X is given by the ratio $\begin{matrix} \frac{n}{u_{ϕ} (X, n)}, \end{matrix}$ that is, the relative amount of input from X needed to compute the first n bits of Y.

There is an alternative definition which is more straightforward. The ϕ-output/input ratio of σ, ${OI}_{ϕ} (σ)$ , is defined to be $\begin{matrix} {OI}_{ϕ} (σ) = \frac{| ϕ (σ) |}{| σ |} . \end{matrix}$

Lemma 2.
For any Turing functional Φ with representation ϕ and any $X \in 2^{ω}$ such that $Φ (X) \in 2^{ω}$ , $\begin{matrix} lim_{n \to \infty} \frac{| ϕ (X ↾ n) |}{n} = lim_{m \to \infty} \frac{m}{u_{ϕ} (X, m)}, \end{matrix}$ provided that both limits exists.
Proof.
Fix an input X. Let $m_{0} = 0$ and let $m_{k + 1}$ be the least $m > m_{k}$ such that $| ϕ (X ↾ m) | > | ϕ (X ↾ m_{k}) |$ . Let $n_{k} = | ϕ (X ↾ m_{k}) |$ . Then for each $k > 0$ , $u_{ϕ} (X, n_{k}) = m_{k}$ and hence $\begin{matrix} {OI}_{ϕ} (X ↾ m_{k}) = \frac{n_{k}}{m_{k}} = \frac{n_{k}}{u_{ϕ} (X, n_{k})}, \end{matrix}$ so that the two sequences have identical infinite subsequences, and hence the limits must be equal (since they are assumed to exist). □

Let us write ${OI}_{ϕ} (X)$ for ${lim sup}_{n \to \infty} {OI}_{ϕ} (X ↾ n)$ ; we refer to this as the ϕ-extraction rate along X. For the specific examples of extraction rates that we calculate in the remaining sections, we will work with specific representations defined from the randomness extraction literature.
3.2. Canonical representations

Suppose that we are given an almost total Turing functional and would like to determine its extraction rate. Which representation should we use? For instance, we would like to say that the extraction rate for a constant function should be very low and should approach 0 in the limit. However, consider the following example.

Example 3.
Let $Φ (X) = 0^{ω}$ for all $X \in 2^{ω}$ and let $ϕ (σ) = 0^{| σ |}$ for all $σ \in 2^{< ω}$ . Then $u_{ϕ} (X, n) = n$ for all n and thus ${lim}_{n \to \infty} \frac{n}{u_{ϕ} (X, n)} = 1$ for all X.

To avoid this problem, we can work with a canonical representation of a Turing functional, which may be defined as follows.
Definition 4.
For any partial continuous function Φ, the canonical representation ϕ for Φ is defined by letting $ϕ (σ)$ be the longest common initial segment of all members of ${Φ (X) : σ ≺ X}$ .
Example 5.

The identity function ϕ on strings is the canonical representation of the identity function on $2^{ω}$ and thus for $X \in 2^{ω}$ , the use $u_{ϕ} (X, n) = n$ for all $n \in ω$ , so that ${lim}_{n \to \infty} \frac{n}{u_{ϕ} (X, n)} = 1$ .

If $Φ (X) = X \oplus X$ , then its canonical representation is given by $ϕ (σ) = σ \oplus σ$ for $σ \in 2^{< ω}$ (where $σ \oplus σ$ is the finite string defined as in the infinite case). Thus for $X \in 2^{ω}$ , ${lim}_{n \to \infty} \frac{n}{u_{ϕ} (X, n)} = \frac{1}{2}$ .

Note that if ϕ is the canonical representation for a constant function $Φ (X) = C$ , then we have $ϕ (σ) = C$ , an infinite sequence, for every σ. To avoid this unpleasantness, we can further restrict our functions to the non-constant functions.
Definition 6.
A partial continuous function Φ is nowhere constant if for every $σ \in 2^{< ω}$ , either there is some $X \in [[σ]] ∖ dom (Φ)$ or there exist $X_{1} \neq X_{2}$ in $[[σ]]$ such that $Φ (X_{1}) \neq Φ (X_{2})$ .

It is easy to see that if Φ is nowhere constant, then the canonical representation is a well-defined map taking strings to strings and satisfies condition (i) in the definition of a representative of a functional. Moreover, the canonical representation of a functional has the following nice property, which is immediate from the definition.
Lemma 7.
Let Φ be a partial continuous functional on $2^{ω}$ with canonical representation $ϕ : 2^{< ω} \to 2^{< ω}$ . Then for all σ such that $σ 0, σ 1 \in dom (ϕ)$ , if $ϕ (σ 0) ⪰ τ$ and $ϕ (σ 1) ⪰ τ$ , then $ϕ (σ) ⪰ τ$ .

Next we consider the computability of the canonical represenation.
Proposition 8.
If Φ is a total, nowhere constant Turing functional, then the canonical representation ϕ of Φ is computable.
Proof.
Let $Φ : 2^{ω} \to 2^{ω}$ be a total, nowhere constant Turing functional and let $ψ : 2^{< ω} \to 2^{< ω}$ be some computable representation of Φ. Then we can compute, for each m, a value $n_{m}$ such that $| ψ (σ) | ⩾ m$ for all strings σ of length $⩾ n_{m}$ . Now let a string σ be given. Since Φ is nowhere constant, we can compute a least value m such that there exist $τ_{0}, τ_{1} ⪰ σ$ such that $| ψ (τ_{i}) | ⩾ m$ for $i \in {0, 1}$ and $ψ (τ_{0}) \neq ψ (τ_{1})$ . Then the value $ϕ (σ)$ of the canonical representation can be computed by letting $ϕ (σ)$ be the common initial segment $ψ (σ^{'}) ↾ m$ for all $σ^{'}$ of length $n_{m}$ extending σ. □

On the other hand, if Φ is only a partial computable, nowhere constant function, then the canonical representation of Φ need not be computable.
Example 9.
Let E be some noncomputable c.e. set and define $Φ (0^{n} 1 X) = X$ if $n \in E$ and undefined otherwise. Then for the canonical representation ϕ of Φ, we have $ϕ (0^{n} 1 σ) = σ$ if $n \in E$ and $ϕ (0^{n} 1 σ) = ϵ$ , otherwise. We can modify this to get an almost total functional by letting $Φ (0^{n} 1 X) = X$ if either $n \in E$ or if $X (i) = 1$ for some i. In this case, for each $k \in ω$ , we have $ϕ (0^{n} 10^{k}) = 0^{k}$ if $n \in E$ and equals ϵ otherwise.
Proposition 10.
For any partial computable, nowhere constant Turing functional Φ, the canonical representation ϕ is computable in $\emptyset^{'}$ .
Proof.
Let ψ be some computable representation of Φ. Then for the canonical representation ϕ derived from ψ as in the proof of Proposition 8, we have $ϕ (σ) = τ$ if and only if
$(\exists n) (\forall σ^{'} \in {0, 1}^{n}) [σ ≺ σ^{'} ⟹ τ ⪯ ψ (σ^{'})]$ , and

for $i = 0, 1$ , $\neg (\exists n) (\forall σ^{'} \in {0, 1}^{n}) [σ ≺ σ^{'} ⟹ τ^{⌢} i ⪯ ψ (σ^{'})]$ .
Thus the graph of ψ is a $Σ_{2}^{0}$ set and in fact is a difference of c.e. sets. □

Lastly, we can define the output/input ratio of a Turing functional given in terms of its canonical representation. Definition 11.
Let Φ be a partial Turing functional with canonical representation ϕ. The Φ-output/input ratio given by σ, ${OI}_{Φ} (σ)$ , is defined to be $\begin{matrix} {OI}_{Φ} (σ) = \frac{| ϕ (σ) |}{| σ |} . \end{matrix}$ Similarly, for $X \in 2^{ω}$ we define ${OI}_{Φ} (X)$ to be $\begin{matrix} \underset{n \to \infty}{lim sup} \frac{| ϕ (X ↾ n) |}{n} . \end{matrix}$ We refer to ${OI}_{Φ} (X)$ as the Φ-extraction rate along X.

3.3. Average output/input ratios

For a given representation ϕ of a Turing functional Φ, we would like to define the average ϕ-output/input ratio. However, such an average depends on an underlying probability measure on $2^{ω}$ . Since we are interested, at least in part, in Turing functionals that extract unbiased randomness from biased random inputs, we need to consider average ϕ-output/input ratios parametrized by an underlying measure.

Definition 12.
Given $ϕ : 2^{< ω} \to 2^{< ω}$ , the average ϕ-output/input ratio for strings of length n with respect to μ, denoted $Avg (ϕ, μ, n)$ , is defined to be $\begin{matrix} Avg (ϕ, μ, n) = \sum_{σ \in 2^{n}} μ (σ) {OI}_{ϕ} (σ) . \end{matrix}$

Equivalently, we have $\begin{matrix} Avg (ϕ, μ, n) = \frac{1}{n} \sum_{σ \in 2^{n}} μ (σ) | ϕ (σ) | . \end{matrix}$ Note that this is the μ-average value of ${OI}_{ϕ} (X ↾ n)$ over the space $2^{ω}$ , since this function is constant on each interval $[[σ]]$ . That is, if we fix n and let $F_{n} (X) = {OI}_{ϕ} (X ↾ n)$ , then $F_{n}$ is a computable map from $2^{ω}$ to $R$ and the average value of $F_{n}$ on $2^{ω}$ is given by $\begin{matrix} \int_{2^{ω}} F_{n} (X) d μ (X) . \end{matrix}$

We consider the behavior of this average in the limit, which leads to the following definition (which is adapted from one provided by Peres in [23]).
Definition 13.
For a function $ϕ : 2^{< ω} \to 2^{< ω}$ , the μ-extraction rate of ϕ, denoted $Rate (ϕ, μ)$ , is defined to be $\begin{matrix} Rate (ϕ, μ) = \underset{n \to \infty}{lim sup} Avg (ϕ, μ, n) . \end{matrix}$ In the case that ϕ is the canonical representation of a functional Φ, we further define $\begin{matrix} Rate (Φ, μ) = Rate (ϕ, μ) . \end{matrix}$
Example 14.
Let $Φ (X) = X \oplus X$ , with a representation given by $ϕ (σ) = σ \oplus σ$ (which, as noted above, is the canonical representation of Φ). Then $| ϕ (σ) | = 2 | σ |$ and hence ${OI}_{ϕ} (σ) = 2$ for all strings σ. Thus the average ϕ-output/input ratio is 2. Certainly $u_{ϕ} (X, 2 n) = n$ but at the same time $u_{ϕ} ({(σ \oplus σ)}^{⌢} i) = i + 1$ , so that $u_{ϕ} (X, 2 n - 1) = n$ and hence $\frac{2 n - 1}{u_{ϕ} (X, 2 n - 1)} = 2 - \frac{1}{n}$ . Although these rates do not agree at each finite level, they do agree in the limit. Indeed, since ${OI}_{ϕ} (X ↾ n) = 2$ for all n, we have the limit ${OI}_{ϕ} (X) = 2$ for all X and hence the average output input ratio over all $X \in 2^{ω}$ is $\begin{matrix} Rate (Φ, μ) = Rate (ϕ, μ) = \int_{2^{ω}} {OI}_{ϕ} (X) d μ = 2 . \end{matrix}$ Moreover, ${lim}_{n \to \infty} \frac{n}{u_{ϕ} (X, n)} = 2$ as well.

An interesting problem is to determine for which Turing functionals the limsup in the definition of extraction rate can be replaced with a limit. The following is an example where the limit does not exist.
Example 15.
Given a fixed function $α : ω \to ω ∖ {0}$ , define the total functional $Φ_{α} (X)$ for any input X to be the infinite concatenation of the strings $X {(i)}^{α (i)}$ . Thus if $α (n) = 2$ for all n, then $Φ_{α} (X) = X \oplus X$ . If $α (n) = n + 1$ , then $\begin{matrix} Φ_{α} (X) = X (0) X (1) X (1) X (2) X (2) X (2) X (3) \dots . \end{matrix}$ Now let $α^{} (n) = \sum_{i < n} α (i)$ , which is a strictly increasing function. It is clear that for any* strictly increasing function $β : ω \to ω$ , there is a function α such that $β = α^{}$ and that, in general, α is computable if and only if $α^{}$ is computable. Fix α and $β = α^{}$ and let ϕ be the canonical representation of $Φ_{α}$ . Then we have $| ϕ (X ↾ n) | = β (n)$ for each $n \in ω$ , so that ${OI}_{ϕ} (σ) = \frac{β (n)}{n}$ for any string σ of length n. Now the behavior of this limit is completely arbitrary. For example, let $β (0) = 1$ and let $β (2^{n} + i) = 2^{n + 1} + i$ for all n and for all $i < 2^{n}$ . Then ${OI}_{ϕ} (σ) = 2$ whenever $| σ |$ is a power of 2, but ${OI}_{ϕ} (σ) = \frac{2^{n + 1} + i}{2^{n} + i}$ for $| σ | = 2^{n} + i$ and in particular, if $| σ | = 2^{n + 1} - 1$ , then ${OI}_{ϕ} (σ) = \frac{3 \cdot 2^{n} - 1}{2 \cdot 2^{n} - 1}$ . Thus ${lim sup}_{n \to \infty} Avg (ϕ, μ, n) = 2$ whereas ${lim inf}_{n \to \infty} Avg (ϕ, μ, n) = 1.5$ .

For ${lim}_{n \to \infty} Avg (ϕ, μ, n)$ to exist, the function ϕ must be regular in the relative amount of input needed for a given amount of output. The authors have studied some families of functions for which this is the case. First, there are the so-called online* continuous (or computable) functions, which compute exactly one bit of output for each bit of input (see [5]). On the other hand, there are the random continuous functions which produce regularity in a probabilistic sense. For example, the random continuous functions as defined by Barmpalias et al. [2] produce outputs which are roughly $\frac{2}{3}$ as long, on the average, as the inputs. See also [4].

Another example for which the limsup in the definition of rate is actually a limit is given by the following result.
Lemma 16.
Suppose there exists some $c \in ω$ such that $| ϕ (σ) | ⩽ c | σ |$ for all $σ \in 2^{< ω}$ and that there is some $r \in R$ such that $\begin{matrix} lim_{n \to \infty} \frac{| ϕ (X ↾ n) |}{n} = r \end{matrix}$ for μ-almost every $X \in 2^{ω}$ . Then $Rate (ϕ, μ) = r$ .
Proof.
Since there is some c such that $\frac{| ϕ (X ↾ n) |}{n} ⩽ c$ for all $X \in 2^{ω}$ , by the dominated convergence theorem, $\begin{array}{l} r = \int_{2^{ω}} lim_{n \to \infty} \frac{| ϕ (X ↾ n) |}{n} d μ (X) & = lim_{n \to \infty} \int_{2^{ω}} \frac{| ϕ (X ↾ n) |}{n} d μ (X) = \\ = lim_{n \to \infty} Avg (ϕ, μ, n) = Rate (ϕ, μ) . \end{array}$ □

In the next two sections, we consider examples of Turing functionals Φ given by representations ϕ for which the following two conditions hold:
${lim}_{n \to \infty} Avg (ϕ, μ, n)$ exists (for an appropriate choice of the measure μ), and

${OI}_{ϕ} (X) = {lim}_{n \to \infty} {OI}_{ϕ} (X ↾ n) = Rate (ϕ, μ)$ for all sufficiently μ-random sequences X.
That is, the extraction rate of ϕ is attained along sufficiently random inputs of Φ. In yet another section, we then consider an example in which condition (ii) holds, but for which the satisfaction of condition (i) is left as an open question.
4. The rate of block functionals

For fixed $n \in ω$ , an n-block map is a function $ϕ : 2^{< ω} \to 2^{< ω}$ satisfying the following property: Given $σ \in 2^{< ω}$ , we first write $σ = {σ_{1}}^{⌢} \dots^{⌢} {σ_{k}}^{⌢} τ$ , where $| σ_{i} | = n$ for $i = 1, \dots, k$ and $| τ | < k$ . Then we have $\begin{matrix} ϕ (σ) = ϕ {(σ_{1})}^{⌢} \dots^{⌢} ϕ (σ_{k}) . \end{matrix}$ That is, the behavior of ϕ is completely determined by its values of strings of length n (and is undefined on all strings of length $k < n$ ). An n-block functional is a Turing functional Φ that has an n-block map ϕ as its canonical representation. In this case we refer to ϕ as the n-block map associated to Φ. (Note that every n-block map can be extended to an $n k$ -block map for $k \in ω$ that induces the same functional. Thus, the requirement that an n-block functional has an n-block map as a canonical representation ensures that an n-block functional isn’t also an $n k$ -block functional for every $k \in ω$ .) We say that an n-block map $ϕ : 2^{< ω} \to 2^{< ω}$ is non-trivial if $| ϕ (σ) | > 0$ for some $σ \in 2^{n}$ .

Block maps show up in the literature on randomness extraction, where typically one attempts to extract a sequence of unbiased random bits from a biased source. For example, the 2-block map $ϕ : 2^{< ω} \to 2^{< ω}$ defined by setting

$ϕ (10) = 0$ ,

$ϕ (01) = 1$ , and

$ϕ (00) = ϕ (11) = ϵ$

is precisely von Neumann’s procedure. Other examples of block maps in the randomness extraction literature are the randomizing functions studied by Elias in [9], the iterations of von Neumann’s procedure studied by Peres in [23], and extracting procedures studied by Pae in [22].

We will determine the extraction rate of a n-block function with respect to a certain class of measures. An n-step Bernoulli measure is a Bernoulli measure on ${(2^{n})}^{ω}$ . That is, an n-step Bernoulli measure is obtained by taking an infinite product of copies of some fixed measure on the set $2^{n}$ . Clearly, an n-step Bernoulli measure extends naturally to a measure on $2^{ω}$ . Hereafter, we will use the term n-step Bernoulli measure to refer to this extension. Recall that a measure μ on $2^{ω}$ is positive if $μ (σ) > 0$ for all $σ \in 2^{< ω}$ .

Proposition 17.
Let $n \in ω$ . Suppose that μ is a positive n-step Bernoulli measure on $2^{ω}$ and $ϕ : 2^{< ω} \to 2^{< ω}$ is a non-trivial n-block map with associated n-block functional Φ. Then Φ is μ-almost total.
Proof.
Let $S = {τ \in 2^{n} : ϕ (τ) = ϵ}$ , which is not equal to $2^{n}$ since ϕ is non-trivial. Let $U = S^{ω}$ , the set of all infinite sequences built up by concatenating members of S. Since μ is positive and $S \neq 2^{n}$ , $\sum_{τ \in S} μ (τ) < 1$ , from which it follows that $μ (U) = 0$ . Next, for each $σ \in {(2^{n})}^{< ω}$ such that $| σ | = n k$ for some $k \in ω$ , let $U_{σ} = {σ^{⌢} X : X \in U}$ . Clearly $μ (U_{σ}) = μ (σ) \cdot μ (U) = 0$ , since μ is an n-step Bernoulli measure. Then $dom (Φ) = 2^{ω} ∖ ⋃_{σ \in {(2^{n})}^{< ω}} U_{σ}$ , from which it follows that $μ (dom (Φ)) = 1$ . □
Theorem 18.
Let μ be a positive n-step Bernoulli measure on $2^{ω}$ and $ϕ : 2^{n} \to 2^{< ω}$ a non-trivial n-block map with associated n-block functional Φ. Then $\begin{matrix} Rate (Φ, μ) = Rate (ϕ, μ) = Avg (ϕ, μ, n) . \end{matrix}$
Proof.
We first note that if we consider the bits $τ (0), \dots, τ (n - 1)$ of $τ \in 2^{n}$ as a sequence of random variables, then the expected value of $| ϕ (τ) |$ is $\begin{matrix} E (| ϕ (τ (0), \dots, τ (n - 1)) |) = \sum_{σ \in 2^{n}} μ (σ) | ϕ (σ) |, \end{matrix}$ from which it follows that $\begin{matrix} Avg (ϕ, μ, n) = \frac{1}{n} E (| ϕ (τ (0), \dots, τ (n - 1)) |) . \end{matrix}$

For $k \in ω$ , given a string of length $τ = {τ_{1}}^{⌢} \dots^{⌢} τ_{k}$ of length $n k$ (where $| τ_{i} | = n$ for $i = 1, \dots, k$ ), since μ is an n-step Bernoulli measure, the blocks $τ_{1}, \dots, τ_{k}$ are independent. Thus, the μ-expected number of output bits for a string of length $n k$ is $\begin{array}{l} E (| ϕ (τ (0), \dots, τ (n k - 1)) |) & = \sum_{i = 1}^{k} E (| ϕ (τ_{i} (0), \dots, τ_{i} (n - 1)) |) \\ = \sum_{i = 1}^{k} n \cdot Avg (ϕ, μ, n) = n k \cdot Avg (ϕ, μ, n) . \end{array}$ Thus $\begin{matrix} Avg (ϕ, μ, n k) = \frac{1}{n k} E (| ϕ (τ (0), \dots, τ (n k - 1)) |) = Avg (ϕ, μ, n) . \end{matrix}$ For $k \in ω$ and $i < n$ , we have $Avg (ϕ, μ, n k + i) ⩽ Avg (ϕ, μ, n k)$ , since the expected number of output bits of strings for inputs of length $n k + i$ is equal to the expected number of output bits for inputs of length $n k$ . It thus follows that $\begin{matrix} Rate (Φ, μ) = Rate (ϕ, μ) = \underset{k \to \infty}{lim sup} Avg (ϕ, μ, k) = lim_{k \to \infty} Avg (ϕ, μ, n k) = Avg (ϕ, μ, n) . \end{matrix}$ □
Theorem 19.
Given $n \in ω$ , let μ be a computable, positive n-step Bernoulli measure on $2^{ω}$ , and let $X \in 2^{ω}$ be μ-Schnorr random. Then for every non-trivial n-block map $ϕ : 2^{< ω} \to 2^{< ω}$ with associated n-block functional Φ, $\begin{matrix} {OI}_{Φ} (X) = Rate (Φ, μ) . \end{matrix}$

To prove Theorem 19, we first need to develop some background. Let $T : 2^{ω} \to 2^{ω}$ be the n-shift operator; that is, for $X \in 2^{ω}$ and $σ \in 2^{n}$ , $T (σ^{⌢} X) = X$ . For an n-step Bernoulli measure μ on $2^{ω}$ , T is μ-invariant, i.e., for any $τ \in 2^{< ω}$ , $μ (τ) = μ (T^{- 1} ([[τ]]))$ . Indeed, for any cylinder $[[τ]]$ , $\begin{matrix} T^{- 1} ([[τ]]) = \cup [[{σ τ : σ \in 2^{n}}]] . \end{matrix}$ Thus, $\begin{matrix} μ (T^{- 1} ([[τ]])) = \sum_{σ \in 2^{n}} μ (σ τ) = μ (τ) \sum_{σ \in 2^{n}} μ (σ) = μ (τ) . \end{matrix}$ Recall that a μ-invariant transformation $T : 2^{ω} \to 2^{ω}$ is ergodic if for any $A \subseteq 2^{ω}$ such that $T^{- 1} (A) = A$ , we have $μ (A) = 0$ or $μ (A) = 1$ . The following lemma is a useful characterization of ergodic transformations on $2^{ω}$ (see [29, Theorem 5.1.5, Theorem 6.3.4(1)]).
Lemma 20.
Let μ be a measure on $2^{ω}$ and let $T : 2^{ω} \to 2^{ω}$ be μ-invariant. Then T is ergodic if and only if $\begin{matrix} lim_{n \to \infty} \frac{1}{n} \sum_{i = 0}^{n - 1} μ (T^{- i} [[σ]] \cap [[τ]]) = μ (σ) μ (τ) \end{matrix}$ for all $σ, τ \in 2^{< ω}$ .

The following result is straightforward, but we include it here for the sake of completeness.
Lemma 21.
The n-shift on $2^{ω}$ is ergodic with respect to an n-step Bernoulli measure.
Proof.
We apply Lemma 20. Let $σ, τ \in 2^{< ω}$ be given. Then there is some $k \in ω$ and m with $0 ⩽ m < n$ such that $| τ | = n k + m$ . Then for $j = 2^{n - m}$ , there are strings $τ_{1}, \dots τ_{j}$ of length $n (k + 1)$ such that $[[τ]] = ⋃_{i = 1}^{j} [[τ_{i}]]$ . Then $T^{- (k + 1)} ([[σ]]) = \cup [[{ρ σ : ρ \in 2^{n (k + 1)}}]]$ . Note that for $ρ \in 2^{n (k + 1)}$ , $\begin{matrix} (1) & μ (T^{- (k + 1)} ([[σ]]) \cap [[ρ]]) = μ (σ) μ (ρ) \end{matrix}$ Then we have $\begin{matrix} T^{- (k + 1)} ([[σ]]) \cap [[τ]] = ⋃_{i = 1}^{j} (T^{- (k + 1)} ([[σ]]) \cap [[τ_{i}]]) \end{matrix}$ and hence $\begin{array}{l} μ (T^{- (k + 1)} ([[σ]]) \cap [[τ]]) & = \sum_{i = 1}^{j} μ (T^{- (k + 1)} ([[σ]]) \cap [[τ_{i}]]) \\ = \sum_{i = 1}^{j} μ (σ) μ (τ_{i}) by (1) \\ = μ (σ) μ (τ) \end{array}$ A similar argument shows that $μ (T^{- k^{'}} ([[σ]]) \cap [[τ]]) = μ (σ) μ (τ)$ for all $k^{'} ⩾ k + 1$ . It follows that $\begin{matrix} lim_{n \to \infty} \frac{1}{n} \sum_{i = 0}^{n - 1} μ (T^{- i} [[σ]] \cap [[τ]]) = μ (σ) μ (τ), \end{matrix}$ and hence by Lemma 20, T is ergodic. □

The last ingredient we will use in the proof of Theorem 19 is the following effective version of Birkhoff’s Ergodic Theorem due to Franklin and Towsner: Theorem 22 (Franklin-Towsner [11]).

Let μ be a computable measure on $2^{ω}$ and let $T : 2^{ω} \to 2^{ω}$ be a computable, μ-invariant, ergodic transformation. Then for any bounded computable function F and any μ-Schnorr random $X \in 2^{ω}$ , $\begin{matrix} lim_{k \to \infty} \frac{1}{k} \sum_{i = 0}^{k - 1} F (T^{i} (X)) = \int F d μ . \end{matrix}$

Proof of Theorem 19.
Let $X \in {SR}_{μ}$ . Given $n \in ω$ , let μ be an n-step Bernoulli measure on $2^{ω}$ and let T be the n-shift. We define $F (X) = \frac{| ϕ (X ↾ n) |}{n}$ . Then $\begin{matrix} \int F d μ = \sum_{σ \in 2^{n}} μ (σ) \frac{| ϕ (σ) |}{n} = Avg (ϕ, μ, n) = Rate (ϕ, μ), \end{matrix}$ where the last equality holds by Theorem 18. Next, for any μ-Schnorr random sequence $X \in 2^{ω}$ , $\begin{matrix} \frac{1}{k} \sum_{i = 0}^{k - 1} F (T^{i} (X)) = \frac{1}{k} \sum_{i = 0}^{k - 1} \frac{| ϕ (T^{i} (X) ↾ n) |}{n} = \frac{1}{n k} \sum_{i = 0}^{k - 1} | ϕ (X ↾ [n i, n (i + 1)) | = \frac{| ϕ (X ↾ n k) |}{n k}, \end{matrix}$ where the last equality follows from the fact that ϕ is an n-block map. Then $\begin{aligned} {OI}_{Φ} (X) & = lim_{n \to \infty} \frac{| ϕ (X ↾ n) |}{n} = lim_{k \to \infty} \frac{| ϕ (X ↾ n k) |}{n k} = lim_{k \to \infty} \frac{1}{k} \sum_{i = 0}^{k - 1} F (T^{i} (X)) \\ = \int F d μ = Rate (ϕ, μ) = Rate (Φ, μ), \end{aligned}$ where the third equality follows from Theorem 22, as the function F is bounded. □

5. The rate of functionals induced by DDG-trees

The next example we consider is given in terms of DDG-trees (discrete distribution generating trees), first introduced by Knuth and Yao in [18]. In their study, Knuth and Yao were concerned with generating an arbitrary nonuniform distribution using a random source of bits. Properly speaking, this task of nonuniform random number generation can be seen as a kind orf inverse of classical randomness extraction, which is concerned with transforming biased randomness into unbiased randomness. However, in our context, we can view nonuniform random number generation as biased randomness extraction, although the Knuth/Yao framework allows us to efficiently generate unbiased randomness over an arbitrary alphabet (possibly infinite) from unbiased randomness over ${0, 1}$ .

The main idea behind a DDG-tree is that it is a binary tree in which all nodes are either branching nodes or terminal nodes, the latter of which are labelled with symbols from some fixed alphabet A. Each branching node in the tree corresponds to the toss of an unbiased coin. Starting from the root of the tree, we follow the outcome of consecutive coin tosses until we arrive at a terminal node, which is then output. This in turn induces a probability distribution on A.

More precisely, we define a DDG-tree as follows. Given a tree $S \subseteq 2^{< ω}$ consisting of branching nodes and terminal nodes, we label the terminal nodes of S, the set of which is denoted by $D (S)$ , with values from a fixed set $A = {a_{1}, \dots, a_{k}}$ (which we will assume to be finite). In particular, we define a labeling function $ℓ_{S} : D (S) \to A$ such that for all $τ \in D (S)$ , $ℓ_{S} (τ) \in A$ is the label assigned to τ. To ensure we have a probability distribution on A, the labels on S must satisfy the following condition: For $i = 1, \dots, k$ , if we set $\begin{matrix} p_{i} = \sum_{ℓ_{S} (τ) = a_{i}} 2^{- | τ |}, \end{matrix}$ then $\begin{matrix} \sum_{i = 1}^{k} p_{i} = 1 . \end{matrix}$ The distribution ${p_{1}, p_{2}, \dots, p_{k}}$ on A is induced by the following process:

For each branching node in the tree, we use the toss of an unbiased coin to determine which direction we will take.

If we arrive at a terminal node τ, the process outputs $ℓ_{S} (τ)$ .

A DDG-tree S defines a function from

2^{ω}

to A as follows: For

X \in 2^{ω}

, the output determined by X is the unique element

a \in A

such that for some

n \in ω

X ↾ n

is a terminal node in S labelled with a, if it exists; otherwise, the output is the empty string ϵ. That is, we look for the first n such that

X ↾ n \in D (S)

, and if such an n exists, we output the value

ℓ_{S} (X ↾ n)

Knuth and Yao define the average running time of randomness extraction by a DDG-tree S to be $\begin{matrix} AvgRT (S) = \sum_{i \in ω} i \cdot λ ([[D (S) \cap 2^{i}]]) . \end{matrix}$ That is, $AvgRT (S)$ is the average number of input bits needed to produce a single output bit.

Hereafter we will restrict our attention to computable DDG-trees, where a DDG-tree S is computable if the set $D (S)$ is a computable set and the function $ℓ_{S} : D (S) \to A$ is computable (which together imply that the values $p_{1}, \dots, p_{k}$ assigned to members of A are computable).

We can use a computable DDG-tree S to define a Turing functional as follows. First, for every $σ \in D (S)$ , we set $ϕ_{S} (σ) = ℓ_{S} (σ)$ . Then for any $σ \in 2^{< ω}$ , if σ does not extend any $τ \in D (S)$ , then we set $ϕ_{S} (σ) = ϵ$ . However, if σ extends some $τ \in D (S)$ , then we can write $σ = {σ_{1}}^{⌢} \dots^{⌢} σ_{k}$ , where $σ_{1}, \dots, σ_{k - 1} \in D (S)$ and $σ_{k} \notin D (S)$ (and is possibly empty). Note that this decomposition is unique, as $D (S)$ is prefix-free. Then we set $\begin{matrix} ϕ_{S} (σ) = ϕ_{S} {(σ_{1})}^{⌢} \dots^{⌢} ϕ_{S} {(σ_{k - 1})}^{⌢} ϕ_{S} (σ_{k}) = ℓ_{S} {(σ_{1})}^{⌢} \dots^{⌢} ℓ_{S} {(σ_{k - 1})}^{⌢} ϵ . \end{matrix}$

We next extend $ϕ_{S}$ to a Turing functional $Φ_{S} : 2^{ω} \to A^{ω}$ . For $X \in 2^{ω}$ , we define a possibly finite sequence $n_{0}, n_{1}, \dots$ inductively as follows:

$n_{0}$ is the unique n such that $ℓ_{S} (X ↾ n) \in D (S)$ , if it exists; otherwise $n_{0}$ is undefined.

Suppose $n_{0}, \dots, n_{k}$ have been defined. Then $n_{k + 1}$ is the unique n such that $ℓ_{T} (X ↾ [n_{k}, n)) \in D (S)$ ; otherwise $n_{k + 1}$ is undefined.

Hereafter we will refer to the sequence of strings

{(X ↾ [n_{k}, n_{k + 1}))}_{k \in ω}

as the S-blocks of X.

If, for a given $X \in 2^{ω}$ , the corresponding infinite sequence ${(n_{i})}_{i \in ω}$ is defined, then we set $\begin{matrix} Φ_{S} (X) = ℓ_{S} {(X ↾ n_{0})}^{⌢} ℓ_{S} {(X ↾ [n_{0}, n_{1}))}^{⌢} \dots^{⌢} ℓ_{S} {(X ↾ [n_{k}, n_{k + 1}))}^{⌢} \dots \end{matrix}$ In the case that the corresponding sequence of block lengths is finite, then $Φ_{S} (X)$ is undefined.

The issue of determining the canonical representation of a Turing functional defined in terms of a DDG-tree is a delicate one. Knuth and Yao spend a considerable portion of their study [18] on the identification of the DDG-tree that most efficiently induces a distribution on a set A (as well as more general distributions), where this efficiency is given in terms of extraction rate. Hereafter, we will restrict our attention to DDG-trees S that are minimal with respect to extraction rate, which amounts to assuming that the corresponding map $ϕ_{S}$ on $2^{< ω}$ is the canonical representation of the associated Turing functional $Φ_{S}$ . Let us refer to such DDG-trees as minimal DDG-trees.

Proposition 23.
If S is a computable DDG-tree, then the Turing functional $Φ_{S}$ is almost total.
Proof.
First, observe that collection of cylinders determined by the elements of $D (S)$ yields a set of Lebesgue measure one. Indeed, $\begin{matrix} λ ([[D (S)]]) = \sum_{i = 1}^{k} \sum_{ℓ_{S} (τ) = a_{i}} 2^{- | τ |} = \sum_{i = 1}^{k} p_{i} = 1 . \end{matrix}$ It then follows that the set $P = {X \in 2^{ω} : (\forall n) ϕ_{S} (X ↾ n) = ϵ}$ is a $Π_{1}^{0}$ class of Lebesgue measure zero. As in the proof of Proposition 17, if we set $P_{σ} = {σ^{⌢} X : X \in P}$ , then we have $λ (P_{σ}) = λ (σ) \cdot λ (P) = 0$ . Then $dom (Φ_{S}) = 2^{ω} ∖ ⋃_{σ \in {(D (S))}^{< ω}} P_{σ}$ , and so we have $λ (dom (Φ_{S})) = 1$ . □

We would like to calculate the extraction rate for a Turing functional $Φ_{S}$ induced by a minimal DDG-tree S. To do so, we will first prove the following:
Theorem 24.
Let $X \in 2^{ω}$ be Schnorr random. Then for every computable, minimal DDG-tree S, we have $\begin{matrix} {OI}_{Φ_{S}} (X) = \frac{1}{AvgRT (S)} . \end{matrix}$

To prove Theorem 24, we would like to mimic the proof of Theorem 19. In particular, we need to find an appropriate effective version of Birkhoff’s ergodic theorem to derive the result. However, to do so, we need to define the appropriate measure-preserving transformation.
Definition 25.
Let $S \subseteq 2^{< ω}$ be a tree with $λ ([[D (S)]]) = 1$ . The tree-shift $T_{S} : 2^{ω} \to 2^{ω}$ is defined by setting $T_{S} (X) = Y$ , where $X = σ^{⌢} Y$ and $σ \in D (S)$ . Moreover, in the case that $X ↾ n \notin D (S)$ for all $n \in ω$ , $T_{S} (X)$ is undefined.

Note that if S is a computable DDG-tree, then the associated tree-shift $T_{S}$ is computable by an almost total Turing functional, as $T_{S}$ is defined on $[[D (S)]]$ .
Lemma 26.
If S is a tree with $λ ([[D (S)]]) = 1$ , then the tree-shift $T_{S}$ is λ-invariant and ergodic.
Proof.
First, we show λ-invariance. For $τ \in 2^{< ω}$ , we have $\begin{matrix} T_{S}^{- 1} ([[τ]]) = \cup [[{ρ τ : ρ \in D (S)}]] . \end{matrix}$ Then $\begin{matrix} λ (T_{S}^{- 1} ([[τ]])) = \sum_{σ \in D (S)} λ (σ τ) = \sum_{σ \in D (S)} λ (σ) λ (τ) = λ (τ) \sum_{σ \in D (S)} λ (σ) = λ (τ) . \end{matrix}$

Next, we prove that $T_{S}$ is ergodic. Towards this end, we claim that for every $σ, τ \in 2^{< ω}$ and $n \in ω$ , $λ (T_{S}^{- n} ([[σ]]) \cap [[τ]]) = λ (σ) λ (τ)$ . We show this by induction on n. For the case in which $n = 1$ , given $σ, τ \in 2^{< ω}$ , there is a prefix-free set ${τ_{i}}_{i \in ω} \subseteq D (S)$ such that $⋃_{i \in ω} [[τ_{i}]] \subseteq [[τ]]$ and $\begin{matrix} λ (τ) = \sum_{i \in ω} λ (τ_{i}); \end{matrix}$ that is, $[[τ]] = ⋃_{i \in ω} [[τ_{i}]]$ up to a set of λ-measure zero. Then $\begin{matrix} T_{S}^{- 1} ([[σ]]) \cap [[τ]] = ⋃_{i \in ω} [[τ_{i} σ]] \end{matrix}$ and hence $\begin{matrix} λ (T_{S}^{- 1} ([[σ]]) \cap [[τ]]) = \sum_{i \in ω} λ (τ_{i} σ) = λ (σ) \sum_{i \in ω} λ (τ_{i}) = λ (σ) λ (τ) . \end{matrix}$ Next, suppose that $λ (T_{S}^{- n} ([[σ]]) \cap [[τ]]) = λ (σ) λ (τ)$ . Then $\begin{matrix} T_{S}^{- (n + 1)} ([[σ]]) = T_{S}^{- n} (T_{S}^{- 1} ([[σ]])) = T_{S}^{- n} (\cup [[{ρ σ : ρ \in D (S)}]]) = ⋃_{ρ \in D (S)} T_{S}^{- n} ([[ρ σ]]) . \end{matrix}$ Then $\begin{array}{l} λ (T_{S}^{- (n + 1)} ([[σ]]) \cap [[τ]]) & = \sum_{ρ \in D (S)} λ (T_{S}^{- n} ([[ρ σ]]) \cap [[τ]]) \\ = \sum_{ρ \in D (S)} λ (ρ σ) λ (τ) = λ (σ) λ (τ) \sum_{ρ \in D (S)} λ (ρ) = λ (σ) λ (τ), \end{array}$ where the second equality follows from the inductive hypothesis. It follows that $\begin{matrix} lim_{n \to \infty} \frac{1}{n} \sum_{i = 0}^{n - 1} λ (T_{S}^{- i} [[σ]] \cap [[τ]]) = λ (σ) λ (τ), \end{matrix}$ and thus, by Lemma 20, $T_{S}$ is ergodic. □

The effective version of the ergodic theorem that we will use in the proof of Theorem 24 requires us to introduce some additional notions. First, a function is a.e. computable if it is computable on a $Π_{2}^{0}$ set of Lebesgue measure 1. As noted above, $T_{S}$ is computable on $[[D (S)]]$ , which is a $Σ_{1}^{0}$ class of measure 1, and so it is a.e. computable.

Next, a function $F : 2^{ω} \to R$ is effectively integrable (also $L^{1}$ -computable) if there is a computable sequence of rational step functions ${(s_{n})}_{n \in ω}$ such that $F (X) = {lim}_{n \to \infty} s_{n} (X)$ (whenever $F (X) ↓$ ) and for all $n \in ω$ , $\int | s_{n} - s_{n - 1} | d λ ⩽ 2^{- n}$ ; see, e.g. [24] or [20].

We now can formulate the relevant effective version of Birkhoff’s ergodic theorem, due to Gács, Hoyrup, and Rojas [12] (as observed by Rute [24], Gács, Hoyrup, and Rojas prove a slightly different result, but the proof of their result establishes the following). Theorem 27 (Effective Birkhoff’s Ergodic Theorem, version 2 [12] ).

Let μ be a computable measure on $2^{ω}$ and let $T : 2^{ω} \to 2^{ω}$ be an a.e. computable, μ-invariant, ergodic transformation. Then for any a.e. computable function F that is effectively integrable and any Schnorr random $X \in 2^{ω}$ , $\begin{matrix} lim_{k \to \infty} \frac{1}{k} \sum_{i = 0}^{k - 1} F (T^{i} (X)) = \int F d μ . \end{matrix}$

Proof of Theorem 24.
Let $X \in 2^{ω}$ be Schnorr random. We define $F : 2^{ω} \to ω$ so that $F (X)$ is the unique n such that $X ↾ n \in D (S)$ ; that is, F counts the number of input bits of a given sequence X needed to generate one bit of output using the DDG-tree S. Clearly F is also computable on $[[D (S)]]$ and is thus a.e. computable.

To see that f is effectively integrable, we define a sequence $s_{n}$ of rational step functions on $2^{ω}$ as follows: $\begin{matrix} s_{n} (X) = \{\begin{matrix} F (X) & if (\exists k ⩽ n) F (X) ↓ = k \\ n & otherwise . \end{matrix} \end{matrix}$ Observe that $s_{n + 1} (X) ⩽ s_{n} (X) + 1$ , and $s_{n} (X) = s_{n + 1} (X)$ if and only if there is some $k ⩽ n$ such that $X ↾ k \in D (S)$ . For $n \in ω$ , let us set $D (S) ↾ n = {σ \in D (S) : | σ | ⩽ n}$ . $\begin{array}{l} \int | s_{n} - s_{n - 1} | d λ & = \int s_{n} - s_{n - 1} d λ \\ = 1 \cdot λ (2^{ω} ∖ [[D (S) ↾ (n - 1)]]}) + 0 \cdot λ ([[D (S) ↾ (n - 1)]]}) \\ = λ (2^{ω} ∖ [[D (S) ↾ (n - 1)]]}) . \end{array}$

Since $λ ([[D (S)]]) = 1$ and $D (S)$ is a computable set, the sequence ${(λ (2^{ω} ∖ [[D (S) ↾ n]]))}_{n \in ω}$ is uniformly computable and converges to 0. Thus by choosing an appropriate subsequence of the functions ${(s_{n})}_{n \in ω}$ , it follows that F is effectively integrable.

Next, observe that $\begin{array}{l} (2) & \begin{matrix} \int F d λ & = \sum_{σ \in D (S)} \int_{[[σ]]} F d λ = \sum_{i \in ω} i \cdot 2^{- i} \cdot | {σ : σ \in D (S) \cap 2^{i}} | \\ = \sum_{i \in ω} i \cdot λ ([[D (S) \cap 2^{i}]]) \\ = AvgRT (S) . \end{matrix} \end{array}$

Given a Schnorr random $X \in 2^{ω}$ , if we repeatedly apply the tree-shift $T_{S}$ to X followed by the function F, we have $\begin{matrix} (3) & \sum_{i = 0}^{k - 1} F (T_{S}^{i} (X)) = n_{0} + \sum_{i = 0}^{k - 1} | [n_{i}, n_{i + 1}) | = n_{k} \end{matrix}$ where ${(n_{i})}_{i \in ω}$ is the sequence determined by the S-blocks of X. Then it follows from (3) that $\begin{matrix} (4) & lim_{k \to \infty} \frac{1}{k} \sum_{i = 0}^{k - 1} F (T_{S}^{i} (X)) = lim_{k \to \infty} \frac{n_{k}}{k} . \end{matrix}$ Thus, $\begin{matrix} lim_{k \to \infty} \frac{n_{k}}{k} = lim_{k \to \infty} \frac{1}{k} \sum_{i = 0}^{k - 1} F (T_{S}^{i} (X)) = \int F d λ = AvgRT (S), \end{matrix}$ where the first equality is (4), the second equality follows from Theorem 27, and the third equality comes from (2).

Lastly, we consider the values $\begin{matrix} \frac{| ϕ_{S} (X ↾ n) |}{n} \end{matrix}$ for $n \in ω$ . Fix $n \in ω$ , if ${(n_{i})}_{i \in ω}$ is the sequence determined by the S-blocks of X, then for the maximum value k such that $n_{k} ⩽ n$ , $\begin{matrix} \frac{| ϕ_{S} (X ↾ n) |}{n} = \frac{k}{n} . \end{matrix}$ Then $\begin{matrix} \frac{k}{n_{k + 1}} < \frac{| ϕ_{S} (X ↾ n) |}{n} ⩽ \frac{k}{n_{k}} . \end{matrix}$ Since $\begin{matrix} \frac{k}{n_{k + 1}} = \frac{k + 1}{n_{k + 1}} - \frac{1}{n_{k + 1}}, \end{matrix}$ we have $\begin{matrix} lim_{k \to \infty} (\frac{k + 1}{n_{k + 1}} - \frac{1}{n_{k + 1}}) < lim_{n \to \infty} \frac{| ϕ_{S} (X ↾ n) |}{n} ⩽ lim_{k \to \infty} \frac{k}{n_{k}} . \end{matrix}$ It thus follows that $\begin{matrix} {OI}_{Φ_{S}} (X) = lim_{n \to \infty} \frac{| ϕ_{S} (X ↾ n) |}{n} = lim_{k \to \infty} \frac{k}{n_{k}} = \frac{1}{AvgRT (S)} . \end{matrix}$ □
Corollary 28.
$Rate (Φ_{S}, λ) = \frac{1}{AvgRT (S)}$ .
Proof.
We apply Lemma 16. First, we have $\frac{| ϕ_{S} (σ) |}{| σ |} ⩽ 1$ for every $σ \in 2^{< ω}$ . By Theorem 24, $\begin{matrix} lim_{n \to \infty} \frac{| ϕ_{S} (X ↾ n) |}{n} = {OI}_{Φ_{S}} (X) = \frac{1}{AvgRT (S)} \end{matrix}$ for every Schnorr random sequence. The conclusion immediately follows from Lemma 16. □

6. The extraction rate of the Levin–Kautz conversion procedure

We now calculate the extraction rate of a general procedure due independently to Levin [19], Kautz [16], Schnorr and Fuchs in [26], and Knuth and Yao [18]. In addition, this procedure has been studied in the randomness extraction literature under the label of the interval algorithm (see, for instance, [13]) and is the main idea behind the data compression technique known as arithmetic coding (see [25, Chapter 4]).

Following [3], we will refer to this procedure as the Levin-Kautz conversion procedure. Here we prioritize Levin and Kautz, as they both used this procedure to study the conversion of Martin-Löf random sequences with respect to one measure into Martin-Löf random sequences with respect to another measure. In particular, Levin and Kautz use the procedure to prove the following:

Theorem 29 (Levin [19], Kautz [16]).

For every computable $μ, ν \in P (2^{ω})$ , if $X \in {MLR}_{μ}$ and X is not computable (and in particular, $μ ({X}) = 0$ ), then there is some $Y \in {MLR}_{ν}$ such that $X \equiv_{T} Y$ .

We will give the basic idea of Levin-Kautz conversion procedure using the succinct approach due to Schnorr and Fuchs [26] in the context of converting biased randomness into unbiased randomness (i.e., randomness with respect to the Lebesgue measure). For computable $μ \in P (2^{ω})$ and $σ \in 2^{< ω}$ , we define two subintervals of $[0, 1]$ : $\begin{matrix} {(σ)}_{λ} = [\sum_{i = 1}^{| σ |} 2^{- i} σ (i), \sum_{i = 1}^{| σ |} 2^{- i} σ (i) + 2^{- | σ |}] \end{matrix}$ and $\begin{matrix} {(σ)}_{μ} = [\sum_{τ <_{lex} σ; | τ | = | σ |} μ (τ), \sum_{τ ⩽_{lex} σ; | τ | = | σ |} μ (τ)] \end{matrix}$ (here $⩽_{lex}$ defines the lexicographic ordering on strings of a fixed length). We define a Turing functional $Φ_{μ \to λ}$ as follows: For $σ, τ \in 2^{< ω}$ , enumerate $(σ, τ)$ into $S_{Φ_{μ \to λ}}$ if ${(σ)}_{μ} \subseteq {(τ)}_{λ}$ . Thus, given a μ-random sequence X as input, $Φ_{μ \to λ}$ treats X as the representation of some $r \in [0, 1]$ the bit values of which are determined by the values of the measure μ. For instance, the first bit of r is a 0 if $r < μ (0)$ and a 1 if $r > μ (0)$ (and the procedure is undefined if $r = μ (0)$ ). $Φ_{μ \to λ}$ then outputs the standard binary representation of the real number r. One can verify that the resulting Turing functional $Φ_{μ \to λ}$ is μ-almost total and induces the Lebesgue measure on $2^{ω}$ .

More generally, for computable measures μ, ν, one can similarly define an almost total functional $Φ_{μ \to ν}$ that transforms μ-randomness into ν-randomness. Moreover, one can verify that, for non-computable sequences $X \in {MLR}_{μ}$ and $Y \in {MLR}_{ν}$ such that $Φ_{μ \to ν} (X) = Y$ ,

$(Φ_{μ \to ν} \circ Φ_{ν \to μ}) (X) = X$ , and

$(Φ_{ν \to μ} \circ Φ_{μ \to ν}) (Y) = Y$ .

Thus, given such a pair X and Y, we clearly have

X \equiv_{T} Y

We will consider this result in the context of strongly positive measures. A measure μ on $2^{ω}$ is strongly positive if there is some $δ \in (0, \frac{1}{2})$ such that for every $σ \in 2^{< ω}$ , $μ (σ 0 ∣ σ) \in [δ, 1 - δ]$ ; that is, all of the conditional probabilities associated with μ are bounded away from 0 and 1 by a fixed distance. The main theorem we will prove in this section is an effective, pointwise version of a result due to Uyematsu and Kanaya [31], who studied the extraction rate of the interval algorithm with respect to a general class of measures, namely the shift-invariant ergodic measures. Recall that for a shift-invariant ergodic measure μ on $2^{ω}$ , the entropy of μ is defined to be $\begin{matrix} h (μ) = lim_{n \to \infty} - \frac{1}{n} \sum_{| σ | = n} μ (σ) log μ (σ) . \end{matrix}$

Theorem 30.
Let μ and ν be computable shift-invariant ergodic measures that are strongly positive. Then for every non-computable $A \in {MLR}_{μ}$ , $\begin{matrix} {OI}_{Φ_{μ \to ν}} (A) = \frac{h (μ)}{h (ν)} . \end{matrix}$ In particular, in the case that $ν = λ$ , we have ${OI}_{Φ_{μ \to λ}} (A) = h (μ)$ .

Several remarks are in order. First, by the Shannon source coding theorem [6, Section 5.10], $\frac{h (μ)}{h (ν)}$ is the optimal rate for converting between μ-randomness and ν-randomness. Second, Han and Hoshi [13] showed that in the case that μ and ν are Bernoulli measures, $Rate (Φ_{μ \to ν}, μ) = \frac{h (μ)}{h (ν)}$ , but in the case that μ and ν are shift-invariant and ergodic, this is appears to be open (see [33, Remark 14]).

As a first step towards proving Theorem 30, we define an auxiliary function. Given $A \in {MLR}_{μ}$ and $B = Φ_{μ \to ν} (A)$ , let $ϕ_{μ \to ν}$ be the canonical representation of $Φ_{μ \to ν}$ and set $\begin{matrix} g (n) = max {k : ϕ_{μ \to ν} (A ↾ n) ⪰ B ↾ k} . \end{matrix}$ Equivalently, $g (n)$ is the maximum value k such that ${(A ↾ n)}_{μ} \subseteq {(B ↾ k)}_{ν}$ . It follows that the $Φ_{μ \to ν}$ -extraction rate of the computation $Φ_{μ \to ν} (A) = B$ is ${OI}_{Φ_{μ \to ν}} (A) = {lim sup}_{n \to \infty} \frac{g (n)}{n}$ .

We now calculate ${OI}_{Φ_{μ \to ν}} (A)$ for each $A \in {MLR}_{μ}$ . We will make use of two additional results. First, we use the following lemma due to Kautz:
Lemma 31 (Kautz [17]).

Suppose that μ and ν are computable and strongly positive, and let $δ \in (0, \frac{1}{2})$ satisfy the condition that $μ (σ 0 ∣ σ), ν (σ 0 ∣ σ) \in [δ, 1 - δ]$ for every $σ \in 2^{< ω}$ . Suppose further that for $A, B \in 2^{ω}$ we have $Φ_{μ \to ν} (A) = B$ .

For every $n \in ω$ , $\begin{matrix} μ (A ↾ n) ⩽ ν (B ↾ g (n)) . \end{matrix}$

There exists infinitely many $n \in ω$ such that $\begin{matrix} δ^{2} \cdot ν (B ↾ g (n)) ⩽ μ (A ↾ n) . \end{matrix}$

Next, we use an effective version of the Shannon-McMillan-Breiman theorem due to Hoyrup [14].

Theorem 32 (Hoyrup [14]).

Let μ be a computable shift-invariant ergodic measure on $2^{ω}$ . Then for every μ-Martin-Löf random sequence $X \in 2^{ω}$ , $\begin{matrix} lim_{n \to \infty} \frac{K (X ↾ n)}{n} = lim_{n \to \infty} \frac{- log μ (X ↾ n)}{n} = h (μ) . \end{matrix}$

With these pieces, we now turn to the proof of our theorem.

Proof of Theorem 30.
Let $Φ_{μ \to ν} (A) = B$ for $A \in {MLR}_{μ}$ . By Theorem 29, we have $B \in {MLR}_{ν}$ . Since μ and ν are strongly positive, choose $δ \in (0, \frac{1}{2})$ such that $μ (σ 0 ∣ σ), ν (σ 0 ∣ σ) \in [δ, 1 - δ]$ for every $σ \in 2^{< ω}$ . By part (i) of Lemma 31, we have $\begin{matrix} μ (A ↾ n) ⩽ ν (B ↾ g (n)) \end{matrix}$ for all $n \in ω$ . Applying the negative logarithm to both sides and dividing through by n yields $\begin{matrix} \frac{- log ν (B ↾ g (n))}{n} ⩽ \frac{- log μ (A ↾ n)}{n} \end{matrix}$ for all $n \in ω$ . It thus follows that $\begin{matrix} (5) & \underset{n \to \infty}{lim sup} \frac{- log ν (B ↾ g (n))}{n} ⩽ \underset{n \to \infty}{lim sup} \frac{- log μ (A ↾ n)}{n} \end{matrix}$ for all $n \in ω$ . Next, by part (ii) of Lemma 31, we have $\begin{matrix} δ^{2} \cdot ν (B ↾ g (n)) ⩽ μ (A ↾ n) \end{matrix}$ for infinitely many $n \in ω$ . Again, applying the negative logarithm to both sides and dividing through by n yields, for some $c \in ω$ , $\begin{matrix} \frac{- log μ (A ↾ n)}{n} ⩽ \frac{- log ν (B ↾ g (n)) + c}{n} \end{matrix}$ for infinitely $n \in ω$ . It thus follows that $\begin{array}{l} \underset{n \to \infty}{lim inf} \frac{- log μ (A ↾ n)}{n} & ⩽ \underset{n \to \infty}{lim sup} \frac{- log ν (B ↾ g (n)) + c}{n} \\ ⩽ \underset{n \to \infty}{lim sup} \frac{- log ν (B ↾ g (n))}{n} + \underset{n \to \infty}{lim sup} \frac{c}{n} \\ = \underset{n \to \infty}{lim sup} \frac{- log ν (B ↾ g (n))}{n} . \end{array}$ Combining this inequality with (5), we thus have $\begin{matrix} (6) & \underset{n \to \infty}{lim inf} \frac{- log μ (A ↾ n)}{n} ⩽ \underset{n \to \infty}{lim sup} \frac{- log ν (B ↾ g (n))}{n} ⩽ \underset{n \to \infty}{lim sup} \frac{- log μ (A ↾ n)}{n} . \end{matrix}$ By Theorem 32, since $A \in {MLR}_{μ}$ , it follows from our assumptions on μ that $\begin{matrix} (7) & \underset{n \to \infty}{lim inf} \frac{- log μ (A ↾ n)}{n} = \underset{n \to \infty}{lim sup} \frac{- log μ (A ↾ n)}{n} = lim_{n \to \infty} \frac{- log μ (A ↾ n)}{n} = h (μ) . \end{matrix}$ Combining (6) and (7), we conclude $\begin{matrix} (8) & \underset{n \to \infty}{lim sup} \frac{- log ν (B ↾ g (n))}{n} = h (μ) . \end{matrix}$

Next, we use the fact that for positive sequences ${(a_{n})}_{n \in ω}$ and ${(b_{n})}_{n \in ω}$ such that ${lim}_{n \to \infty} a_{n}$ exists, $\begin{matrix} \underset{n \to \infty}{lim sup} (a_{n} \cdot b_{n}) = (\underset{n \to \infty}{lim sup} a_{n}) (\underset{n \to \infty}{lim sup} b_{n}) \end{matrix}$ along with Equation (8) and the fact that $\begin{matrix} lim_{n \to \infty} \frac{- log ν (B ↾ g (n))}{g (n)} = h (ν), \end{matrix}$ which follows from Theorem 32, to derive the following: $\begin{aligned} h (μ) & = \underset{n \to \infty}{lim sup} \frac{- log ν (B ↾ g (n))}{n} = \underset{n \to \infty}{lim sup} (\frac{- log ν (B ↾ g (n))}{g (n)} \frac{g (n)}{n}) \\ = \underset{n \to \infty}{lim sup} (\frac{- log ν (B ↾ g (n))}{g (n)}) \underset{n \to \infty}{lim sup} (\frac{g (n)}{n}) \\ = lim_{n \to \infty} (\frac{- log ν (B ↾ g (n))}{g (n)}) \underset{n \to \infty}{lim sup} (\frac{g (n)}{n}) \\ = h (ν) \cdot {OI}_{Φ_{μ \to ν}} (A) . \end{aligned}$ From this we can conclude that $\begin{matrix} {OI}_{Φ_{μ \to ν}} (A) = \frac{h (μ)}{h (ν)} . \end{matrix}$ □

As noted above after the statement of Theorem 30, determining $Rate (Φ_{μ \to ν}, μ)$ appears to be an open question in the randomness extraction literature. Essentially, this problem boils down to finding a uniform bound of the sequence of functions given by $\frac{| ϕ_{μ \to ν} (X ↾ n) |}{n}$ to apply the dominated convergence theorem to calculate $Rate (Φ_{μ \to ν}, μ)$ .
7. Open questions

We conclude with several open questions. First, there is a general question about generalizing the results from Sections 4, 5, and 6 to apply to a broader class of Turing functionals:

Question 33.

What features of an almost total Turing functional Φ guarantee that $\begin{matrix} {OI}_{Φ} (X) = Rate (Φ, μ) \end{matrix}$ for the appropriate choice of measure μ and all sufficiently μ-random sequences X?

Next, the results we have established in showing the level of randomness that is sufficient for a sequence to witness the extraction rate of a Turing functional do not tell us what level of randomness is necessary for this to hold. Thus we can ask:

Question 34.

For each of the classes of Turing functionals that we have discussed, what is the level of randomness necessary for a sequence to witness the associated extraction rate?

References

Arora and

Barak, Computational Complexity: A Modern Approach, Cambridge University Press, 2009.

Barmpalias,

Brodhead,

Cenzer,

J.B.

Remmel and

Weber, Algorithmic randomness of continuous functions, Arch. Math. Logic 46(7–8) (2008), 533–546. doi:10.1007/s00153-007-0060-4.

Bienvenu and

Monin, Von Neumann’s biased coin revisited, in: 2012 27th Annual IEEE Symposium on Logic in Computer Science, IEEE, 2012, pp. 145–154. doi:10.1109/LICS.2012.26.

Cenzer and

C.P.

Porter, Algorithmically random functions and effective capacities, in: Theory and Methods of Computation (TAMC 2015), Lecture Notes in Computer Science, Vol. 9076, Springer Verlag, 2015, pp. 22–37.

Cenzer and

D.A.

Rojas, Online computability and differentiation in the Cantor space, in: Sailing Routes in the World of Computation, Lecture Notes in Comput. Sci., Vol. 10936, Springer, Cham, 2018, pp. 136–145. doi:10.1007/978-3-319-94418-0_14.

T.M.

Cover and

J.A.

Thomas, Elements of Information Theory, John Wiley & Sons, 2012.

Doty, Dimension extractors and optimal decompression, Theory Comput. Syst. 43 (2008), 425–463. doi:10.1007/s00224-007-9024-7.

R.G.

Downey and

D.R.

Hirschfeldt, Algorithmic Randomness and Complexity, Springer, 2010.

Elias, The efficient construction of an unbiased random sequence, Ann. Math. Statist. 43 (1972), 865–870. doi:10.1214/aoms/1177692552.

10.

J.N.Y.

Franklin and

C.P.

Porter, Key developments in algorithmic randomness, in: Algorithmic Randomness: Progress and Prospects,

J.N.Y.

Franklin and

C.P.

Porter, eds, Lecture Notes in Logic, Vol. 50, Cambridge University Press, 2020. doi:10.1017/9781108781718.

11.

J.N.Y.

Franklin and

Towsner, Randomness and non-ergodic systems, Moscow Mathematical Journal 14(4) (2014), 711–744. doi:10.17323/1609-4514-2014-14-4-711-744.

12.

Gács,

Hoyrup and

Rojas, Randomness on computable probability spaces – a dynamical point of view, Theory of Computing Systems 48(3) (2011), 465–485. doi:10.1007/s00224-010-9263-x.

13.

T.S.

Han and

Hoshi, Interval algorithm for random number generation, IEEE Transactions on Information Theory 43(2) (1997), 599–611. doi:10.1109/18.556116.

14.

Hoyrup, The dimension of ergodic random sequences, in: 29th International Symposium on Theoretical Aspects of Computer Science, LIPIcs, Leibniz Int. Proc. Inform., Vol. 14, Schloss Dagstuhl. Leibniz-Zent. Inform, Wadern, 2012, pp. 567–576.

15.

Impagliazzo and

Wigderson, P = BPP if E requires exponential circuits: Derandomizing the XOR lemma, in: Proceedings of the Twenty-Ninth Annual ACM Symposium on Theory of Computing, 1997, pp. 220–229.

16.

S.M.

Kautz, Degrees of Random Sets, ProQuest LLC, Thesis (Ph.D.)–Cornell University, Ann Arbor, MI, 1991, p. 129.

17.

S.M.

Kautz, Resource-bounded randomness and compressibility with respect to nonuniform measures, in: Proceedings of the International Workshop on Randomization and Approximation Techniques in Computer Science, Springer-Verlag, 1997, pp. 197–211. doi:10.1007/3-540-63248-4_17.

18.

D.E.

Knuth and

A.C.

Yao, The complexity of nonuniform random number generation, in: Algorithms and Complexity, Proc. Sympos., Carnegie-Mellon Univ., Pittsburgh, Pa., 1976, 1976, pp. 357–428.

19.

Levin and

A.K.

Zvonkin, The complexity of finite objects and the development of the concepts of information and randomness of means of the theory of algorithms, Uspekhi Mat. Nauk 25 (1970), 85–127.

20.

Miyabe,

L^{1}

-computability, layerwise computability and Solovay reducibility, Computability 2(1) (2013), 15–29. doi:10.3233/com-13015.

21.

Nies, Computability and Randomness, Oxford Logic Guides, Vol. 51, Oxford University Press, 2009, p. xvi+433.

22.

S-i.

Pae, Binarizations in random number generation, in: 2016 IEEE International Symposium on Information Theory (ISIT), IEEE, 2016, pp. 2923–2927. doi:10.1109/ISIT.2016.7541834.

23.

Peres, Iterating von Neumann’s procedure for extracting random bits, Ann. Statist. 20 (1992), 590–597.

24.

Rute, Algorithmic randomness and constructive/computable measure theory, in: Algorithmic Randomness: Progress and Prospects,

J.N.Y.

Franklin and

C.P.

Porter, eds, Lecture Notes in Logic, Vol. 50, Cambridge University Press, 2020.

25.

Sayood, Introduction to Data Compression, Morgan Kaufmann, 2017.

26.

C.-P.

Schnorr and

H.-P.

Fuchs, General random sequences and learnable sequences, The Journal of Symbolic Logic 42(3) (1977), 329–340. doi:10.2307/2272862.

27.

Shaltiel, An introduction to randomness extractors, in: International Colloquium on Automata, Languages, and Programming, Springer, 2011, pp. 21–41. doi:10.1007/978-3-642-22012-8_2.

28.

Shen,

V.A.

Uspensky and

Vereshchagin, Kolmogorov Complexity and Algorithmic Randomness, Vol. 220, American Mathematical Soc., 2017.

29.

C.E.

Silva, Invitation to Ergodic Theory, Vol. 42, American Mathematical Soc., 2008.

30.

Toska, Strict process machine complexity, Arch. Math. Logic 53 (2014), 525–538. doi:10.1007/s00153-014-0378-7.

31.

Uyematsu and

Kanaya, Almost sure convergence theorems of rate of coin tosses for random number generation by interval algorithm, in: 2000 IEEE International Symposium on Information Theory, IEEE, 2000, p. 457.

32.

von Neumann, Various techniques used in connection with random digits, Applied Math Series (1951), 36–38.

33.

Watanabe and

T.S.

Han, Interval algorithm for random number generation: Information spectrum approach, IEEE Transactions on Information Theory (2019).

Randomness extraction in computability theory

Abstract

Keywords

1. Introduction

2. Background

2.1. Notation

2.2. Trees

2.3. Turing functionals

2.4. Computable measures on 2 ω

2.5. Notions of algorithmic randomness

3.1. The definition of extraction rate via a representation

Theorem 29 (Levin [19], Kautz [16]).

Theorem 32 (Hoyrup [14]).

References

2.4. Computable measures on $2^{ω}$