For sequences of non-lattice weakly dependent random variables, we obtain asymptotic expansions for Large Deviation Principles. These expansions, commonly referred to as strong large deviation results, are in the spirit of Edgeworth Expansions for the Central Limit Theorem. We show that the results are applicable to Diophantine iid sequences, finite state Markov chains, strongly ergodic Markov chains and Birkhoff sums of smooth expanding maps & subshifts of finite type.
If is a sequence of independent identically distributed (iid) centred random variables with exponential moments, then Cramér’s Large Deviation Principle (LDP) states that if , then for all ,
where . This implies that tail probabilities of sums of iid random variables decay exponentially fast, i.e., for large N.
The following LDP (see [25, Chapter V.6]) provides the log large deviation asymptotics for more general sequences of random variables.
(Gärtner–Ellis).
Letbe a sequence of random variables. Suppose there existssuch that for,where Ω is strictly convex continuously differentiable function with. Then, for all, there existssuch thatwhereand.
It is natural to ask if the asymptotics of the tail probabilities, , could be made more precise. The standard approach to address this would be to look at the pre-exponential factor as well as the asymptotic expansions of the distribution function in the domain of large deviations.
(Strong Asymptotic Expansions for LDP).
Suppose satisfies the LDP with rate function I. Then, admits strong asymptotic expansion of order r for LDP in the range if there are functions for such that for each ,
This idea of expressing the errors in limit theorems as asymptotic expansions goes back to Chebyshev in [9]. In the setting of the Central Limit Theorem (CLT) these expansions, called the Edgeworth expansions, were first discussed rigorously in [10], later in [3,15,16,18,19,27,33,34], and more recently in [14,17,25,26,31]. Such expansions in the Local Limit Theorem (LLT) for iid lattice valued random variables are discussed in [15,27]. In [17,36], the same expansions are considered for weakly dependent lattice random variables.
In the absence of strong asymptotic expansions, weak expansions can be used to describe the asymptotics of large deviations. They are in the spirit of weak Edgeworth expansions in [6,17].
(Weak Asymptotic Expansions for LDP).
Suppose satisfies an LDP with rate function I. Let be a normed space of functions defined on . Then admits weak asymptotic expansion of order r for large deviations in the range for if there are functions (depending on f) for such that for each ,
The asymptotic expansions for LDP have wide range of applications. One such example is the problem of obtaining uniform asymptotics for the solutions of second order parabolic equations with periodic coefficients. In the 1-dimensional case these asymptotics have been obtained in [41]. In the higher dimensions, the main term of the asymptotic expansion valid up to the domain of large deviations was established in [24]. Such results are the key to studying the phenomena of intermittency in branching diffusion processes.
In statisitics, inference models can be improved using these precise large deviation results. See [1,8,28] and references therein. Moreover, one can use these expansions to describe tails of invariant measures of stochastic regression models. [23,30] discuss similar examples. In dynamical systems, the LDP for Birkohff sums is closely related to the problem of finding rates of escape from neighbourhoods of invariant sets. This is described in [42]. Finding exact large deviations gives a better idea of the capacity of invariant sets as a barrier to transport.
Our focus here is to establish natural conditions (in the context of dynamical systems & Markov processes) that guarantee the existence of strong and weak asymptotic expansions for LDP, and to verify them for a wide range of examples.
The first rigorous treatment of exact large deviation asymptotics for sums of iid random variables was done by Cramér in [10] assuming the existence of an absolutely continuous component in the distribution of . In [1], the strong asymptotic expansions of all orders are obtained when satisfies the 0-Diophantine condition, , or when is lattice valued. [8,28] describe the pre-exponential factor in large deviation asymptotics in the non-iid settng under a decay condition on the Fourier–Laplace transform of but do not discuss the higher order corrections. For geometrically ergodic Markov chains, these exact conndtioned are verified in [31].
There is a substantial body of work related to the large deviation asymptotics of densities (whose existence we do not assume). See for example, [12,27,37,39]. In addition, limit theorems and their higher order asymptotics have been studied for random matrix products. For example, LLN, CLT and LDP for random matrix products can be found in [32] and [5, Chapter 5]. In [7], the pre-exponential factor in the LDP is obtained. It should be noted that the general criterion that we have developed here does apply in the setting studied in [7]. In fact, their results follow from Theorem 2.3. For more recent work on random matrix products see [2,21,38].
Even though there is an extensive literature devoted to asymptotic expansions for the LDP, the conditions provided in them for the existence of expansions are far from being optimal. In contrast, the results presented here are sharp. In the classical setting of iid random variables, the conditions required in our paper are much less restrictive than in any of the previous work. Also, we discuss several examples in which the asymptotic expansions up to a finite order r exist while expansions of order do not.
We obtain weak expansions in several cases where strong expansions do not exist. In fact, we believe that our work is the first where the weak expansions are used in the context of large deviations. This is significant because the availability of weak expansions will be crucial in several applications mentioned above. Moreover, we obtain asymptotic expansions in several examples that were inaccessible by previous methods.
The abstract conditions guaranteeing the asymptotic expansions and the main results of the paper are presented in Section 2. Continuous time analogues of these results will be discussed in a sequel to this paper. The conditions we state are an extension of the Nagaev–Guivarc’h criterion, which is often used to establish the CLT for Markov processes and dynamical systems (see [5,20] for details). The idea behind the Nagaev–Guivarch´ approach is to first code the characteristic function of using iterations of an operator – a Markov operator or a Perron–Forbenius transfer operator – and then use the spectral properties of this operator to obtain results about .
We present the proofs of our results in Section 3. There are two key ideas behind these proofs: the Cramér’s transform, which exponentially tilts the distribution function of and the weak Edgeworth expansions for weakly dependent random variables found in [17].
In Section 4.1, we consider the iid setting and recover the results in [1] for non-lattice random variables. In Section 4.2, we provide an affirmative answer to a question raised in [1] about the existence of strong asymptotic expansions for LDPs for iid sequences that are neither 0-Diophantine nor lattice-valued. In Section 4.4, we show that for finite state Markov chains, weak expansions of all orders exist even when strong expansions of sufficiently high order (depending on the number of states) fail to exist. We discuss Markov chains with -densities in Section 4.3. We also discuss strong large deviation results for ergodic averages of smooth expanding maps and subshifts of finite type. These are obtained in Section 4.5 and Section 4.6 as a Corollary of Theorem 2.3.
The coefficients of these asymtptotic expansions are related to the asymptotic moments of the exponentially tilted , and hence to exponential moments of . This relationship is explicit because the coefficients are written as integrals of polynomials with coefficients depending on the exponential moments of . The derivation of polynomials follows a standard argument due to Cramér, and in the non-iid setting these polynomials are described in [17, Section 4] in detail. In fact, a precise description of the coefficients in both weak and strong asymptotic expansions along with an inductive algorithm to compute them are provided there.
Throughout the paper, we assume that N is large enough without explicitly mentioning that we do so, and make no attempt to find optimal constants in the error terms. However, we keep track of how the errors depend on the function in the weak expansion. The letter C is often used to denote constants and may refer to different constants, even in the same sentence. The subscripts present in these constants, like r and a in , describe how the constants depend on parameters.
Main results
Suppose that there exist a Banach space , a family of bounded linear operators , and vectors (the space of bounded linear functionals on ) such that
for for which the following conditions [B] and [C] are satisfied:
Condition [B]: There exists such that
is continuous on the strip and holomorphic on the disc .
For each , the operator has an isolated and simple eigenvalue and the rest of its spectrum is contained inside the disk of radius smaller than (spectral gap). In addition, .
For each , for all real numbers , the spectrum of the operator , denoted by , satisfies: .
For each , there exist positive numbers , and such that
for all , for all .
In the case of ergodic sums of dynamical systems, is the Ruelle–Perron–Forbenius transfer operator. Also, the relation (2.1) takes the form where is a twisted transfer operator, μ is the initial distribution and is the constant function 1. In the case of Markov chains, is the corresponding Markov operator and where is a Fourier kernel associated to and μ, are as before.
Suppose (B4) holds. Let be such that . Then, writing , for all , we have that and
Therefore,
where . Note that by fixing large enough, we can make as large as we want. Hence, given (B4), by reducing by an arbitrarily small quantity and choosing sufficiently large, we may assume is sufficiently large.
As a consequence of (B2), the operator , , takes the form
where is the eigenprojection corresponding to the top eigenvalue and . Due to (B1), we can use perturbation theory of bounded linear operators (see [29, Chapter 7]) to conclude that , and are analytic.
Condition [C]: For all , and .
Without loss of generality, we assume that i.e. is centred to simplify the notation. One can easily reformulate the results for non-centered using the corresponding results for .
Fix . Due to (2.1) and (2.2) we have that
Due to (B2) and (2.2), the spectral radius of is less than . So, . From the condition [C], . Thus, for large enough N,
for some . Therefore
Also, note that is analytic and strictly convex because , is analytic and . Also, (see [17, Section 4]). Now, applying Theorem 1.1, we conclude that satisfies the LDP in (1.2) with .
From the above calculations it is clear that for , and hence, for .
If , then exists and the LDP holds for all . This is because the function defined as is strictly increasing on . In fact, the function f is differentiable on and
Now, for all since is strictly convex. Thus, for all .
In order to state our main results, we introduce the function space given by
where . We call a function f (left) exponential of order α, if . Define the function space by
It is clear that if . Finally, define, .
The following two theorems give higher order asymptotics for the LDP in Theorem 1.1 in the weak and the strong sense, respectively.
Let. Suppose that conditions [B] and [C] hold. Then, for all, there existand polynomialsof degree at most, such that forand, for allwhereand.
We note that for a given a, the polynomials ’s are unique. To see this, fix a. Note that the order r asymptotic expansions for large deviations (strong and weak), if they exist, are unique. Thus, the constants defined as are unique for all k. Assume there exist two polynomials, and with . Since and is dense in , we have for all , we have that for . So, .
Let,. Suppose that conditions [B] and [C] hold with. Then, for all,where.
Moreover, we can evaluate the pre-exponential factor in LDPs under significantly weaker conditions. Namely, we obtain an improved version of Theorem E of [25] with precise asymptotics.
Suppose that (B1), (B2), (B3) and [C] hold. Then, for every,
Analogous results hold for . In fact, one can deduce the corresponding results for by considering and functions that are right exponential of order α. However, for simplicity we focus only on .
Proofs of the main results
Recall from Remark 2.3 that the LDP given by (1.2) holds under the conditions [B] and [C]. That is, given , there exists such that
where . So we fix , and take to denote value of for which is achieved. Since is the unique maximizer of analytic function on , . That is,
Observe that
where . Define, . Then,
From this, we have
The following lemma (whose proof we postpone till the end of the proof of the theorem) allows us to obtain the asymptotics of (3.2).
Suppose conditions [B] and [C] hold. Let. Then, for all, there are polynomialsof degree at most, such that for,,
We refer to this expansion as the weak expansion of for .
Since with , we have that . We show this when and . The argument for general q and r is similar. Suppose, , is continuous and exponential order . It is clear that is continuous. We need to show that , and are absolutely integrable. Since is exponential of order α, given , there exists an such that for all , , and therefore,
In addition, our assumptions imply that . Thus, , which shows that f is also exponential of order α.
Now, it remains to show that . This is true because there is such that for , , and .
Finally, to complete the proof of Theorem 2.1 we apply Lemma 3.1 to . □
For a fixed , from (2.2) and perturbation theory of bounded linear operators (see [29, Chapter 7]), there exists such that for all , can be expressed as
where is the eigenprojection to the top eigenspace of , the spectral radius of is less than , and . In addition, the spectral data are analytic with respect to the perturbation parameter because the perturbations are analytic. That is, , and are analytic in a neighbourhood of (see [29, Chapter 7]).
Iterating (3.3), we obtain
Define and . Then, for all ,
where and .
From (3.1) and the condition [C],
for some . Thus, there exists such that
First, we estimate the contribution to from the region away from . Fix as in (3.7). Due to (B3), the spectral radius of is strictly less than 1. Since is continuous, there exists such that for all (K as in (B4)). Thus,
Due to Remark 2.2, without loss of generality we assume that . From (B4),
Since , we have that and is bounded. Therefore,
Note that the integral is finite since . Combining these estimates, we obtain
From (3.7), we know that for all , . Thus, for , . Therefore,
Choosing , we have
Using (B3) and compactness, there exist and (which do not depend on n and s) such that for all . By (3.5),
Let us focus on the first term of (3.11). Put . Note that is analytic on because is analytic.
Now we are in a position to compute . To this end we make use of ideas in [17]. From (3.6), function can be written as
where ψ denotes the error term, and is analytic. That is
Denote by the order Taylor approximation of ψ. Then, is the unique polynomial such that . Also, and is a polynomial of degree r. In fact, we can write , where is analytic and . Thus,
Denote by the order r Taylor expansion of . Then, and with analytic such that . Now, substituting the Taylor expansions for and , and taking to be the remainder of when approximated by powers of up to order r:
Take . It is clear that is analytic and . Now, collecting terms in the RHS according to ascending powers of we obtain,
Notice that, (as a function) and k (as an integer) have the same parity. To see this, note that for each , ’s are formed by collecting terms with the common factor of . Observe that and are a polynomial in with no constant term, and therefore when we take powers of and , the resulting will contain terms of the form .
Note that . The highest power of s in , , is a result from the term in being raised to its kth power, i.e., above. Thus, are polynomials of degree . The lowest power of s in corresponds to and is equal to k. Next, define by
We write the Taylor approximation of :
where and
Therefore,
where
for large n. Hence,
From (3.12),
for . Substituting this in (3.14),
Since and k have the same parity, if is odd then
So only the positive integer powers of will remain in the expansion. Also, there is C that depends only on r and a such that
Choosing D such that ,
Therefore, fixing D large, we can assume the integrals to be over the whole real line.
Now, define and substitute in (3.16) to obtain
where
Combining, the above with (3.8) and (3.10) we obtain the required result. □
Take to be the distribution function of . Let be a function defined on some finite measure space such that it induces the finite measure on . Note that is not a random variable since the measure it induces on is not a probability measure. Take to be the distribution function of . That is,
Then, from the definition of the operator in (3.5), we obtain for all because
Also, recall that for , where is as in proof of Lemma 3.1,
From (3.12) and the estimate (which is explained below the equation (3.10)) for , we conclude that RHS converges to as . Hence, converges to the function , where and .
We denote the function inducing the measure on the real line by . Thus, converges weakly to .
Observe that
From (3.6), we have , , and therefore
We claim for some . Since has a spectral gap, by perturbation theory, there exists (uniform for ) such that
Therefore,
where Γ is the positively oriented circle centered at with radius ε. Hence,
Now the claim follows from the observation that . So, both the terms on RHS of (3.19) converge to zero as . Therefore, has asymptotic mean a.
We say that admits a strong asymptotic expansion of order r if admits the Edgeworth expansion of order r, i.e., there exist polynomials (whose parity as a function is the opposite of the parity of k) such that
uniformly for , where and . Note that these expansions, if they exist, are unique (the argument in Remark 2.4 applies).
The proof of the existence of the strong expansions is based on two intermediate lemmas (Lemma 3.2 and Lemma 3.3 below). The first lemma establishes that whenever the order r strong asymptotic expansion for exists, lower order weak expansions (as in Lemma 3.1) exist for . It is the Proposition A.1 in [17] adapted to our setting. The second lemma shows that whenever has weak expansions for the corresponding has strong expansions (of the corresponding order) for large deviations. Finally, to prove Theorem 2.2, we have to show that the conditions [B] and [C] imply the existence of strong expansions for .
Suppose thatadmits the order r strong asymptotic expansion. Then there are polynomialssuch thatfor.
Suppose . Define, . Observe that uniformly in x and
where are polynomials given by and Q is such that , i.e.,
Note that and are of opposite parity, because is of degree 1.
Next, we observe that
Now, we integrate by parts and use that (because ) and the fact that , are finite to obtain
From the Plancherel formula,
where and are given by the following relation,
This follows from the basic Fourier identity , and we refer the reader to [15, Chapter III, IV] for a detailed discussion. We also note that, by the uniqueness of expansions, these agree with the ones in (3.15). Also, by construction, and have the same parity. This means has the same parity as k.
Next, replace by in (3.22) to obtain
Then, substituting with its order Taylor expansion,
Put
to obtain
Since k and are of the same parity, when is odd. So we collect terms such that where and write
Then rearranging, simplifying and absorbing higher order terms into the error, we obtain
This is the order weak expansion for . □
Supposeis a sequence insatisfying the following:
There existssuch thatfor all k,
are uniformly bounded in,
pointwise,
For all m,
There existssuch that for all,
Then, for,
From (e) and (a) for ,
Now, (b) and (c) give us that
This along with assumption (d) allow us to take the limit in the RHS of (3.24) and to conclude that
This implies the result. □
We proceed as in Lemma 3.1 (see (3.3)–(3.13)) and obtain the polynomials and . Also, define polynomials and using the relations (3.23) and (3.21), respectively. Then define
Then, is the Fourier transform of . This follows from the definitions of these quantities.
From the Berry–Esséen inequality, [3, Lemma 12.2], for each there exists such that
Note that because . Also, both and are uniformly bounded in s and n. Therefore, choosing ( as in (3.3)), we have
We claim that
for sufficiently small γ. From the definition of ,
where as . As a result, for all the integrand of (3.27) can be made smaller than by choosing γ small enough. This establishes (3.27).
Combining (3.26) and (3.27), we obtain that for small γ,
where .
Take
where K is as in (B4).
Now we estimate the these integrals using (B3) and (B4). Since is a polynomial of as , is bounded uniformly in s and n (say by M). Therefore,
for some . By (B4), with (WLOG) for . Also, by assumption, . Thus,
By (B3), the spectral radius of is strictly less than 1. Since is continuous, there exist and such that for all for large n. Then, for sufficiently large n, we have
Combining the asymptoics for and ,
From (3.28) and (3.29), we deduce that RHS of (3.25) is . Therefore, uniformly in x. Since is uniformly bounded in and we have that decays exponentially fast. Thus, . By the derivation of , it is immediate that this expansion takes the form described in (3.20).
From Lemma 3.2, has the order r weak expansion on . Since where , we have that . Therefore,
for all , .
In particular, this holds for . Let be a sequence such that is a point-wise limit of and ’s satisfy the hypothesis of Lemma 3.3. (We construct such a sequence in Appendix x.) Then, by Lemma 3.3,
That is
□
Note that the coefficients of the strong expansion are obtained by replacing f with in coefficients of the weak expansions. Since ’s are bounded in , we can do this without altering the order of the error. However, for any , is not a pointwise limit of a sequence of functions in with bounded. To observe this, assume that are uniformly bounded and point-wise. Then, for all ,
This implies that for all . Clearly, this is a contradiction. Therefore, Theorem 2.1 does not automatically give us strong expansions. Indeed, in Section 4 we exhibit an example (see Section 4.2.2) where weak expansions exist when strong expansions fail to exist.
The proof of Theorem 2.3 is similar to that of Theorem 2.2. We include it for completeness.
Let . Since (B1) and (B2) hold, as before we have (3.12), where φ is analytic, and . As in the previous proof, Berry–Esséen inequality, [3, Lemma 12.2], given , there exists such that
Since as , we have
Also, we conclude that
as before. Because of (B3), there is such that
Combining these estimates, we conclude that admits the strong expansion of order 1. Therefore, admits the weak expansion order 0 for . As before, approximating by a sequence in , we conclude that
From (3.17), . Then,
From the duality of the Legendre transform, . Hence, we have the required form of the first order expansion. □
(B1) through (B4) with imply that satisfies the conditions (A1) through (A4) in [17] with . We observed above that this is enough to guarantee the existence of the order Edgeworth expansion for . However, we cannot directly apply the results in [17] because does not induce a probability measure.
Examples
iid random variables with Cramér’s condition
Let X be a non-lattice centred random variable whose logarithmic moment generating function is finite in a neighborhood of 0, denoted by J. Let be a sequence of iid copies of X. Then, from [13, Chapter 1], we have the LDP:
where the rate function I is given by
For each , there exists a unique such that .
We further assume that X satisfies the Cramér’s condition. That is,
This is equivalent to X being 0-Diophantine, a notion we define later in (4.4). These conditions are enough to guarantee the existence of weak and strong expansions for large deviations:
Let X be a non-lattice centered random variable whose logarithmic moment generating function is finite in a neighborhood of 0, and which satisfies the Cramér’s condition. Letbe a sequence of iid copies of X. Then, for all r,
admits the weak asymptotic expansion of order r for large deviations forin the range.
admits the strong asymptotic expansion of order r for large deviations in the range.
Take , and . Define acting on by . Then, by the independence of , . Since the moment generating function is finite on J, is analytic on the strip . So we have (2.1) and (B1). The validity of (B2) is immediate because is one-dimensional, for , and .
Take F to be the distribution function of X. For , we define to be a random variable with distribution function given by
Since X is non-lattice, and distribution of has a positive density with respect that of X, we have is also non-lattice. Therefore, for each ,
This is equivalent to (B3).
Since has a positive density with respect that of X, also satisfies the Cramér’s condition (see [1, Lemma 4]). Therefore, (4.3) holds uniformly in . That is, there exist such that for . Therefore, , for . This gives (B4) for arbitrary .
To see that [C] holds, observe that
From the Hölder’s inequality, , and the equality does not occur because X is not constant. Hence, . □
This provides an alternative proof for existence of strong asymptotic expansions for large deviations in [1, Theorem 2 (Case 1)] for iid sequences satisfying Cramér’s condition. We also recover, [1, Theorem 1 (Case 1, 3)], which gives us the first term of the expansions for non-lattice iid sequences.
Let X be a non-lattice centred random variable whose logarithmic moment generating function is finite in a neighborhood of 0. Letbe a sequence of iid copies of X. Then,admits the order 0 strong expansion for large deviations in the range.
To see this we only have to observe that (B1), (B2), (B3) and [C] hold as long as X is non-lattice (we used Cramér’s condition only when we established (B4) in the previous proof). So the result follows from Theorem 2.3. Also, we note that for all θ. Thus, we recover the results in [1] mentioned above. □
Compactly supported l-Diophantine iid random variables
A random variable X is called l-Diophantine if there exist positive constants and C such that
Equivalently, a random variable X with distribution function F is l-Diophantine if and only if there exists such that for all ,
where (see [6]).
Now, we describe two interesting classes of l-Diophantine random variables. In Case I, we discuss an iid sequence of compactly supported and l-Diophantine with () random variables, while in Case II we assume, in addition, that those random variables take finitely many values.
Case I
Let X be compactly supported and l-Diophantine with (). Then, assuming supp ,
where is as in (4.2). Thus, from (4.5), for all ,
So the random variable with distribution function is also l-Diophantine.
Let X be compactly supported and l-Diophantine with. Then,
For all r,admits the weak asymptotic expansion of order r for large deviations for allfor, where, for a suitable α depending on a.
For all,admits the strong asymptotic expansion for large deviations of order r in the range.
Taking as in Section 4.1, we can establish the condition [C], (B1), (B2) and (B3) as in the 0-Diophantine case. (B4) follows from the l-Diophantineness of . In fact,
and hence, it follows that whenever ,
where can be made arbitrarily small. So . □
Case II
Let X be a centred random variable taking values with probabilities , respectively. Then the logarithmic moment generating function of X is finite for all . Take to be a sequence of iid copies of X.
Take , , for and . Then a is called β-Diophantine if there is a constant such that for ,
In the rest of this section we assume that is β-Diophantine. In fact, almost all a are β-Diophantine provided (see [40]). Since a is β-Diophantine, the characteristic function of X satisfies
for some c. This follows from the following Lemma whose proof can be found in [14]).
Let X be a discrete random variable taking valueswith probabilities, respectively, andbe as defined above. Then there exists a positive constant c such that
Now we prove the existence of asymptotic expansions for large deviations in this setting.
Letbe β-Diophantine. Taketo be a sequence of iid copies of X. For all r,admits the weak expansion of order r forfor, where, for suitable α depending on a.
We define as in Section 4.1. Then the conditions (B1), (B2), (B3) and [C] are immediate from Section 4.1.
Due to Lemma 4.4, as a random variable, X is -Diophantine. Since has a positive density with respect that of X, is -Diophantine for all as in Section 4.2.1. That is for all θ, there exists such that
Therefore when , where can be made arbitrarily small. So (B4) holds with . □
However, one can show that strong expansions of order or higher do not exist. To see this, let be sum of n iid copies of (defined in Section 4.1). Note that takes different values. Therefore, has jumps of order . As a result, as , and may differ only by at most . This forces the order of the strong asymptotic expansion to satisfy , which gives us , as required. Thus, this is an example where weak expansions exist even when strong expansions fail to exist.
Time homogeneous Markov chains with smooth density
Take to be a time homogeneous Markov process on a compact connected manifold M with transition density , which is bounded away from 0 (non-degenerate). Let for a function . We assume that cannot be written in the form
where and is lattice valued. The following lemma characterizes such h (see [17]).
(
4.6
) holds iff there existssuch that the functionis lattice valued.
Note that the CLT holds for and the limiting normal distribution is degenerate if and only if (4.6) holds with constant (see [22]). Therefore, in our setting, the CLT is non-degenerate.
We need the following lemma to obtain the condition [B].
Letbe a positivefunction on. Let P be an operator ongiven byThen P has a simple leading eigenvalue, and the corresponding eigenfunction g is positive and.
From the Weierstrass theorem, is a uniform limit of functions formed by finite sums of functions of the form . Therefore, P can be approximated by finite rank operators. So P is compact on . Since P is an operator that leaves the cone of positive functions invariant, by a direct application of Birkhoff Theory (see [4]), P has a leading eigenvalue λ that is positive and simple along with a unique positive eigenfunction g with .
Since, and is k times continuously differentiable in x and is compact, we can differentiate under the integral sign k times. This means g is . □
The next theorem establishes the existence of strong and weak expansions for large deviations in this setting.
Taketo be a time homogeneous Markov chain on a compact connected manifold M withnon-degenerate transition density. Letfor afunctionthat does not satisfy (
4.6
). Takewith. Then, for all r,
admits the weak asymptotic expansion of order r for large deviations in the range, forwithand suitable α depending on a.
admits the strong asymptotic expansion of order r for large deviations in the range.
Take and consider the family of integral operators,
Let μ be the initial distribution of the Markov chain. Then, using the Markov property, we have . Now we check the condition [B].
It is straightforward that is entire and therefore (B1) holds. Note that, for all θ, is of the form P in Lemma 4.7. Therefore, (B2) holds for all θ. Take to be the top eigenvalue and to be the corresponding eigenfunction. Then is .
To show (B3) and (B4), we define a new operator as follows.
It is easy see to that
define a new Markov chain with the associated Markov operator . Observe that is a positive operator and (since is the eigenfunction corresponding to eigenvalue of ).
Now we can repeat the arguments in [17] to establish the properties of the perturbed operator given by
Since (4.6) does not hold, we conclude that (see [17, Section 3.6.3]).
Take to be the operator on that corresponds to multiplication by . Then . Therefore, is the scaled by . This implies as required.
Also, since is , we can integrate by parts, as in [17, Section 3.6.3], to conclude that there exist and such that for all . Therefore,
Now we establish [C]. Since (4.6) does not hold, the asymptotic variance of is positive. Taking to be the top eignevalue of , . Thus,
Put . Since , from (3.6) we have that . Thus, .
Note that , where is the projection onto the top eigenspace. From [25, Chapter III], , where is the top eigenfunction of , the adjoint of . Since itself is a positive compact operator acting on (the space of finitely additive finite signed measures), is a finite positive measure. Hence, for all θ.
The rate function is finite for , where . We observe that because h is bounded, i.e., . In fact,
To see this, note that is subadditive. So exists and is equal to . Given , there exists such that for all . Thus for all , and hence . Next, given , for all n, . Fix n. Then there exists a realization such that . Since h is uniformly continuous on , there exists such that by choosing from a ball of radius δ centred at , we have . We estimate the probability of choosing such a realization and obtain a lower bound for :
Therefore, , as required. □
Finite state Markov chains
Consider a time homogeneous Markov chain with state space whose transition probability matrix is positive. Suppose that is such that there are no constants and a d-vector H such that
for all . Define .
Next, define , and
We further assume that h is β-Diophantine, that is, there exists such that for all ,
If , then almost all h are β-Diophantine (see [40]). These assumptions yield the following result.
Taketo be a time homogeneous Markov chain on {1,…,d} with a positive transition probability matrix. Letforhthat does not satisfy (
4.7
) and is β-Diophantine. Take,. Then, for all r,admits the weak expansion of order r in the range, forwhere, and for suitable α depending on a.
We use ideas from the previous section and [17] to establish conditions [B] and [C].
Consider the family of operators ,
Take and , the initial distribution. Then, from the Markov property, we obtain . Obviously, is entire. So (2.1) and (B1) hold.
The matrix is positive for each θ, and hence, by the Perron–Frobenius theorem, has a positive leading eigenvalue that is simple, and the corresponding eigenvector is positive. In addition, P (corresponding to ) is stochastic. So its top eigenvalue satisfies . Since we deal with finite-dimensional spaces, the remaining part of (B2) follows immediately.
Next, define a new Markov chain corresponding to the stochastic matrix
Then the corresponding operator is
Also, (B3) follows because (4.7) does not hold. For a proof of this fact refer to [17, Section 3.6.2], where (B3) is proven for . From the Diophantine condition (4.8), there exists such that . For a proof of this, refer to [17, Section 3.6.2]. So
Thus, when . Note that the Diophantine nature of the matrix h is independent of the change of measure done in (4.10) and hence, the underlying Markov process. Therefore, the same proof applies to .
Note that , where corresponds to multiplication by . Therefore, for , which gives us (B4) with , where can be made arbitrarily small.
The same argument as in Theorem 4.8 adapted to finite state chains gives us condition [C] and the fact that the range of large deviations is , where
□
However, as in the case of discrete iid random variables, strong expansions of order or higher do not exist because the number of distinct values takes is at most .
Note that in the proof of the previous theorem, the Diophantine nature of h was not used in proving (B1), (B2), (B3) and [C]. Therefore, we also have the following first order asymptotics for large deviations for a general finite state Markov chain.
Taketo be a time homogeneous Markov chain onwith a positive transition probability matrix. Letforhthat does not satisfy (
4.7
). Thenadmits the order 0 strong expansion for large deviations in the rangewhere B is as in (
4.11
).
Sub-shifts of finite type
In this section, we prove an exact Large Deviation Principle for subshifts of finite type (SFT’s). Many concrete dynamical systems like Axiom A diffeomorphisms and Markov maps of the interval can be studied by converting them to SFT’s via a symbolic coding. See, for instance, [35] for a multitude of such examples. Hence, the exact Large Deviation Principle we establish here, applies beyond the setting in which it is introduced.
We recall some facts about SFT’s without proof. [35, Chapters 1–4] contain a detailed account of the theory as well as proofs of the following.
Let A be a matrix with only 0 and 1 as entries. Define
Let act on a sequence by truncating the first symbol and moving remaining elements to the left by one position, i.e., . Then, is called a subshift of finite type (also known as a topological Markov chain). Define the period d of A by and if , A is called aperiodic. Also, A is called irreducible if for all there exists N such that .
The Tychonoff product topology on is metrizable. Let . Then, a metric on Σ can be defined by where is the maximal N such that for all . This induces the product topology on and we consider its restriction to .
Let be continuous and . Take to be the α-Hölder functions with Hölder constant C. Then, if and only if for all . In particular, this characterizes the space of Lipschitz functions (corresponds to ) on which is denoted by . Define, , and . Then, is a Banach space such that -bounded sets are -compact.
From now on we focus only on -valued functions in . A function is called a coboundary if there exist such that , and it is said to be generic if the only solution to in is a constant F and . Note that if f is a coboundary then it is not generic. Given f and g, we say f and g are cohomologous if is a coboundary.
Define the pressure of f by
where is the entropy of σ with respect to μ and is the space of σ-invariant probability measures. Then, there is a unique σ-invariant probability measure m such that , and this m is called the stationary equilibrium state of f, and f, a potential of m. It follows that , and if f and g are cohomologous then . Given a stationary equilibrium state m and any two potentials of m, there is a constant c such that is cohomologous to c. We call f a normalized potential of m if f is a potential of m and . In fact, this potential f is unique upto a coboundary.
Now, we state and prove a strong large deviation result for irreducible, aperiodic SFT’s.
Supposeis a subshift of finite type with an irreducible, aperiodic A. Letbe-valued. Suppose m is the stationary equilibrium state of g and it is normalized. Letbe-valued. Supposeis generic, not cohomologous to a constant and. Define,with initial distribution m,andThen, for all, there exists a constantsuch that
We introduce the family of Ruelle operators , ,
We establish the conditions (B1), (B2), (B3) and [C] for this family of operators. Then, the result follows from Theorem 2.3.
It is straightforward that these are bounded linear operators, and that is analytic. Also,
From Ruelle–Perron–Forbenius Theorem ([35, Theorem 2.2]), for all , has a simple maximal positive eigenvalue given by with a positive eigenfunction and the rest of its spectrum is contained strictly inside . Also, by the choice of normalized potential g. This is (B2).
Since f is generic, for all , does not have eigenvalues on . This follows from the remarks appearing before the Theorem 4.13 in [35]. From [35, Theorem 4.5] if does not have an eigenvalue of modulus then its spectral radius is strictly smaller than . This establishes (B3).
Again, from the Ruelle–Perron–Forbenius Theorem, the projection of to the top eigenspace takes the form where is the equilibrium state of . Hence, . Also, from [35, Proposition 4.12], if and only if f is not cohomologous to a constant. Therefore, we have [C]. □
Smooth expanding maps
Uniformly expanding maps are the most basic type of uniformly hyperbolic systems, and as a result they have been studied extensively. Most of their statistical properties are well-known. See, for example, [20] and references therein. Here we establish an exact Large Deviation Principle for -observables in the setting of -expanding maps of the torus.
Suppose f is smooth and uniformly expanding on , i.e., , and there is such that . Let be such that there is constant c and such that
That is, g is not a continuous coboundary. Take , . If we choose an initial point x according to a probability density then becomes a sequence of random variables. WLOG assume .
Then, the following theorem establishes a strong large deviation result for .
Suppose,and uniformly expanding on. Letbe such that (
4.12
) does not hold. Takewith initial distribution μ. Define,, andwhereandis the Kolmogorov–Sinai entropy. Then, for allthere exists a constantsuch that
Take to be the transfer operator associated with f,
For , define by . That is,
Then, it follows from properties of the transfer operator that
Also, is analytic due to the power series expansion, . Note here that, because and .
From [11, Lemma A.1], we have that for , is of Perron–Forbenius type for all θ and the projection operator to the top-eigenspace takes the form where is positive and is a positive measure. That is for all θ, with where .
We need to verify that and for .
To see the former we note that
and if equality holds then g is a continuous coboundary (see [11, A.12b and Lemma A.16]). Therefore, in our setting, for all θ.
For the latter, we first show that , essential spectral radius of is at most , and there are no eigenvalues on . Observe that
where . From this it follows that
We note that from [11, Remark A.3] the spectral radii of and coincide. Now, from (4.17),
and from (4.16),
Thus, we obtain,
where depends only on s and θ. Since the unit ball in is relatively compact in , we can use [20, Lemma 2.2] to conclude that the essential spectral radius of is at most and the spectral radius of is at most .
Next, we normalize the family of operators ,
Then, where is multiplication by the function . Note that is invertible because . Now, and have the same spectrum. However, the eigenfunction corresponding to the eigenvalue 1 of changes to the constant function .
Assume is an eigenvalue of for . Then, there exists with . Observe that,
Also note that, is a positive operator. Hence, for all n. However,
because is the eigenfunction corresponding to the top eigenvalue. So for all x,
This implies that is constant. WLOG . So we can write for . Then,
Therefore,
for all x. Since,
and are unit vectors, it follows that
for all y. Because LHS is continuous,
Because g is not a continuous coboundary we have a contradiction. Therefore, does not have an eigenvalue on the unit circle when . So does not have eigenvalues on when .
Now, due to Theorem 2.3 the strong large deviation result (4.14) holds with
and
The entropy formulation of , (4.13), can be found in [11, Lemma 6.6]. □
Footnotes
Construction of { f k }
For each k, let for . Extend to in such a way that , is continuously differentiable and satisfying the following conditions.
is increasing on with derivative on is bounded above by 1.
is decreasing on with derivative bounded below by .
on .
on and elsewhere.
Then, is supported on . Here our choice of bounds 1 and in some sense arbitrary. As long as they are large enough and independent of k, we obtain an appropriate sequence of functions. As an example, the graph of looks like:
Since , for all ,
Since on , and is increasing on
Also, note that for all . Hence,
Put and . Then, is finite and depends only on γ and r.
Now, we have the following:
Acknowledgement
The authors would like to thank Dmitry Dolgopyat and Leonid Koralov for useful discussions and suggestions during the project and carefully reading the manuscript. While working on this article, Kasun Fernando was partially supported by NSERC Discovery grant 502617-2017 and Pratima Hebbar was partially supported by the ARO grant W911NF1710419 and the University of Maryland’s Research and Scholarship Award (RASA).
References
1.
R.R.Bahadur and R.Ranga Rao, On deviations of the sample mean, Ann. Math. Statist.31(4) (1960), 1015–1027. doi:10.1214/aoms/1177705674.
2.
Y.Benoist and J.F.Quint, Central limit theorem for linear groups, Ann. Probab.44(2) (2016), 1308–1340. doi:10.1214/15-AOP1002.
3.
R.N.Bhattacharya and R.Ranga Rao, Normal Approximation and Asymptotic Expansions, 1st edn, John Wiley and Sons, 1976.
P.Bougerol and J.Lacroix, Products of Random Matrices with Applications to Schrödinger Operators, 1st edn, Progress in Probability and Statistics, Birkhäuser Basel, Boston, 1985.
6.
E.Breuillard, Distributions diophantiennes et théorèmes limite local sur , Probab. Theory Related Fields132(1) (2005), 39–73.
7.
D.Buraczewsk and S.Mentemeier, Precise large deviation results for products of random matrices, Ann. Inst. H. Poincaré Probab. Statist.52(3) (2016), 1474–1513. doi:10.1214/15-AIHP684.
8.
N.R.Chaganty and J.Sethuraman, Strong large deviation and local limit theorems, Ann. Probab.21(3) (1993), 1671–1690. doi:10.1214/aop/1176989136.
9.
P.L.Chebyshev, Sur deux théorèmes relatifs aux probabilités, Acta mathematica14 (1890), 305–315.
10.
H.Cramér, Sur un nouveau theorémè-limite de la thèorie des probabilités, in: Actualités Scientifiques et Industrielles, Vol. 736, Hermann & Cie, Paris, 1938, pp. 2–23.
11.
J.De Simoi and C.Liverani, Limit theorems for fast-slow partially hyperbolic systems, Invent. Math.213 (2018), 811–1016.
12.
D.Deltuviene and L.Saulis, Asymptotic expansion of the distribution density function for the sum of random variables in the series scheme in large deviation zones, Acta Applicandae Mathematicae78 (2003), 87–97. doi:10.1023/A:1025783905023.
13.
F.den Hollander, Large Deviations, Fields Institute Monographs, Vol. 14. American Mathematical Society, Providence, RI, 2000.
14.
D.Dolgopyat and K.Fernando, An error term in the Central Limit Theorem for sums of discrete random variables, preprint.
15.
C.-G.Esséen, Fourier analysis of distribution functions. A mathematical study of the Laplace–Gaussian law, Acta Math.77 (1945), 1–125. doi:10.1007/BF02392223.
16.
W.Feller, An Introduction to Probability Theory and Its Applications Vol. II, 2nd edn, John Wiley & Sons, Inc., New York–London–Sydney, 1971.
17.
K.Fernando and C.Liverani, Edgeworth expansions for weakly dependent random variables, to appear in Ann I. H. Poincare-PR.
18.
B.V.Gnedenko and A.N.Kolmogorov, Limit Distributions for Sums of Independent Random Variables, Revised edn, Addison-Wisely, Reading MA, 1967. Translated from the Russian, annonated, and revised by K. L. Chung. With Appendices by J. L. Doob and P. L. Hsu.
19.
F.Götze and C.Hipp, Asymptotic expansions for sums of weakly dependent random vectors, Z. Wahrscheinlickeitstheorie Verw.64 (1983), 211–239. doi:10.1007/BF01844607.
20.
S.Gouezel, Limit theorems in dynamical systems using the spectral method. Hyperbolic dynamics, fluctuations and large deviations, in: Proc. Sympos. Pure Math., Vol. 89, AMS, Providence, RI, 2015, pp. 161–193.
21.
I.Grama, E.Le Page and M.Peigné, Conditioned limit theorems for products of random matrices, Prob. Theory and Relat. Fields168 (2017), 601–639. doi:10.1007/s00440-016-0719-z.
22.
Y.Guivarc’h and J.Hardy, Théorèmes limites pour une classe de chaînes de Markov et applications aux difféomorphismes d’Anosov, Annales de l’I. H. P. Probabilités et statistiques24(1) (1988), 73–98.
23.
Y.Guivarc’h and E.Le Page, Spectral gap properties for linear random walks and Pareto’s asymptotics for affine stochastic recursions, Ann. Inst. H. Poincaré Probab. Statist.52(2) (2016), 503–574. doi:10.1214/15-AIHP668.
24.
P.Hebbar, L.Koralov and J.Nolen, Branching diffusion processes in periodic media, 2019, preprint.
25.
H.Hennion and L.Hervé, Limit Theorems for Markov Chains and Stochastic Properties of Dynamical Systems by Quasi-Compactness, 1st edn, Lecture Notes in Mathematics, Springer-Verlag, Berlin Heidelberg, 2001.
26.
L.Hervé and F.Pène, The Nagaev–Guivarc’h method via the Keller–Liverani theorem, Bull. Soc. Math. France138(3) (2010), 415–489. doi:10.24033/bsmf.2594.
27.
I.A.Ibragimov and Y.V.Linnik, Independent and Stationary Sequences of Random Variables, Wolters-Noordhoff Publishing, Groningen, 1971, With a supplementary chapter by I. A. Ibragimov and V. V. Petrov. Translation from the Russian edited by J. F. C. Kingman.
28.
C.Joutard, Strong large deviations for arbitrary sequences of random variables, Ann. Inst. Stat. Math.65(1) (2013), 49–67. doi:10.1007/s10463-012-0361-1.
29.
T.Kato, Perturbation Theory for Linear Operators, Classics in Mathematics, Springer-Verlag, Berlin, 1995, reprint of the 1980 edition.
30.
H.Kesten, Random difference equations and renewal theory for products of random matrices, Acta Math.131 (1973), 207–248. doi:10.1007/BF02392040.
31.
I.Kontoyiannis and S.P.Meyn, Spectral theory and limit theorems for geometrically ergodic Markov chains, Ann. App. Prob.13(1) (2003), 304–362. doi:10.1214/aoap/1042765670.
32.
E.Le Page, Théorèmes limites pour les produits de matrices alèatoires, in: Probability Measures on Groups, Springer, Berlin Heidelberg, 1982, pp. 258–303.
33.
S.V.Nagaev, Some limit theorems for stationary Markov chains, Theory Probab. Appl.2(4) (1959), 378–406. doi:10.1137/1102029.
34.
S.V.Nagaev, More exact statement of limit theorems for homogeneous Markov chain, Theory Probab. Appl.6(1) (1961), 62–81. doi:10.1137/1106005.
35.
W.Parry and M.Pollicott, Zeta functions and the periodic orbit structure of hyperbolic dynamics, in: Astérisque, Société Mathématique de France, 1990, pp. 187–188.
36.
F.Pène, Mixing and decorrelation in infinite measure: The case of periodic Sinai billiard, Ann. Inst. H. Poincaré Probab. Statist.5(1) (2019), 378–411. doi:10.1214/18-AIHP885.
37.
V.V.Petrov, Sums of Independent Random Variables, Springer-Verlag, 1975.
38.
C.Pham, Conditioned limit theorems for products of positive random matrices, ALEA, Lat. Am. J. Probab. Math. Stat.15 (2018), 67–100. doi:10.30757/ALEA.v15-04.
39.
L.Saulis and V.A.Statulevicius, Limit Theorems for Large Deviations, Kluwer Academic Publishers, 1991.
40.
V.G.Sprindzuk, Metric Theory of Diophantine Approximations, Scripta Series in Math., V. H. Winston & Sons, Washington, DC, 1979.
41.
T.Tsuchida, Long-time asymptotics of heat kernels for one-dimensional elliptic operators with periodic coefficients, Proc. London Math. Soc.97(2) (2008), 450–476. doi:10.1112/plms/pdn014.
42.
L.-S.Young, Some large deviation results for dynamical systems, Trans. of the AMS318(2) (1990), 525–543. doi:10.1090/S0002-9947-1990-0975689-7.