Abstract
Trans-Gaussian Kriging is a popular method for predicting unobserved values of a nonstationary and non-Gaussian spatial process. However, the predictions generated by Trans-Gaussian Kriging are often biased, primarily because these are produced by applying nonlinear transformations to unbiased optimal predictors of the underlying Gaussian field. The existing approach to bias correction is based on analytical approximations that only alleviate the bias problem partially. In this paper, we formulate a bootstrap method for bias correction and show that under some conditions, it yields asymptotically unbiased predictions. Results from a moderately large simulation show that the proposed method works well in reducing the bias in finite samples. A real data example is also presented illustrating the methodology.
1 Introduction
Nonstationary and non-Gaussian spatial data appear frequently in practice. Although there is well established theory of unbiased prediction for intrinsically stationary spatial processes (see, for example, Cressie, 1991; Stein, 1999), the literature on unbiased prediction is parse for non-Gaussian nonstationary spatial processes. The most common approach to predicting a non-Gaussian spatial process at an unobserved location is Trans-Gaussian Kriging, where the observations are first transformed to a Gaussian process through a (known) link function and then standard theory of unbiased prediction or Kriging is applied to the Gaussian process. The resulting predicted value is then transformed back to the original scale by using the inverse of the link function. This yields a reasonable prediction of the original spatial process. However, the resulting naive predictor is biased, simply due to the fact that in general, E(φ(X)) ≠ φ(E(X)) for a nonlinear function φ.
The reasons for opting for unbiased estimators and predictors are well documented in the literature (cf. Lehmann and Casella, 1998). A statistical procedure with a nontrivial bias produces a distorted picture of the population and may lead to very unreasonable conclusions. To eliminate the bias, Cressie (1991) suggested an additive correction factor based on Taylor’s series expansion and some analytical considerations. Here we formulate a new methodology for bias correction based on the bootstrap that does not require any analytical consideration on the part of the user. The proposed methodology generates bias-corrected predictors essentially using an algorithm that can be implemented quite generally and that is applicable to a large class of nonlinear transformations. However, the method involves computation and provides a way to bypass the required analytical derivations, which are often intractable.
The main results of the paper address the bias properties of the existing as well as new bias-corrected predictors. In Section 4, we show that the bias-corrected predictor based on the Taylor’s series expansion eliminates the bias only partially, except for some very special types of transformations, namely, for low order polynomials. In comparison, we show that under some regularity conditions, the proposed bootstrap based method removes the bias of the naive predictor entirely in the ideal case where some of the unknown process parameters are known, and asymptotically when the unknown parameter are replaced by their consistent estimators. This is an important advancement over the existing solutions in the literature as these do not produce an asymptotically unbiased predictor of the non-Gaussian process in general. Simulation results indicate that the proposed bias corrections do not inflate the mean square prediction errors (MSPEs) by a significant amount. Indeed, the bootstrap based bias-corrected predictors have comparable (and sometimes smaller) MSPEs than the existing predictors.
The layout of the paper is as follows. In Section 2, we briefly describe the Trans-Gaussian Kriging method and present the bias-corrected predictor based on Taylor’s expansions. In Section 3, we describe the bootstrap based bias-corrected predictors—both when the process parameters are known and when these are unknown. Section 4 states the theoretical properties of the bias-corrected estimators. Sections 5 and 6 respectively, present results from a moderately large simulation study and an illustrative example involving real data from Oklahoma on the prediction of Heating Degree Days (HDDs). Proofs of the technical results are given in Section 7.
2 Background
2.1 Basic framework
Let Z(⋅) be a spatial process on a domain D ⊂
d
and let Z ≡ {Z(s
i
): i 1, …, n} denote the observations, where the sites s1, …, s
n
either lie on a regular grid (in which case, we take D ≡
d
, the d-dimensional integer grid) or are irregularly spaced (in which case D ≡
d
). Furthermore, let s0 ∈ D be a site such that Z(s0) is unobserved. We shall suppose that the observable process Z(⋅) is such that for a known invertible link function φ(⋅),
with
where µ(s; β) is a non-random function and where {(s): s ∈ D} is a zero-mean, second order stationary Gaussian process with variogram 2γ (⋅; θ), ≡ E((⋅) – (0))2 and covariogram C(⋅; θ) ≡ E(⋅) (0). Here we suppose that µ(⋅; β) and 2γ(⋅; θ) (and hence C(⋅; θ) are known, except for the finite dimensional mean parameters β ∈ Band covariance parametersθ∈ Θ. We now give two common examples of such Trans-Gaussian spatial processes.
Suppose that the Z(⋅) variables are positive with probability one and let
be a Gaussian process as in (2.1). Then Z(⋅) is a Log-Normal process.
Let Z(s) denote a (0, 1) valued random variable, e.g., a proportion, such as the mortality rate due to a certain disease. Let
where logit z log (z/[1–z]), z ∈ (0, 1) and where Y(s) is a Gaussian process as in (2.1). Then Z(⋅) is a Logit-Normal process.
2.2 Trans-Gaussian Kriging
Next we briefly describe the Trans-Gaussian Kriging method (cf. Cressie, 1991). The basic idea is to use the inverse link function to transform the observed spatial process to a Gaussian process and then detrend the data and perform Kriging on the residuals. To fix ideas, first consider the ‘ideal’ case where the parameters β andθare known. In this case, the exact mean of Y(⋅)’s are known and hence, we may work with the Gaussian error variables (s) Y(s) – µ(s; β) directly. Let
where λ′ (λ1, …, λn) are the Ordinary Kriging weights given by (cf. Cressie, 1991 p. 123)
where γ ≡ (C(s0 – s1; θ), …, C(s0 – s
n
; θ))′ and Σ be the n × n matrix with (i, j)th entry C(s
i
– s
j
; θ) and 1 is the n ×1 vector of 1 s. The ‘natural’ but biased Trans-Gaussian Kriging predictor of Z(s0) in the ‘ideal’ case is then given by
In practice, the parameters β andθare typically unknown and hence, the error variables (s
i
)’s are not observable. As a result, the OKP in (2.2) cannot be directly used in practice. Although β andθare unknown, we shall suppose that estimators
Also, let
where
2.3 Bias-corrected predictors
As pointed out by a referee, the predictor in (2.4) is the (conditional) ‘maximum likelihood predictor’ of Z(s0) in the Gaussian setting. However, both the predictors in (2.4) and (2.6) are biased for a nonlinear φ(⋅) such as in Examples 1 and 2. Often, this bias is significant when Y(s0) is in the tails of its marginal univariate normal distribution. Then, depending on the extremity of the transformation performed by φ(⋅), the bias of
where
and where
where φ(i)(⋅) denotes the ith derivative of φ(⋅). Notice that the approximate predictor in (2.7) ignores the terms in this summation for all i ≥ 3. While this yields a bias-corrected predictor for low order polynomials, such as φ(y) y3 (with the higher order derivatives zero), this fails to correct the bias of most nonlinear functions with nonzero higher order derivatives. In the next section, we construct a bias-corrected predictor based on a spatial bootstrap method that produces asymptotically unbiased predictors for a large class of nonlinear functions, thereby improving upon the bias correction method described above.
3 Bootstrap based bias-corrected prediction
There are many ways of adjusting for the bias of the estimated Kriging predictor
is approximately unbiased. Here, the subscript M corresponds to a ‘multiplicative’ bias correction. The multiplicative bias correction factor c described in (3.1) works the best for nonnegative Z-variables, such as in the case of log-normal Kriging. For Z-variables with an unrestricted range, such as in power transformation models, an alternative is to use an additive correction factor a based on the bootstrap such that
is approximately unbiased. To motivate the main steps, we first describe the bootstrap methodology for the simpler ‘ideal’ case where the parameter values β andθare known, in Section 3.1, and then extend it to the unknown parameter case in Section 3.2.
3.1 Known parameters
For now, we will work in the setting where we assume that we know the values of the covariance parameterθand the mean parameter β. Note that in the multiplicative case that the factor that makes
while in the additive case,
The explicit forms of c(β, θ) and a(β, θ) are difficult to derive. In this section, we present a bootstrap method for estimating the factors c(β, θ) and a(β, θ) that does not require the user to do such analytical work; it replaces the tedious and intractable analytical derivation by a simple, albeit computer intensive bootstrap algorithm.
The bootstrap algorithm for the known parameter case for estimating a and c is as follows:
Generate *(s0), *(s1), …, *(s
n
), from the Gaussian process with covariogram C(⋅; θ). Note that this involves the generation of n + 1 observations. Compute Compute Let
In practice,
3.2 Unknown parameters
In real life applications, the true values of the parameters β andθare unknown and hence, the aforementioned predictors must be calculated using their estimates. We suppose that estimators Generate *(s0), *(s1), …, *(s
n
) from the Gaussian process with estimated covariogram For i 0, 1, …, n, define the bootstrap variables Use the ‘bootstrap observations’ {Z*(s
i
): i 1, …, n} in place of {Z(s
i
): i 1, …, n} and the same estimation procedure to find the bootstrap version Compute Next, define the bootstrap version of the estimated OKP where Set
Then, the bootstrap-based bias-corrected predictors of
respectively. Again, a Monte Carlo simulation is usually used to approximate
A similar construction applies to
4 Theoretical results
In this section, as well as in the accompanying proofs, we highlight the dependence of various quantities depending on n. Thus, we shall write
4.1 Bias of the existing predictors
The main results of this section give explicit expressions for the bias of the naive predictor
(C.1)
The function φ(⋅) admits a power series representation given by
for some For some K1 ∈(0, ∞),
(C.2) As n → ∞,
Next we comment about the regularity conditions. Condition (4.1) on φ(⋅) implies that φ(⋅) is infinitely diffentiable, although a more complicated expression can be derived for functions φ(⋅) that are only twice differentiable. The power series representation is specifically used for deriving an explicit expression for the bias of the naive predictor in terms of
Condition (4.2) is a sufficient condition for deriving the series representation of Bias
Finally, Condition (4.3) is a mild requirement on the estimators
Similarly, if
The following result gives the leading terms in the expansions for the biases of the naive predictor
Note that under Conditions (C.1) and (C.2), the bias of the naive predictor is the difference between the values of an analytic function of
4.2 Bias of the bootstrap based predictors
We now consider the bias properties of the bootstrap based bias-corrected predictors
We shall make use of the following conditions for studying the bias properties of the bootstrap based predictors.
(C.3) There exist δ0 0 and {αi}i≥1 ⊂ [0, ∞) with |λin(t)| ≤ αi for all i 1, …, n and t ∈ Θ, and {τn (⋅)}n≥1 is equicontinuous at the true value θ. φ(⋅) is differentiable and there exists a monotone function Φ(1): [0, ∞) → [0, ∞) such that |φ(1) (x)|≤ Φ(1) (|x|) for all x ∈ , and for n sufficiently large,
for all (b, t) ∈ N (δ0) where K4 (1 + α∞) K1. There exists a sequence {bn}n≥1 with bn → 0 as n → ∞ such that for all (b, t) ∈ N(δ0) and n ≥ 1,
for any δ 0.
C(0, t) K3 for all t and C(0, ) is continuous atθwith C(0,θ) 0.
Now we briefly comment on the conditions. Condition (C.3)(i) is a smoothness condition on the Kriging weights. These can be guaranteed by assuming that the covariance matrix Σ
n
(θ) and its inverse are bounded in the spectral norm and that the covariogram decays quickly and is Hölder continuous at the true value θ. See Bandyopadhyay and Lahiri (2010), where a similar condition has been used in the time series context. In the same spirit, Conditions (C.3)(ii) and (iii) are smoothness conditions on the prediction variance τ(⋅) and the mean function µ(⋅, ⋅). Condition (C.3)(iv) is used for proving asymptotic negligibility of second order terms in the Taylor’s expansion in the L1-metric, uniformly in a neighbourhood of the true parameter value. Condition (C.3)(v) says that the estimators
The main result of this section is the following:
Note that by construction, the ‘ideal’ bootstrap based bias-corrected predictors are exactly unbiased (for any given sample size n). The error term o(1) in (4.6) results from estimating the unknown parameters β andθby
In the next section, we compare the finite sample performance of the different bias-corrected predictors, both in terms of their biases as well as in terms of their MSPEs.
5 Simulations
5.1 Framework
For the purposes of these simulations, we set D to be a suitable subset of
These choices for D allow for the observance of the asymptotic behaviour of the predictor for both expanding and filling in the observed data sites. Notice, for example, that as we move from D1 to D3 to D5, the window expands from a 7 × 7 grid to a 20 × 20 grid. Also, D2 is a filled in version of D1, and D4 is a filled in version of D2, allowing for closer neighbours to be used for prediction.
We then generated (s0), …,(s
n
) from a zero mean, second order stationary Gaussian process with one of the following covariograms:
Matérn: Exponential:
The first covariogram is a special case of the Matérn (1960) class of covariograms with variance equal to 1 and a smoothness parameter of 0.5, while the second is an exponential covariogram. These two covariograms give examples of both isotropic and anisotropic models.
Next, we let µ(⋅; β) ≡ 1. Then,
Because log-Normal Kriging and the Box-Cox Transformation are frequently used in practice for Trans-Gaussian data, here we consider the following functions to transform to Y(s0), …,Y(s
n
) to Z(s0), …, Z(s
n
):
φ(y): y3 φ(y): ey φ(y): e2y.
Since the mean is constant for all s, we simply used the sample mean
We compared the four predictors: (i) Ratio of expected values of an observation and its prediction:
Bias of the predictor:
Mean square prediction error (MSPE):
An accurate predictor will have a ratio of expect values close to 1, and a bias close to 0. A smaller MSPE represents less variability in the prediction.
5.2 Simulation results
Overall, the bootstrap bias correction procedure does tend to decrease bias, as does Cressie’s predictor, particularly as the observation grids are filled in (moving from D1 to D2 and D3 to D4). Which of the two bias correction methods works better depends on the model.
Tables 1 and 2 show the results for φ(y) y3, with both the isotropic (Table 1) and anisotropic (Table 2) covariograms. Looking at the ratios of expected values and biases, it is easily seen that all three predictors with a correction factor perform better than the naive predictor. Here, the Taylor’s expansion based predictor in (2.7) performs as well as the bootstrap predictors. This is due to the ‘nice’ behaviour of the function φ(y) y3. The Taylor expansion yields
Results for φ(y) = y3, C(
Results for φ(y) y3, C(
The same also holds for
Finally, we note that the bias and MSPE tend to decrease as the observation grid is filled in, and that the MSPEs are comparable for all four predictors, which was seen in all simulations.
Figure 1 shows the biases for 1000 realizations of all four predictors for the isotropic covariogram. It is easy to see that the bias correction factors reduce the skewness in the distribution of the biases, bringing the outliers closer to the boxes. Recall that the most bias arises when Y(s0) is far from its mean, and this effect increases as the derivative of φ(⋅) becomes large in absolute value. Figure 2 demonstrates the improvement in bias as a result of the corrected predictors, without a significant increase in the MSPE. With a transformation such as φ(y) y3, the Taylor’s expansion based bias-corrected predictor is as effective as the bootstrap predictors, as we see in Tables 1 and 2. However, the predictor


This is further illustrated by Table 3 for the Matern covariogram. A similar performance is observed also for the anisotropic case (not reported here to save space; see Rister, 2010, for details). Note that the Taylor’s expansion based predictor is still an improvement over the naive predictor, but it is not comparable to the bootstrap based predictors.
Results for φ(y) e
y
, C(
Table 4 shows the results for φ(y) e2y with the anisotropic covariogram. Here, we see that as the transformation function becomes more extreme, the predictors become less reliable. The naive predictor performs poorly, and Taylor’s expansion based predictor performs significantly worse than the bootstrap predictors. With this transformation function, we can no longer count on the later terms in Taylor expansion to be small, particularly when φ(i) (y) 2 i e2y i!. Compared to the earlier models, the accuracy of the bootstrap predictors has begun to deteriorate, although they both still outperform the existing predictors. Interestingly, the MSPE of the multiplicative predictor appears to be more volatile than the MSPE of the additive predictor. For the isotropic covariogram, similar results are seen (but not reported here). See Rister (2010) for the omitted simulation results.
Results for φ(y) e
2y
, C(
6 Oklahoma climatology data
For a real world data example, we look at a problem relating weather and energy consumption. Define Z(s) to be the ‘heating degree days’ (HDDs) at a location s, depending on the average daily outside air temperature T(s). HDD depends on a base temperature, often taken to be 65 Fahrenheit, such that
An interpretation of HDD is as follows: if one location s1 has an HDD value of 20 and another location s2 has an HDD value of 40, a building at s2 will require twice as much energy to heat as a similar building at s1. As implied by the (6.1), HDDs are calculated on a daily basis. For more on HDDs, see Ristinen and Kraushaar (2006). Cooling degree days (CDDs) is a similar measurement regarding the amount of energy required to cool a building. Generally, HDDs are higher in the winter and CDDs in the summer.
Often, HDDs are accumulated over weeks, months or seasons, to study energy usage over an extended period of time, but we will focus on a single day of observations. Figure 3 shows the HDDs for 1 January 2009 at 116 observation sites in Oklahoma, provided by the OCS (2010), a joint venture between the University of Oklahoma and Oklahoma State University that tracks weather throughout the state. Suppose we wish to predict the daily HDD measurement at the points (–98,36), (–96,36.2), (–96,35.2) and (–97.4). These sites are marked by a ‘×’ symbol in Figure 3.
HDD observations in Oklahoma, 1 January 2009, with multiplication symbols (×) representing prediction points
There appears to be a definite relationship between the spatial location and the HDD measurement, with a higher number of HDDs corresponding to the sites in the northeastern part of the state. Further, Figure 4 shows that the distribution of HDDs is highly skewed. The next step is to identify a suitable transformation function φ(⋅).

We begin by using the transformation suggested by Box and Cox (1964), which has proven to be suitable for positive variables. If negative observations were present, the Box-Cox transformation would fail, and some modified methods, such as the one suggested by Yeo and Johnson (2000), should be tried. The Box-Cox transformation function is given by
where x {x1, …, xn} and GM(x) is the geometric mean of x. The geometric mean is a constant that scales the data, and is often omitted. For the HDD data, the optimal value of λ was found to be –3.005236. The omission of the geometric mean for this data set leads to transformed observations with a small variance, causing computational problems later in the analysis. As a result, we included the geometric mean in g λ (⋅).
Next define x ≡ Z {Z(s1), …, Z(s
n
)}, Y(s) gλ Z(s)) ≡ φ–1(Z(s)) and k ≡ (GM(Z))λ–1, where λ –3.005236. Then, it is easy to check that
The distribution of transformed HDD observations is shown in Figure 5. It is clear that normality is plausible for Y(s).

Next, we calculate
where X1(s) is the longitude of s, X2(s) is the latitude and µ(s; β) β0 β1 X1 (s) + β2 X2 (s) + β3 X1(s) X2 (s). The estimated coefficients and p-values from a least squares regression model are given in Table 5, with all coefficients significantly different from 0.
Based on some exploratory work on the covariogram, we chose
where ||
Linear model summary for HDD data
WLS procedures give
Predictors for HDD data
7 Proofs
Here, we once again shall write
7.1 Proof of Proposition 4.1
First, consider (4.4). Let N ∼ N (0, 1). Then, it is easy to check that for any k ≥ 1,
where Γ(⋅) is the Gamma function. Note that by Jensen’s inequality, for any a, b ∈ (0, ∞),
Thus,
for a ≥ 0 and b ≥ 0. Further, for any integer r ≥ 2, it is easy to check that
Hence, by (4.2), it follows that for x ∈ {σn, σ0},
Note that
for
Next, fix δ ∈ (0, 1/4). Let
for n ≥ 1. Then,
Note that by the Dominated Convergence Theorem,
Similarly,
Further, by (4.1), uniformly in u ∈ (0, 1),
Hence, it follows that
Also, by the Cauchy-Schwarz inequality and Condition 4.3,
From (7.2) to (7.4), (4.4) follows. The proof of (4.5) is similar and hence is omitted.
7.2 Proof of Proposition 4.2
First, consider the multiplicative bias-corrected predictor
so that the ideal version
Note that
where
for all n ≥ n0 for some n0 n0 (K0, …, K3, α∞, δ) and K K (K0, …, K4). By similar arguments,
where r1n ≤ Kδ for all n ≥ n1 for some n1 n1 (K0, …, K3, α∞, δ) ≥ 1.
Now, consider I1n. Note that by (4.2),
Next, consider I2n. By (7.5), (7.6) and (4.2), we have
by letting n → ∞ first and then δ ↓ 0. Hence, the asymptotic unbiasedness of
The proof for
Footnotes
Acknowledgements
Research partially supported by NSF grant no. DMS–1007703 and NSA grant no. H98230–11–1–0130.
