Abstract
There has been an increasing need to reconstruct past climate from proxy records quantitatively and mechanistically. The inverse proxy modeling method stands out as a novel approach to quantitative palaeoclimate reconstructions through integrating process-based models and proxy records, representing a major progress in quantitative palaeoclimatology. It has been proposed to incorporate multiple proxy records to produce a more robust constraint on the climate parameters sought for estimation, and most of the work has been conducted using pollen records in conjunction with vegetation models. Here, I show a worked example of using paired stable oxygen and carbon isotope records of peat cellulose from one single core to infer the climate history in NE China for the last 6000 years through solving a well-posed inverse problem using Bayesian statistics. The quantitative palaeoclimate data obtained in this study may deepen our insight into the dynamics of the East Asian summer monsoon. Mean growing season temperature and relative humidity show millennial-scale fluctuations prior to c. 4000 cal. yr BP; thereafter, centennial-scale fluctuations prevailed, revealing the relative importance of solar activity over tropical ocean–atmosphere interactions in regulating the variability of regional climate during the late Holocene. It appears that there was a prominent out-of-phase relationship between temperature and relative humidity, due probably to the different response of these climate elements to orbital forcing and land cover. This worked example demonstrates the potential of using model–data fusion techniques to produce physically meaningful, mathematically optimal, and geologically sound results.
Keywords
Introduction
The overarching objective of palaeoclimatological studies is to better understand the driving mechanisms behind past climate changes so as to improve the prediction of future climate. Quantitative reconstructions of major climate variables such as temperature and precipitation from proxy records turn out to be an important step toward this end. The transfer function method stands out as a commonly followed approach in this regard since the 1970s (Birks, 1995; Bryson and Kutzbach, 1974; Huntley and Prentice, 1988; Imbrie and Kipp, 1971; Jackson, 2012; Juggins, 2013). However, this method is unable to render proxy records in climatic terms mechanistically because of the exclusion of physical models that govern the formation of the proxy records (Guiot et al., 2000).
Inverse proxy modeling (IPM) is emerging as a promising method to reconstruct past climate quantitatively and mechanistically. Essentially different from the traditional transfer function method, the IPM method generates quantitative palaeoclimate information through integrating proxy records with mechanistic models that capture the underlying physical, chemical, and biological processes forming the proxy records, thereby providing physically meaningful and geologically sound results. Assimilations of proxy data in process-based deterministic models have shown great potential in palaeoclimatological studies (Gebbie and Huybers, 2006; Gebka et al., 1999; Hargreaves and Annan, 2002; Heinze and Hasselmann, 1993; Hopcroft et al., 2009; Huybers et al., 2007; LeGrand and Wunsch, 1995; McKeague et al., 2005; Rajver et al., 1998; Widmann et al., 2010; Wunsch, 2003). This model–data fusion technique represents a major advance in the area of quantitative palaeoclimatology, and it has been increasingly used to infer past climate from pollen data in conjunction with vegetation models (Garreta et al., 2010; Guiot et al., 1999, 2000; Hatté and Guiot, 2005; Hatté et al., 2009; Peng et al., 2011; Rousseau et al., 2006; Wu et al., 2007a, 2007b).
Recent work using abiotic proxy data and mechanistic models has demonstrated the promise of using the IPM method to produce quantitative palaeoclimate information in the paradigm of Bayesian statistical inference (Yu et al., 2012). Nevertheless, using a single proxy record for palaeoclimate inference may lead to the solution of an ill-posed inverse problem. As proposed by Guiot et al. (2009), the degrees of freedom of the inverse problem could be reduced substantially if multiple proxy records were used in the solution. Here, I conduct a Bayesian inference of climate history in NE China (Figure 1) during the second half of the Holocene from peat cellulose δ18O and δ13C records within one core and mechanistic models for the fractionation of these isotope species. The results suggest that reliable palaeoclimate information may be obtained if multiple proxy records and more complex models were involved in solving a well-posed inverse problem within the framework of Bayesian statistics.

Map showing the location of Jinchuan peat bog, NE China. Filled square indicates the coring site.
Concepts of the IPM method
The principle of the IPM method is to find a set of optimal estimates of climate and environmental variables for a model by minimizing the difference between model predictions and proxy records (Guiot et al., 2000, 2009; Yu et al., 2012). Generally, this method is composed of four components (Peng et al., 2011): (1) observational data sets, (2) forward models that can reproduce the observational data, (3) a cost function with an error structure that describes the relationship between observations and model predictions with prior beliefs, and (4) an optimization algorithm to match model estimates with observations. Specifically, given a set of proxy records, if the underlying physical, chemical, biological, and geological processes are known, then we can formulate a model that can best reproduce these proxy records:
where
For the mechanistic model, the climate and environmental variables, θ, are usually poorly defined or totally unknown. These variables can be inferred in conjunction with proxy records by minimizing the model–data residuals, which in turn yield a quantitative estimate of past climate conditions known as the inverse problem (Yu et al., 2012). The inverse problem may be solved effectively using Bayesian statistics (Tarantola, 2005). In Bayesian inference, the model parameters sought for estimation are usually treated as continuous random variables that follow a certain kind of probability distribution, say p(
By virtue of the definition, each error sequence (i.e. the model–data residuals) follows a multivariate Gaussian distribution with zero mean and unknown covariance. Therefore, the likelihood function can be expressed as follows:
where │ means ‘conditional on’. The joint posterior probability of model parameter, θ, and the covariance matrix, ∑, can be expressed in accord with the rule of Bayesian inference:
where ∝ denotes ‘proportional to’, and
An important caveat in inverse problems is the uniqueness of the solution. The solution to an inverse problem may be non-unique, depending on the number of proxy records and the degrees of freedom of the model parameters (Figure 2). If only one proxy record is used in the inversion, the problem is usually severely ill-posed because the number of model parameters is much higher than that of the proxy record. As more proxy records are used, the problem tends to be less ill-posed. And a well-posed or even overdetermined problem can be attained if the degrees of freedom of the model are exactly equal or less than the number of proxy records.

Schematic diagram illustrating the performance of palaeoclimate inversion as a function of model complexity and the number of proxy records.
Data, model, and optimization technique
The Jinchuan peat cellulose δ18O and δ13C records
The δ18O and δ13C records used in this study are derived from a peat bog (42°20′N, 126°22′E) located to the west of the Jinchuan Township of Huinan County, Jilin Province, NE China (Figure 1). The area of this bog is about 100 ha, and the average elevation is about 600 m a.s.l. The peat bog originated in a barrier maar lake formed by volcanic activity during the Middle Pleistocene, and it represents the terminal stage of a lacustrine environment. Lying in the northern temperate zone and modulated by the East Asian monsoon, climate in this area exhibits a remarkable seasonality with a warm-humid summer and a cool-dry autumn. Annual mean temperature is c. 4°C, and it varies between 3°C and 5°C interannually. Annual precipitation varies from 436 to 1020 mm with an average of c. 704 mm, most of which (61%) occurs in summer months. Influenced by air masses from the western Pacific Ocean, annual mean relative humidity remains fairly constant at 70 ± 3%.
A c. 6-m-long core was taken from the center of the bog (Figure 1). The chronology of the core is supported by five conventional radiocarbon ages dated on detrital peat cellulose (Hong et al., 2000). These ages are well ordered along the core, revealing a constant accumulation rate of c. 1.0 mm/yr. The δ18O and δ13C values were measured on alpha cellulose extracted from the same core. These records are presented in Figure 3. Readers are referred to Hong et al. (2000, 2001) for the full account of these records. The δ18O record has been interpreted as a proxy of regional surface air temperature (Hong et al., 2000), while the δ13C record has been presumed as reflecting variations in humidity or precipitation (Hong et al., 2001) during the last 6000 years. Both records reveal two major periods of climate shift as well as several minor ones superimposed on this trend. Spectral analyses of both records indicate that centennial-scale fluctuations in regional climate may have been modulated by solar activity. However, the chronological control of these records may not be firm enough to tackle the climatic variability at this timescale, and the validity of temperature dependence of δ18O enrichment in a sedge leaf is lacking (Oldfield, 2001). In addition, there might be a species-specific effect on the variability of stable isotope records derived from bulk peat cellulose (Stebich et al., 2011). However, this appears not to be the case for the records used in this study because analyses of plant remains reveal that the marsh community along the core was dominated by Carex species (Hong et al., 2000), an annual vascular C3 sedge having shallow rooting systems that can make it sensitive to the variation of soil moisture or precipitation.
List of variables used in the models of stable oxygen and carbon isotope fractionation in a sedge leaf.
RuBP: ribulose-1,5-bisphosphate.

(a) Sedge peat cellulose stable oxygen and (b) carbon isotope records from Jinchuan peat bog, NE China. Triangles denote radiocarbon age control points. Note that these records are based on the same chronology.
Model of stable oxygen and carbon isotope fractionation in sedge leaf cellulose
The model used in this study is composed of two modules: one deals with the progressive enrichment of the heavy oxygen isotope in leaf cellulose with respect to that in source water associated with the kinetic, equilibrium, and biochemical fractionations during evapotranspiration and photosynthesis (Dongmann et al., 1974; Roden et al., 2000), and the other tackles the continuous depletion of the heavy carbon isotope relative to that in the atmospheric CO2 during photosynthesis and respiration (Farquhar et al., 1989; White et al., 1994). Specifically, the stable oxygen isotopic composition of a sedge leaf, δ18Oc, can be expressed as follows:
And the stable carbon isotopic composition of a sedge leaf, δ13Cc, can be written as follows:
Variables defined in the above equations are listed and described in Table 1.
The Metropolis–Hastings-within-Gibbs algorithm
To reduce the degrees of freedom of the model parameters, all of the environmental variables were parameterized in terms of mean growing season temperature, t, and relative humidity, h, except for the δ13C and the concentration of atmospheric CO2, which can be readily obtained from ice-core data for the Holocene (Elsig et al., 2009; Indermühle et al., 1999) and for the last 2000 years in a higher resolution (Francey et al., 2002; Trudinger et al., 2002). Therefore, the inversion turns out to be a well-posed problem that can be solved in the Bayesian paradigm. For the prior distribution of the climate parameters, the weak informative uniform distribution was used, while for the variance, the inverse gamma (IG) distribution was employed (Appendix 1). Acknowledging the complexity of the joint posterior probability distribution of the unknown parameters, the MCMC method was used to simulate their marginal posterior probability distribution (Gilks et al., 1996). Specifically, for the climate parameters, t and h, a random-walk white noise process was used to generate Markov chains, and the acceptance of the move of the chains was evaluated using the Metropolis–Hastings algorithm (Hastings, 1970). If the move of the chains for these two parameters is acceptable, then the variance was updated sequentially using a Gibbs sampler, which was given in an analytical form (Appendix 2). The inversion was implemented in the MATLAB® environment using some built-in functions. Codes are available upon request.
Results
The marginal posterior probability distributions of climate parameters, t and h, as well as the variance were simulated using Markov chains of 100,000 iterations with a burn-in period set to 5000. As the iteration continues, the chains gradually deviate from their initial state. After the burn-in period, the chains eventually converge and reach a stationary state. Samples were then drawn from the chains to generate their conditional posterior distributions. To reduce the storage requirement and to remove the autocorrelation of the samples, the chains were thinned out by keeping only every 10th samples during the move. Figure 4a and b shows the gradual convergence of the Markov chain of variance,

The Markov chain and histogram built up from the chain that mimics the conditional posterior probability distribution of the covariance of model–data residues.
Given the high-dimensional nature, the conditional posterior probability distribution of climate parameters, t and h, cannot be visualized using histograms as done for the variance. Instead, the statistical moments such as the mean and standard deviation of these variables were calculated, and the results were plotted against a calibrated radiocarbon chronological scale in Figure 5. The inferred mean growing season temperature (Figure 5a) and relative humidity (Figure 5b) exhibit disparate patterns of variability, and they are generally in anti-phase at both centennial and millennial time scales. Specifically, mean growing season temperature exhibits remarkable fluctuations at a millennial timescale prior to c. 4000 cal. yr BP. After c. 4000 cal. yr BP, centennial-scale oscillations superimposed on a cooling trend prevailed. In contrast to this pattern, relative humidity continued to decrease until about 3000 cal. yr BP, and then turned to rise toward the present. Superimposed on this general trend, relative humidity also exhibited remarkable variability in a frequency band shifting from millennial to centennial scale across c. 4000 cal. yr BP.

(a) Inferred MGST and (b) relative humidity from the stable oxygen and carbon isotope records of the Jinchuan peat bog, NE China. Note the shift in oscillatory mode across c. 4000 cal. yr BP and the out-of-phase relationship of these two climate variables.
Discussion
Quantitative palaeoclimate information about mean growing season temperature and relative humidity in NE China during the last 6000 years was obtained by integrating peat cellulose δ18O and δ13C records with mechanistic models for the fractionation of these isotope species. Both temperature and humidity show millennial-scale fluctuations prior to c. 4000 cal. yr BP, primarily modulated by large-scale ocean–atmosphere interactions in the tropical Pacific (Moy et al., 2002; Stott et al., 2002; Yancheva et al., 2007). For example, the prominent climate event at c. 5000 cal. yr BP might be an imprint of the onset of modern El Niño-Southern Oscillation (Rodbell et al., 1999; Sandweiss et al., 2001). The climate event at c. 4000 cal. yr BP may have exerted a traumatic impact on the society and economy of Neolithic people in North China as marked by the demise of rice-dominated agriculture and the emergence of dynamic civilization (Wu and Liu, 2004). From c. 4000 cal. yr BP, the variability of regional climate is characterized by centennial-scale fluctuations, presumably driven by solar activity in this frequency band (Hong et al., 2000, 2001).
The inferred palaeoclimate data provide a fresh look at the dynamics of the East Asian summer monsoon (EASM), particularly the phase relationship between temperature and humidity, because they are based on the same chronology. Driven by the thermal gradient between tropical oceans and the Eurasian landmass, the EASM is an important component of global climate system (An, 2000). Seasonal shifts in the wind pattern impose a great influence on the hydrological cycling over a large area ranging from maritime to inland China and its neighboring countries. Synthesizing published proxy records revealed a time-transgressive nature of the EASM (An et al., 2000; He et al., 2004). For example, maximum monsoonal precipitation occurred during c. 10,000–8000 14C yr BP in NE China, 7000–5000 14C yr BP in central China, and c. 3000 14C yr BP in southern China. However, Wei and Gasse (1999) examined lacustrine δ18O records from the Tibetan Plateau and western China, and they found that maximum monsoonal precipitation have occurred synchronously in these areas. Precisely dated proxy records and reliable quantitative climate reconstruction are the two key prerequisites to address this debate. Although the inferred temperature and humidity records are not long enough to cover the entire Holocene, they do provide a new perspective on this debate. Humidity decreased progressively until 3000 cal. yr BP, and then, it turned to rise, primarily modulated by orbital insolation (Berger and Loutre, 1991). This trend was also revealed by speleothem δ18O records (Dykoski et al., 2005; Fleitmann et al., 2003), which are assumed to indicate monsoonal precipitation. However, temperature tends to be out of phase with humidity, revealing a different response to orbital forcing and probably land cover. Therefore, humidity or precipitation appears to be a more appropriate proxy of the EASM than temperature.
The quantitative reconstruction of regional climate history in this study was implemented within the framework of Bayesian statistical inference using the MCMC method. It is a non-gradient algorithm that performs a global search for the optimization through minimizing the difference between model estimates and all of the observations simultaneously, and thus, it can identify the global minimum for the cost function that may possess multiple minima. Therefore, the results appear to be mathematically optimal. This can be measured by the variance of the model–data residuals. From the conditional posterior distributions of the variance (Figure 4), it can be seen that the modal value of both
Fundamentally different from the transfer function method, the IPM method offers a novel approach to climate reconstructions from proxy records and mechanistic models. The most significant advantage of this method is that it allows the underlying chemical and physiological processes governing the formation of the proxy records to be included, thereby yielding physically meaningful and geologically sound results. The IPM method also offers a flexibility of making use of multiple proxy records and prior information. This would relax the limitation of palaeoclimatic reconstructions using single proxy record, particularly when the proxy record has equivocal indicative meanings in climatic terms. For example, peat cellulose δ18O has been widely interpreted as an indicator of temperature (Hong et al., 2000, 2009; Xu et al., 2006), while δ13C has been thought to reveal humidity or precipitation (Hong et al., 2001, 2003, 2005, 2010). This appears to be true to the most extent. Figure 6 shows expected results of the ‘models’ used here to link δ18O and δ13C records to climate parameters. The reconstructed temperature signal mimics both the general trend and specific details of the δ18O record very well (Figure 6a), and the humidity signal strongly resembles the δ13C record (Figure 6b). Nevertheless, correlation analyses show that the δ18O record is closely correlated not only to temperature (Figure 7a) but also to relative humidity (Figure 7b). Actually, in monsoonal area, the δ18O of source water is more correlated to precipitation than temperature known as the ‘amount effect’ (Araguás-Araguás et al., 1998; Johnson and Ingram, 2004; Vuille et al., 2005). Therefore, attention should be paid when interpreting δ18O records in this area. On the contrary, the δ13C record shows less relevance to temperature (Figure 7c), but it is highly correlated to relative humidity (Figure 7d), confirming previous interpretation that peat cellulose δ13C record is a reliable proxy of summer monsoon intensity (Hong et al., 2001, 2003, 2005, 2010).

Comparison between reconstructed climate variables with proxy records. (a) MGST versus stable oxygen isotope record and (b) relative humidity versus stable carbon isotope record.

Diagrams showing the dependence of stable oxygen and carbon isotopes on climate variables. (a) Stable oxygen isotopes versus mean growing season temperature, (b) stable oxygen isotopes versus relative humidity, (c) stable carbon isotopes versus mean growing season temperature, and (d) stable carbon isotopes versus relative humidity.
Last but not least, a common caveat with the inverse problem is the robustness of the solution (Tarantola, 2005). As demonstrated in this study, a more robust or even unique constraint on the unknown climate variables could be obtained, if the degrees of freedom of the mechanistic model were reduced through an appropriate parameterization scheme and/or including multiple proxy records. Therefore, in addition to the development and acquisition of proxy records from various archives, an urgent need in quantitative palaeoclimatology is the thorough understanding of modern processes. This could not only deepen our insight into the underlying physical, chemical, and biological processes and thus help us interpret the proxy records mechanistically but also foster the development and improve the parameterization of process-based models that can be used to integrate with proxy records to produce quantitative palaeoclimate information.
Conclusion
A Bayesian inversion of climate history in NE China for the last 6000 years was conducted by integrating peat cellulose stable oxygen and carbon isotope records with process-based models for the fractionation of these isotope species, and quantitative palaeoclimate information, such as mean growing season temperature and relative humidity, was obtained. These data, based on the same chronology, provide a new perspective on the dynamics of the EASM:
There is a significant shift in the oscillatory mode of regional climate from millennial to centennial scales across c. 4000 cal. yr BP. Prior to c. 4000 cal. yr BP, both temperature and relative humidity exhibited millennial-scale oscillations, while after c. 4000 cal yr BP, centennial-scale fluctuations dominated the variability. This shift in the frequency band reveals the relative importance of solar activity over tropical ocean–atmosphere interactions in regulating regional climate changes during the late Holocene.
The changes in mean growing season temperature are out of phase with relative humidity. Relative humidity continued to decrease until 3000 cal. yr BP; thereafter, it turned to rise toward the present. This trend represents the long-term variability of the EASM essentially modulated by orbital insolation. However, mean growing season temperature tends to deviate from this trend significantly, due likely to the complex response to orbital forcing and land cover.
Footnotes
Appendix 1
Appendix 2
Acknowledgements
My gratitude is extended to Professor YT Hong for generously providing me with the Jinchuan peat cellulose stable oxygen and carbon isotope records. I thank the two anonymous reviewers for their constructive comments and suggestions.
Funding
This research was funded by the National Basic Research Program, of China (grant no. 2013CB955903), the ‘100-Talents’ Program, and the Strategic Priority Research Program (grant no. XDA05120401) of the Chinese Academy of Sciences.
