Abstract
Use of multi-model ensembles from global climate models to simulate the current and future climate change has flourished as a research topic during recent decades. This paper assesses the performance of multi-model ensembles in simulating global land temperature from 1960 to 1999, using Nash-Sutcliffe model efficiency and Taylor diagrams. The future trends of temperature for different scales and emission scenarios are projected based on the posterior model probabilities estimated by Bayesian methods. The results show that ensemble prediction can improve the accuracy of simulations of the spatiotemporal distribution of global temperature. The performance of Bayesian model averaging (BMA) at simulating the annual temperature dynamic is significantly better than single climate models and their simple model averaging (SMA). However, BMA simulation can demonstrate the temperature trend on the decadal scale, but its annual assessment of accuracy is relatively weak. The ensemble prediction presents dissimilarly accurate descriptions in different regions, and the best performance appears in Australia. The results also indicate that future temperatures in northern Asia rise with the greatest speed in some scenarios, and Australia is the most sensitive region for the effects of greenhouse gas emissions. In addition to the uncertainty of ensemble prediction, the impacts of climate change on agriculture production and water resources are discussed as an extension of this research.
Keywords
I Introduction
In recent decades there has been growing interest into the assessment of historical, current and future climate variability and trends (Miao et al., 2012; Zanis et al., 2009). The fourth assessment report (AR4) of the Intergovernmental Panel on Climate Change (IPCC) gives evidence that the global mean surface air temperature (SAT) rose 0.74° ± 0.18°C during 1906–2005 (IPCC, 2007). AR4 also indicates that during the 21st century the global surface air temperature is likely to rise further by 1.1–2.9°C, assuming the lowest emissions scenario, and 2.4–6.4°C for the highest (Meehl et al., 2007). Global warming inevitably has a major impact on natural and anthropogenic systems at local and regional scales. In order to mitigate and adapt the risk of climate change, decision-makers in governments, non-governmental organizations and the general public need detailed information on present and future climate.
Global coupled Atmospheric-Ocean General Circulation Models (coupled GCMs) are the modelling tools traditionally used in theoretical investigations of the mechanisms of climatic changes (Covey et al., 2003). Using GCMs it is possible to distinguish natural climate variability from anthropogenic effects. More than 20 GCMs with different resolutions have been developed by several research groups around the world. It is generally accepted that the agreement of models with present observations is currently the only way to assign confidence into the quality of a model (Errasti et al., 2011; McAvaney et al., 2001), and the better a model performance in reproducing present-day climate, the higher the reliability of the climate change simulation (Coquard et al., 2004; Giorgi and Mearns, 2002). Three possible forcing agents have been identified as contributors to 20th-century global warming (Li et al., 2007). These include anthropogenic greenhouse gas forcings (CO2, CH4, N2O, etc.), natural forcings (sulfate aerosols and ozone change) and the internal variability of climate system itself (North Atlantic Oscillation, NAO and El Niño–Southern Oscillation, ENSO). However, because of our limited computational resources, incomplete understanding of the way climate is expected to react (Kiktev et al., 2003), simplifying assumptions in model construction (Phillips and Gleckler, 2006), epistemic or aleatory uncertainties in model parameterization (Duan and Phillips, 2010) and unreasonable natural forcing (Li et al., 2007; Zhou and Yu, 2006), GCMs are not expected to represent the climate characteristics with the same intensity and frequency as in observed data. For instance, it has been shown that warming trends over western Europe are much stronger than expected from climate model projections (Van Oldenborgh et al., 2009). Phillips and Gleckler (2006) found that the AR4 GCMs exhibit systematic model bias, with most models displaying aggregated precipitation variability magnitudes larger than observed.
To maximize GCM value, research increasingly focuses on improving GCM predictions. It is generally believed that multi-model ensembles are superior to single models, and that the ensembles may even outperform the best single participating model (Buser et al., 2009; Duan and Phillips, 2010; Fischer et al., 2012; Gleckler et al., 2008; IPCC, 2001; Lambert and Boer, 2001; Phillips and Gleckler, 2006; Pincus et al., 2008; Reichler and Kim, 2008; Weigel et al., 2008). Because single models are overconfident (Weigel et al., 2008), multi-model ensembles contain the information from all participating models and embrace distinctly different physical parameterizations (Pincus et al., 2008), and consequently moderate the uncertainties arising from different parameterizations and dynamical cores in the different GCMs (Zanis et al., 2009). The simplest multi-model ensemble is simple model averaging (SMA), which can be generated by combining the individual model output with equal weights (Duan and Phillips, 2010; Hagedorn et al., 2005). The empirical benefits of multi-model averaging for enhancing the accuracy of GCMs have motivated the development of more sophisticated averaging approaches wherein the participating single-model contributions are weighted according to their prior performance (e.g. Doblas-Reyes et al., 2005; Rajagopalan et al., 2002; Robertson et al., 2004; Stephenson et al., 2005; Yun et al., 2003).
Among these sophisticated averaging approaches, Bayesian model averaging (BMA) is one of the most promising methods. BMA ‘allows you to start with what you already believe (prior) and then see how new information (likelihood) changes your confidence in that belief (posterior)’ (Malakoff, 1999). BMA is adaptive in the sense that recent realizations of the forecast system are used as a training sample to carry out calibration (Yang et al., 2011a). In addition, BMA is also a method of combining forecasts from different sources into a consensus probability density function (PDF) (Fischer et al., 2012). Due to its advantages, BMA methods have been widely used in various fields of studies, including uncertainty analysis (Zhang et al., 2009), global precipitation simulation (Phillips and Gleckler, 2006) and global surface air temperature (Min and Hense, 2006, 2007) in the 20th century, the spatial distribution of global temperature (Duan and Phillips, 2010), runoff prediction (Duan et al., 2007; Freer et al., 1996), soil moisture (Bou-Zeid and Ei-Fadel, 2002), drought events (Vidal and Wade, 2009) and extreme precipitation (Yang et al., 2011a).
Most of these previous studies focused mainly on the spatial characteristics of climate variables. A detailed comparison among different regions is rare, however, and mostly limited to the 20th century. Indeed, the future trend of climate change within the annual and decadal scales may be more helpful for mitigating, and adapting to, the risk of climate change. This study aims to: (1) compare and evaluate the temporal and spatial changes in global terrestrial SAT using the outputs from GCMs, SMA and BMA; and (2) project the temporal change in global terrestrial SAT in the 21st century using BMA. The uncertainty of the BMA process will be discussed. We intend to contribute to a better understanding of simulation and projection of global terrestrial SAT, while also providing a stronger base for decision-making concerning environmental management and carbon emissions.
II Data and methods
1 Data
Observations of monthly SAT over the global land surface were taken from the National Centers for Environmental Prediction (NCEP) reanalysis data set (available at http://www.cdc.noaa.gov/cdc/data.ncep.reanalysis.derived.html). The resolution of NCEP reanalysis data is 2.5° × 2.5°, and the timeframe in this research is 1960–1999.
IPCC-AR4 used 24 GCMs results as part of the Coupled Model Intercomparison Project Phase 3 (CMIP-3) of the World Climate Research Programme (WCRP) (Table 1). The data include the results of the 20th-century (20C3M) experiment and future climate simulations (2000∼2099) under different emissions scenarios (available at http://esg.llnl.gov:8443/index.jsp). Three main future scenarios, which are derived different assumptions of future social and technological development, were represented as ‘low’ (B1), ‘medium’ (A1B) and ‘high’ (A2) scenarios in terms of cumulative greenhouse gas emissions. As some future scenarios have not been provided by all GCMs, here we only select the data of the first 17 GCMs in Table 1 to carry out the multi-model ensemble estimation and projection. For comparison purposes, all GCMs results were regridded to the same resolution as that of the observed data (2.5° × 2.5°grid).
List of the global climate models in IPCC-AR4.
2 The methodology of simple model averaging (SMA)
In the method of SMA, each model gets the same weight (wk = 1/K, where K is the number of models) in the multi-model forecast. This is the most common method in single-model ensemble weighting (Katz and Ehrendorfer, 2006). Within SMA any knowledge about the performance of the model is neglected (Casanova and Ahrens, 2009).
3 The methodology of Bayesian moving average (BMA)
a General formulation
BMA provides a way to combine different models to a multi-model, and is a promising method for calibrating ensembles in modelling and forecasts (Yang et al., 2011b). BMA has been proposed as a popular technique to correct under-dispersion during ensemble forecasts (Duan and Phillips, 2010; Min and Hense, 2006; Raftery et al., 2005; Yang et al., 2011a). The output of BMA is a probability density function (pdf), which is a weighted average of pdfs centred on the bias-corrected forecasts. BMA weights reflect the relative contributions of the component models to the predictive skill over a training sample. The combined forecast pdf of a variable y is:
where p(y|Mk
,yT
) is the forecast pdf based on model Mk
alone, estimated from the training period of observations y
T, K is the number of models being combined and p(Mk
|yT
) is the posterior probability of model Mk
being correct given the training data. This term is computed with the aid of Bayes’ theory:
Because BMA methodology assumes that the model forecasts are unbiased, the technique of bias-correction should be applied in advance. If the linear regression of y on fk is employed, y=ak *fk +bk . Unique coefficients ak and bk for each model k are determined using a least squares approximation, with the observations in the training period yT as the dependent variable and the forecast as the explanatory variables.
Considering the application of BMA to bias-corrected forecasts from the k model, equation (1) can be rewritten as:
where wk
= p (Mk
|yT
) is the BMA weight for model k computed from the training data set and reflects the relative performance of models k on the training period. The weights wk
add up to 1. The conditional probatilities pk
[y|(fk
,yT
)] may be interpreted as the conditional pdf of y given fk
(i.e. model k has been chosen) and training data yT
. These conditional pdfs are assumed to be normally distributed as:
where the coefficients ak and bk are estimated from the bias-correction procedures described above. This means that the BMA predictive distribution becomes a weighted sum of normal distribution with equal variance but centre at bias-corrected forecast which can also be obtained from the BMA distribution using the conditional expectation of y given the forecasts:
b Model weights
The BMA weights and the variance σ 2 are estimated using maximum likelihood (Raftery et al., 2005). For given parameters to be estimated, the likelihood function is the probability of the training data and is viewed as function of the parameters (Yang et al., 2011b). The weights and variance are chosen so as to maximize this function (i.e. the parameter values for which the observation data were most likely to have been observed). The algorithm used to calculated the BMA weights and variance is called the expectation maximization (EM) algorithm (Dempster et al., 1977). The method is iterative and converges to a local maximum of the likelihood. The detailed description of the BMA method is provided by Raftery et al. (2005).

Implementation of Bayesian multi-model averaging processes: (A) posterior distribution of variable y for each model; (B) normalized likelihood of model giving the correct response; (C) model forecasts weighted by normalized likelihood; and (D) weighted forecast summed to form multi-model density.
c Training and projecting period, and the standard deviation
In this study, we used a 40-year period (1960–1999) to train BMA weights for 17 GCM models under the 20C3M emission scenario. Figure 1 illustrates how BMA produces a multi-model forecast pdf during the training period. The training data include the annual and decadal mean SAT, and the focus area includes global and regional scales. The decadal mean SAT was obtained by the 10-year moving average. We then multiplied the BMA weights for each model during the training period by the outputs of corresponding models during the rest period (2000∼2099) to generate the present and future scenarios of climate change. We applied the coefficients ak and bk to the rest of the period for bias-correction. For a given variable y, the ensemble standard deviation is computed as follows (Arzhanov et al., 2012):
4 Skill score
We used two skill score metrics to evaluate the overall agreement between the observed SAT data (O) and forecasted results (F) (GCMs and their ensemble predictions). The first is the Nash-Sutcliffe model efficiency (E) (Nash and Sutcliffe, 1970), which is commonly used to quantitatively assess the accuracy of model outputs. It is defined as:
where E can range from –∞ to 1, and E ∼ 1 indicates more skill overall, E = 0 indicates the model predictions are as accurate as the mean of the observed data, whereas an efficiency less than zero occurs when the observed mean is a better predictor than the model (Warner et al., 1997).
The second metric is the Taylor diagram, which provides a way of summarizing graphically how closely a pattern (or a set of patterns) matches observations (Taylor, 2001). The similarity between simulation and observation is quantified in terms of their correlation (R), their centred root-mean-square-error (RMSE) and the amplitude of their standard deviations (Std). Three statistics of simulations and observation are plotted on a polar style graph, the radial distances from the origin to the simulation are proportional to the Std, and the azimuthal positions give the correlation coefficient between the simulation and observation. The reason that each point in the two-dimensional space of the Taylor diagram can represent three different statistics simultaneously (RMS difference, R and Std) is that these statistics are related by the following formula:
where
5 The spatial scale of assessment
In addition to the simulation and projection of SAT on the global land scale, 10 regions at finer scale based on the characteristics of observed SAT were also involved in this research (Figure 2). They are Antarctica (ANT), Greenland (GRL), northern North America (NNA), southern North America (SNA), South America (SAM), Africa (AF), Europe (EUR), northern Asia (NAS), southern Asia (SAS) and Australia (AUS). NNA and SNA are divided by the 49°N latitude, and NAS and SAS are separated at 50°N latitude.

Maps of the 1960–1999 climatology of annual mean continental SAT (in units of °C) from the NCEP/NCAR reanalysis data (left) and positioning of the 10 analysed regions (right).
III Results
1 Spatial simulation
Figure 3 shows the error map of each simulation for the 40-year average (1960–1999) global mean SAT simulation. From this map, it is easy to find that the performances of different models are disparate among regions. On the global scale, GCM 1 (bccr_bcm2_0) and GCM 4 (csiro_mk3_0) underestimate the SAT during the period of 1960–1999, but GCM 5 (csiro_mk3_5) overestimate the SAT. Almost all models have poor performance in the regions of ANT, EUR and GRL. Generally, the models from MPI (GCM 13) and NCAR (models 15 and 16) have the best performance; the grey area of the error map is the largest. BMA results exceed any model, even for the best model.

Error map of each simulation for the 40-year average (1960–1999) global mean SAT simulation (in units of °C).
With the help of skill scores, it is easier to judge the performance of each simulation (Figure 4). It is shown that all the E values are invariably over 0.9 among the 19 simulations (17 AR4 GCMs and SMA, BMA). This result indicates that the majority of the AR4 GCMs provide satisfactory simulations of the relative spatial distribution of 40-year average global mean SAT during 1960–1999. The highest efficiency value proves once again that GCM 13 (mpi_echam5), GCM 15 (ncar_ccsm3_0) and GCM 16 (ncar_pcm1) are the most accurate. They even overmatch the SMA results and greatly approximate to the BMA simulation. In term of multi-model ensemble prediction, BMA can improve the simulated accuracy further when compared to the SMA results, and hence gives a higher model efficiency. Using a Taylor diagram, Figure 4 still shows the performances of ensemble predictions in simulating 40-year average regional mean SAT. It is found that the ensemble prediction in Australia (AUS) and northern North America (NNA) are the best among 10 regions (closest distance to the observed point). Moreover, BMA can improve the simulation results in southern Asia (SAS), southern North America (SNA), northern North America (NNA), Europe (EUR) and Antarctica (ANT) when compared to SMA results.

The results of Nash-Sutcliffe efficiency (left) and Taylor diagram (right) for spatial distribution simulation. The Nash-Sutcliffe efficiency aims at spatial distribution simulation of 40-year global average (1960–1999) mean SAT, and E values for model numbers 18 and 19 (reddish colour) are the SMA and BMA results, respectively. The Taylor diagram aims at spatial distribution simulation of 40-year regional average (1960–1999) mean SAT. In a Taylor diagram, the radial distance from the origin represents the standard deviation ratio of simulated SAT to that of observations, and the pattern correlation between two fields is given by the azimuthal position. The nearer to the OBS point in a Taylor diagram, the closer the observations to the simulation. In addition, the denseness of points in the diagram denotes the magnitude of diversity among different models. A denser distribution of points shows smaller diversity among models, and vice versa.
2 Temporal simulations
Figure 5 shows the temporal performance of AR4 GCMs and their ensemble predictions in simulating annual global mean SAT. Most of the GCMs underestimated the annual global mean SAT if lacking the bias-correction. By comparison with Figure 4, Figure 5 shows that the GCMs are less accurate in temporal simulation than spatial simulation, with E < 0.5 and R < 0.7. GCM 14 (mri_cgcm2_3_2a) ranked first among 17 GCMs. The best models are different in simulating spatial distribution and temporal dynamic of global SAT, which is partly affected by the bias-correction process before BMA. Figure 5 also indicates that the single GCM can approximate the trend of global land SAT, but the assessment of accuracy is relatively weak. Compared to the single GCM, ensemble prediction can improve the temporal simulation in annual global mean SAT. Also, the performance of BMA is better than SMA with a higher E value and better performance in the Taylor diagram.

The performance of simulated annual global mean SAT. Top: the trend of global mean SAT from raw model output without bias-correction. Centre: the Nash-Sutcliffe efficiency results after bias-correction. Bottom: the Taylor diagram result after bias-correction (bottom).
Figure 6 presents the comparison between the observed and the BMA simulated SAT trend on annual and decadal scales. All BMA results (no matter whether on global or regional scales) cannot reflect accurately the actual annual SAT dynamic, although the increasing trend was caught. However, the BMA result can identify the 10-year moving average SAT on the global scale. Among 10 regions, the best BMA performance of 10-year moving average simulation appears in AUS, with the highest R and E values. However, the performance in SAM and GLR is not acceptable, with unsatisfactory R and E values. The shading in Figure 6 is two standard deviation of BMA results, implying the uncertainty of simulation. The value is ±0.33°C/decade on the global scale. Among 10 regions, the lowest uncertainty of BMA results appears in AF (±0.38°C/decade), followed by AUS (±0.46°C/decade), SNA (±0.53°C/decade), SAM (±0.68°C/decade), SAS(±0.69°C/decade), ANT (±0.95°C/decade), EUR (±1.06°C/decade), GRL (±1.11°C/decade), NAS (±1.13°C/decade) and NNA (±1.19°C/decade). In addition, the uncertainty range in some regions has not incorporated completely the observed SAT series, such as SAM.

Comparison of observed global mean SAT and BMA results in different regions. Annual global mean SAT change (dotted red line) with 10-year moving average (thick red line), and its BMA results for annual global mean SAT (dotted blue line) and 10-year moving average (thick blue line). Shading denotes the ±2 standard deviation range of BMA results with 10-year moving average.
3 SAT projection over the 21st century
Based on the BMA process in the period 1960–1999, the weight of each GCM was obtained. According to these weights and the different GCMs scenarios data set, BMA simulation of SAT projection over the 21st-century scenario was generated. In this study, we calculated the BMA simulation of three main future emission scenarios (B1, A1B and A2) and plotted them in Figure 7. The SAT on the decadal scale is projected only, due to the poor performance on the annual scale. On the global scale, the BMA simulation shows that the global land SAT increases greatly in the 21st century. On average, the global land SAT rose by 2.20°C/100 yr, 3.63°C/100 yr and 4.39°C/100 yr for the B1, A1B and A2 scenarios, respectively. Among three scenarios, the rise of decadal global mean SAT in NAS is the most significant among the 10 regions, increasing by 3.12°C/100yr, 5.22°C/100yr and 6.08°C/100yr for B1, A1B and A2 scenarios, respectively. Figure 7 also illuminates that the most sensitive region impacted by greenhouse gas emissions is AUS. The warming rates in the A1B and A2 scenarios are about 2.2 and 2.7 times, respectively, that of the B1 scenario. Turning to the uncertainty of projected SAT, we find that the uncertainties of SAT projected by BMA in the three future emission scenarios are similar, regardless of scale. The uncertainty of SAT projected by BMA in the 21st century increases with time.

SAT projection over the 21st-century by use of the BMA method. The evolutions for three emission scenarios are shown. The evolution was derived by applying a 10-year moving average. Solid lines are Bayesian multi-model global averages of surface warming under the scenarios A2 (red line), A1B (green line) and B1 (blue line). The coloured text indicates the average rate of warming under the scenarios A2 (red), A1B (green) and B1 (blue). Shading denotes the ±2 standard deviation range of BMA results with 10-year moving average.
IV Discussion and conclusions
Use of multi-model ensembles from global climate models to simulate climate change has flourished as an important topic of scientific and public interest during recent decades. However, most previous studies focused mainly on the spatial characteristics of climate variables in the 20th century, and few global climate models were used. The use of more ensemble members provides more information and thus optimizes the accuracy of simulations (Zanis et al., 2009). We compared and evaluated the spatiotemporal changes in global land SAT using the outputs from GCMs, SMA and BMA, and projected future global land SAT trends for different emission scenarios. The major findings of this study are summarized and discussed as follows: By simulating the global land SAT from an ensemble of 17 CIMP3-GCMs, BMA ensemble prediction is better at fitting spatial than annual temporal behaviour. Single GCM can reproduce the spatial distribution of global land SAT accurately; however, the improvement by BMA ensemble prediction against the single GCM is more significant when simulating the temporal rather than the spatial distribution. GCM 13 (mpi_echam5), GCM 15 (ncar_ccsm3_0) and GCM 16 (ncar_pcm1) in this study well matched the spatial distribution of global land SAT simulated by SMA, and approximated the BMA results because the single GCM has already performed well at simulating the spatial distribution of global land SAT (Figures 3 and 4), which reduced the ensemble prediction’s ability for improvement. Even so, the ability of BMA to reproduce the historical climate is better than that of SMA generally. In addition, BMA simulation can demonstrate the SAT trend on the decadal scale, but the annual assessment of accuracy is relatively weak because the GCMs in 20C3M were run with a free-running ocean. Perhaps the coupled atmosphere-ocean processes could improve the performance of annual SAT. Hence, direct use of the short-term output from a single GCM or ensemble prediction of GCMs is not recommended. In order to model the short-term dynamic series, some effective techniques to improve the regional simulation accuracy should be considered in advance, such as statistical downscaling (Jia et al., 2011; Moriondo and Bindi, 2006) and regional climate models (RCMs) driven by the GCM output as their boundary conditions (Buser et al., 2009; Nieto and Rodríguez-Puebla, 2006). By simulating the regional SAT from an ensemble of 17 CIMP3-GCMs, BMA ensemble prediction performs best in the AUS, regardless of spatial distribution or interdecadal dynamics. The AR4 GCMs were used to simulate climate change as a result of slow alterations to certain boundary conditions and forcing parameters, which affect the energy balance at the global scale. Some special regions with complicated climate-forming processes (e.g. northern Europe and Greenland) cannot be accurately described by the climate model at the global scale. Consequently, not all GCMs are able to provide a similarly accurate description of the present climate (Errasti et al., 2011; Im et al., 2007; Masson and Knutti, 2011; Nieto and Rodríguez-Puebla, 2006), which impacts the consistency of BMA ensemble performance on the global scale. Thus, accuracy validation of GCMs should be carried out before applying them in different regions. The BMA forecast would be better than SMA or any one member, since it is determined from an ensemble distribution that has its first and second moments bias-corrected using recent verification data for all the ensemble members. It is essentially an ‘intelligent’ consensus forecast, weighted by the recent performance results for the component models. In addition, the uncertainty of ensemble predicted global land SAT is about ±0.33°C at the global scale, which is obviously narrower than that of a single climate model (IPCC, 2007), which implies that ensemble prediction by BMA can effectively reduce the range of uncertainty. Giorgi and Francisco (2000b) believe uncertainties in climate change predictions come from four sources: the forcing scenarios, the use of different GCMs, the predictions by different realizations of a given scenario with a given GCM, and sub-GCM grid scale forcing and processes. Uncertainty can be evaluated by performing several simulations with the same model (Giorgi and Francisco, 2000a) and using different climate models with the same forcing (Crossley et al., 2000). BMA yields posterior probability distributions of all the uncertain quantities of interest, and these posterior distributions provide a wealth of information about GCM reliability and temporal (present to future) correlations (Tebaldi et al., 2005). Because of the advantages of BMA, it is considered an extension and elaboration of the reliability ensemble averaging method. The GCMs generated by different research groups have different resolutions (Table 1). It is believed that higher resolution does not automatically lead to improved model accuracy (Kiktev et al., 2003; Miao et al., 2012), and the present research confirms this view. Whereas GCM 16 (ncar_pcm1) gives the best spatial temperature simulation, its resolution is only moderate. However, the degree of resolution must have an effect on the accuracy of the spatial and temporal simulation of a given model, even after the model outputs have been interpolated onto a grid of uniform resolution (2.5° × 2.5° in the present study). For GCM 1 (bccr_bcm2_0) as an example, the raw resolution of GCM 1 is about 2.8° × 2.8°, yet the annual global mean SAT will be underestimated after regridding the raw resolution to 2.5° × 2.5° (Figure 8). The correlation between raw GCMs outputs and new outputs after regridding is significant (R > 0.99). Hence, the bias-correction is necessary after the regridding process. In this research, the process of bias-correction is considered during the BMA, which could minimize new errors. Multi-model ensemble results indicate that future SAT rises fastest in NAS for the B1, A1B and A2 scenarios. Meanwhile, AUS is the most sensitive region affected by greenhouse gas emissions. According to the IPCC-AR4 report, the ensemble projects an increase in global mean surface air temperature of 1.8°C, 2.8°C and 3.4°C in the B1, A1B and A2 scenarios, respectively, by 2090 to 2099 relative to 1980 to 1999 on average (IPCC, 2007). In this study, the corresponding rises are 2.20°C, 3.63°C and 4.39°C, respectively. The significant increase is mainly because this study only addresses the global terrestrial range. Future climate warming unavoidably will disturb the natural eco-environment system, water resource distribution, agriculture production and local human activities (Miao et al., 2010, 2011). The discrepancy of different regions intensifies the difficulty of mitigation and adaptation strategies when facing the risk of global warming. For example, southern Africa could lose more than 30% of its main crop, maize, by 2030 due to climate change (Lobell et al., 2008), while the crop yields in East and Southeast Asia could increase up to 20% by the mid-21st century (IPCC, 2007). Turning to water resources, Labat et al. (2004) suggested an increase of about 4% in global runoff from an increase of 1°C in global temperature. On a regional scale, the runoff will increase with double CO2 in all rivers in high northern latitudes, with a maximum increase of 47%, whereas at low latitudes both increases and decreases are predicted, ranging from a 96% increase to a 43% decrease (Miller and Russell, 1992). However, if future population increase is considered, water stress increases over 62.0–75.8% of total river basin areas and decreases over 19.7–29.0% of this area by the 2050s (Alcamo et al., 2007). All of these connections challenge decision-makers and scientists facing these potential variations caused by climate change.

The global mean SAT comparison between different resolutions of the GCM 1 (bccr_bcm2_0).
Footnotes
Acknowledgements
We are grateful to the Program for Climate Model Diagnosis and Intercomparison (PCMDI) for collecting and archiving the model data, and to the National Centers for Environmental Prediction (NCEP) for collecting and archiving the observed climate data.
Funding
Funding for this research was provided by the National Key Basic Special Foundation Project of China (2010CB951604) (2010CB428402), the National Natural Science Foundation of China (no. 41001153) and the State Key Laboratory of Earth Surface Processes and Resource Ecology.
