Evaluation and application of Bayesian multi-model estimation in temperature simulations

Abstract

Use of multi-model ensembles from global climate models to simulate the current and future climate change has flourished as a research topic during recent decades. This paper assesses the performance of multi-model ensembles in simulating global land temperature from 1960 to 1999, using Nash-Sutcliffe model efficiency and Taylor diagrams. The future trends of temperature for different scales and emission scenarios are projected based on the posterior model probabilities estimated by Bayesian methods. The results show that ensemble prediction can improve the accuracy of simulations of the spatiotemporal distribution of global temperature. The performance of Bayesian model averaging (BMA) at simulating the annual temperature dynamic is significantly better than single climate models and their simple model averaging (SMA). However, BMA simulation can demonstrate the temperature trend on the decadal scale, but its annual assessment of accuracy is relatively weak. The ensemble prediction presents dissimilarly accurate descriptions in different regions, and the best performance appears in Australia. The results also indicate that future temperatures in northern Asia rise with the greatest speed in some scenarios, and Australia is the most sensitive region for the effects of greenhouse gas emissions. In addition to the uncertainty of ensemble prediction, the impacts of climate change on agriculture production and water resources are discussed as an extension of this research.

Keywords

Bayesian Model Averaging (BMA)climate change prediction general circulation models (GCM)multi-model ensembles regional temperature change

I Introduction

In recent decades there has been growing interest into the assessment of historical, current and future climate variability and trends (Miao et al., 2012; Zanis et al., 2009). The fourth assessment report (AR4) of the Intergovernmental Panel on Climate Change (IPCC) gives evidence that the global mean surface air temperature (SAT) rose 0.74° ± 0.18°C during 1906–2005 (IPCC, 2007). AR4 also indicates that during the 21st century the global surface air temperature is likely to rise further by 1.1–2.9°C, assuming the lowest emissions scenario, and 2.4–6.4°C for the highest (Meehl et al., 2007). Global warming inevitably has a major impact on natural and anthropogenic systems at local and regional scales. In order to mitigate and adapt the risk of climate change, decision-makers in governments, non-governmental organizations and the general public need detailed information on present and future climate.

Global coupled Atmospheric-Ocean General Circulation Models (coupled GCMs) are the modelling tools traditionally used in theoretical investigations of the mechanisms of climatic changes (Covey et al., 2003). Using GCMs it is possible to distinguish natural climate variability from anthropogenic effects. More than 20 GCMs with different resolutions have been developed by several research groups around the world. It is generally accepted that the agreement of models with present observations is currently the only way to assign confidence into the quality of a model (Errasti et al., 2011; McAvaney et al., 2001), and the better a model performance in reproducing present-day climate, the higher the reliability of the climate change simulation (Coquard et al., 2004; Giorgi and Mearns, 2002). Three possible forcing agents have been identified as contributors to 20th-century global warming (Li et al., 2007). These include anthropogenic greenhouse gas forcings (CO₂, CH₄, N₂O, etc.), natural forcings (sulfate aerosols and ozone change) and the internal variability of climate system itself (North Atlantic Oscillation, NAO and El Niño–Southern Oscillation, ENSO). However, because of our limited computational resources, incomplete understanding of the way climate is expected to react (Kiktev et al., 2003), simplifying assumptions in model construction (Phillips and Gleckler, 2006), epistemic or aleatory uncertainties in model parameterization (Duan and Phillips, 2010) and unreasonable natural forcing (Li et al., 2007; Zhou and Yu, 2006), GCMs are not expected to represent the climate characteristics with the same intensity and frequency as in observed data. For instance, it has been shown that warming trends over western Europe are much stronger than expected from climate model projections (Van Oldenborgh et al., 2009). Phillips and Gleckler (2006) found that the AR4 GCMs exhibit systematic model bias, with most models displaying aggregated precipitation variability magnitudes larger than observed.

To maximize GCM value, research increasingly focuses on improving GCM predictions. It is generally believed that multi-model ensembles are superior to single models, and that the ensembles may even outperform the best single participating model (Buser et al., 2009; Duan and Phillips, 2010; Fischer et al., 2012; Gleckler et al., 2008; IPCC, 2001; Lambert and Boer, 2001; Phillips and Gleckler, 2006; Pincus et al., 2008; Reichler and Kim, 2008; Weigel et al., 2008). Because single models are overconfident (Weigel et al., 2008), multi-model ensembles contain the information from all participating models and embrace distinctly different physical parameterizations (Pincus et al., 2008), and consequently moderate the uncertainties arising from different parameterizations and dynamical cores in the different GCMs (Zanis et al., 2009). The simplest multi-model ensemble is simple model averaging (SMA), which can be generated by combining the individual model output with equal weights (Duan and Phillips, 2010; Hagedorn et al., 2005). The empirical benefits of multi-model averaging for enhancing the accuracy of GCMs have motivated the development of more sophisticated averaging approaches wherein the participating single-model contributions are weighted according to their prior performance (e.g. Doblas-Reyes et al., 2005; Rajagopalan et al., 2002; Robertson et al., 2004; Stephenson et al., 2005; Yun et al., 2003).

Among these sophisticated averaging approaches, Bayesian model averaging (BMA) is one of the most promising methods. BMA ‘allows you to start with what you already believe (prior) and then see how new information (likelihood) changes your confidence in that belief (posterior)’ (Malakoff, 1999). BMA is adaptive in the sense that recent realizations of the forecast system are used as a training sample to carry out calibration (Yang et al., 2011a). In addition, BMA is also a method of combining forecasts from different sources into a consensus probability density function (PDF) (Fischer et al., 2012). Due to its advantages, BMA methods have been widely used in various fields of studies, including uncertainty analysis (Zhang et al., 2009), global precipitation simulation (Phillips and Gleckler, 2006) and global surface air temperature (Min and Hense, 2006, 2007) in the 20th century, the spatial distribution of global temperature (Duan and Phillips, 2010), runoff prediction (Duan et al., 2007; Freer et al., 1996), soil moisture (Bou-Zeid and Ei-Fadel, 2002), drought events (Vidal and Wade, 2009) and extreme precipitation (Yang et al., 2011a).

Most of these previous studies focused mainly on the spatial characteristics of climate variables. A detailed comparison among different regions is rare, however, and mostly limited to the 20th century. Indeed, the future trend of climate change within the annual and decadal scales may be more helpful for mitigating, and adapting to, the risk of climate change. This study aims to: (1) compare and evaluate the temporal and spatial changes in global terrestrial SAT using the outputs from GCMs, SMA and BMA; and (2) project the temporal change in global terrestrial SAT in the 21st century using BMA. The uncertainty of the BMA process will be discussed. We intend to contribute to a better understanding of simulation and projection of global terrestrial SAT, while also providing a stronger base for decision-making concerning environmental management and carbon emissions.

II Data and methods

1 Data

Observations of monthly SAT over the global land surface were taken from the National Centers for Environmental Prediction (NCEP) reanalysis data set (available at http://www.cdc.noaa.gov/cdc/data.ncep.reanalysis.derived.html). The resolution of NCEP reanalysis data is 2.5° × 2.5°, and the timeframe in this research is 1960–1999.

IPCC-AR4 used 24 GCMs results as part of the Coupled Model Intercomparison Project Phase 3 (CMIP-3) of the World Climate Research Programme (WCRP) (Table 1). The data include the results of the 20th-century (20C3M) experiment and future climate simulations (2000∼2099) under different emissions scenarios (available at http://esg.llnl.gov:8443/index.jsp). Three main future scenarios, which are derived different assumptions of future social and technological development, were represented as ‘low’ (B1), ‘medium’ (A1B) and ‘high’ (A2) scenarios in terms of cumulative greenhouse gas emissions. As some future scenarios have not been provided by all GCMs, here we only select the data of the first 17 GCMs in Table 1 to carry out the multi-model ensemble estimation and projection. For comparison purposes, all GCMs results were regridded to the same resolution as that of the observed data (2.5° × 2.5°grid).

Table 1.

List of the global climate models in IPCC-AR4.

No.	Model	Source	Grid resolution	20c3m	B1	A1B	A2
1	bccr_bcm2_0	Bjerknes Centre for Climate Research, Norway	64*128	√	√	√	√
2	cccma_cgcm3_1	Canadian Centre for Climate Modelling and Analysis, Canada	48*96	√	√	√	√
3	cnrm_cm3	Centre National de Recherches Meteorologiques, France	64*128	√	√	√	√
4	csiro_mk3_0	Australian Commonwealth Scientific and Research Organization, Australia	96*192	√	√	√	√
5	csiro_mk3_5	Australian Commonwealth Scientific and Research Organization, Australia	96*192	√	√	√	√
6	gfdl_cm2_0	Geophysical Fluid Dynamics Laboratory, USA	90*144	√	√	√	√
7	gfdl_cm2_1	Geophysical Fluid Dynamics Laboratory, USA	90*144	√	√	√	√
8	giss_model_e_r	Goddard Institute of Space Studies (NASA), USA	46*72	√	√	√	√
9	inmcm3_0	Institute for Numerical Mathematics, Russia	45*72	√	√	√	√
10	ipsl_cm4	Institut Pierre Simon Laplace, France	72*96	√	√	√	√
11	miroc3_2_medres	Center for Climate System Research, Japan	64*128	√	√	√	√
12	miub_echo_g	Meteorological Institute of the University of Bonn, Germany	48*96	√	√	√	√
13	mpi_echam5	Max-Planck-Institute for Meteorology, Germany	96*192	√	√	√	√
14	mri_cgcm2_3_2a	Meteorological Research Institute, Japan	64*128	√	√	√	√
15	ncar_ccsm3_0	NCAR Community Climate System Model, USA	128*256	√	√	√	√
16	ncar_pcm1	NCAR Parallel Climate Model, USA	61*128	√	√	√	√
17	ukmo_hadcm3	Hadley Centre for Climate Prediction, UK	73*96	√	√	√	√
18	cccma_cgcm_t63	Canadian Centre for Climate Modelling and Analysis, Canada	64*128	√	√	√
19	giss_aom	Goddard Institute of Space Studies (NASA), USA	60*90	√	√	√
20	giss_model_e_h	Goddard Institute of Space Studies (NASA), USA	46*72	√		√
21	iap_fgoals1_0_g	Institute of Atmospheric Physics, China	60*128	√	√	√
22	ingv_echam4	National Institute of Geophysics and Volcanology, Italy,	160*320	√		√	√
23	miroc3_2_hires	Center for Climate System Research, Japan	160*320	√	√	√
24	ukmo_hadgem1	Hadley Centre for Climate Prediction, UK	145*192	√		√	√

2 The methodology of simple model averaging (SMA)

In the method of SMA, each model gets the same weight (w_k = 1/K, where K is the number of models) in the multi-model forecast. This is the most common method in single-model ensemble weighting (Katz and Ehrendorfer, 2006). Within SMA any knowledge about the performance of the model is neglected (Casanova and Ahrens, 2009).

3 The methodology of Bayesian moving average (BMA)

a General formulation

BMA provides a way to combine different models to a multi-model, and is a promising method for calibrating ensembles in modelling and forecasts (Yang et al., 2011b). BMA has been proposed as a popular technique to correct under-dispersion during ensemble forecasts (Duan and Phillips, 2010; Min and Hense, 2006; Raftery et al., 2005; Yang et al., 2011a). The output of BMA is a probability density function (pdf), which is a weighted average of pdfs centred on the bias-corrected forecasts. BMA weights reflect the relative contributions of the component models to the predictive skill over a training sample. The combined forecast pdf of a variable y is:

p (y | y^{T}) \sum_{k = 1}^{K} p (y | M_{k}, y^{T}) p (M_{k} | y^{T})

where p(y|M_k ,y^T ) is the forecast pdf based on model M_k alone, estimated from the training period of observations y ^T, K is the number of models being combined and p(M_k |y^T ) is the posterior probability of model M_k being correct given the training data. This term is computed with the aid of Bayes’ theory:

p (M_{k} | y^{T}) = \frac{p (y^{T} | M_{k}) p (M_{k})}{\sum_{i = 1}^{k} p (y^{T} | M_{l}) p (M_{l})}

Because BMA methodology assumes that the model forecasts are unbiased, the technique of bias-correction should be applied in advance. If the linear regression of y on f_k is employed, y=a_k *f_k +b_k . Unique coefficients a_k and b_k for each model k are determined using a least squares approximation, with the observations in the training period y^T as the dependent variable and the forecast as the explanatory variables.

Considering the application of BMA to bias-corrected forecasts from the k model, equation (1) can be rewritten as:

p (y | f_{1} . . . . . . f_{k}, y^{T}) = \sum_{k = 1}^{K} w_{k} p_{k} (y | f_{k}, y^{T})

where w_k = p (M_k |y^T ) is the BMA weight for model k computed from the training data set and reflects the relative performance of models k on the training period. The weights w_k add up to 1. The conditional probatilities p_k [y|(f_k ,y^T )] may be interpreted as the conditional pdf of y given f_k (i.e. model k has been chosen) and training data y^T . These conditional pdfs are assumed to be normally distributed as:

y | (f_{k}, y^{T}) ~ N (a_{k} + b_{k} y_{k}, σ^{2})

where the coefficients a_k and b_k are estimated from the bias-correction procedures described above. This means that the BMA predictive distribution becomes a weighted sum of normal distribution with equal variance but centre at bias-corrected forecast which can also be obtained from the BMA distribution using the conditional expectation of y given the forecasts:

E [y | (f_{1} . . . . . . f_{k}, y^{T})] = \sum_{k = 1}^{K} w_{k} (a_{k} + b_{k} f_{k})

b Model weights

The BMA weights and the variance σ ² are estimated using maximum likelihood (Raftery et al., 2005). For given parameters to be estimated, the likelihood function is the probability of the training data and is viewed as function of the parameters (Yang et al., 2011b). The weights and variance are chosen so as to maximize this function (i.e. the parameter values for which the observation data were most likely to have been observed). The algorithm used to calculated the BMA weights and variance is called the expectation maximization (EM) algorithm (Dempster et al., 1977). The method is iterative and converges to a local maximum of the likelihood. The detailed description of the BMA method is provided by Raftery et al. (2005).

Figure 1.

Implementation of Bayesian multi-model averaging processes: (A) posterior distribution of variable y for each model; (B) normalized likelihood of model giving the correct response; (C) model forecasts weighted by normalized likelihood; and (D) weighted forecast summed to form multi-model density.

c Training and projecting period, and the standard deviation

In this study, we used a 40-year period (1960–1999) to train BMA weights for 17 GCM models under the 20C3M emission scenario. Figure 1 illustrates how BMA produces a multi-model forecast pdf during the training period. The training data include the annual and decadal mean SAT, and the focus area includes global and regional scales. The decadal mean SAT was obtained by the 10-year moving average. We then multiplied the BMA weights for each model during the training period by the outputs of corresponding models during the rest period (2000∼2099) to generate the present and future scenarios of climate change. We applied the coefficients ak and bk to the rest of the period for bias-correction. For a given variable y, the ensemble standard deviation is computed as follows (Arzhanov et al., 2012):

σ (y | f_{1} ...... f_{k}) = {\sum_{k = 1}^{K} [σ_{k}^{2} + f_{k}^{2}] * w k - E {(y | f_{1} ...... f_{k})}^{2}}

4 Skill score

We used two skill score metrics to evaluate the overall agreement between the observed SAT data (O) and forecasted results (F) (GCMs and their ensemble predictions). The first is the Nash-Sutcliffe model efficiency (E) (Nash and Sutcliffe, 1970), which is commonly used to quantitatively assess the accuracy of model outputs. It is defined as:

E = 1 - \frac{\sum_{i = 1}^{n} {(O_{i} - F_{i})}^{2}}{\sum_{i = 1}^{n} {(O_{i} - \overset{ˉ}{O})}^{2}}

where E can range from –∞ to 1, and E ∼ 1 indicates more skill overall, E = 0 indicates the model predictions are as accurate as the mean of the observed data, whereas an efficiency less than zero occurs when the observed mean is a better predictor than the model (Warner et al., 1997).

The second metric is the Taylor diagram, which provides a way of summarizing graphically how closely a pattern (or a set of patterns) matches observations (Taylor, 2001). The similarity between simulation and observation is quantified in terms of their correlation (R), their centred root-mean-square-error (RMSE) and the amplitude of their standard deviations (Std). Three statistics of simulations and observation are plotted on a polar style graph, the radial distances from the origin to the simulation are proportional to the Std, and the azimuthal positions give the correlation coefficient between the simulation and observation. The reason that each point in the two-dimensional space of the Taylor diagram can represent three different statistics simultaneously (RMS difference, R and Std) is that these statistics are related by the following formula:

R M S^{2} = σ_{F}^{2} + σ_{O}^{2} - 2 σ_{F} σ_{O} R

where $σ_{F}$ and $σ_{O}$ are the standard deviations of the simulations and observation respectively (Taylor, 2001). The Taylor diagram is especially useful in evaluating multiple aspects of complex models or in gauging the relative skill of many different models (IPCC, 2001).

5 The spatial scale of assessment

In addition to the simulation and projection of SAT on the global land scale, 10 regions at finer scale based on the characteristics of observed SAT were also involved in this research (Figure 2). They are Antarctica (ANT), Greenland (GRL), northern North America (NNA), southern North America (SNA), South America (SAM), Africa (AF), Europe (EUR), northern Asia (NAS), southern Asia (SAS) and Australia (AUS). NNA and SNA are divided by the 49°N latitude, and NAS and SAS are separated at 50°N latitude.

Figure 2.

Maps of the 1960–1999 climatology of annual mean continental SAT (in units of °C) from the NCEP/NCAR reanalysis data (left) and positioning of the 10 analysed regions (right).

III Results

1 Spatial simulation

Figure 3 shows the error map of each simulation for the 40-year average (1960–1999) global mean SAT simulation. From this map, it is easy to find that the performances of different models are disparate among regions. On the global scale, GCM 1 (bccr_bcm2_0) and GCM 4 (csiro_mk3_0) underestimate the SAT during the period of 1960–1999, but GCM 5 (csiro_mk3_5) overestimate the SAT. Almost all models have poor performance in the regions of ANT, EUR and GRL. Generally, the models from MPI (GCM 13) and NCAR (models 15 and 16) have the best performance; the grey area of the error map is the largest. BMA results exceed any model, even for the best model.

Figure 3.

Error map of each simulation for the 40-year average (1960–1999) global mean SAT simulation (in units of °C).

With the help of skill scores, it is easier to judge the performance of each simulation (Figure 4). It is shown that all the E values are invariably over 0.9 among the 19 simulations (17 AR4 GCMs and SMA, BMA). This result indicates that the majority of the AR4 GCMs provide satisfactory simulations of the relative spatial distribution of 40-year average global mean SAT during 1960–1999. The highest efficiency value proves once again that GCM 13 (mpi_echam5), GCM 15 (ncar_ccsm3_0) and GCM 16 (ncar_pcm1) are the most accurate. They even overmatch the SMA results and greatly approximate to the BMA simulation. In term of multi-model ensemble prediction, BMA can improve the simulated accuracy further when compared to the SMA results, and hence gives a higher model efficiency. Using a Taylor diagram, Figure 4 still shows the performances of ensemble predictions in simulating 40-year average regional mean SAT. It is found that the ensemble prediction in Australia (AUS) and northern North America (NNA) are the best among 10 regions (closest distance to the observed point). Moreover, BMA can improve the simulation results in southern Asia (SAS), southern North America (SNA), northern North America (NNA), Europe (EUR) and Antarctica (ANT) when compared to SMA results.

Figure 4.

The results of Nash-Sutcliffe efficiency (left) and Taylor diagram (right) for spatial distribution simulation. The Nash-Sutcliffe efficiency aims at spatial distribution simulation of 40-year global average (1960–1999) mean SAT, and E values for model numbers 18 and 19 (reddish colour) are the SMA and BMA results, respectively. The Taylor diagram aims at spatial distribution simulation of 40-year regional average (1960–1999) mean SAT. In a Taylor diagram, the radial distance from the origin represents the standard deviation ratio of simulated SAT to that of observations, and the pattern correlation between two fields is given by the azimuthal position. The nearer to the OBS point in a Taylor diagram, the closer the observations to the simulation. In addition, the denseness of points in the diagram denotes the magnitude of diversity among different models. A denser distribution of points shows smaller diversity among models, and vice versa.

2 Temporal simulations

Figure 5 shows the temporal performance of AR4 GCMs and their ensemble predictions in simulating annual global mean SAT. Most of the GCMs underestimated the annual global mean SAT if lacking the bias-correction. By comparison with Figure 4, Figure 5 shows that the GCMs are less accurate in temporal simulation than spatial simulation, with E < 0.5 and R < 0.7. GCM 14 (mri_cgcm2_3_2a) ranked first among 17 GCMs. The best models are different in simulating spatial distribution and temporal dynamic of global SAT, which is partly affected by the bias-correction process before BMA. Figure 5 also indicates that the single GCM can approximate the trend of global land SAT, but the assessment of accuracy is relatively weak. Compared to the single GCM, ensemble prediction can improve the temporal simulation in annual global mean SAT. Also, the performance of BMA is better than SMA with a higher E value and better performance in the Taylor diagram.

Figure 5.

The performance of simulated annual global mean SAT. Top: the trend of global mean SAT from raw model output without bias-correction. Centre: the Nash-Sutcliffe efficiency results after bias-correction. Bottom: the Taylor diagram result after bias-correction (bottom).

Figure 6 presents the comparison between the observed and the BMA simulated SAT trend on annual and decadal scales. All BMA results (no matter whether on global or regional scales) cannot reflect accurately the actual annual SAT dynamic, although the increasing trend was caught. However, the BMA result can identify the 10-year moving average SAT on the global scale. Among 10 regions, the best BMA performance of 10-year moving average simulation appears in AUS, with the highest R and E values. However, the performance in SAM and GLR is not acceptable, with unsatisfactory R and E values. The shading in Figure 6 is two standard deviation of BMA results, implying the uncertainty of simulation. The value is ±0.33°C/decade on the global scale. Among 10 regions, the lowest uncertainty of BMA results appears in AF (±0.38°C/decade), followed by AUS (±0.46°C/decade), SNA (±0.53°C/decade), SAM (±0.68°C/decade), SAS(±0.69°C/decade), ANT (±0.95°C/decade), EUR (±1.06°C/decade), GRL (±1.11°C/decade), NAS (±1.13°C/decade) and NNA (±1.19°C/decade). In addition, the uncertainty range in some regions has not incorporated completely the observed SAT series, such as SAM.

Figure 6.

Comparison of observed global mean SAT and BMA results in different regions. Annual global mean SAT change (dotted red line) with 10-year moving average (thick red line), and its BMA results for annual global mean SAT (dotted blue line) and 10-year moving average (thick blue line). Shading denotes the ±2 standard deviation range of BMA results with 10-year moving average.

3 SAT projection over the 21st century

Based on the BMA process in the period 1960–1999, the weight of each GCM was obtained. According to these weights and the different GCMs scenarios data set, BMA simulation of SAT projection over the 21st-century scenario was generated. In this study, we calculated the BMA simulation of three main future emission scenarios (B1, A1B and A2) and plotted them in Figure 7. The SAT on the decadal scale is projected only, due to the poor performance on the annual scale. On the global scale, the BMA simulation shows that the global land SAT increases greatly in the 21st century. On average, the global land SAT rose by 2.20°C/100 yr, 3.63°C/100 yr and 4.39°C/100 yr for the B1, A1B and A2 scenarios, respectively. Among three scenarios, the rise of decadal global mean SAT in NAS is the most significant among the 10 regions, increasing by 3.12°C/100yr, 5.22°C/100yr and 6.08°C/100yr for B1, A1B and A2 scenarios, respectively. Figure 7 also illuminates that the most sensitive region impacted by greenhouse gas emissions is AUS. The warming rates in the A1B and A2 scenarios are about 2.2 and 2.7 times, respectively, that of the B1 scenario. Turning to the uncertainty of projected SAT, we find that the uncertainties of SAT projected by BMA in the three future emission scenarios are similar, regardless of scale. The uncertainty of SAT projected by BMA in the 21st century increases with time.

Figure 7.

SAT projection over the 21st-century by use of the BMA method. The evolutions for three emission scenarios are shown. The evolution was derived by applying a 10-year moving average. Solid lines are Bayesian multi-model global averages of surface warming under the scenarios A2 (red line), A1B (green line) and B1 (blue line). The coloured text indicates the average rate of warming under the scenarios A2 (red), A1B (green) and B1 (blue). Shading denotes the ±2 standard deviation range of BMA results with 10-year moving average.

IV Discussion and conclusions

Use of multi-model ensembles from global climate models to simulate climate change has flourished as an important topic of scientific and public interest during recent decades. However, most previous studies focused mainly on the spatial characteristics of climate variables in the 20th century, and few global climate models were used. The use of more ensemble members provides more information and thus optimizes the accuracy of simulations (Zanis et al., 2009). We compared and evaluated the spatiotemporal changes in global land SAT using the outputs from GCMs, SMA and BMA, and projected future global land SAT trends for different emission scenarios. The major findings of this study are summarized and discussed as follows:

By simulating the global land SAT from an ensemble of 17 CIMP3-GCMs, BMA ensemble prediction is better at fitting spatial than annual temporal behaviour. Single GCM can reproduce the spatial distribution of global land SAT accurately; however, the improvement by BMA ensemble prediction against the single GCM is more significant when simulating the temporal rather than the spatial distribution. GCM 13 (mpi_echam5), GCM 15 (ncar_ccsm3_0) and GCM 16 (ncar_pcm1) in this study well matched the spatial distribution of global land SAT simulated by SMA, and approximated the BMA results because the single GCM has already performed well at simulating the spatial distribution of global land SAT (Figures 3 and 4), which reduced the ensemble prediction’s ability for improvement. Even so, the ability of BMA to reproduce the historical climate is better than that of SMA generally. In addition, BMA simulation can demonstrate the SAT trend on the decadal scale, but the annual assessment of accuracy is relatively weak because the GCMs in 20C3M were run with a free-running ocean. Perhaps the coupled atmosphere-ocean processes could improve the performance of annual SAT. Hence, direct use of the short-term output from a single GCM or ensemble prediction of GCMs is not recommended. In order to model the short-term dynamic series, some effective techniques to improve the regional simulation accuracy should be considered in advance, such as statistical downscaling (Jia et al., 2011; Moriondo and Bindi, 2006) and regional climate models (RCMs) driven by the GCM output as their boundary conditions (Buser et al., 2009; Nieto and Rodríguez-Puebla, 2006).

By simulating the regional SAT from an ensemble of 17 CIMP3-GCMs, BMA ensemble prediction performs best in the AUS, regardless of spatial distribution or interdecadal dynamics. The AR4 GCMs were used to simulate climate change as a result of slow alterations to certain boundary conditions and forcing parameters, which affect the energy balance at the global scale. Some special regions with complicated climate-forming processes (e.g. northern Europe and Greenland) cannot be accurately described by the climate model at the global scale. Consequently, not all GCMs are able to provide a similarly accurate description of the present climate (Errasti et al., 2011; Im et al., 2007; Masson and Knutti, 2011; Nieto and Rodríguez-Puebla, 2006), which impacts the consistency of BMA ensemble performance on the global scale. Thus, accuracy validation of GCMs should be carried out before applying them in different regions.

The BMA forecast would be better than SMA or any one member, since it is determined from an ensemble distribution that has its first and second moments bias-corrected using recent verification data for all the ensemble members. It is essentially an ‘intelligent’ consensus forecast, weighted by the recent performance results for the component models. In addition, the uncertainty of ensemble predicted global land SAT is about ±0.33°C at the global scale, which is obviously narrower than that of a single climate model (IPCC, 2007), which implies that ensemble prediction by BMA can effectively reduce the range of uncertainty. Giorgi and Francisco (2000b) believe uncertainties in climate change predictions come from four sources: the forcing scenarios, the use of different GCMs, the predictions by different realizations of a given scenario with a given GCM, and sub-GCM grid scale forcing and processes. Uncertainty can be evaluated by performing several simulations with the same model (Giorgi and Francisco, 2000a) and using different climate models with the same forcing (Crossley et al., 2000). BMA yields posterior probability distributions of all the uncertain quantities of interest, and these posterior distributions provide a wealth of information about GCM reliability and temporal (present to future) correlations (Tebaldi et al., 2005). Because of the advantages of BMA, it is considered an extension and elaboration of the reliability ensemble averaging method.

The GCMs generated by different research groups have different resolutions (Table 1). It is believed that higher resolution does not automatically lead to improved model accuracy (Kiktev et al., 2003; Miao et al., 2012), and the present research confirms this view. Whereas GCM 16 (ncar_pcm1) gives the best spatial temperature simulation, its resolution is only moderate. However, the degree of resolution must have an effect on the accuracy of the spatial and temporal simulation of a given model, even after the model outputs have been interpolated onto a grid of uniform resolution (2.5° × 2.5° in the present study). For GCM 1 (bccr_bcm2_0) as an example, the raw resolution of GCM 1 is about 2.8° × 2.8°, yet the annual global mean SAT will be underestimated after regridding the raw resolution to 2.5° × 2.5° (Figure 8). The correlation between raw GCMs outputs and new outputs after regridding is significant (R > 0.99). Hence, the bias-correction is necessary after the regridding process. In this research, the process of bias-correction is considered during the BMA, which could minimize new errors.

Multi-model ensemble results indicate that future SAT rises fastest in NAS for the B1, A1B and A2 scenarios. Meanwhile, AUS is the most sensitive region affected by greenhouse gas emissions. According to the IPCC-AR4 report, the ensemble projects an increase in global mean surface air temperature of 1.8°C, 2.8°C and 3.4°C in the B1, A1B and A2 scenarios, respectively, by 2090 to 2099 relative to 1980 to 1999 on average (IPCC, 2007). In this study, the corresponding rises are 2.20°C, 3.63°C and 4.39°C, respectively. The significant increase is mainly because this study only addresses the global terrestrial range. Future climate warming unavoidably will disturb the natural eco-environment system, water resource distribution, agriculture production and local human activities (Miao et al., 2010, 2011). The discrepancy of different regions intensifies the difficulty of mitigation and adaptation strategies when facing the risk of global warming. For example, southern Africa could lose more than 30% of its main crop, maize, by 2030 due to climate change (Lobell et al., 2008), while the crop yields in East and Southeast Asia could increase up to 20% by the mid-21st century (IPCC, 2007). Turning to water resources, Labat et al. (2004) suggested an increase of about 4% in global runoff from an increase of 1°C in global temperature. On a regional scale, the runoff will increase with double CO₂ in all rivers in high northern latitudes, with a maximum increase of 47%, whereas at low latitudes both increases and decreases are predicted, ranging from a 96% increase to a 43% decrease (Miller and Russell, 1992). However, if future population increase is considered, water stress increases over 62.0–75.8% of total river basin areas and decreases over 19.7–29.0% of this area by the 2050s (Alcamo et al., 2007). All of these connections challenge decision-makers and scientists facing these potential variations caused by climate change.

Figure 8.

The global mean SAT comparison between different resolutions of the GCM 1 (bccr_bcm2_0).

Footnotes

Acknowledgements

We are grateful to the Program for Climate Model Diagnosis and Intercomparison (PCMDI) for collecting and archiving the model data, and to the National Centers for Environmental Prediction (NCEP) for collecting and archiving the observed climate data.

Funding

Funding for this research was provided by the National Key Basic Special Foundation Project of China (2010CB951604) (2010CB428402), the National Natural Science Foundation of China (no. 41001153) and the State Key Laboratory of Earth Surface Processes and Resource Ecology.

References

Alcamo

Flörke

Märker

(2007) Future long-term changes in global water resources driven by socio-economic and climatic changes. Hydrological Sciences Journal 52: 247–275.

Arzhanov

Eliseev

Mokhov

(2012) A global climate model based, Bayesian climate projection for northern extra-tropical land areas. Global and Planetary Change 86–87: 57–65.

Bou-Zeid

Ei-Fadel

(2002) Climate change and water resources in Lebanon and the Middle East. Journal of Water Resources Planning and Management 9–10: 343–356.

Buser

Künsch

Lüthi

. (2009) Bayesian multi-model projection of climate: Bias assumptions and interannual variability. Climate Dynamics 33: 849–868.

Casanova

Ahrens

(2009) On the weighting of multimodel ensembles in seasonal and short-range weather forecasting. Monthly Weather Review 137: 3811–3822.

Coquard

Duffy

Taylor

. (2004) Present and future surface climate in the western USA as simulated by 15 global climate models. Climate Dynamics 23: 455–472.

Covey

AchutaRao

Cubasch

. (2003) An overview of results from the Coupled Model Intercomparison Project. Global and Planetary Change 37 (1–2): 103–133.

Crossley

Polcher

Cox

. (2000) Uncertainties linked to land-surface process in climate change simulation. Climatic Dynamics 16: 949–961.

Dempster

Laird

Rubin

(1977) Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39B: 1–39.

10.

Doblas-Reyes

Hagedorn

Palmer

(2005) The rationale behind the success of multi-model ensembles in seasonal forecasting. Part II: Calibration and combination. Tellus A57: 234–252.

11.

Duan

Phillips

(2010) Bayesian estimation of local signal and noise in multimodel simulations of climate change. Journal of Geophysical Research 115(D18): D18123.

12.

Duan

Ajami

Gao

. (2007) Multi-model ensemble hydrologic prediction using Bayesian model averaging. Advances in Water Resources 30: 1371–1386.

13.

Errasti

Ezcurra

Sáenz

. (2011) Validation of IPCC AR4 models over the Iberian Peninsula. Theoretical and Applied Climatology 103: 61–79.

14.

Fischer

Weigel

Buser

. (2012) Climate change projections for Switzerland based on a Bayesian multi-model approach. International Journal of Climatology 32(15): 2348–2371.

15.

Freer

Beven

Ambroise

(1996) Bayesian estimation of uncertainty in runoff prediction and the value of data: An application of the GLUE approach. Water Resources Research 32: 2161–2173.

16.

Giorgi

Francisco

(2000a) Uncertainties in regional climate change prediction: A regional analysis of ensemble simulations with the HADCM2 coupled AOGCM. Climatic Dynamics 16: 169–182.

17.

Giorgi

Francisco

(2000b) Evaluating uncertainties in the prediction of regional climate change. Geophysical Research Letters 27: 1295–1298.

18.

Giorgi

Mearns

(2002) Calculation of average, uncertainty range, and reliability of regional climate changes from AOGCM simulations via the ‘Reliability Ensemble Averaging’ (REA) method. Journal of Climate 15: 1141–1158.

19.

Gleckler

Taylor

Doutriaux

(2008) Performance metrics for climate models. Journal of Geophysical Research 113: D06104.

20.

Hagedorn

Doblas-Reyes

Palmer

(2005) The rationale behind the success of multi-model ensembles in seasonal forecasting. Part I: Basic concept. Tellus A57: 219–233.

21.

Kwon

Ahn

. (2007) Multi-decadal scenario simulation over Korea using a one-way double-nested regional climate model system. Part 1: Recent climate simulation (1971–2000). Climate Dynamics 28: 759–780.

22.

Intergovernmental Panel on Climate Change (IPCC) (2001) Climate Change 2001: Third Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge: Cambridge University Press.

23.

Intergovernmental Panel on Climate Change (IPCC) (2007) Climate Change 2007: The Physical Science Basis. Working Group I Contribution to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge: Cambridge University Press.

24.

Jia

Zhu

. (2011) A statistical spatial downscaling algorithm of TRMM precipitation based on NDVI and DEM in the Qaidam Basin of China. Remote Sensing of Environment 115: 3069–3079.

25.

Katz

Ehrendorfer

(2006) Bayesian approach to decision making using ensemble weather forecasts. Weather Forecasting 21: 220–231.

26.

Kiktev

Sexton

DMH

Alexander

. (2003) Comparison of modeled and observed trends in indices of daily climate extremes. Journal of Climate 16: 3560–3571.

27.

Labat

Goddéris

Probst

. (2004) Evidence for global runoff increase related to climate warming. Advances in Water Resources 27: 631–642.

28.

Lambert

Boer

(2001) CMIP1 evaluation and intercomparison of coupled climate models. Climate Dynamics 17: 83–106.

29.

Wang

Zhou

(2007) Impacts of external forcing on the 20th century global warming. Chinese Science Bulletin 52: 3148–3154.

30.

Lobell

Burke

Tebaldi

. (2008) Prioritizing climate change adaptation needs for food security in 2030. Science 319(5863): 607–610.

31.

McAvaney

Covey

Joussaume

. (2001) Model evaluation. In: Houghton

Ding

Griggs

, et al. (eds) Climate Change 2001: The Scientific Basis. Cambridge: Cambridge University Press, 471–524.

32.

Malakoff

(1999) Statistics: A brief guide to Bayes theorem. Science 286: 1461.

33.

Masson

Knutti

(2011) Spatial-scale dependence of climate model performance in the CMIP3 ensemble. Journal of Climate 24: 2680–2692.

34.

Meehl

Stocker

Collins

. (2007) Global climate projections. In: Solomon

Qin

Manning

, et al. (eds) Climate Change 2007: The Physical Science Basis. Cambridge: Cambridge University Press, 747–846.

35.

Miao

Duan

Yang

. (2012) On the applicability of temperature and precipitation data from CMIP3 for China. PLOS ONE 7(9): e44659.

36.

Miao

Borthwick

AGL

(2010) Recent changes in water discharge and sediment load of the Yellow River basin, China. Progress in Physical Geography 34: 541–561.

37.

Miao

Borthwick

AGL

. (2011) A preliminary estimate of human and natural contributions to the changes in water discharge and sediment load in the Yellow River. Global and Planetary Change 76 (3–4): 196–205.

38.

Miller

Russell

(1992) The impact of global warming on river runoff. Journal of Geophysical Research 97(D3): 2757–2764.

39.

Min

Hense

(2006) A Bayesian assessment of climate change using multimodel ensembles. Part I: Global mean surface temperature. Journal of Climate 19: 3237–3256.

40.

Min

Hense

(2007) A Bayesian assessment of climate change using multimodel ensembles. Part II: Regional and seasonal mean surface temperatures. Journal of Climate 20: 2769–2790.

41.

Moriondo

Bindi

(2006) Comparison of temperatures simulated by GCMs, RCMs and statistical downscaling: Potential application in studies of future crop development. Climate Research 30: 149–160.

42.

Nash

Sutcliffe

(1970) River flow forecasting through conceptual models. Part 1 – A discussion of principles. Journal of Hydrology 10: 282–290.

43.

Nieto

Rodríguez-Puebla

(2006) Comparison of precipitation from observed data and general circulation models over the Iberian Peninsula. Journal of Climate 19: 4254–4275.

44.

Phillips

Gleckler

(2006) Evaluation of continental precipitation in 20th century climate simulations: The utility of multimodel statistics. Water Resources Research 42: W03202.

45.

Pincus

Batstone

Hoffman

RJP

. (2008) Evaluating the present-day simulation of clouds, precipitation, and radiation in climate models. Journal of Geophysical Research 113, D14209.

46.

Raftery

Gneiting

Balabdaoui

. (2005) Using Bayesian model averaging to calibrate forecast ensembles. Monthly Weather Review 133: 1155–1174.

47.

Rajagopalan

Lall

Zebiak

(2002) Categorical climate forecasts through regularization and optimal combination of multiple GCM ensembles. Monthly Weather Review 130: 1792–1811.

48.

Reichler

Kim

(2008) How well do coupled models simulate today’s climate? Bulletin of the American Meteorological Society 89: 303–311.

49.

Robertson

Lall

Zebiak

. (2004) Improved combination of multiple atmospheric GCM ensembles for seasonal prediction. Monthly Weather Review 132: 2732–2744.

50.

Stephenson

Coelho

CAS

Doblas-Reyes

. (2005) Forecast assimilation: A unified framework for the combination of the multi-model weather and climate prediction. Tellus 57A: 253–264.

51.

Taylor

(2001) Summarizing multiple aspects of model performance in a single diagram. Journal of Geophysical Research 106: 7183–7192.

52.

Tebaldi

Smith

Nychka

. (2005) Quantifying uncertainty in projections of regional climate change: A Bayesian approach to the analysis of multimodel ensembles. Journal of Climate 18: 1524–1540.

53.

Van Oldenborgh

Drijfhout

van Ulden

. (2009) Western Europe is warming much faster than expected. Climate of the Past 5: 1–12.

54.

Vidal

Wade

(2009) A multimodel assessment of future climatological droughts in the United Kingdom. International Journal of Climatology 29: 2056–2071.

55.

Warner

Stake

Guillard

. (1997) Evaluation of EPIC for a shallow New England Soil: II. Soil nitrate. Transactions of the ASABE 40: 585–593.

56.

Weigel

Liniger

Appenzeller

(2008) Can multi-model combination really enhance the prediction skill of probabilistic ensemble forecasts? Quarterly Journal of the Royal Meteorological Society 134: 241–260.

57.

Yang

Hao

Shao

. (2011a) Multi-model ensemble projections in temperature and precipitation extremes of the Tibetan Plateau in the 21st century. Global and Planetary Change 80–81: 1–13.

58.

Yang

Wang

Zhao

. (2011b) Changes of climate extremes in a typical arid zones: Observations and multimodel ensemble projections. Journal of Geophysical Research, 116: D19106.

59.

Yun

Stefanova

Krishnamurti

(2003) Improvement of the multimodel superensemble technique for seasonal forecasts. Journal of Climate 16: 3834–3839.

60.

Zanis

Kapsomenakis

Philandras

. (2009) Analysis of an ensemble of present day and future regional climate simulations for Greece. International Journal of Climatology 29: 1614–1633.

61.

Zhang

Srinivasan

Bosch

(2009) Calibration and uncertainty analysis of the SWAT model using genetic algorithms and Bayesian model averaging. Journal of Hydrology 374 (3–4): 307–317.

62.

Zhou

(2006) Twentieth century surface air temperature over China and the globe simulated by coupled climate models. Journal of Climate 19: 5843–5858.