Abstract
Palaeoecology provides critical data for establishing ecological ‘baselines’, which can guide restoration efforts and be used to assess ecosystem change. However, statistical analyses can be challenging because of the large number of methods available for establishing palaeoecological baselines combined with a lack of practical guidance, particularly around quantifying baselines that include natural variability. We contribute one solution by providing guidance and an R package baselines for using palaeoecological data to (i) define baselines and (ii) test for change over time that incorporates variability. These methods provide an alternative to single-taxon analyses and allow ecosystem complexity to be captured. We use published pollen records as case studies to demonstrate how to establish vegetation baselines for seven localities in New Zealand where relatively recent (c. 1280 AD) and near-simultaneous human settlement across the country allows background environmental disturbance in the pre-human era to be distinguished from anthropogenic disturbance. We present methods for calculating distance from initial sample, distance from baseline in ordination space, allowing incorporation of ecosystem variability, and analysis of rates of change over time using principal curves. We found conventional and Bayesian ordination methods yielded similar results and were effective at identifying change following human settlement, despite the potential for a positive mean-variance relationship to confound results. Principal response curves were most sensitive to a known period of vegetation disturbance caused by volcanic eruptions at two sites with tephra deposits. Our proposed methods, case study and R package baselines are designed to provide a suite of tools to encourage and enable palaeoecological data to be used by palaeoecologists to assess trajectories and change over time, and monitor whether historical management actions have facilitated a change in direction towards a desired baseline state.
Keywords
Introduction
Over the past 60,000 years, terrestrial ecosystems have been transformed by anthropogenic impacts which have accelerated with the spread of modern humans (Johnson et al., 2017). These impacts have been compounded by global climate change, wildfires, faunal extinctions and invasive species. Such drivers of change continue to place pressure on biodiversity (Choat et al., 2012; Foley et al., 2005; Zedler and Kercher, 2004, 2005). For example, an estimated 64–71% of the world’s remaining wetlands have been lost since 1900 AD (Davidson, 2014), global forest loss has been estimated at a conservative rate of 0.6% per year for the period 2000–2005 (Hansen et al., 2010; Tropek et al., 2014), and global rates of vegetation change have accelerated without precedent during the Late-Holocene (Mottl et al., 2021a; Nogué et al., 2021). Ecosystem restoration has a key role in biodiversity conservation (Benayas et al., 2009; Erwin, 2009; Young, 2000; Zedler, 2000), and complements work to prevent further biodiversity loss. Restoration is defined as the process of assisting the recovery of an ecosystem relative to a reference ecosystem or baseline (McDonald et al., 2016). However, defining a reference ecosystem can require knowledge of the pre-disturbance state and condition of the ecosystem in question (Balaguer et al., 2014; Parker and Wiens, 2005; Stoddard et al., 2006). It is not always easy to define such baselines when the ecological changes have either occurred prior to living memory (Papworth et al., 2009), or prior to most long-term ecological studies, which tend to be less than 50 years old (Willis et al., 2007).
Palaeoecology provides approaches for describing past ecosystems and can therefore provide reference or baseline ecosystem states that help to define restoration aims against which progress can be measured, and for example, to assess ecological character change (Finlayson et al., 2016). Widely used proxies for past ecosystem state and change include preserved pollen (vegetation), charcoal (fire histories), fossils (flora and fauna) and sediment analysis (erosion and nutrient fluxes). However, despite widespread calls to integrate palaeoecology into biodiversity conservation (e.g. Birks, 2012; Dietl and Flessa, 2011; Froyd and Willis, 2008), it has yet to be widely adopted for the purposes of deriving baselines (Willis et al., 2007) or measuring biodiversity change (Willis et al., 2005) in a quantitative fashion.
While examples of palaeoecology-informed baselines exist (Barnosky et al., 2017; Clarke and Lynch, 2016; Lyver et al., 2015; Nogué et al., 2017; van Leeuwen et al., 2008), challenges remain. One such challenge is that desired ‘baselines’ rarely incorporate variability of ecosystem state from normal ecosystem dynamics, for example, in response to climate, and disturbance events such as volcanism, and wildfires (Birks, 2012; Dietl and Flessa, 2011; Froyd and Willis, 2008). Ecologists have argued that the concept of ‘snapshot’ baselines is problematic because ecosystems are spatially and temporally variable (Willis et al., 2005), and the long-term nature of palaeoecological data provides an ideal opportunity for describing that natural variability and allowing unprecedented anthropogenic changes to be distinguished from naturally occurring fluctuations (e.g. Nogué et al., 2021). This presents a problem for some ecological assessments, where ecosystems should be assessed against historical baselines that include variability (Gell et al., 2016). While there are numerous examples of excellent methods-focussed papers on temporal analyses (e.g. Anderson and Thompson, 2004; Buckley et al., 2021), there is less guidance for palaeoecologists wanting to assess trajectories against baselines that incorporate natural variability in ecosystem state. Palaeoecologists undertaking temporal analyses of community composition are confronted with ongoing debates regarding the merits of new methods, such as with conventional and Bayesian ordination methods (e.g. Beissinger et al., 2016; Braak and Šmilauer, 2015; Hui et al., 2015; Warton et al., 2015, 2016). Moreover, palaeoecological data are commonly recorded as counts – such as numbers of pollen grains – and as a result are expected to have a positive mean-variance relationship. Calculating compositional change is therefore problematic as a positive mean-variance relationships can confound traditional methods (i.e. a change in variance wrongly presents as a change in composition) and thus make calculating compositional change difficult (Warton and Hui, 2017). We present guidance for implementing distance-from-baseline methods using a Bayesian ordination, which solves this issue as Bayesian ordinations avoid an assumption of a constant mean-variance relationship.
A key issue with applying palaeoecological data to restoration ecology and conservation is moving from promise to application (Dietl and Flessa, 2011); here we set out a guide to incorporate palaeoecological data using selected conventional and novel techniques. We have extended techniques that are based on familiar concepts for most palaeoecologists and ecologists (e.g. ordination); we have gone for depth of coverage of selected techniques, rather than breadth, and note that other methods may usefully be substituted (e.g. Anderson and Thompson, 2004; Mottl et al., 2021a; Nogué et al., 2021). For example, Anderson and Thompson (2004) cover multivariate control charts, which are derived from industrial applications and are designed at sounding ‘alarm bells’ if change in a condition is outside the bounds of what is expected. As such, this is best used for monitoring purposes (e.g. assessing Ramsar Convention wetlands; see Gell et al., 2016). Mottl et al. (2021a) provides another example where a rate-of-change analysis is used to highlight global acceleration in rates of vegetation change, over multiple pollen cores, since the Late-Holocene. Nogué et al. (2021) used a breakpoint analysis to look for changes in directional rates of change of pollen composition in sediment records following human settlement of islands around the world.
Here, we demonstrate how to (a) quantify a baseline incorporating natural variability; (b) calculate distance from baseline over time; (c) identify periods of significant change; and (d) show how to test whether modern or restored systems are becoming more, or less, similar in their composition to the identified baseline. We apply these techniques to a reference set of seven well-documented, pollen-based vegetation records published from sites in New Zealand (Figure 1) to demonstrate the utility of our methods and show how modern ecosystems can be compared to pre-disturbance baselines. Some of the sites have experienced minor and major natural ecosystem fluctuations before human arrival and associated anthropogenic disturbance. The pollen records cover three key periods: first, the state before initial human arrival, followed by two anthropogenic states beginning with initial human arrival from Polynesia circa 1280 AD (Wilmshurst et al., 2008) and then the later settlement by Europeans (commonly defined as 1840 AD; McGlone (1983). Deforestation by Polynesian fires cleared more than 40% of the mostly lowland and montane dry forest cover of New Zealand within less than 200 years of initial arrival (McWethy et al., 2010; Perry et al., 2014). In the post-European period, further forest clearance continued, reducing remaining forest cover to less than 25% of the land area (Perry et al., 2014) and many alien plant species were introduced. Additionally, the eruption of the Taupō supervolcano (Wilson and Walker, 1985) which has been comprehensively dated to AD 232 ± 10 (Lowe et al., 2013) and the rhyolitic tephra associated with this event is found in two of our cores, where we use it as an additional date and cause of natural disturbance (Wilmshurst and McGlone, 1996). The timing and ecological impact of Polynesian and European settlement are also well-documented and generally well-understood, meaning that changes in the pollen that are detected with our methods can be validated against known events (McGlone and Wilmshurst, 1999; Perry et al., 2014; Wilmshurst and McGlone, 1996; Wilmshurst et al., 2008).

Map of case study locations and summary pollen diagrams from all seven case studies (see references in text for methods) showing the relative abundance of herbaceous, shrub and tree taxa. Ferns and wetland taxa are excluded. Each case study was truncated to the first sample after 1 AD as per age-depth models; each case study ends with the most recent sample (this varies across case studies, ranging from 1916 at Waitutu to 2013 at Enderby). Dotted line at Rotonuiaha and Tutira indicates date of Taupō eruption; dashed lines indicate 1280 and 1840 AD, the dates of Polynesian and European settlement, respectively.
This paper is accompanied by a technical supplement (Supplemental Appendix S1) which contains code and detailed methods to run the analyses described in this paper and a statistical software package to run in R (R Core Team, 2020), called baselines (Burge, 2022), to aid in the application of the methods.
Methods
To evaluate and compare our methods, we use pollen count records from seven New Zealand localities as case studies (Figure 1), including two lake sediment cores from the North Island, two peat cores from the South Island and three peaty soil cores from off-shore islands (Tawhiti Rahi, the largest of the Poor Knights Island group, hereafter referred to as ‘Poor Knights’, and the subantarctic Enderby and Campbell islands). The two North Island sites (Rotonuiaha and Tutira) were directly affected by the Taupō eruption (Wilmshurst and McGlone, 1996; Wilmshurst et al., 1997). All sites show signs of human impact (see references below), but Enderby and Campbell islands only experienced brief periods of post-European impact, and Waitutu remains in native forest. For details of core locations, radiocarbon dating, pollen analyses and interpretations of vegetation change we refer to the following published accounts: Tutira and Rotonuiaha (both described in Wilmshurst, 1997; Wilmshurst and McGlone, 1996); Eweburn (Wilmshurst et al., 2002); Waitutu (Turney et al., 2017); Poor Knights (Wilmshurst et al., 2014); Enderby Island (Wood et al., 2016); and Campbell Island (McGlone et al., 2007). Summary pollen diagrams indicating patterns of vegetation change are presented in Figure 1. We truncated cores by applying a minimum (i.e. oldest) date (1 AD) to avoid incorporating Mid- and Late-Holocene climatic variability into our baseline calculations (see Wilmshurst et al., 2007). We use the dates of 1280 AD and 1840 AD as the beginning of the phases of settlement from Polynesia and Europe, respectively, and we term the period between these two dates the ‘Polynesian period’ and the period after 1840 AD as the ‘European period’ which is named to represent the additional disturbance that accompanied later European settlement of New Zealand. ‘Pre-human’ refers to prior to Polynesian period. Other site-specific anthropogenic disturbances are provided in Supplemental Appendix S2.
Overview of analytical decision making
The choice of analysis method will be influenced by the data properties and research questions. A flowchart of key choices and resulting suggested methods is presented in Figure 2. Here, we give an overview of these choices in more detail, to complement the flowchart, before discussing their implementation.

A flow chart to guide the choice of analysis method. Choose the question at the top that best applies to your research question. Diamonds indicate choices; ellipses are the resulting methods. Conventional (NMDS) and Bayesian ordination choices depend on the suitability of your data and both may need to be evaluated (see text for details).
In all cases, the temporal resolution of a core and years captured between sampling intervals needs to be considered relative to the questions being asked. Morris et al. (2015) note that the ability to detect a disturbance may be limited where it is of short duration, or the sensitivity of the ecosystem to the event is low. Cycles of natural variability may only be captured if sampling resolution is sufficient to capture them, for example, high resolution analysis of climate-fuel-fire cycles in the Northern Great Plains revealed an approximate periodicity of 160 years (Brown et al., 2005), and hence sampling would need to capture intervals less than that. Furthermore, as with any statistical test, whether a time period is considered to be significantly different to another will be influenced by sample size. As such, quantitative analyses may require more intensive sampling than qualitative analyses. We hesitate to provide ‘minima’ as these will depend heavily on the nature of the question, and characteristics of the system studied (above).
A baseline will ideally incorporate natural variability, but there will be situations where it is inappropriate to do so because a pollen record cannot be divided into samples before and after a known event. In these instances, the baseline will need to be derived from one sample. One method is to calculate distance from an initial sample using a distance metric, and multiple distance metrics are available (see Legendre and De Cáceres, 2013; Legendre and Legendre, 2012). This method compares each sample to the initial sample; increasing distance means samples are becoming increasingly dissimilar to the initial one. An example of distance from initial sample being used in an ecological time-series is Magurran et al. (2015), who used Jaccard distance to reflect changes in beta diversity over time, relative to the initial year, in North Atlantic groundfish assemblages, finding a change over time had occurred (and had led to spatial homogenisation). A second method that does not include variability in a baseline, nor requires a known event is principal response curves (PrC), which is a form of multivariate analysis – derived from redundancy analyses – that reduces high dimensional data to a one-dimensional curve (De’ath, 1999; Van Den Brink and Braak, 1999). PrC is well-suited to data for which there is a strong underlying gradient. There has been a recent uptake of PrC in the palaeoecological literature (e.g. Beck et al., 2018; Bennion et al., 2015; Herzschuh et al., 2016) because PrC analyses are useful for quantifying periods of rapid change and identifying whether a site is returning to its original composition (Auber et al., 2017).
Baselines incorporating multiple samples will be feasible where the timing of disturbance (e.g. human settlement; volcanic eruption) is known a priori. In our case, we use 1280 AD as the first anthropogenic disturbance date. This method uses ordination distance, and therefore the next choice is then between conventional or Bayesian ordination methods. If the data exhibit a strong mean-variance relationship (Supplemental Figure S2.1), there is a risk that distances in variance will confound differences in compositional location (i.e. confound compositional change) (Warton et al., 2012). As such, where a strong mean-variance relationship exists, Bayesian methods with an appropriate error structure should, at minimum, be compared to conventional methods, with further exploration if the results are qualitatively different.
A further choice is whether to calculate distance from baseline as a function of the distance from the centroid, or distance from an ellipse incorporating the variability around the reference level data. The latter option has two benefits: (a) that variability in the baseline state is accounted for (i.e. less variable baselines will have a smaller ellipse) and (b) distance from baseline can reach zero once samples fall within the ellipse. A distance of zero from the ‘centroid’ may be impossible to meet in practice as a centroid could represent the average of distinct communities. Conversely, a centroid-type method might be favoured when alternate baseline conditions are identified in the baseline samples. In this case ellipses around each baseline might overlap one another, and thus a sample could have a distance of zero from both ellipses; it will be more informative to test which centroid a sample is moving towards. Multiple baselines defined a priori could be calculated using the methods set out in this paper (although not demonstrated), for an example of calculation of multiple historical baselines not defined a priori, see Woodbridge et al. (2018).
Statistical tests of change from baseline, following baseline calculation
Tests for the statistical significance of changes over time (quantified using the methods above) include simple linear models (as in Magurran et al., 2015), or generalised additive models (GAMs), which are extensions of generalised linear models. Linear models allow testing of whether some measure of ecological distance is increasing, decreasing, or remaining static for a certain time period. Linear models can also test whether the rate of change is different between time periods (e.g. ‘did the rate of ecological change in New Zealand increase after European settlement?’). Methods such as generalised least squared models can be applied if significant temporal autocorrelation is identified (Beguería and Pueyo, 2009), and an example of this is provided in the technical supplement (S1).
GAMs are an alternative to linear models, suited to non-linear data, that also allow for tests of the first derivative (i.e. the slope) of the fitted relationship, to determine if and where the slope deviates from zero. Temporal autocorrelation can be accounted for using a mixed model GAM (GAMM). We fit the more complicated GAMM for demonstration purposes in this paper with a term accounting for temporal autocorrelation, but primarily use the term ‘GAM’ throughout this text unless context requires otherwise, such as in the methods section. For a full treatment of GAMs and palaeoecological data, we refer the reader to Simpson (2018).
Detailed methodology as it relates to case studies
Distance from initial sample
We calculated compositional distance from a defined initial sample separately for each site by first transforming pollen count data (to relative abundance and then log-transforming) using the R package vegan (Oksanen et al., 2019), and then using the Jaccard distance option and the function calculateDistanceStart from the baselines package (Burge, 2022). We fit GAMMs (above) using cubic regression splines to each time series of distances and used diagnostic plots to amend k (the dimension of the basis for the smooth term) which influences the ‘wiggliness’ of the smooth function. We identified periods of rapid change by calculating for which time periods (if any) the confidence interval around the first derivative of the fitted GAMM did not include zero using R package gratia (Simpson, 2020). We excluded the distance of the first observation to itself (which will always be zero, by definition).
Principal response curves
We applied a Hellinger-transformation to pollen count data and modelled these using the ‘prcurve’ function in the analogue package in R (Simpson, 2007; Simpson and Oksanen, 2016). We then fit GAMMs as described above, except that the poor fit of GAMM models for Rotonuiaha and Tutira prompted us to use adaptive splines with a GAM (see S1 for worked example), which are recommended for data with abrupt changes, but cannot yet be used in the GAMM framework (Simpson, 2018).
Distance from baseline incorporating multiple samples: conventional (NMDS) ordination
We calculated compositional distance from baseline (see Technical Supplement S1, and Figure 3 for a visual representation of the methods) separately for each site by first transforming (to relative abundance and then log-transforming – same transformation as for distance from initial sample) and then undertaking a convention ordination of the pollen count data (Jaccard distance, two dimensions) using an NMDS calculated with the R package vegan (Oksanen et al., 2019).

Comparison of conventional and Bayesian ordination methods for Rotonuiaha and subsequent distance from ellipse. Example of calculation of 95% ellipse (purple circle), and centroid (pink X in centre of ellipse) around pre-human period for Rotonuiaha using (a) a conventional NMDS ordination method and (b) a Bayesian ordination method, which allow for a positive mean-variance relationship to be specified. Subsequent distances from baseline, defined as either distance from centroid (pink) or minimum distance from ellipse (purple) for (c) conventional ordination methods and (d) Bayesian ordination methods. Panels (c and d) show ‘raw’ distance from ordination; in the workflow we subsequently apply statistical models to these distances, for each site.
Statistical tests of distance from baseline are demonstrated using a linear model. We compared the distance from baseline for two time periods – post-Polynesian settlement and post-European settlement. We fit a model with distance as a function of time (years AD) and period, and an interaction between the two. We then used model selection to define the favoured model according to the Akaike’s Information Criterion (AIC). We therefore test whether distance increases over time, or by period, and whether the rate of distance change also differs by period. We report for each site which terms were included in the favoured model.
Distance from baseline incorporating multiple samples: Bayesian ordination
Distance to a reference baseline can also be calculated from data ordinated using Bayesian methods which allow for an error family to be specified, allowing for a positive mean-variance relationship to be specified (commonly found with count data, as an example). As expected with count data (refer introduction), we found a positive mean-variance relationship across our datasets (Supplemental Figure S2.1), thus we expected the results to differ between the Bayesian and conventional methods. We calculate ordination distance based on a latent variable model using package boral in R (Hui, 2016) and extracted the scores. We calculated a bivariate ellipse (standard deviation; 95% confidence interval) around the pre-human baseline points and then calculated distance from centroid and distance from ellipse and conducted statistical tests as above. Note this method provides the same results as when the function ordiellipse from the R package vegan is used and where function arguments ‘kind = “sd”’ and ‘conf = 0.95’ are specified, and the ‘show.groups’ argument is set to the same as the ‘reflev’ argument.
Results
Distance from initial sample
When looking at change from the initial sample (in our case, the first sample after 1 AD), we found that three sites (Waitutu, Eweburn, and Campbell Island) had relatively uniform rates of change across both pre- and post-human settlement periods (Figure 4a; full statistical results in Supplemental Table S2.1), and as such appeared amenable for analysis with a linear model instead of a GAM. All other sites exhibited variable rates of change in distance-from-baseline that is, non-linear change, indicated by periods of a relatively flat slope where the derivative did not differ significantly from zero, and other periods of rapid change, where it did (Figure 4a: periods of rapid change are indicated using a thicker line). Derivatives from the GAMM indicated that change at Poor Knights and Rotonuiaha pre-dated human arrival, although we note that in some cases, such as Poor Knights, the modelled slope and derivatives thereof change before the raw data does. For example, at Poor Knights, the modelled slope change is shown to pre-date human arrival, but the raw data indicates a change between the pre- and post-human arrival time periods. As ever, care is needed by the analyst in interpreting modelled outputs; more intensive sampling would have likely reduced extent of this effect. We recommend plotting raw data alongside modelled outputs as we have done in Figure 4. Tutira showed little change in the pre-human period, averaging around 0.4 units of dissimilarity through the period (Figure 4a), followed by rapid change during the Polynesian period which then plateaued (indicating no change in the dissimilarity over time). Enderby Island was the most varied site with change-from-baseline increasing until the Polynesian period, decreasing (i.e. becoming more similar to baseline) during the Polynesian period, and exhibiting rapid change in the European period.

(a) Ecological distance from initial sample (Jaccard dissimilarity). In our case studies the initial sample is the first sample after 1 AD. (b) Changes in principal curve scores, both derived from pollen count data and with a GAMM model fit (thin line). Sites are ordered top to bottom by increasing latitude (see Figure 1). The solid thicker line indicates time periods of rapid change; that is, parts of the modelled fit for which the confidence interval of the slope does not include zero. Vertical lines indicate dates of Polynesian and European settlement (dashed), and the Taupō eruption (dotted, only indicated for affected sites). ‘PrC variance expl’. reports the proportion of variance in the data explained by each principal curve.
Principal response curves
Principal curves (PrC) were used to characterise the multivariate pollen data using one axis, rather than two or more as in other ordination methods. We identified periods of change by calculating for which time periods (if any) the confidence interval around the first derivative of the fitted GAM did not include zero. These time periods are shown in Figure 4b with a thicker black line overlain on the modelled line. We found that Waitutu (least disturbed) and Eweburn (next least disturbed) showed change over time but no substantial evidence of increases in rate of change over time. In comparison, the PrC analysis indicated two separate periods of post-human change at Rotonuiaha and Tutira (the volcanic sites) but also highlighted rapid change around the time of the Taupō eruption (Figure 4b) consistent with earlier interpretations (Wilmshurst, 1997; Wilmshurst and McGlone, 1996). Similarly, Poor Knights show two distinct periods of change coinciding with Polynesian and European settlement also consistent with original interpretation (Wilmshurst et al., 2014). A short period of change was highlighted at Campbell Island during the late Polynesian period although no published evidence exists for Polynesian discovery or subsequent Māori settlement of this island (McGlone et al., 2007). Enderby Island showed rapid and significant change only post-European settlement. Variance explained by the principal curves (a metric of goodness of fit for a single axis) ranged from 0.38 at Waitutu to 0.77 at Enderby and is displayed within Figure 4b; this range is consistent with other published principal curve analyses of pollen (Herzschuh et al., 2016; Rodrigues et al., 2016; Shumilovskikh et al., 2018).
Distance from baseline – based on conventional (NMDS) and Bayesian ordination methods
For each site we compared samples from the Polynesian and European periods to two alternative pre-human reference points: a baseline without variability (the centroid of the pre-human period), and a baseline with variability (the 95% ellipse around all pre-human samples). We do not illustrate the baseline without variability further in terms of the case studies. In terms of a baseline incorporating variability, we found that at several sites (e.g. Enderby, Waitutu, Rotonuiaha) the distance from baseline reached zero (see raw data points; Figure 3). Overall, conventional and Bayesian ordination methods yielded the same favoured model for six of the seven sites (Supplemental Table S2.1; favoured model depicted within each panel of Figure 5a and b).

Comparison of ‘best models’ to model change over time from a global model of year, time period, and an interaction between the two, for conventional (NMDS) and Bayesian ordinations. Distance from baseline as calculated using (a) NMDS and (b) Bayesian ordination, and a 95% CI ellipse around the pre-human period for both methods. Sites are ordered top to bottom by increasing latitude. Points are the raw distances; lines indicate the fit of the best model after conducting model selection using AICc. Shading indicates 95% CI. Vertical lines indicate the date of European settlement (dashed) and therefore the beginning of the ‘European period’ referred to in the text. The x-axis begins at 1280, that is, the date of human settlement and the beginning of the Polynesian period referred to in the text. The formula for the most favoured model for each site and for each ordination type is depicted in each panel.
Sites with consistency of outcomes between conventional and Bayesian ordination
We discuss the distance-from-baseline model results in order of complexity. The simplest model was favoured at one of the most ecologically stable and still-forested sites (Waitutu). This model was intercept-alone, indicating no change over time and no marked effect of human settlement (Figure 5, Supplemental Table S2.1). This intercept-alone model was favoured by both conventional and Bayesian ordination methods.
At three sites (Eweburn, Tutira and Poor Knights) the favoured model included a term for year, indicating increasing distance from baseline over time, but no marked change between Polynesian and European periods (Figure 5). This year-alone model was favoured by both conventional and Bayesian ordination methods.
At two sites (Enderby and Campbell), the favoured model included a term for year, period, and an interaction between year and period, indicating that the rate of change over time differed between the Polynesian and European periods (Figure 5). Specifically, there was very little change during the Polynesian period, while there was comparatively rapid change during the European period. This interaction model was favoured by both conventional and Bayesian ordination methods.
At Campbell Island, at the start of the European period, distance from baseline jumped high initially compared to the end of the Polynesian period, but decreased over time (Figure 5). This is consistent with a brief disturbance early in the European period, and subsequent recovery (McGlone et al., 2007).
Sites with inconsistency of outcomes between conventional and Bayesian ordination
At one site (Rotonuiaha) conventional and Bayesian ordination methods led to differing preferred models (Figure 5). At Rotonuiaha, using conventional ordination, the favoured model included a term for year, indicating increasing distance from baseline over time, but no marked change between Polynesian and European periods – as for the favoured model at Tutira, a relatively close site. However, at Rotonuiaha, using Bayesian ordination, the favoured model was more complex and included a term for year, time period, and an interaction between year and time period, indicating that the rate of change over time differed between the Polynesian and European periods. Specifically, the rate of change was quicker during the European period, compared to the Polynesian period (Figure 5b). However, the difference between the ‘favoured’ and second-most-favoured model at Rotonuiaha was small for both conventional and Bayesian methods, and the second-most-favoured model for each method was the most favoured model for the other method (i.e. the second favoured model under the Bayesian method was the year-alone method, which was the most favoured model under the conventional method; data not shown). This suggests that both models are plausible; outside of the case study, further comparisons (e.g. comparison of relevant model metrics) should be considered if this occurs.
Discussion
Overall, we found all methods detected rapid ecological changes where expected and therefore are likely to be appropriate for analysing a range of palaeoecological questions. The key decision to help choose which method to use is whether change is being assessed against a baseline, or whether continuous change over time is of more interest (see flowchart in Figure 2); subsequent decisions apply to the suitability of the data to different methods. Given the potential for count data, such as ours, to have results confounded by shifts in variance, the results were surprisingly consistent between conventional and Bayesian ordination methods (Figure 5). Principal response curves (PrC) detected ecological change caused by the Taupō eruption, while distance from initial sample did not (Figure 4).
Choosing the right method for the question
We emphasise that our methods are designed to apply to a variety of research questions, and not all methods will perform equally well for all questions. The flowchart in Figure 2 sets out a research question-centred approach to addressing ‘which method’ to evaluate first. We have explored some of these methods in depth and others are referenced with relevant examples from the palaeoecological literature. Our flowchart assumes some assessment of change over time is desired, potentially with reference to a baseline of some description. This change over time can be quantified using the methods set out in this paper (PrC, ordination-based distance, distance from initial sample). Once the method of calculating distance from baseline (whether that baseline incorporates a single sample or multiple samples) has been decided, the subsequent statistical analysis must also be tailored to the research question, and the statistical properties of the data. In this regard it is a ‘mix and match’ between distance from baseline methods, and the subsequent statistical analysis. For example, pre-defined time periods will be best assessed with a model that explicitly incorporates those time periods as categories (such as the linear model in our analysis); non-linear change may be best assessed using a non-linear model, such as the GAM (or GAMM) discussed in our analysis; abrupt (non-linear) shifts with linear change in-between may be best assessed using breakpoint analysis (not covered in our methods, but for an example see Nogué et al., 2021). As an example of whether the data suit the model being applied, we used GAMs to effectively smooth the ‘raw’ data derived from distance from initial sample and PrC. While GAMs are useful for non-linear responses, two of our sites (Eweburn and Waitutu) had linear responses in the PrC analysis, which would have made it unnecessary to assess them using a GAM. The same applies to the GAM modelling of distance from initial sample for Campbell Island, Waitutu, and Eweburn. The GAM model highlights a period of rapid change in distance from initial sample at Waitutu but is not particularly informative and would most certainly be better represented as a linear model. As with all statistical methods, researchers will need to incorporate context, such as their knowledge of the ecosystem and processes that affect it, potentially other proxies, and dynamics from similar ecosystems to help interpret whether signals from the techniques suggested are ecologically meaningful and appropriate for use in monitoring or restoration. We refer to the Technical Supplement (S1) which provides steps through analysis of one pollen core in greater detail than this manuscript.
Long-term change: Assessment and comparison of methods
We use one site (Rotonuiaha) to highlight the key advantages and disadvantages of each method. Rotonuiaha is a lake-core that included evidence of pre-human environmental disturbance in the form of the Taupō volcanic eruption; aside from disturbance caused by this eruption, the pre-human vegetation was characterised by tall, diverse forest taxa. Following human settlement, disturbance during the early post-Polynesian period was indicated by rapid declines in pollen from forest taxa, and an increase in pollen and spores from seral taxa such as Pteridium esculentum and Coriaria spp., with charcoal. Later, following local European settlement around 1870 AD, there were increases of pastoral and planted species (e.g. Poaceae, exotic Pinus spp., exotic Salix spp.). This site is also the subject of the worked examples in the Technical Supplement (S1).
Ability to detect rapid change following natural disturbance
Two methods (PrC, and distance from initial sample) could be used to identify natural variability and substantial compositional turnover in the pre-human phase. In contrast, distance from baseline uses the pre-human period to define a ‘baseline’, so any periods of natural variability may merely contribute to a highly ‘variable’ baseline which incorporates much natural ecosystem change. As such, palaeoecologists interested in compositional shifts and rates of change across an entire core, need to apply a method like PrC or distance from initial sample. Based on our results, we suggest PrC as this was the only method that allowed us to identify the changes that occurred in the short time following the Taupō eruption at Rotonuiaha and Tutira. The principal response curve also clearly picked up natural ecosystem variability at Campbell Island prior to 700 AD (Figure 5b, raw data), although this is not considered to be anthropogenic (McGlone et al., 2010) and was not identified by the subsequent modelling using GAMs. Periods of rapid compositional change were reliably captured by GAMS applied to the principal response curve data. Compared to an alternative method such as breakpoint analysis (e.g. Nogué et al., 2021), which is not covered by this paper, GAMs do not assume linearity in-between ‘breaks’, and thus allow rates of change to vary within a period, such as within the pre-human period. GAM modelling highlighted the period of rapid change around the Taupō eruption at Rotonuiaha. Where breakpoints (sudden change points) and linearity of change between the breakpoints is expected, breakpoint analysis would perform best.
Rapid compositional change at Rotonuiaha following the Taupō volcanic eruption was not well-represented by distance from initial sample. Distance from initial sample averaged a dissimilarity value of just over 0.4 for most of the pre-human period (Figure 4a), which was reflected in the fitted GAM. A small peak in the raw distance occurred around the time of the eruption, which was not reflected in the GAM. The lack of a strong signal from the eruption demonstrates the constraint of this method, that subsequent samples may be substantially different to each other, but equidistant from the initial sample. This is one reason we prefer PrC. An alternative to PrC is an assessment of each sample to the previous one (rather than each sample to the initial sample), and then annualised to an annual rate. Annualisation or similar relativisation accounts for uneven temporal sampling (see Beck et al., 2018 for an example; Mottl et al., 2021b for a rate-of change worked methodology).
Distance from initial sample also indicated periods of ‘change’ prior to human settlement and not associated with volcanic eruptions, and which were not identified by other distance metrics. This was particularly the case for Enderby Island. The history of Enderby Island suggests that most change should be detected in the European period, with only minor changes in the Polynesian period (as found using distance from baseline methods and PrC). Distance from initial sample, by its nature, cannot account for baseline ecosystem variability. It has long been recognised that natural variability through time can occur both at large-scales (Landres et al., 1999; Sprugel, 1991), and, much smaller scales (e.g. where the carousel model of succession applies within a site) (Sykes et al., 1994; van der Maarel and Sykes, 1993). Given that distance from initial sample fails to incorporate much of that natural variability, we consider it should be used with caution to determine distance from ‘baseline’. Furthermore, periods of rapid change can and will occur in natural ecosystems and such rapid change is not necessarily a ‘false positive’ where it is indicated by a method.
Ability to detect rapid and cumulative anthropogenic change
There was an established period of rapid change in New Zealand after Polynesian settlement including deforestation in dry, lowland regions, followed by ongoing impacts associated with European settlement, including further clearance and farming activities. We highlight here that distance from initial sample and PrC were analysed using GAMs, which are well-suited to non-linear responses and for analysing rates of change over time, whereas the distances from baseline (using either conventional NMDS or Bayesian ordination methods) were analysed with a linear regression method designed to test for overall differences in trends between the post-Polynesian and post-European time periods. Both distance from initial sample and PrC analyses picked up the rapid change at Rotonuiaha following Polynesian settlement. Both distance from baseline analyses for the post-Polynesian period indicated there was change over time; the ‘raw’ data from the Bayesian analysis also indicated the shift after the first two samples, which fell within the pre-human baseline.
Cumulative but slow change may not be picked up by certain methods. For example, at Rotonuiaha, all methods indicated that composition during the European period was substantially different to the pre-human period. However, for distance from initial sample, the post-European period was not identified as a period of rapid change. It was already at a high level of compositional dissimilarity to the initial sample baseline (dissimilarity > 0.7; Figure 4a) and potentially reaching saturation (maximal possible dissimilarity). PrC identified the post-European period as being one of rapid and high cumulative change, consistent with the intensive clearance of the remaining bracken shrubland from the catchment and establishment of a pastoral landscape for sheep farming (Wilmshurst, 1997). Distance from baseline indicated slightly different interpretations of change – the NMDS method indicated a uniform rate of change between the post-Polynesian and post-European time – effectively a uniform increase over time in distance from baseline, whereas the Bayesian method indicated an increase in the rate of change in distance from baseline over time.
Difference between conventional and Bayesian ordination techniques
The method of testing distance from ellipse meant we were able to account for variability within our pre-human baseline. One question we assessed is the effect of using different ordination techniques to calculate ordinated distance from baseline. Given the positive mean-variance relationship in our data, the results were surprisingly consistent between conventional and Bayesian ordination methods. The ‘best model’ was the same under both methods for six of the seven sites. If we were analysing this in a ‘real world’ setting, we would have undertaken extra analyses to examine how and why the difference arose (see discussion in Warton and Hui, 2017). In addition to accounting for the positive mean-variance relationship in the data, the other advantages of the Bayesian method are that it does not require transformation from the original count data, and a random effect can be included to account for differing total counts for each replicate (Hui, 2016). However, the Bayesian method is computationally slower – although not intractably slow – and does not have built-in support for ellipses and other analyses that other packages in R provide. We have provided some basic functionality to calculate ellipses with Bayesian ordination data in our baselines (Burge, 2022) package.
Has restoration been successful? Baselines and multiple states
In some cases, the techniques demonstrated in this paper may be used to assess whether an ecosystem has shifted towards a desired state. In such cases, finding zero distance from a baseline incorporating variability may be helpful for answering whether restoration has been successful (Chapman and Underwood, 2010; Hobbs and Norton, 1996; Ruiz-Jaen and Mitchell Aide, 2005). Zero distance will occur most readily when distance from ellipse is calculated. This occurred at Campbell Island towards the end of the European period for several data points, for both conventional and Bayesian ordination, consistent with a brief, early disturbance at the beginning of the European period, and subsequent recovery (Figure 5a and b).
We have demonstrated the methods to calculate variability in a baseline state. However, multiple states can occur through time, and exist outside of degraded systems: examples of several ‘natural’ but quite different hydro-ecological states at Playa Lakes (Australia) have been found in the palaeoecological record (e.g. Gell et al., 2016). In this case, calculating one centroid will be inappropriate, as it will test whether a modern site is returning to some halfway point between the two states. Calculating the ellipse in this case is also problematic, as it will likely encompass extremely high variability. We suggest the best approach is to undertake classification of palynological data at an appropriate scale (e.g. local, regional, national or perhaps larger). It will then be possible to calculate which vegetation associations baseline data can be classified into, and whether modern samples fall within the same associations, other associations, or what might be termed ‘novel ecosystems’ (De Cáceres et al., 2015; Willis et al., 2010). Alternatively, if it is clear how many alternative baseline states there are, one might calculate distance from each state (beyond the scope of this paper, but this would be a natural extension of our methods). Palaeoecology might also be combined with other knowledge systems to define baselines. For example, Lyver et al. (2015) used palaeoecology to define multiple baseline states for four sites in New Zealand. These states were subsequently combined with indigenous knowledge to develop a biocultural restoration paradigm.
Application of the methods described in this paper may reveal that a novel ecosystem has developed at a site (Hobbs et al., 2009). Such a finding may prompt a reconsideration of using historical ecosystem states as ‘baseline’ or restoration goals for a site; the return of a novel ecosystem to a historical state may be intractable in certain circumstances (Higgs et al., 2014; Jackson and Hobbs, 2009). The methods outlined in this paper can identify such systems that appear ‘native’ or ‘natural’ but which lack historical analogues, as was the case at Poor Knights (Wilmshurst et al., 2014).
Conclusions
We present a set of selected methods for analysing multivariate data to provide baselines and to assess change over time. Although the methods presented in this manuscript focus on palynological data, they could be used for any other commonly used long-term proxy records of biodiversity change (such as diatoms, chironomids, testate amoebae, bone assemblages, sedimentary DNA etc). However, palynological data is currently the most commonly available and widely used data type suited for assessing restoration against baselines (Clarke and Lynch, 2016; McCarroll et al., 2016; Wiik et al., 2015).
We demonstrate that ordination-based distance from baseline can incorporate natural variability in a baseline, such that samples may reach ‘zero distance’ when similar to baseline states, and that generalised additive models can capture non-linear and rapid ecological change. Moreover, we show that distance-from-initial-sample is probably less useful, particularly where more recent and transformative change saturates, but might be used where a baseline of multiple samples is not possible. Our results demonstrate where particular methods will be more suited, with the hope that this may encourage quantitative analysis of ecosystem change seen in palaeodata applied to restoration ecology.
Supplemental Material
sj-docx-2-hol-10.1177_09596836231169986 – Supplemental material for A guide to assess distance from ecological baselines and change over time in palaeoecological records
Supplemental material, sj-docx-2-hol-10.1177_09596836231169986 for A guide to assess distance from ecological baselines and change over time in palaeoecological records by Olivia R Burge, Sarah J Richardson, Jamie R Wood and Janet M Wilmshurst in The Holocene
Supplemental Material
sj-pdf-1-hol-10.1177_09596836231169986 – Supplemental material for A guide to assess distance from ecological baselines and change over time in palaeoecological records
Supplemental material, sj-pdf-1-hol-10.1177_09596836231169986 for A guide to assess distance from ecological baselines and change over time in palaeoecological records by Olivia R Burge, Sarah J Richardson, Jamie R Wood and Janet M Wilmshurst in The Holocene
Footnotes
Author contributions
Olivia R Burge: Conceptualisation; Methodology; Software; Formal Analysis; Data Curation; Writing – Original Draft; Visualisation. Sarah J Richardson: Conceptualisation; Methodology; Data Curation; Funding acquisition; Writing – Review and Editing; Supervision. Jamie R Wood: Conceptualisation; Methodology; Data Curation; Funding acquisition; Writing – Review and Editing; Project administration. Janet M Wilmshurst: Conceptualisation; Methodology; Data Curation; Funding acquisition; Writing – Review and Editing; Supervision; Project administration.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the New Zealand Ministry of Business, Innovation and Employment Strategic Science Investment Funding to Crown Research Institutes, and a Smart Ideas grant [C09X1616] funded by New Zealand Ministry of Business, Innovation and Employment. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
