MARCO POLO

Abstract

Hitherto, the Landscape Reconstruction Algorithm (LRA) has been the only truly quantitative approach to stand-scale palynology. However, the LRA requires information on pollen productivity and dispersal, which is not always available. The alternative approach MARCO POLO (MAnipulating pollen sums to ReCOnstruct POllen of Local Origin) presented here is solely based on pollen values and does not rely on a pollen dispersal function. In a stepwise fashion, MARCO POLO removes those pollen types from the pollen sum whose values are significantly higher than in a neighbouring large basin. The resulting regional pollen sum is free of the disturbing factor of (extra-)local pollen. Based on this sum, comparison with the pollen record from the large basin allows calculating sharp (extra-)local signals. Treating the (extra-)local pollen portion with representation factors (R-values) then produces a quantitative reconstruction of the stand-scale vegetation composition. We tested MARCO POLO and the LRA on a dataset of pollen surface samples and forest vegetation relevés from northern Central Europe. Both approaches reconstruct the presence or absence of taxa at the stand scale within a small margin of error. Where observed cover was ⩾2%, both models always reconstructed presence, where modelled cover was ⩾2% the taxon was always present. Overall, both approaches perform well in reconstructing the cover of taxa within a 100-m radius. In our tests, MARCO POLO is slightly better at reconstructing cover values for more taxa. Although some model parameters evidently need revision, the simple correlative approach of MARCO POLO appears to perform at least as well as the complex LRA model.

Keywords

forest community local pollen deposition pollen correction factors pollen productivity estimates quantitative vegetation reconstruction surface samples

Introduction

The reconstruction of forest cover and composition has always played a central role in Holocene palaeoecology. Pollen-based quantitative reconstructions of past forest composition commonly use records from large basins focussing on vegetation development in the wider surrounding landscape. The changes on that large scale do not necessarily represent the actual changes in forest communities at the stand scale that result from processes such as competition, succession and disturbance. Actuo-ecological studies at the community level can help interpret the palaeo-record, but they typically cover time periods of a few years to several decades only. Using (sub)fossil records, palaeoecology may achieve a similarly high temporal resolution, but extend the time scale to thousands of years (or more).

Fossil pollen provides particularly abundant records of past vegetation, but pollen is transported over long distances and typically reflects (past) vegetation at much larger spatial scales than of interest to community-level ecology. Still, most of the pollen is deposited close to its source (Janssen, 1966; Tauber, 1965) and different sized basins record pollen from different sources. Whereas large lakes mainly record pollen from the wider surroundings, pollen from close by dominates in records from terrestrial soils, small ponds or peatlands and mineral soil to peat transitions (regional vs. (extra-)local pollen sensu Janssen, 1966; Jacobson and Bradshaw, 1981). Small basins, such as forest depressions, are therefore well suited to reconstruct the changes in vegetation patterns at the stand scale.

However, in small basins there is still a significant regional component to the pollen load. Hence, one challenge of so-called stand-scale palynology (Bradshaw, 2013) is to distinguish between pollen from close by and farther away (Jacobson and Bradshaw, 1981; Oldfield, 1970). Attempts have been made to identify pollen of (extra-)local origin by comparing pollen data between sites, using approaches of varying degree of complexity (Bradshaw, 1981; Heide, 1984; Jacobson, 1979; Sugita et al., 2006), but the results remained semi-quantitative only. The modern analogue approach of Davis et al. (1994, 1998) has produced convincing results, but can only reconstruct communities that have modern analogues, which is rarely the case in Central Europe. A true quantitative approach to stand-scale reconstructions has been proposed with the Landscape Reconstruction Algorithm (LRA; Overballe-Petersen et al., 2013; Sugita, 2007a, 2007b; Sugita et al., 2010). The LRA approach requires reliable information on pollen productivity and dispersal, which is not available in all cases (Theuerkauf et al., 2013, 2015).

Here, we present MARCO POLO (MAnipulating pollen sums to ReCOnstruct POllen of Local Origin) as an alternative approach to reconstruct stand-scale vegetation that does not rely on a dispersal function. MARCO POLO elaborates an adjusted regional pollen sum that is free of disturbing (extra-)local signals (cf. Janssen, 1959) and that can be used to reconstruct stand-scale vegetation more reliably. A previous version of the model was introduced by Spangenberg (2008). In a first step, the model detects the (extra-)local components of the pollen assemblages from a small site (reconstruction of presence/absence). In a second step, the (extra-)local pollen portion is treated with representation factors (R-values; Davis, 1963) to reconstruct stand-scale vegetation composition also quantitatively. For lack of appropriate R-values for other growth forms, the latter step is thus far restricted to tree and shrub taxa. We test MARCO POLO using a dataset of pollen surface samples and forest vegetation relevés from northern Central Europe and compare our model results with reconstructions obtained by the LRA.

Materials and methods

Study sites

We studied 11 forest plots and 3 reference sites located in the federal state of Mecklenburg-Vorpommern in NE Germany (Tables 1 and 2). The lowland area is flat to slightly hilly and characterised by morainic plains, terminal moraines and outwash plains. The climate is temperate with a mean annual air temperature of 8.8°C and mean annual precipitation of 620 mm (1983–2013, German Meteorological Service DWD). The forest plots were selected to cover the variety of forest types found in the area (Table 1). Plots were placed at a distance of 150–650 m from the nearest forest edge. At each plot, tree and shrub cover was recorded and surface samples were collected for pollen analysis. In addition, we collected surface samples from three large basins close to the sampled forest plots to assess regional pollen deposition (Table 2).

Table 1.

Forest sites.

Sample/plot	Study site	Latitude (N)	Longitude (E)	Sample type	Site type	Distance to reference site (km)	Arboreal vegetation
CON17	Conower Werder	53°17′45.88″	13°28′26.29″	Top of core	Small peatland	0.8	Fagus stand
DAMM	Dammbruch	54°08′01.14″	13°16′10.65″	Moss	Forest floor	4.7	Alnus Fraxinus Quercus stand
ELD1	Eldena near Greifswald	54°04′52.6″	13°26′49.3″	Moss	Forest floor	8.7	Quercus Fagus Acer stand
ELD2	Eldena near Greifswald	54°04′16.1″	13°27′56.5″	Moss	Forest floor	10.3	Fagus stand with Fraxinus and Acer
EZA	Eldena near Greifswald	54°04′37.93″	13°28′25.20″	Top of core	Small peatland	10.3	Fagus stand with Quercus and Carpinus
GNOI1	Northeast of Gnoien	54°00′02.43″	12°40′03.18″	Moss	Forest floor	6.4	Fagus stand
GNOI2	Northeast of Gnoien	53°59′34.46″	12°40′10.81″	Moss	Forest floor	7.1	Fagus stand with Quercus
STEF1	Steffenshäger Wald	54°07′26.40″	13°17′44.25″	Moss	Forest floor	3.0	Picea Fagus stand
STEF2	Steffenshäger Wald	54°06′26.9″	13°18′13.2″	Moss	Forest floor	3.4	Fraxinus Corylus Quercus stand with Alnus
WEND1	Wendorfer Holz	54°09′15.0″	13°14′35.3″	Moss	Forest floor	7.0	Pinus Betula stand
WEND2	Wendorfer Holz	54°09′32.7″	13°14′43.9″	Moss	Forest floor	7.1	Picea stand surrounded by Pinus

Table 2.

Reference sites.

Sample	Study site	Latitude	Longitude	Site type	Reference for
CAR	Carwitzer See	53°18′13.50″N	13°28′24.19″E	Lake 175 ha	CON17
KIES	Kieshofer Moor	54°07′43.24″N	13°20′27.88″E	Peatland 32 ha	DAMM, ELD1, ELD2, EZA, STEF1, STEF2, WEND1, WEND2
STAS	Stassower See	54°02′4.20″N	12°35′15.17″E	Lake 7 ha	GNOI1, GNOI2

Pollen analysis

Surface samples for pollen analysis were collected in the centre of each forest plot. In nine of the plots, mosses were sampled at the centre and corners of a 1-m² area and mixed. The remaining two samples were taken from the surface of small peatlands (40 × 80 m for CON17 and 10 × 30 m for EZA; Table 1). In the ‘Kieshofer Moor’, a large open peatland, the upper 5 cm of Sphagnum moss was sampled, and in the lakes ‘Carwitzer See’ and ‘Stassower See’ the upper 3 cm of sediment was sampled using a gravity corer (Uwitec, Austria).

Pollen sample preparation (cf. Fægri and Iversen, 1989) included treatment with HCl, 10% KOH, sieving (120 µm) and acetolysis (7 min); samples rich in silicates were additionally treated with HF. Samples were mounted in silicone oil and counted with 400× magnification. In order to differentiate clearly between plant taxa and pollen types, the latter are displayed in Small Capitals (Table 3; Joosten and De Klerk, 2002). Pollen nomenclature follows Moore et al. (1991).

Table 3.

Taxon names (in regular font), species observed in the relevés (in italics) and associated pollen types (following Moore et al., 1991; in small capitals). Taxon names are used to refer to both observed vegetation and pollen-based reconstructions.

Taxon	Observed species	Pollen type
Acer	Acer platanoides, A. pseudoplatanus	Acer campestre type
Alnus	Alnus glutinosa	Alnus
Betula	Betula pendula, Betula sp.	Betula
Carpinus	Carpinus betulus	Carpinus type
Corylus	Corylus avellana	Corylus
Fagus	Fagus sylvatica	Fagus
Fraxinus	Fraxinus excelsior	Fraxinus
Picea	Picea abies	Picea
Pinus	Pinus sylvestris	Pinus
Quercus	Quercus robur	Quercus
Arboreal Rosaceae	Crataegus sp., Prunus avium, P. padus, P. serotina, P. spinosa, Sorbus aucuparia	Rosaceae non-operculate
Salix	Salix sp.	Salix
Tilia	Tilia cordata	Tilia
Ulmus	Ulmus glabra	Ulmus glabra type

Tree and shrub cover

Forest vegetation relevés were made by estimating the crown cover of trees and shrubs of reproductive age (cf. Lang, 1994) in circular field plots of 100 m radius. Plots were subdivided into four 25-m-wide rings that were visually assessed in subsections. The obtained cover estimates were merged and normalised to a total of 100%. Taxon cover was estimated in steps of 5% – below 5% in steps of 1%. Cover below 1% was recorded by ‘+’ and ‘r’ (cf. Braun-Blanquet, 1964); for numerical analysis, these estimates were transformed to 0.5% and 0.15%, respectively.

Cover estimates were cross-checked using aerial orthophotos (GeoPortal.MV). Where reconstructed presence was not supported by the relevé, we used forest inventory data and additional field observations to check whether corresponding taxa were present within 2000 m distance from the sampling point. Whereas elements of the actual vegetation are written in Italics, reconstructed taxa are written in regular font (Table 3).

The MARCO POLO approach starts from the common assumption that regional pollen deposition (as recorded in large basins: our reference sites, Table 3) represents the vegetation composition of a larger area (Janssen, 1966). Pollen deposition in small basins (our forest sites, Table 2) deviates from this regional record, because it contains a large proportion of local and extra-local pollen (sensu Janssen, 1966), i.e. pollen produced by vegetation in and directly surrounding the basin. Like LRA (Sugita, 2007a, 2007b), MARCO POLO uses this difference to reconstruct vegetation composition at the stand scale (Spangenberg, 2008; cf. Jacobson and Bradshaw, 1981). The approach is a logical extension of the work of Andersen (1967, 1970), who found that pollen values increase linearly with local tree cover starting from a non-zero background value. This background pollen he identified as originating from the wider surroundings, i.e. from regional sources (sensu Janssen, 1966).

First step: Presence/absence analysis

In MARCO POLO, a ‘local’ pollen record from a small basin (L-record) and a contemporaneous ‘regional’ record from a neighbouring large basin (R-record) are used. For both records, percentage values are initially calculated based on a pollen sum that includes all pollen types of interest, using the same set of pollen types for both records. For each pollen value, the 1 sigma uncertainty range is calculated following Mosimann (1965). Pollen types for which the percentage value minus 1 sigma in the L-record is larger than its value plus 1 sigma in the R-record are considered to have an (extra-)local component in the L-record. These types are subsequently removed from the pollen sum of both the L- and the R-record, percentage values and uncertainty ranges are recalculated, and again the pollen types with (extra-)local values in the L-record are removed (Figure 1). This procedure is repeated until no further pollen types can be identified to have an (extra-)local component in the L-record. All pollen types still left in the pollen sum are considered regional (sensu Janssen, 1966; i.e. ‘exotic’ sensu Andersen, 1970) and yield an adjusted regional pollen sum that is free of disturbing (extra-)local factors (Janssen, 1959).

Figure 1.

MARCO POLO explained using data from site ELD2. Step 1 (top) identifies taxa with an (extra-)local component in a ‘local’ L-record through comparison with a ‘regional’ R-record. In the first round, percentage values are calculated using a pollen sum that includes all types. Those pollen types for which the percentage value minus 1 sigma in the L-record is larger than the pollen value plus 1 sigma in the R-record (marked *) are removed from the pollen sum (in grey font). Percentage values of both the L- and R-record are recalculated based on this adjusted sum and further types may be removed in subsequent rounds until no types with an (extra-)local component are left. Step 2 (bottom) uses the percentage values resulting from Step 1 and calculates the difference between the L- and the R-record for each taxon with an (extra-)local component (now in black font). The result is a value for the (extra-)local component in the L-record. These values are multiplied by R-values and normalised to 100% cover.

After this identification of (extra-)local components, the pollen values calculated using this adjusted regional pollen sum serve as the basis for quantitative reconstruction of the (extra-)local vegetation in the second step of MARCO POLO.

Second step: Correction for differences in representation

Once pollen types with an (extra-)local component have been identified, the relative cover of the associated taxa is determined. At this stage, all pollen values are based on the adjusted regional sum. The pollen values of types with an (extra-)local component in the L-record are not free of a regional component; this regional component is removed by subtracting the pollen values in the R-record from the corresponding values in the L-record. The resulting values, which represent the (extra-)local components contained in the L-record only, are finally corrected using representation factors (R-values). To that end, each value is multiplied by the representation factor of the corresponding taxon. The resulting values are normalised to a total of 100%. The result is an estimate of the relative cover of each taxon in close vicinity to the sampling point.

We tested MARCO POLO by combining each of the 11 forest plots (as L-records) with the closest of the three reference sites (as R-records; Table 2). We used the R-values for arboreal taxa of Andersen (1970; Table 4). This large set of R-values was derived from extensive data gathered in a landscape with a similar glacial and species immigration history to our study area and similar physiographical features and climate regime. Moreover, the set includes Fagus, which is a dominant tree taxon in our study area. Thus far, R-values for arboreal Rosaceae and Salix have not been published for Northern and Central Europe. We used an arbitrary R-value of 1:1 to be able to include these taxa in the quantitative reconstruction. These values are close to the 0.8:1 R-value for ‘Others’ (Rosaceae, Ilex, Salix, etc.) of Davis (1963) for a study site in North America. The low pollen values associated with these taxa and the fact that they occur in a limited number of plots only did not allow for an in-depth analysis of the suitability of these values. For the same reasons, the error caused by the arbitrary setting will be small.

Table 4.

Correction factors applied in this study: R-values for MARCO POLO, and PPE and fall speed for LRA.

Taxa	R-value	PPE	Fall speed
Acer	1 × 2	1.27^c	0.056^c
Alnus	1:4	9.07	0.021
Betula	1:4	3.09	0.024
Carpinus	1:3	3.55	0.042
Corylus	1:4^a	1.99	0.025
Fagus	1	2.35	0.057
Fraxinus	1 × 2	1.03	0.022
Picea	1:2	2.62	0.056
Pinus	1:4	6.38	0.031
Quercus	1:4	5.83	0.035
Arboreal Rosaceae	1^b	1^d	0.035^d
Salix	1^b	1.22	0.022
Tilia	1 × 2	0.8	0.032
Ulmus	1:2	1.27	0.032

PPE: pollen productivity estimate.

R-values proposed by Andersen (1970), except ^a by Iversen Andersen (1973) and ^b arbitrarily set to 1. PPEs and fall speeds from Mazier et al. (2012), except ^c from (Hellman et al., 2008) and ^d estimated based on size and shape of pollen.

At plot CON17, Alnus trees (<5 m high) and Betula shrubs grow on the sampled peatland. Their presence in the area is restricted to this small peatland site. They show very high local pollen values that distort the reconstruction of the remaining taxa in both LRA and MARCO POLO. We therefore removed Alnus and Betula from the pollen record of CON17 and repeated the analyses.

LRA modelling

The LRA (Sugita, 2007a, 2007b) consists of two models: The REVEALS model translates pollen deposition from large sites into mean regional vegetation composition. The LOVE model then uses this result to translate pollen deposition from small sites into mean vegetation composition near these small sites. We applied the LRA by running the ‘REVEALS.v4.2.2.Tallinn.wks.exe’ and ‘LOVE.v3.1.7.Tallinn.wks.exe’ programs provided by S. Sugita. For each of the 11 forest plots we ran the LOVE model using REVEALS results from the closest of the three reference sites (Table 2). Besides pollen data, the LRA requires pollen productivity estimates (PPEs). We used the set of mean PPEs (‘PPE.st2’) compiled by Mazier et al. (2012), except for Acer (Table 4). For Acer, so far only two, substantially different PPEs are available from studies in Southern Sweden and Switzerland. We used the PPE from the less distant and physiographically similar Swedish study area.

Statistics

We compared observed and modelled plant cover values using regression analysis. Regression analysis was performed on the total set of data (pooled across all sites and taxa) and for each single taxon (pooled across all sites). Because both the observed and modelled values are subject to error, the ordinary least squares method would underestimate the true slope of the regression (Riggs et al., 1978). We therefore used the geometric mean method to estimate regression parameters (Riggs et al., 1978; Webb III et al., 1981). Regression analysis in all cases showed that the intercept did not differ significantly from 0. As zero cover would moreover logically result in zero (extra-)local pollen deposition, we decided to use regression analysis with zero-intercept. As all values are presented as percentage values, overestimation in one taxon will automatically result in underestimation of all other taxa. We will address this interacting effect of percentage values in the discussion section.

About 40% of plot STEF1 is covered by young Picea trees, which neither of the models was able to depict. Although we had assumed these trees were of reproductive age, they apparently were not. We therefore removed this taxon from the record before quantitative analysis.

Results

Presence/absence

We recorded 20 arboreal taxa in the relevés that are associated with 13 pollen types. These 13 pollen types plus Salix (Salix was not recorded in the relevés) were used in the analysis (Table 3). In total, 154 sets (11 plots × 14 taxa) of one observed and two reconstructed cover values were obtained.

MARCO POLO and the LRA perform similarly in reconstructing local presence/absence of taxa (Table 5). The presence was correctly reconstructed by both models in 56 cases (true-positives); absence in 45 (true-negatives).

Table 5.

Presence reconstructed by the LRA and MARCO POLO models for different taxa. The number of plots (radius 100 and 500 m) in which a taxon is present (N₁₀₀ and N₅₀₀) is compared with the number of plots at which local presence of the taxon was reconstructed by MARCO POLO (N_MP) and the LRA (N_LRA). Reconstructions are divided into true-positives (T) and false-positives (F). Improvement with greater radius is indicated by bold numbers.

Taxon	N₁₀₀	N_MP		N_LRA		N₅₀₀	N_MP		N_LRA
	N₁₀₀	T	F	T	F	N₅₀₀	T	F	T	F
Acer	6	4	–	5	1	6	4	–	5	1
Alnus	4	3	–	3	–	4	3	–	3	–
Betula	7	5	2	5	2	9	7	–	7	–
Carpinus	6	5	2	5	3	7	5	2	6	2
Corylus	5	2	2	3	1	6	3	1	3	1
Fagus	10	9	–	8	–	10	9	–	8	–
Fraxinus	9	7	2	9	2	11	9	–	11	–
Picea	4	4	3	4	6	10	7	–	10	–
Pinus	4	4	2	4	3	6	6	–	6	1
Quercus	11	9	–	9	–	11	9	–	9	–
Arboreal Rosaceae	9	4	1	5	2	9	4	1	5	2
Salix	0	–	3	–	4	3	2	1	2	2
Tilia	1	1	2	1	3	2	2	1	2	2
Ulmus	2	1	–	2	2	2	1	–	2	2
Total	78	58	19	63	29	96	71	6	79	13

In 31 sets, at least one of the models reconstructed presence although the taxon was absent in the relevé (false-positives; Figure 2). In 27 of these 31 cases, the reconstructed values are only small (<3%). Of the 31 false-positives, 16 can be explained by the presence of a corresponding taxon within 500 m of the sample point. In the remaining sets, LRA produces 15 false-positives; MP only 8. In these 15 cases, the reconstructed cover is (well) below 2%. In 22 sets, at least one of the models indicated local absence, although the taxon was recorded in the relevé (false-negative). In all these cases, the cover of the taxon was below 2%.

Figure 2.

Reconstructed cover for cases where the respective taxon is actually absent in the relevé (100 m radius). Numbers denote the closest distance (m) from the sampling point at which the taxon was observed.

We thus end up with 109 sets of observed and corresponding model values with at least one value larger than 0. These sets are used in the subsequent analysis of cover values.

Reconstructed cover

MARCO POLO and the LRA perform well in reconstructing the cover of taxa within a 100-m radius (Figure 3). Regression analysis of the total of the 109 datasets for both models reveals a near-perfect fit between the reconstructed and observed cover (Table 6). The error of the LRA model (RMSE = 8.58%) is slightly higher than that of MARCO POLO (RMSE = 6.44%). Both models tend to overestimate the cover of rare taxa (cover ⩽10%); for higher values, error is more evenly distributed between over- and underestimations (Figure 4).

Figure 3.

Observed and modelled cover for each plot and taxon. For plot codes, see Table 1.

Table 6.

Similarity between observed (Veg.) and modelled (LRA and MP) cover. Similarity is expressed as regression coefficient and coefficient of determination of a geometric mean linear regression and as root mean square error.

	All model results			Model results >2% cover
	Veg. – MP	Veg. – LRA	LRA – MP	Veg. – MP	Veg. – LRA	LRA – MP
Number of pairs	109	109	109	58	49	58
Regression coefficient (slope)	0.96	0.97	0.99	0.96	0.97	0.99
Coefficient of determination (R²)	0.90	0.82	0.95	0.88	0.81	0.94
Root mean square error (RMSE)	6.44	8.58	4.67	9.04	12.35	6.55

MP: MARCO POLO; LRA: Landscape Reconstruction Algorithm; RMSE: root mean square error.

Figure 4.

Error in reconstructed cover in relation to observed cover. Dots in the upper graph denote the difference between reconstructed and observed cover and dashed lines in the lower graph the RMSE. Note the scaling of the x-axis.

The performance of the models differs considerably between taxa (Table 7). Both models show close correlation (R² > 0.65) between the observed and reconstructed cover for Fagus, Picea, Betula, Corylus, Acer and Alnus; MARCO POLO also for Fraxinus and Pinus. However, the cover tends to be overestimated (slope >1.2) in case of Corylus and Acer (in LRA also Betula) and underestimated (slope <0.8) in case of Alnus (in LRA also Pinus). The mismatch is particularly large in the LRA estimates for Corylus and Alnus. Correlation is weak (R² < 0.65) for Quercus, Carpinus and arboreal Rosaceae.

Table 7.

Regression coefficient (slope) and coefficient of determination (R²) of taxon-specific geometric mean linear regression (with zero-intercept) between observed and reconstructed cover for MARCO POLO (MP) and LRA. Bold values indicate slopes close to 1 (0.8 < α < 1.2) and R² > 0.65.

Taxon	Number of pairs		Slope		R ²
	MP	LRA	MP	LRA	MP	LRA
Fagus	10	10	0.90	0.85	0.95	0.93
Picea	6	9	0.97	1.05	0.98	0.97
Fraxinus	11	11	1.10	0.87	0.68	0.39
Pinus	6	7	0.82	0.72	0.78	0.58
Betula	9	9	1.19	1.41	0.99	0.98
Corylus	7	6	1.25	2.37	0.99	0.99
Acer	6	7	1.48	1.32	0.99	0.99
Alnus	4	4	0.78	0.40	0.97	0.82
Quercus	11	11	0.86	0.93	0.56	0.41
Carpinus	8	9	0.53	0.85	0.45	0.44
Arb. Rosaceae	10	11	1.76	2.92	0.27	0.20

MP: MARCO POLO; LRA: Landscape Reconstruction Algorithm.

Discussion

The forest stand reconstructions of the MARCO POLO model are largely consistent with the observed tree cover and similar to the LRA reconstructions. Both MARCO POLO and LRA reliably reconstruct the presence of taxa with more than 2% cover.

MARCO POLO correctly reconstructs local presence in about 75% of the cases (Table 5). LRA performs slightly better at about 80%, but produces more false-positives: in 14% of the cases in which the LRA reconstructs local presence, the taxon is not present within a 500-m radius (<8% for MARCO POLO). However, falsely reconstructed presence is never above 1.3% cover for LRA (n = 12) and never above 0.7% for MARCO POLO (n = 5) (Figure 2).

Both models may fail to reconstruct the presence of rare taxa, resulting in false-negatives. In all these cases, the observed cover of the rare taxon was below 2%. Considering the false-positive thresholds of 1.3% and 0.7%, we propose a conservative 2% threshold: For both models, any reconstructed cover above 2% implies that a taxon is indeed present – if not nearby, then at least within a 500-m radius. A value below 2% may mean a taxon actually is not present locally. In case of reconstructed absence, a taxon may still have been present with a cover below 2%. Besides the 2% threshold, there is no apparent pattern in modelled presence/absence. One might expect that model errors are more common for entomophilous taxa or for taxa with low pollen productivity, but neither seems to be the case.

The 2% error made by the models in reconstructing presence/absence implies that error is equally small in reconstructing near-total dominance (cover close to 100%). In other words, very low and very high values are close to the ideal 1:1 regression between modelled and observed data. In addition, if the cover of a taxon is underestimated at a particular site, the other taxa at that site are overestimated because cover is expressed as percentage and always totals 100%. Thus, any deviation away from the 1:1 regression is accompanied by deviations of the same total size in the other direction. As a result, a regression that uses the total dataset of modelled and observed cover necessarily reveals a near-perfect 1:1 fit and is thus not a good measure for model performance. The general performance of the models should therefore be judged by the variance around the regression (R²) or the deviation from the 1:1 ideal (RMSE). MARCO POLO performs slightly better than the LRA in this respect. When the values below 2% are removed from analysis, R² remains high for both models, but as can be expected when low values with low error are removed, RMSE values increase (by about 40%; Table 6).

The correction factors (R-values and PPEs) used in the models are mean values derived from field studies at multiple sites (Andersen, 1970; Mazier et al., 2012). Pollen load shows considerable variation in relation to stand structure (age, health, shading, productivity, patchiness, etc.) and site characteristics (relief, wind direction, air turbulences, etc.) (Matthias et al., 2012; Theuerkauf et al., 2013). Such variation (largely) averages out over large areas, which means that in quantitative reconstruction of regional vegetation, the use of the averaged correction factors is appropriate. However, it will cause error in reconstructions at the stand scale.

For example, Quercus is underrepresented in the model results of plots ELD1, GNOI2 and STEF2 (Figure 3). In these plots, Quercus trees are old and of poor vitality and presumably produce less pollen than the correction factors suggest. Picea is similarly underrepresented in plot STEF1; here probably because trees are not yet reproductively mature. In the case of plot CON17, Betula and Alnus growing in the small peatland at the centre of the plot provide so much pollen to the sample that the dominance of Fagus in the (much larger) relevé is obscured in the reconstruction (Figure 5). The local presence of Betula and Alnus results not only in a high local component from gravitational pollen deposition but the Betula shrubs at the sample point probably block the trunk space pollen component of the surrounding forest as well (Tauber, 1965). Such reconstruction errors in the percentage cover of one taxon necessarily cause error in other taxa as well, and do so in a non-linear way (the ‘Fagerlind effect’; Fagerlind, 1952; Prentice and Webb III, 1986). This type of error was very prominent in the plots STEF1 and CON17. When we removed the taxa that disturb the picture (Picea; Betula and Alnus) from the respective datasets, the reconstructions for the remaining taxa improved significantly.

Figure 5.

Observed and modelled cover for site CON17. Betula and Alnus are growing locally on this peatland site and disturb the reconstruction when included (a; total RMSE over all reconstructions is 30.26% for LRA and 16.58% for MARCO POLO); results are much better when the disturbing taxa are excluded (b; RMSE 3.64% for LRA and 1.05% for MARCO POLO).

A single correction factor can never capture the specific conditions of each site. This inherent error restricts the predictive capacity of stand-level reconstructions by default. A larger dataset would be needed to assess inherent variation and possibly derive general as well as taxon-specific error estimates. Because of the Fagerlind effect, error will be highest when the observed cover is near 50% (Figure 4). Whereas the inherent error cannot be reduced, bias caused by inappropriate correction factors can.

Close correlation between the reconstructed and observed cover (slope close to 1, high R²) suggests that correction factors are (near-)optimal for several taxa in our study: the R-values for Fagus, Picea, Betula, Fraxinus and Pinus used in MARCO POLO and the PPEs for Fagus and Picea used in the LRA (Table 7). For other taxa, the reconstructions appear to be biased. Both models tend to systematically overestimate the cover of Corylus and Acer – the LRA also of Betula – indicating that R-values are too high and PPEs too low. The opposite applies to Alnus. The remaining taxa show a poor fit between the modelled and observed cover. The observed cover of Carpinus and arboreal Rosaceae is low in all cases (⩽10% and ⩽2%, respectively) and any trend is apparently lost in the noise. Tilia and Ulmus occur with even lower cover and only in very few of the plots. Although higher cover was observed for Quercus (up to 35%), this taxon is known to show high variation in pollen–vegetation relationships, making derivation of a robust correction factor difficult (Andersen, 1970; Theuerkauf et al., 2013).

The R-values used in MARCO POLO are derived from closed forest settings similar to our own (Andersen, 1970). In contrast, the PPEs for the LRA have mostly been estimated using surface samples from (semi-)open settings (including lakes, heathlands and large forest openings; Mazier et al., 2012) to avoid (extra-)local overrepresentation of forest taxa. These PPEs may not be fully applicable to closed forests, because pollen dispersal is different. Whereas the canopy component dominates the pollen load of forest taxa arriving to open settings, the trunk space component is more important in closed forest settings (Tauber, 1965). As a result, representation of taxa forming lower canopy strata in the forest will differ between pollen samples taken inside and outside the forest. The pollen dispersal function that underlies the PPE.st2 data does not capture this difference, which for some taxa may, in part, explain the poor performance of the LRA compared with MARCO POLO. More sophisticated pollen dispersal models (e.g. Lagrangian stochastic models; Kuparinen et al., 2007) that are able to depict pollen dispersal both outside and inside the forest would be needed to produce accurate PPEs that are applicable both at the regional and the (extra-)local scales.

Any reconstruction of local forest cover faces the problem that a few trees close by may contribute as much pollen as many trees at a larger distance (Jacobson and Bradshaw, 1981; Oldfield, 1970). Moreover, different taxa have their distinct pollen production and dispersal characteristics so that delineating a common (extra-)local source area for a site is always deficient (Jackson, 1990; Jacobson and Bradshaw, 1981). Despite these limitations, several studies have shown that pollen data from small forest sites (soils, small ponds, small peatlands) best represent forest composition within 30–150 m radius (Andersen, 1970; Bunting et al., 2005; Calcote, 1995; Sugita, 1994; Sugita et al., 2010). Our results support these findings.

In only a few cases, species are reconstructed to be locally present with a cover larger than the 2% threshold although they are absent within the 100-m radius (Figure 2). At plot GNOI1, for example, the presence of pine in the local reconstruction can be attributed to a large pine plantation beginning about 200 m distance from the sampling point. In WEND2, the presence of Fraxinus at about 300 m distance can hardly explain the high reconstructed cover. Here, the high number of pollen must have been deposited in previous years before large numbers of trees were lost because of ash dieback (cf. Pautasso et al., 2013). Similarly, large pine trees were cut near plot EZA some years before sampling. These few problems aside, we find good correlation between the pollen-based reconstructions and the plant cover within a 100-m radius.

Like the LRA, the application of MARCO POLO to palaeo-data requires robust chronological linkage between the L- and R-records. Once this synchronicity is established, insight can be gained into long-term vegetation dynamics at small spatial scales; using a combination of sites, spatial patterns and shifting mosaics can be detected (Spangenberg, 2008). In this way, records of (extra-)local pollen (L-records) can help understand the fine-scale patterns and processes that underlie large, landscape-level dynamics of, for example, plant migration and land use. L-records can be found in terrestrial soils (including palaeosoils buried by, for example, peatlands, dunes or earthworks), in small ponds or peatlands and near the edges of larger peatlands or lakes.

The first step of MARCO POLO, in which only (extra-)local presence or absence of taxa is assessed, does not require R-values. Close vicinity to the sample point can thus be reconstructed also for the numerous herbs and especially cultivated plants, for which R-values are thus far lacking. Such presence/absence analyses may be particularly worthwhile in reconstructing (extra-)local vegetation cover in archaeological settings, for example, revealing where particular crops were grown or in which rotation.

Conclusion

Both MARCO POLO and the LRA are able to reconstruct the presence or absence of taxa at the stand scale within a small margin of error. Reconstructed cover values below 2% are not reliable and should be interpreted with caution. Statistical reliability, possible contamination or re-deposition, long distance transport and ecological plausibility should be taken into account when drawing conclusions from low values.

MARCO POLO explicitly reconstructs presence/absence in its first step, solely on the basis of pollen values. The LRA is more complex and requires correction factors (PPEs, fall speed) and a dispersal model to deliver results. However, the reconstruction of presence/absence hardly depends on the validity of these model attributes. At this stage, both models produce reliable results and can be applied to reconstruct the species composition of plant communities for which no modern analogue exists.

The full effect of model attributes (R-values in MARCO POLO and PPEs, fall speeds and dispersal model in the LRA) and their validity only becomes noticeable in the taxon-specific performance of the models. Although some model parameters evidently need revision, the present first test shows that the simple correlative approach of MARCO POLO performs at least as well as the complex LRA model. The dispersal model of the LRA (that underlies the PPEs) seems to be the most critical component.

MARCO POLO relies on R-values derived from ‘simple’ linear correlation of pollen and vegetation proportions, which makes it transparent and easy to implement. Studies into (non-distance-weighted) R-values have become virtually non-existent since the introduction of the LRA. Our results show that it is definitely worthwhile to revive research into R-values.

Model performance may be impaired in small peatland sites. If taxa of interest are growing on the site itself (e.g. Alnus, Betula, Calluna, Cyperaceae, Pinus, Poaceae, Salix), this may distort the pollen record. Such local presence may be inferred from macrofossils, from the presence of unripe, not-yet separated pollen (pollen clumps) or from exceptionally high values of the (extra-)local component. When using small peatland sites, any taxon that may have been growing on the peatland should be excluded from the final quantitative reconstruction. However, on the surrounding mineral ground, the same taxa may act as pioneer species or may be indicative for openness or early successional stages – all of interest to ecologists. Additional indicators, such as pollen of co-occurring taxa, non-pollen palynomorphs, charcoal or geochemical parameters, and expert judgement are then required to draw conclusions. In that sense, models are only an aid to reconstruct past stand-scale vegetation. They cannot replace the ecological expertise of the interpreter.

Footnotes

Acknowledgements

AM carried out the field work and pollen analysis and did the MARCO POLO calculations. AM and HJ developed the idea behind the MARCO POLO model, which AM implemented. MT analysed the forest inventory data and did the LRA calculations. AM and JC analysed the model outcomes and wrote the paper, to which MT and HJ contributed. We thank Uwe Gehlhar (Research Institute of the State Forest Authorities of Mecklenburg-Vorpommern) for supporting data collection from sites CON17 and CAR, thus helping us to further develop MARCO POLO. We also thank an anonymous reviewer for helpful comments. We dedicate this paper to the memory of Roel Janssen for his pioneering work and inspirational ideas.

Funding

The work was supported by the Research Institute of the State Forest Authorities of Mecklenburg-Vorpommern, the Hemholtz Association (ICLEA VH-VI-415) and a PhD scholarship of the Heinrich Boell Foundation.

References

Andersen

(1967) Tree-pollen rain in a mixed deciduous forest in South Jutland (Denmark). Review of Palaeobotany and Palynology 3: 267–275.

Andersen

(1970) The relative pollen productivity and pollen representation of North European trees, and correction factors for tree pollen spectra: Determined by surface pollen analyses from forests. Danmarks geologiske undersøgelse Række 2(96): 99.

Andersen

(1973) The differential pollen productivity of trees and its significance for the interpretation of a pollen diagram from a forested region. In: Birks

JHB

West

(eds) Quaternary Plant Ecology. New York; Toronto, ON, Canada: Wiley & Sons, pp. 109–115.

Bradshaw

RHW

(1981) Quantitative reconstruction of local woodland vegetation using pollen analysis from a small basin in Norfolk, England. Journal of Ecology 69(3): 941–955.

Bradshaw

RHW

(2013) Stand-scale palynology. In: Elias

Mock

(eds) Encyclopedia of Quaternary Science. Amsterdam: Elsevier, pp. 847–853.

Braun-Blanquet

(1964) Pflanzensoziologie: Grundzüge der Vegetationskunde. Wien; New York: Springer.

Bunting

Armitage

Binney

. (2005) Estimates of ‘relative pollen productivity’ and ‘relevant source area of pollen’ for major tree taxa in two Norfolk (UK) woodlands. The Holocene 15(3): 459–465.

Calcote

(1995) Pollen source area and pollen productivity: Evidence from forest hollows. Journal of Ecology 83(4): 591–602.

Davis

(1963) On the theory of pollen analysis. American Journal of Science 261: 897–912.

10.

Davis

Calcote

Sugita

. (1998) Patchy invasion and the origin of a hemlock-hardwoods forest mosaic. Ecology 79(8): 2641–2659.

11.

Davis

Sugita

Calcote

. (1994) Historical development of alternate communities in a hemlock- hardwood forest in northern Michigan, USA. In: Edwards

(ed.) Large-Scale Ecology and Conservation Biology. Oxford: The British Ecological Society by Blackwell Scientific Publications, pp. 19–39.

12.

Fægri

Iversen

(1989) Textbook of Pollen Analysis. Chichester: Wiley.

13.

Fagerlind

(1952) The real significance of pollen diagrams. Botaniska Notiser 105: 185–224.

14.

GeoPortal.MV (2014) Digitale Orthophotos (DOP20). Available at: https://www.geoportal-mv.de/land-mv/GeoPortalMV_prod/de/Startseite/index.jsp.

15.

Heide

(1984) Holocene pollen stratigraphy from a lake and small hollow in north-central Wisconsin, USA. Palynology 8(1): 3–19.

16.

Hellman

Gaillard

M-J

Broström

. (2008) Effects of the sampling design and selection of parameter values on pollen-based quantitative reconstructions of regional vegetation: A case study in southern Sweden using the REVEALS model. Vegetation History and Archaeobotany 17(5): 445–459.

17.

Jackson

(1990) Pollen source area and representation in small lakes of the northeastern United States. Review of Palaeobotany and Palynology 63(1–2): 53–76.

18.

Jacobson

(1979) The palaeoecology of White Pine (Pinus strobus) in Minnesota. Journal of Ecology 67: 697–726.

19.

Jacobson

Bradshaw

RHW

(1981) The selection of sites for paleovegetational studies. Quaternary Research 16: 80–96.

20.

Janssen

(1959) Alnus as a disturbing factor in pollen diagrams. Acta Botanica Neerlandica 8(1): 55–58.

21.

Janssen

(1966) Recent pollen spectra from the deciduous and coniferous-deciduous forests of northeastern Minnesota: A study in pollen dispersal. Ecology 47(5): 804–825.

22.

Joosten

De Klerk

(2002) What’s in a name? Some thoughts on pollen classification, identification, and nomenclature in Quaternary palynology. Review of Palaeobotany and Palynology 122(1–2): 29–45.

23.

Kuparinen

Markkanen

Riikonen

. (2007) Modeling air-mediated dispersal of spores, pollen and seeds in forested areas. Ecological Modelling 208: 177–188.

24.

Lang

(1994) Quartäre Vegetationsgeschichte Europas. Jena; Stuttgart; New York: Gustav Fischer Verlag.

25.

Matthias

Nielsen

Giesecke

(2012) Evaluating the effect of flowering age and forest structure on pollen productivity estimates. Vegetation History and Archaeobotany 21(6): 471–484.

26.

Mazier

Gaillard

Kuneš

. (2012) Testing the effect of site selection and parameter setting on REVEALS-model estimates of plant abundance using the Czech Quaternary Palynological Database. Review of Palaeobotany and Palynology 187: 38–49.

27.

Moore

Webb

Collinson

(1991) Pollen Analysis. London: Blackwell.

28.

Mosimann

(1965) Statistical methods for the pollen analyst: Multinomial and negative multinomial techniques. In: Kummel

Raup

(eds) Handbook of Paleontological Techniques. San Francisco, CA: W. H. Freemann, pp. 636–673.

29.

Oldfield

(1970) Some aspects of scale and complexity in pollen-analytically based palaeoecology. Pollen et Spores 12: 163–172.

30.

Overballe-Petersen

Nielsen

Bradshaw

RHW

(2013) Quantitative vegetation reconstruction from pollen analysis and historical inventory data around a Danish small forest hollow. Journal of Vegetation Science 24(4): 755–771.

31.

Pautasso

Aas

Queloz

. (2013) European ash (Fraxinus excelsior) dieback – A conservation biology challenge. Biological Conservation 158: 37–49.

32.

Prentice

Webb

III (1986) Pollen percentages, tree abundances and the Fagerlind effect. Journal of Quaternary Science 1(1): 35–43.

33.

Riggs

Guarnieri

Addelman

(1978) Fitting straight lines when both variables are subject to error. Life Sciences 22: 1305–1360.

34.

Spangenberg

(2008) 2000 Jahre Waldentwicklung auf nährstoff- und basenreichen Standorten im mitteleuropäischen Jungpleistozän – Fallstudie Naturschutzgebiet Eldena (Vorpommern, Deutschland). PhD Thesis, University of Greifswald. Available at: http://ub-ed.ub.uni-greifswald.de/opus/volltexte/2008/535/.

35.

Sugita

(1994) Pollen representation of vegetation in quaternary sediments: Theory and method in patchy vegetation. Journal of Ecology 82(4): 881–897.

36.

Sugita

(2007a) Theory of quantitative reconstruction of vegetation I: Pollen from large sites REVEALS regional vegetation composition. The Holocene 17(2): 229–241.

37.

Sugita

(2007b) Theory of quantitative reconstruction of vegetation II: All you need is LOVE. The Holocene 17(2): 243–257.

38.

Sugita

Parshall

Calcote

(2006) Detecting differences in vegetation among paired sites using pollen records. The Holocene 16(8): 1123–1135.

39.

Sugita

Parshall

Calcote

. (2010) Testing the Landscape Reconstruction Algorithm for spatially explicit reconstruction of vegetation in northern Michigan and Wisconsin. Quaternary Research 74(2): 289–300.

40.

Tauber

(1965) Differential pollen dispersion and the interpretation of pollen diagrams, with a contribution to the interpretation of the elm fall. Danmarks geologiske undersøgelse Række 2(89): 1–69.

41.

Theuerkauf

Dräger

Kienel

. (2015) Effects of changes in land management practices on pollen productivity of open vegetation during the last century derived from varved lake sediments. The Holocene 25(5): 733–744.

42.

Theuerkauf

Kuparinen

Joosten

(2013) Pollen productivity estimates strongly depend on assumed pollen dispersal. The Holocene 23(1): 14–24.

43.

Webb

III Howe

Bradshaw

RHW

. (1981) Estimating plant abundances from pollen percentages: The use of regression analysis. Review of Palaeobotany and Palynology 34: 269–300.