Abstract
This study aims to estimate relative pollen productivity (PPEs) of major pollen types and the relevant source area of pollen (RSAP) in a semi-open landscape in western Norway. Extended R-value (ERV) models are used to analyse a data set consisting of surface pollen assemblages from 34 lakes and vegetation survey around each site. Ordination techniques indicate relatively short gradients (<2.3 SD) both in the vegetation and pollen data sets. The lake sizes vary from 0.4 ha to 19.0 ha (mean=5.7 ha, sd=4.3 ha) but follow a normal distribution – a departure from the assumption of the ERV models that the lake size should be constant among sites. Simulations demonstrate that, if the sizes of circular-shaped lakes follow a normal distribution, the ERV model-based methods provide the expected PPE and RSAP values using a standardized lake radius. If the distribution is highly skewed or equally random around the mean, the results are not reliable. We apply the analytical strategy implied from the simulations to obtain relative PPEs and the RSAP in western Norway. PPEs for ten taxa (Alnus, Fagus, Picea, Pinus, Quercus, Juniperus, Salix, Calluna, Cyperaceae, Rumex) relative to Poaceae are comparable with some of those previously obtained in different parts of Europe, indicating that there are general patterns of high and low pollen producers that will be useful for reconstruction purposes. The RSAP estimate is the area within a radius of 900–1100 m. This study demonstrates the importance of careful evaluation of the extent to which the departures from the model assumptions affect the outcomes from the ERV model-based analysis.
Keywords
Introduction
After the introduction of agriculture c. 6000 years ago in Northern Europe, human activity has transformed the vegetation into open landscapes; fields, pastures, meadows and heathland. There is no simple pattern of change; it varies in space, time and intensity depending on cultural influences and population pressure. Since the birth of the field of palynology (Von Post, 1918), it has been an aim to transform fossil pollen data into quantitative estimates of past vegetation and thereby be able to estimate the degree of, e.g., open landscape through time. The relationship between arboreal pollen (AP) and non-arboreal pollen (NAP) has been commonly used (Andersen and Berglund, 1994; Berglund, 1994; Firbas, 1934) and gives an idea of opening of the landscape, but is affected by differences in pollen production among species and long-distance pollen transport (Favre et al., 2008; Sugita et al., 1999). Other approaches based on modern analogues (e.g. Jackson and Williams, 2004; Overpeck et al., 1985; Tarasov et al., 2007) and the biomization method (e.g. Fyfe et al., 2010; Prentice et al., 1996, 1998) have provided semi-quantitative and typological estimates of past vegetation; results that are appropriate for model–data comparison for past climate and land use changes (Kaplan et al., 2009; Olofsson and Hickler, 2008). Despite those successful applications substantial issues on quantitative reconstruction are still poorly understood. As the spatial scales of vegetation reconstruction become smaller, the non-linear nature of the relationship between pollen percentages and vegetation (Davis, 1963; Fagerlind, 1952) and the influence of long-distance pollen from regional sources (Sugita et al., 1999) make quantitative reconstruction increasingly complicated. However, recent advances in the theory of pollen analysis have paved the way to overcome those issues (Parsons and Prentice, 1981; Prentice, 1985; Sugita, 1994, 2007a,b). Both theoretical (Sugita 2007a,b) and empirical studies (Gaillard et al., 2010; Hellman et al., 2008a, b; Nielsen and Odgaard, 2010; Soepboer et al., 2010; Sugita et al., 2010b) have shown that a new theory-based approach – the Landscape Reconstruction Algorithm (LRA) – is effective and applicable for vegetation reconstruction in various spatial scales.
Two sets of parameters are necessary for successful applications of the LRA approach: pollen productivity estimates (PPEs) of the taxa used for reconstruction and the relevant source area of pollen (RSAP) in the past (Sugita, 1994, 2007b, c). The RSAP is defined as the area, or distance, at and beyond which the correlation between pollen loading and distance-weighted vegetation abundance for all taxa no longer improves (Sugita, 1994). Theoretically, the RSAP is the smallest spatial unit of vegetation that can be detected using pollen assemblages from similarly sized sites in a given region. RSAP is found to vary with basin size/type and with the spatial pattern of vegetation patches in the landscape (Broström et al., 2005; Bunting et al., 2004; Hellman et al., 2009a, b; Nielsen and Sugita, 2005; Sugita, 1994).
The Extended R-value (ERV) models (Parsons and Prentice, 1981; Sugita, 1994) provide PPEs relative to a reference taxon, when vegetation abundances around similarly sized sites are properly distance-weighted to reduce biases caused by the inter-taxonomic differences in pollen dispersal and deposition, and patchy distribution of source plants (Sugita, 1994, 1998, 2007c; Sugita et al., 2010a). Over the last decade, the number of regions and pollen/palynological equivalent plant taxa for which relative PPEs have been estimated has increased in Europe (Broström et al., 2004; Bunting et al., 2005; Hjelle, 1998; Mazier et al., 2008; Räsänen et al., 2007; Soepboer et al., 2007; Sugita et al., 1999; Von Stedingk et al., 2008) and elsewhere (Calcote, 1995; Duffin and Bunting, 2008). However, there are still variations in the relative PPEs obtained for several taxa (Broström et al., 2008). Relative pollen productivity can potentially vary, depending on, e.g., climate (temperature, precipitation), soil, plant succession, anthropogenic impact and plant taxa included in a pollen type. In addition, field survey methods of vegetation abundance, sampling strategies for pollen data, selection of ERV submodels and parameterization for calculating distance-weighted plant abundance, also influence the outcomes (Broström et al., 2008; Bunting and Hjelle, 2010).
Another, potentially important, factor that has rarely been discussed is the size- and shape-variation of sampling basins from where pollen data are obtained. The ERV model-based analysis assumes that the sedimentary basins are circular and of similar size (Sugita, 1994). These assumptions are valid, or not vastly violated, for the previous studies that used pollen samples from moss polsters or surface sediments from small hollows (Broström et al., 2004; Bunting et al., 2005; Calcote, 1995; Hjelle, 1998; Mazier et al., 2008; Räsänen et al., 2007; Sugita et al., 1999; Von Stedingk et al., 2008). When pollen samples are taken from lakes, it is often difficult to select similarly sized sites in practice (Nielsen and Sugita, 2005; Soepboer et al., 2007). It is still unknown how to evaluate the extent to which the departure from those assumptions affects the reliability of the estimates of relative pollen productivity and the RSAP.
The major objectives of this study are: (1) to evaluate, and seek a sound strategy to minimize, the effects of the size-variability of pollen sampling basins on relative PPEs and the RSAP, and (2) by using a data set of surface pollen from lakes and surrounding vegetation in a semi-open landscape in Norway, to obtain RSAP of the present landscape and reliable relative PPEs of major pollen types to be used in quantitative reconstructions of past vegetation in other studies and regions.
What do we expect when lake size varies? Simulations
Simulation design
Landscapes with different vegetation communities are designed to simulate pollen loadings on selected lakes based on the Ring-Source model of pollen dispersal and deposition (Sugita et al., 1999). The spatial patterns of vegetation simulated are similar to those used in Sugita (2007a, b), including three patch types and the matrix of different plant communities. Each community/stand type has specific plant composition (Table 1), consisting of five taxa (Poaceae, Picea, Betula, Quercus and Fagus). Vegetation patches are circular and distributed randomly without overlap; each patch type has a specific mean size and occupies a specific proportion in area in the simulated vegetation plots of 60 km × 60 km. Vegetation beyond the plots out to 400 km is assumed homogeneous, and its plant composition is the same as the mean of the composition in the plots. There are no systematic gradients in the spatial patterns of vegetation in the landscape – one of the important conditions for the simulations (Sugita, 1994, 1998, 2007a, b, c).
Parameter setting (landscape design and species composition, pollen productivity relative to Poaceae and fall speed of pollen) used for simulations. The species distribution of constituent taxa is assumed to be homogeneous in each stand type
Simulations are carried out using POLLSCAPE (Sugita, 1994) with the following parameter values: u (wind speed) = 3 m/s, Cz (vertical diffusion coefficient) = 0.12 m1/8, Cy (horizontal diffusion coefficient) = 0.21 m1/8, n (empirical turbulence parameter) = 0.25 (dimensionless), and γ (empirical parameter defined as n/2) = 0.125 (dimensionless) (Prentice, 1985; Sugita et al., 1999). PPEs and fall speed of pollen for the five taxa are listed in Table 1. For every combination of simulation scenarios, thirty 60 km × 60 km vegetation plots are created; in each plot a circular lake of a given radius is located randomly in a central 10 km × 10 km square in the plot to allow at least 20 km buffer within the plot and avoid edge effects on pollen representation of vegetation. Total pollen count from each lake, a number necessary to run the ERV models (Parsons and Prentice, 1981; Sugita, 1994), is fixed at 1000 grains. Major assumptions of the Ring-Source model and POLLSCAPE simulations are summarized in Sugita (1994, 1998, 2007c) and Sugita et al. (1999).
Three scenarios of lake-size distribution in addition to the uniform scenario (all lakes of similar size) are considered: normal, random and Weibull (Figure 1). Because all simulated lakes are circular in shape, we use lake radius, R, to represent the lake size. An appropriate random-number generator (Press et al., 1992) is used to obtain radii of 30 lakes for each scenario, except for the uniform scenario (R = 100 m at all sites). The mean and minimum R is set to 100 m and 25 m, respectively, to make comparison to the uniform scenario reasonable. The normal scenario uses the standard deviation of R of 30 m; the random scenario uses a random number generator assuming that R varies between 25 m and 175 m; the Weibull scenario uses the Weibull distribution function (Ruckdeschel, 1981) in such a way that the lake-size distribution is skewed to smaller sizes with a long tail (the parameters u and v of the function are set to 0.5 and 1.1, respectively). These three scenarios are supposed to mirror potential distribution patterns of lake sizes in empirical studies.

Lake-size distribution used in the simulations. Lake radius (R) is used as the variable to represent the lake size. Three probability functions (i.e. normal, random and Weibull) are used to generate R with a mean of 100 m
For the normal, random and Weibull scenarios, a lake radius – an input parameter necessary for the ERV model-based analysis (Sugita, 1994) – is calculated in two different ways (Figure 2): The smallest radius among sites, Rmin, and the average radius, Rmean. When using Rmin (Figure 2, option A), plant abundance data at all sites are used in its entirety for data analysis. With the average radius, the vegetation abundance is calculated in two different ways (Figure 2, options B and C). At some sites plants grow within the average radius and inclusion or exclusion of the within-radius plant abundance may affect the ERV-analysis outcomes. One option is to calculate the within-radius plant abundance and use it as the total plant abundance at the lake shore, referred to as Rmean1. The other option uses plant abundance data only at and beyond the average radius at all sites, referred to as Rmean2.

Diagrams showing the relationship between lake size, radius used in ERV analysis and vegetation data included in the analysis. The Rmin option (A) assumes that all lakes have a radii that is equivalent to the radius of the smallest lake; this means that vegetation abundance from Rmin to the lake shore is assumed nil at most site, because no plant grows on lakes. The Rmean1 and Rmean2 options (B and C) use the radius that is the mean radius of all the sites, calculated based on the mean lake area; the former assumes that within-radius vegetation abundance, when available, is used for calculating distance-weighted plant abundance at the radius, and the latter assumes that within-radius vegetation abundance, when available, is ignored. The RmeanRing option (D) is used for data analysis using the pollen-vegetation training set from western Norway. Because lake shape is not circular at all sites, we use the mean of the radius obtained when the first concentric ring hits vegetation (see text) for this option. For calculating the distance-weighted plant abundance, we use only the vegetation data beyond the radius, Rmean2ring, in such a way that plant abundance in Rmean2 is calculated
We use ERV submodel 3 (Sugita, 1994, 2007b) for data analysis. The RSAP is estimated based on its log-likelihood at every 1 m out to 3000 m from lake centers, using a moving-window regression method (Sugita, 2007b). The results are compared to the distance giving the maximum log-likelihood in the entire sequence. We repeat ten times the entire set of simulations using all combinations of scenarios, and then test the results statistically using T-test.
Simulation results
When lake radius is the same at all sites (the uniform scenario), log-likelihood curves of ERV submodel 3 always show the expected pattern of approaching an asymptote (Figure 3). For the normal and random scenarios the same patterns of the log-likelihood occur when Rmin is used, whereas multiple optima often occur when Rmean1 and Rmean2 are used. When the Weibull scenario is used, the multiple optima always occur regardless of the lake-radius options. The log-likelihood is lower when multiple optima occur.

Examples of simulated log-likelihood curves (1) when pollen data are from lakes with the same radius (R = 100 m; an ideal condition for the ERV model-based method) and (2) when pollen data are from lakes with different lake radii, causing multiple optima of the log-likelihood
It is often difficult to determine the RSAP using the moving-window regression when multiple optima of the log-likelihood occur; instead we use the distance that provides the maximum log-likelihood in the entire sequence as the RSAP estimate (Table 2A). We consider the RSAP estimate from the uniform scenario the most reliable and as expected in the simulated landscapes when the lake radius is 100 m. The permutation test for paired samples (Siegel and Castellan, 1988) shows that the RSAP estimates are not significantly different from the expected value when all three radius options for the normal scenario and Rmin for the random scenario are used (Table 2B). The RSAP estimates for other combinations of scenarios and options are significantly larger than the expected.
Estimated relevant source area of pollen (RSAP) and corresponding log-likelihood values. Ten simulated landscapes are used. (A) shows the RSAP means and standard errors (SEs) in different lake-size distributions and lake-size selection. (B) presents the permutation test results (p-values) whether the RSAP does not differ between the control and a given lake-size selection (
p < 0.05.
The log-likelihood that corresponds to the RSAP is highest when the uniform scenario is considered and the Rmin option always provides the highest log-likelihood in each of the three lake-size distribution scenarios (Table 2C).
As expected, pollen productivity estimates (PPEs) relative to Poaceae are reliable (seen by comparing the result PPE with the input PPE) for all taxa when the uniform scenario is considered (Figure 4). When Rmin is used, PPEs are often significantly higher than expected regardless of the lake-size distribution, considering the 99% confidence intervals (CI). Rmean1 and Rmean2 tend to provide similar PPEs in all lake-size distribution scenarios. In the normal scenario, all PPEs are not significantly different from the expected considering the 99% CI. The random scenario provides PPE of Betula significantly higher than the expected; the Weibull scenario provides PPEs of Quercus and Fagus significantly lower than the expected.

Simulated pollen productivity estimates (PPEs) and their standard errors using four options of lake-size distributions. PPEs for Picea, Betula, Quercus and Fagus relative to Poaceae are calculated using ERV submodel 3. The expected values of the PPEs are the input PPE used for the simulations. When the lake size is uniform, PPEs for all taxa are reliable as expected
Simulation summary and implications for empirical studies
The simulation results show that when the lake-radius distribution is normal, the estimates are robust and similar to those when all lake radii are 100 m. The two Rmean options give similar results, and although they provide lower log-likelihood than the Rmin option, they appear to provide more reliable relative PPEs.
The Rmin option appears to provide an RSAP estimate that is more similar to the expected than the Rmean options. However, the differences between the options are statistically not significant. In summary these results suggest that, even though the log-likelihood is expected to be lower than the ideal case, we are able to use the estimates of the RSAP and relative pollen productivity with confidence in the normal scenario.
These results provide insights into analytical strategies that are appropriate for empirical studies to estimate relative pollen productivity and the RSAP using sampling sites of varying sizes. The size distribution of the lake radii from western Norway varies but follows a normal distribution (Table 3). It is therefore reasonable to apply the above strategies on the empirical data set, even though the shape of the lakes studied is mostly not circular.
Information on the 34 investigated sites in western Norway with geographical position, lake size (radius R) and sampling year. The Kolmogorov-Smirnov one-sample test shows that the lake-size distribution is not significantly different from the normal: KS = 0.056 (p = 0.999) using R calculated from the lake area and KS = 0.150 (p = 0.396) using R based on the PolFlow program
Estimates of relative pollen productivity and relevant source area of pollen for small lakes in western Norway
Study area
The study area (Figure 5; Table 3) is within the boreo-nemoral and southern boreal vegetation zones (Moen, 1999), where vegetation varies with climate, topography, soil and human impact. Common taxa included in this study are Pinus sylvestris, Betula pubescens, B. pendula, Alnus glutinosa, A. incana, Corylus avellana and Quercus robur, as well as Picea abies and Fagus sylvatica which are mainly planted or spread from plantations. The open vegetation types include several communities (e.g. meadows, pastures, heathland, bogs, lake fringes and road sides). In this study, Cyperaceae, Poaceae, Rumex acetosa and Calluna vulgaris are used to characterize the open and semi-open vegetation types. Juniperus communis and Salix are also found in these communities.

Localization of the investigated sites in western Norway between 59°31′ and 60°42′N, and 5°4′ and 6°37′E. The codes used for the sites and further information are given in Table 3
Surface pollen samples are collected from 34 lakes that are located along a gradient from the coast to the inner fjord region and from 3 m to 936 m a.s.l. All sites are below the treeline. Lake size varies from 0.4 to 19 ha (mean=5.7 ha, sd=4.3 ha), with lake radii from 37 to 247 m. The size distribution of lake radii is not significantly different from normal (the Kolmogorov-Smirnov one-sample test: p=0.999) (Table 3).
Methods
Vegetation and pollen data
Digital land resource maps, DMK (Digitalt Markslagskart) made available through Geovekst (http://www.statkart.no/nor/Land/Fagomrader/Geovekst), are used to identify main vegetation/land-cover types. At each site, maps of 3 m × 3 m resolution are made from the centre of the lake out to 2000 m. A field survey crew visited most vegetation types surrounding each lake, to check vegetation boundary positions and to produce site-specific vegetation data. At three random points for each community, percentage cover was estimated for trees in the woodlands and for herbaceous taxa, dwarf-shrubs, shrubs and trees in open and semi-open communities. The mean cover for each vegetation type was then calculated. Species composition in vegetation types which were not surveyed in the field is estimated, based on the cover from comparable communities. Percentage cover of individual taxa in concentric rings with 10 m width from the centre of each lake out to 2000 m – input data necessary for the ERV model-based analysis – is calculated using PolSack, PolFlow and PolLog within the HUMPOL package (Bunting and Middleton, 2005). In PolFlow, vegetation measurement starts from the first concentric ring in which plant abundance data are available.
Surface sediment samples are collected using a Hongve type gravity corer (Boyle, 1995; Wright, 1991) with diameter 8 cm. Top sediments are taken from 0–0.5 cm and 0.5–1 cm. The top 0.5 cm is analysed from all lakes, except for two sites (LEK and SNE (Table 3)) where the top two samples (0–1 cm) are used because of low pollen concentration. A minimum of 1000 pollen grains of terrestrial taxa are counted for each sample except for three sites (LEK, SKO, BJØ) with pollen sums of 500, 414 and 427, respectively, because of low pollen concentration.
Thirteen plant taxa commonly found in both the modern pollen assemblages and in the vegetation survey data are selected for data analysis. These contribute to between 84 and 99% of the pollen sum from the 34 lakes (Figure 6). With the exception of Fagus, present in ten of the 34 lakes, all taxa are present in (nearly) all lakes and with variations in the pollen percentages between sites.

Pollen samples from the 34 investigated lakes in western Norway ordered according to the sample scores of the first PCA axis. Black columns show percentages, grey shadow the percentages × 10
Data analysis
Detrended Correspondence Analysis (DCA) and Principal Component Analysis (PCA) in the program CANOCO (Ter Braak and Smilauer, 2002) are used to show the distribution gradients in the vegetation and pollen data sets. This is done to identify potential outliers and to evaluate the gradient length and spread of species and sites in each data set, as well as that in the simulation results.
Estimates of relative pollen productivity (PPEs) and the relevant source area of pollen (RSAP) are obtained using ERV Analysis v1.2.4 (Sugita et al., 2010a). Fall speeds of pollen are given in Table 4. Poaceae is used as the reference taxon in all analyses. We use the Ring-Source model of pollen dispersal and deposition, ERV submodel 3 and a wind speed of 3 m/s to calculate PPEs (Sugita, 1994; Sugita et al., 1999, 2010b), and a moving window regression method to estimate preliminary RSAP values with window width of 100 m (Sugita, 2007b). However, since the log-likelihood did not approach an asymptote but decreased after having reached a maximum, the moving window regression method was found to be less effective than in the simulations, and the distance that provided the highest log-likelihood is referred to as the estimate of the RSAP (see the simulation section). For the final estimates of relative pollen productivity for individual taxa, the mean of the PPEs obtained at the points providing the five highest log-likelihood are used (i.e. within a 70–100 m window around the RSAP).
Fall speed of pollen for each taxon used in the analysis (from Gaillard et al., 2008) and pollen productivity estimates (PPEs) relative to Poaceae from western Norway obtained in the present study using ERV model 3
The ERV model-based analysis assumes that sampling sites are circular with the same radius. When the lake shape differs from a circle, the frequently used method (Hellman et al., 2008a,b; Nielsen and Sugita, 2005; Soepboer et al., 2007, 2010) is to calculate a radius of a lake based on the assumption that the lake is circular; once the total area is known the radius is obtained. In the PolFlow program (Bunting and Middleton, 2005) vegetation data are collated in concentric rings from a sample point centrally located in the lake. The distance at which the rings hit the vegetation, which corresponds to the largest circle that can fit into an irregularly shaped lake, is in the following referred to as the ring radius, Rring. In our data set Rring varies from 20 to 130 m, with a mean of 70 m. The size distribution of these radii is not significantly different from normal (the Kolmogorov-Smirnov one-sample test: p=0.396) (Table 3).
For data analysis four options for the lake radius are used (Figure 2): Rmin, Rmean1 and Rmean2 defined as in the simulation section, and Rmean2ring, as the average radius of the lakes when the ring radii are used and ignoring the within-radius vegetation.
Results
Main gradients in the vegetation and pollen data
The vegetation data set has a gradient length of 2.26 SD along the first DCA axis which reflects a gradient from Calluna and Juniperus on the negative side to Alnus and Picea on the positive side (Figure 7a). The second axis reflects a gradient from Fagus and Quercus to Betula and Pinus. A few sites are grouped and separated from the others along the first axis; the coastal sites EIK, SKO and LIS to the left and two inland sites from an area of Picea forests, OPE and SKU, to the right. MYR is separated from the remaining sites along the second axis.

The DCA results using the pollen data show a gradient length of 0.74 SD along the first axis, indicating a narrow range in the pollen assemblages among sites. PCA is then applied for analyzing further the relationships between pollen taxa and sites (Ter Braak and Prentice, 1988). The first PCA axis explains 59% of the total variance in the data and shows a gradient from Pinus dominance on the negative side to dominance of deciduous trees (Alnus, Corylus, Betula, Quercus) on the positive side (Figure 7b). The second axis explains 12.7% of the total variance and separates Poaceae, Rumex, Calluna and Fagus from Picea.
There is no clear region-specific grouping of the sites in the pollen data; thus we assume that there are no geographic gradients in the pollen records – an important assumption for the ERV model-based analysis (Sugita, 1994). The distance between BJØ and LEK, which are found at the opposite ends of the first PCA axis (Figure 7b), is c. 2500 m. Sites from the southern part of the investigated region (e.g. TIN, BJO, LIS, and LIN) and the two most coastal sites (SKO and EIK) are spread along the first axis.
Short gradients are also found in the simulated data. DCA on the five taxa used in the simulations shows gradient lengths between 0.2 and 0.5 SD both for the simulated pollen assemblages and the vegetation data.
Relevant source area of pollen (RSAP) in western Norway
The log-likelihood increases out to around 1000–1100 m, then starts declining (Figure 8). The Rmin option gives the highest log-likelihood; Rmean1 gives higher log-likelihood than that when Rmean2 is used; the Rmean2ring option gives an intermediate log-likelihood between the Rmin and Rmean1.

Log-likelihood obtained from ERV model 3 using (a) 34 lakes and radius 40 m, (b) 34 lakes and four different lake radius options. Note different scales on the vertical axes. RSAP in each case (the distance having the highest log-likelihood score) is indicated by an arrow
Using the highest log-likelihood, the Rmin option gives the RSAP estimate of 1040 m. With the Rmean1 and Rmean2 options the RSAP estimate is 920 and 1020 m, respectively, whereas Rmean2ring gives an RSAP of 1070 m (Figure 8b). Simulations show that, if the size distribution follows a normal distribution, the RSAP estimates can be similar when using the Rmin and Rmean options. It is therefore reasonable to conclude that the RSAP in the study region is between 900 and 1100 m when using lakes with mean radius 125 m.
Relative pollen productivity estimates (PPEs) and lake radius
The simulation results imply that relative PPEs are more reliable when the Rmean1 and Rmean2 options are used than those when the Rmin option is used (Figure 4). Also in our empirical data the two Rmean options produce mainly the same results (Figure 9) and the PPE results using the Rmean2 option (Table 4) shows: (1) Pinus and Alnus have high pollen productivities relative to Poaceae, (2) Cyperaceae, Quercus and Picea have slightly higher PPEs than Poaceae, and (3) Calluna, Fagus, Juniperus, Salix and Rumex have lower pollen productivity than Poaceae. The relative PPEs for two common taxa in the Norwegian vegetation, Corylus and Betula, are not significantly different from zero considering the standard error estimates, and are thus not reliable. When the Rmin option is used, taxa such as Alnus, Calluna, Fagus, Picea, Pinus and Quercus show higher PPEs than those using the Rmean1 and Rmean2 options, which is consistent with what the simulations show.

Alpha estimates and their standard errors using 34 lakes from western Norway and different lake radius options. As Rmin the radius 40 m (ring 40–50 m) is used, Rmean2ring is 70 m (ring 70–80 m), and the radius 120 m (ring 120–130 m) represents Rmean1 and Rmean2. In Rmean1 the vegetation of the inner circles is added to the vegetation data of the mean ring, whereas the within-radius vegetation abundance is ignored using Rmean2
The results using the Rmean2ring option show that relative PPEs for nearly all taxa are more comparable with those using the Rmin option than to those using the Rmean1 and Rmean2 options (Figure 9). For some taxa, Juniperus, Salix, Rumex and Corylus, the differences between options are not significant considering the error estimates.
Discussion
Implications for a strategy for estimating relative pollen productivity and the RSAP
As with any statistical and mechanistic model the ERV model-based method is based on several key assumptions, including: (1) the location of sampling sites are randomly selected in a given vegetation region where the spatial structure of vegetation is invariant, and (2) the size and shape of sampling sites are similar and circular (Sugita, 1994). As long as these assumptions and conditions are justifiable, we expect that results are reliable. Palynologists often find it difficult to have proper sampling designs, however. In the present study the heathland area and the mountains above the treeline were excluded to reduce the possible impact of Calluna dominance by the coast (Nielsen, 2004) and long-distance pollen transport in the open mountain vegetation. In the selection of sites, the lake size, shape of the lake and distance between lakes were considered. The size range of the lakes (0.43–19.14 ha) is within the same order of magnitude as the lakes in the studies in Denmark (3.5–33 ha (Nielsen, 2004)) and Switzerland (0.93–38.26 ha (Soepboer et al., 2007)). The conclusions from the present study are therefore relevant also for other lake-based studies. One of the major findings from the simulations in this paper is that, as long as the size distribution of lakes (i.e. radius) follows a normal distribution, PPEs and the RSAP can be reliable. We did not look into the effects on the outcomes of different site shapes. Modelling and empirical studies using the HUMPOL package (Bunting and Middleton, 2005) and other GIS-related methods will be useful on this subject.
Lake size is one of the most critical factors affecting the RSAP estimates (Bunting et al., 2004; Hellman et al., 2009a, b; Sugita, 1994). The RSAP estimates in western Norway are similar to each other: 1040 m, 1070m, 920 m and 1020 m, respectively, when four options for the radius – Rmin, Rmin2ring, Rmean1 and Rmean2 – are used. Even though using the Rmin option may seem unrealistic, with this option plant abundance data at all sites are used in their entirety for the data analysis. The Rmean1 option gives the smallest RSAP in the present study, which may be connected to the way the vegetation abundance data are calculated. The Rmean1 and Rmean2 options calculate the distance-weighted plant abundance at individual sites in specific ways that differ from that where all sites have the same size (Sugita, 1994, 1998). Nielsen and Sugita (2005) and Soepboer et al. (2007) estimated the RSAP in their respective regions using the same way as the Rmean2 option. We recommend Rmin, Rmean1 and Rmean2 to be used in future studies of the RSAP and relative PPEs and compare and evaluate the results carefully.
When using the mean radius, which provided the most reliable results for estimating relative pollen productivity, we had two options: to include the vegetation of the inner circles (Rmean1) or exclude it (Rmean2) (Figure 2). The mean radius for lakes used is 120 m, thus for e.g. the smallest lake (LIS, Table 3), the plant abundance data between 20 m and 120 m is not used. Source plants close to study sites are expected to affect pollen loading on lakes and bogs more than those growing far away. However, the differences in the plant abundance data between Rmean1 and Rmean2 do not influence the estimates of relative pollen productivity significantly (Figure 9). Our results imply that detailed survey of plant abundance in the immediate surrounding of the lakes would not be critical for estimating relative pollen productivity from lake sediment samples.
When calculating relative PPEs the mean of the values obtained from the RSAP to the maximum vegetation surveyed distance has commonly been used (Broström et al., 2004; Mazier et al., 2008; Soepboer et al., 2007). In the present study, where we identify the RSAP between 900 and 1100 m and observe a decrease in the log-likelihood after the optimum, the mean of the values obtained at the five points providing the highest log-likelihood were used to estimate the relative PPEs instead. These estimates were compared with values obtained using the values from the RSAP to the maximum vegetation surveyed distance, using the RSAP and two or four following points, as well as using the ten points providing the highest log-likelihood. The results (not shown) are mostly consistent between the methods, although some estimates differ. There is no objective rule for the method to prefer when the log-likelihood curves do not follow the expected pattern of reaching an asymptote. Selecting relative pollen productivity estimates at the distances where the five highest log-likelihood occur and calculating the means is a robust way in theory and practice.
Differences in methods for collecting pollen and vegetation data can potentially affect relative PPEs and the RSAP (Broström et al., 2008; Bunting and Hjelle, 2010). In addition spatial characteristics of vegetation and land cover in a study region are also major factors affecting the RSAP. Spatial structures of the hypothetical vegetation landscapes used for simulations are stationary even though vegetation structure is patchy (Broström et al., 2004, 2005; Bunting et al., 2004; Hellman et al., 2009a, b; Sugita, 1994, 1998, 2007a, b, c; Sugita et al., 1999). Thus, DCA-based gradient lengths for simulated pollen assemblages and vegetation abundance among sites in our study tend to be shorter than those empirically obtained from western Norway. It is still unclear how the differences in the gradient lengths influence the reliability of relative PPEs and the RSAP. Excluding sites on the edges of the pollen and vegetation gradients in this study change the relative PPEs for taxa such as Alnus, Calluna, Fagus and Picea (results not shown). As a rule of thumb pollen sampling sites need to be selected within a given region where the spatial patterns and structure of vegetation and land cover are stationary (Sugita, 1994, 2007c); however, size and location of sampling basins, spatial patchiness and heterogeneity of plant distribution, and spatial gradients in plant community distribution in the region, all interact with each other when the ERV model-based method is used to estimate relative pollen productivity and the RSAP (Bunting et al., 2004; Hellman et al., 2009a; Nielsen and Sugita, 2005; Sugita, 1994, 1998).
The topography and vegetation cover in the region used for this study are patchy and heterogeneous. Pollen assemblages from herbaceous communities in western Norway have a clear west–east gradient (Hjelle, 1998, 1999), reflecting different species growing in the cultural landscapes at the coast and in the fjord region. However, the PCA results using pollen assemblages including both tree and herbaceous pollen (Figure 7b) show a complex mixture of patterns that cannot be explained simply by the geographical locations of the sites. Thus it is reasonable to assume that the pollen assemblages do not have systematic geographic biases and are appropriate for the ERV model-based analysis.
Relative PPEs from Norway
Among the tree taxa included in this study, relative PPEs of two common taxa in Norwegian vegetation; Betula and Corylus, are problematic. The same low values were obtained in all the lake radius/vegetation calculation options (Figure 9). Sugita et al. (2010a) also find it difficult to estimate pollen productivity of Betula in northern Finland using pollen trap and tree volume data in absolute units. There are several factors influencing the representation of Betula in the pollen assemblages. Two species of Betula and their varieties (B. pendula, B. pubescens with subspecies) grow in the region with different growth forms and heights; thus the amounts of pollen produced per unit area can potentially differ depending on the species growing locally. The climatic gradient from west to east in the region may influence pollen production (Autio and Hicks, 2004), year-to-year variation of pollen production (Ramfjord and Brobakk, 2005), and time periods the top 0.5–1.0 cm of sediments represents are other factors that would matter. The pollen samples may in some of the lakes cover only one or two years of pollen accumulation (Appleby and Piliposyan, 2009); the sampling was carried out over four years (see Table 3), thus annual variation in flowering could obscure pollen representation of vegetation (Andersen, 1974; Hicks, 1985). The diverse topography with steep hill sides in some areas as well as small inlets into some of the sites (Bonny, 1976), are also factors that may have an effect on the pollen deposition. Another point, that may be more important for Betula and Corylus than for the other taxa, is the vegetation survey method where only species present in the upper canopy were included in the woodlands. Both Betula and Corylus often grow as shrubs in the forests and may thus be underestimated in the vegetation data. Although most vegetation communities were surveyed, some communities were not, which may have had an effect on a common taxon such as Betula. The possible implications of the field methods on the relative PPEs should be investigated in future studies.
The relative PPEs for the other taxa are mostly comparable with estimates previously reported in northern Europe (Broström et al., 2004, 2008; Bunting et al., 2005; Hjelle, 1998; Nielsen, 2004; Räsänen et al., 2007; Sugita et al., 1999; Von Stedingk et al., 2008). The estimate for Pinus (5.73) is similar to that in southern Sweden (5.7) and comparable with Estonia (6.8) and Finland (8.4), whereas a higher estimate from central Sweden (21.58) is based on few samples, and a lower estimate from Denmark (1.41) is based on historical maps; Alnus (3.22) is slightly lower than that in southern Sweden (4.2), and both are lower than England (11.4); the estimate for Picea (1.2) is comparable with southern Sweden (1.8) and lower than other estimates from northern Europe (2.78–4.8); PPE for Fagus (0.8) is lower than south Sweden (6.7) and Denmark (3.6) and also Quercus (1.3) is lower than other estimates from northern Europe (7.6 in southern Sweden). The fact that the values of the relative PPEs change when some sites are deleted from the analysis, e.g. for Fagus, may indicate that the low PPE for Fagus in Norway reflects the rareness of that species in the pollen assemblages and vegetation. Also Quercus has low representation within the study area. Another possibility is that low PPE for Fagus reflects its northern distribution limit in the study area. Soepboer et al. (2007) reported relative PPEs for Fagus (0.76), Quercus (2.56) and Picea (0.57) that are comparable with those in western Norway, however.
The PPE for Calluna (0.87) relative to Poaceae is an intermediate value between those obtained in central Sweden (0.3) and western Norway using moss polsters (1.07). Salix (0.62) is higher than that in central Sweden (0.09) but lower than that in south Sweden (1.3) and England (1.37). Juniperus (0.79) is intermediate between those in central (0.11) and southern Sweden (2.1). Rumex (0.39) is similar to that in Denmark (0.35) based on historical maps. The relative PPE of Cyperaceae (1.37) is higher than that previously obtained in western Norway using moss polsters (0.29) but is more comparable with those obtained in central (0.89) and southern Sweden (1.0).
The different PPEs that have been published are based on different sample types (moss polsters, lake sediments) representing different basin types and field methods (Broström et al., 2008). For reconstruction purposes it is important to evaluate the methodology behind the relative PPEs and use appropriate estimates (Bunting and Hjelle, 2010). The similarity in relative PPEs for some taxa obtained in the present study based on lake sediments and the study from southern Sweden based on moss polsters (Sugita et al., 1999) is, however, encouraging for identifying main groups of relative pollen productivity for individual taxa with wide application possibilities for vegetation reconstructions.
Conclusions
The departure from the assumptions for the ERV model-based analysis, which may often be the case in empirical studies, can lead to unexpected patterns of log-likelihood and unreliable relative PPEs and estimates of the RSAP. We have used simulations to evaluate appropriate research strategies and applied these on an empirical lake study from western Norway.
These include:
check that the lake sizes (radius) follow a normal distribution
use the minimum and mean lake radius (calculated from the lake area) to estimate the RSAP
use the mean lake radius to calculate the PPEs, compare the results when the within-radius plant abundance is included and excluded and compare the results to results using the minimum lake radius
when the log-likelihood curve differs from the expected (i.e. increasing and approaching an asymptote), the distance that provides the maximum log-likelihood can be used as the RSAP and the PPEs estimated as the means at the distances where the 5–10 highest log-likelihood occur
Taking into account these strategies on the case study from western Norway shows comparable results with the simulations and that, even though the lake size varies, relative PPEs of ten major taxa in the region are mostly within the range of estimates previously obtained in Europe.
Footnotes
Acknowledgements
We are grateful to Lene S. Halvorsen, Jorunn Larsen and Ingvild K. Mehl for field assistance, to Linn C. Krüger and Jan Berge for processing pollen samples, to Gidske L. Andersen and Henrik Espedal for providing vegetation/land-cover maps, and to Beate Helle for making some of the final figures. Thanks also to all the members of the POLLANDCAL network for inspiring discussions on PPEs and the RSAP during the numerous workshops in the network between 2001 and 2005 and in the following years. The referees, M. Jane Bunting and Anne Birgitte Nielsen, have given valuable comments on the manuscript.
Financial support has been provided by the Norwegian Research Council (Small research grants), the Olaf Grolle Olsen foundation and Bergen University Museum research grants. Sugita was supported by funding from the Estonian Science Foundation Mobilitas Programme (MTT3) and Ministry of Education (SF 0280016 S07).
