Abstract
Grassland habitats provide many ecosystem services but are threatened by agricultural intensification and urbanization. While the lack of accurate and comprehensive inventories at the national scale makes them difficult to manage, advances in spatial modeling using open remote sensing data and open-source software, as well as the increasing use of ecological archives, provide new perspectives for mapping natural habitats. In this context, this study evaluated the contribution of spectral and environmental variables to discriminate and then map grassland habitats throughout France. To this end, 92 spectral variables derived from moderate-resolution imaging spectroradiometer data, 19 bioclimatic variables derived from WorldClim data, 4 topographic variables derived from the European Union Digital Elevation Model (DEM), and 8 soil variables derived from SoilGrids data were combined at a spatial resolution of 250 m. Reference plots that characterized 6 and 21 grassland ecosystems at European Nature Information System (EUNIS) levels 2 (broad habitats) and 3 (habitats), respectively, were collected from vegetation archives. We first performed descriptive analysis that included habitat description, ordination, and pairwise separability. We then performed predictive analysis of grassland habitats using a cross-validated random forest model that included a spatial constraint. While environmental and spectral variables characterized most grassland habitats well and consistently, some confusion occurred between habitats with similar abiotic conditions. The main grassland habitat types were correctly mapped at EUNIS level 2 (F1 score = 0.68), but not at EUNIS level 3 (F1 score = 0.52). In addition, the two variables that contributed most to the model were the near-infrared spectral band in spring and the minimum temperature of the coldest month. The model’s prediction at EUNIS level 2 for mainland France provides the map of grassland habitats at a new spatial scale.
1 Introduction
Grassland habitats are ecosystems in which naturally occurring or indigenous herbaceous vegetation dominates (Allen et al., 2011). They are one of the most widespread ecosystems on Earth and include a wide diversity of habitats such as dry, wet, or alpine grasslands (Dixon et al., 2014). While grassland habitats support many ecosystem services such as biodiversity maintenance, carbon storage, and forage provision, the intensity of their performance is habitat-specific (Zhao et al., 2020). Furthermore, whereas agricultural intensification, urbanization, and climate change stress all grassland habitats (Ali et al., 2016; Van Oijen et al., 2018), some of them are significantly threatened (Gigante et al., 2018; Janssen et al., 2016). Thus, large-scale mapping of grassland habitats is essential to monitor their extent and fragmentation and to measure the efficiency of conservation policies (Mücher et al., 2009). Despite recent progresses, national/continental maps of grassland habitats still suffer from incompleteness and/or spatial inaccuracy (Lomba et al., 2014). For example, distribution of grassland habitats (presence/absence) has been mapped at national and continental scales from vegetation archives, either independently (Carli et al., 2020; Chytrý et al., 2020) or in combination with kilometer-scale environmental variables (Mücher et al., 2009), the spatial resolution of these maps was restricted to a grid size ranging from 1 to 50 km given the approximation of georeferencing and the spatial incompleteness of vegetation plots.
However, a wide diversity of spectral, bioclimatic, soil and topographic variables derived from remotely sensed data are now open data, and can be embedded in open-source geospatial software to model natural vegetation at high spatial resolution (He et al., 2015; Rocchini et al., 2017). Although these variables cover the globe, their use for modeling grassland habitats remains limited to local (Buck et al., 2015; Cheţan et al., 2018; Gavish et al., 2018; Schuster et al., 2015) or regional areas (Álvarez-Martínez et al., 2018; Dalle Fratte et al., 2019; Xu et al., 2019). This may be due to the difficulties involved in collecting accurately georeferenced and areal field data (Corbane et al., 2015; Jongman et al., 2019), the quality of which may vary across regions (Suarez-Seoane et al., 2020) and producers (Jarić et al., 2019; Ullerud et al., 2018). Another reason for the absence of national-scale modeling of grassland habitats using these variables may be inadequate model transferability (Yates et al., 2018), which is related not only to the spatial autocorrelation of the field data, but also to model complexity, which in turn is related to the large number of variables used (Maxwell et al., 2018).
In contrast, national modeling of the land-use/land-cover (LULC) grassland class (Inglada et al., 2017), which includes both natural and non-natural grasslands (Allen et al., 2011), as well as the description of their management practices (Griffiths et al., 2020) is operational through the use of satellite time series. On this basis, many studies have translated the LULC grassland class into general grassland habitat types using decision rules based on very-high-spatial-resolution remote-sensing data acquired at local sites (Adamo et al., 2015, 2020; Moran et al., 2017). However, this approach raises the problem of one-to-many correspondence (Kosmidou et al., 2014), as well as that of the transferability of decision rules related to variation in the quality, number of acquisitions and type of remote sensing data available among local sites (Corbane et al., 2015). A recent study accurately discriminated natural and artificial grasslands over the whole of metropolitan France using moderate-resolution imaging spectroradiometer (MODIS) decennial time series (Hubert-Moy et al., 2019), but did not discriminate grassland habitat types.
This study builds on the research initiated by Hubert-Moy et al. (2019) and aims to discriminate and model the distribution of grassland habitats in France based on spectral and environmental variables according to the European Union Nature Information System (EUNIS; https://eunis.eea.europa.eu/). EUNIS is defined mainly on the basis of phytosociological classification systems (Rodwell et al., 2018) and has five nested hierarchal levels: levels 1–3 describe mostly natural habitats using physiognomic and environmental criteria and a few floristic criteria, while levels 4 and 5 use floristic criteria alone (Davies et al., 2004). Thus, we hypothesized that grassland habitats described as EUNIS level 2 (broad habitats) and level 3 (habitats) could be retrieved from bioclimatic, soil, topographic, and spectral variables. To this end, we first performed descriptive analysis using open spectral and environmental data and open vegetation archives. We then predicted grassland habitats at the national scale using a random forest (RF) model cross-validated with a spatial constraint.
2 Materials and methods
2.1 Study site
This study focused on the European territory of France, which covers 550,000 km2, but excluded intertidal areas since they are rarely included in available spectral and environmental predictive variables (Figure 1). France encompasses four biogeographical regions: Atlantic, Continental, Mediterranean and Alpine (European Environment Agency, 2016). This induces a wide variety of ecological and climatic conditions, which results in a great diversity of grassland habitats (Bensettiti et al., 2005). More specifically, France includes approximately 20 grassland habitats classified into 6 main types (Table 1): littoral sediments (A2), which include coastal brackish grasslands that are no longer subject to the influence of tides owing to the construction of embankments; dry grasslands (E1), which include well-drained or dry soils on which short, unfertilized and unproductive herbaceous vegetation grows; mesic grasslands (E2), which include eutrophic and mesotrophic pastures and hay meadows that are fertilized and reseeded by agriculture; seasonally wet and wet grasslands (E3), which include herbaceous plant communities on hydromorphic soils that receive little or no fertilization; alpine and subalpine grasslands (E4), which include primary or secondary plant communities dominated by grasses or sedges at alpine and subalpine levels; woodland fringes and clearings and tall-forb stands (E5), which include tall-grass plant communities that result from absence of farm management at the edges of urban, agricultural, woodland, or riverine areas.

Study site location and reference data showing (a) the selection of a reference plot within a field habitat polygon and a 250 m MODIS pixel of permanent grasslands (Hubert-Moy et al., 2019), and (b) the distribution of the 1970 reference plots per 10 × 10 km grid (European Environmental Agency (EEA)) within the four European biogeographic regions in France.
The main EUNIS level 2 (bold font) and level 3 (regular font) grassland habitats studied and their respective reference plot sizes. A full description of the EUNIS grassland habitats is available in the EUNIS habitat classification report (Davies et al., 2004).
A full description of the grassland habitats in France is available in the French version of the EUNIS habitat classification report (Louvel et al., 2013). Although no exhaustive national inventory of grassland habitats exists at the field scale in France, analysis of the land parcel information system performed in the framework of the EU common agricultural policy indicates that permanent grasslands (> 5 years), which support natural but also non-natural grasslands (Plantureux et al., 2012), cover approximately 23% of metropolitan France (AGRESTE, 2017).
2.2 Field reference data
The reference data were obtained from field maps produced since the 2000s. These archive data were regionally collected and standardized by the French national botanical conservatories and then publicly released by the regional environmental departments according to the EU Directive 2007/2/EC INSPIRE (Bartha and Kocsis, 2011). They are delivered in vector format in which each polygon represents a natural habitat according to the EUNIS, CORINE Biotopes, or Natura 2000 classification systems. When necessary, correspondence was made with the EUNIS level 3 classification system according to the French national reference system for habitats (HABREF) (Clair et al., 2017). Then, the polygons that corresponded to grassland habitats were selected.
To supplement the field data, a national map of permanent grasslands (Hubert-Moy et al., 2019) was used as a mask, first to discard the reference polygons identified in the early 2000s that no longer corresponded to grassland habitats (e.g., due to urbanization or fallowing), and second to discriminate grassland habitats from the rest of the landscape (especially temporary grasslands) during the spatial modeling process. This map, which was derived from MODIS time series (250 m), accurately predicts (F1 score = 0.89–0.93) the grassland frequency over a 12-year period (2006–2017). A frequency close to zero indicates temporary grassland, while a frequency close to one indicates permanent grassland. A threshold frequency of 0.8 was defined to identify grassland habitat (Figure 1).
The reference plots used for the analysis, which were visually selected from the permanent grassland map, correspond to 250 m pixels that were completely enclosed within a field data polygon (Figure 1). The maximum number of plots per EUNIS level 3 habitat was set at 200 to prevent an unbalanced distribution of reference plots per class. Rare grassland habitats with less than 20 plots were excluded from further analysis. The plots were then aggregated to EUNIS level 2. A total of 1970 reference plots were retrieved (Figure 1).
2.3 Spectral and environmental variables
The spectral variables, which describe temporal biophysical variations of vegetation, were derived from the MODIS-based MOD13Q1 (Terra) and MYD13Q1 (Aqua) collection 6 (Didan, 2015) for 2003–2018. They were retrieved from the NASA EOSDIS Land Processes Distributed Active Archive Center (https://lpdaac.usgs.gov) in WGS84 projection using the USGS Application for Extracting and Exploring and Extracting Analysis Ready Samples (AppEEARS) (Neeley, 2018). These data provide composite images without cloud cover in the red (620–670 nm) and near-infrared (NIR; 841–876 nm) spectra at a temporal resolution of 16 days and a spatial resolution of 250 m. The combination of MOD13Q1 and MYD13Q1 datasets achieves an effective 8-day temporal resolution, as their 16-day intervals are shifted by 8 days. Both red and NIR spectral bands are recognized to characterize the intensity of photosynthetic process, the phenology of vegetation, and vegetation ground cover (Xie et al., 2008). In addition, the NIR spectrum derived from MODIS data is a valuable indicator of biomass (Muukkonen and Heiskanen, 2007), open water (Sun et al., 2011), and snow cover (Gautam et al., 2013). Although vegetation indices such as the normalized difference vegetation index (NDVI) or enhanced vegetation index (EVI) were available in the MOD13Q1/MYD13Q1 products, we did not use them since they do not increase modeling accuracy significantly (Khatami et al., 2016). To generate the spectral variables, the MOD13Q1 and MYD13Q1 data were combined each vegetation year (from 1 November of year n to 30 October of year n + 1), and the Terra and Aqua time-series from 2003–2018 were averaged together on a band-wise basis. A total of 92 spectral variables (i.e., 46 8-day periods averaged annual acquisitions × 2 spectral bands) were considered in the analysis (Table 2).
Description of the spectral and environmental variables considered to model grassland habitats. The variables selected following correlation analysis are highlighted in bold. For spectral variables, two red spectrum variables (red.201 and red.225) and nine near-infrared spectrum variables (NIR.001, NIR.105, NIR.145, NIR.185, NIR.201, NIR.297, NIR.305, NIR.313 and NIR.321) were selected from the 46 8-day periods average annual acquisitions. DOY = day of year.
Environmental variables, which describe bioclimatic conditions, soil properties, and topography, were collected depending on the ecological conditions of the grassland habitats. Specifically, 19 bioclimatic variables were extracted from the WorldClim version 2 dataset (Fick and Hijmans, 2017). These variables, which have an approximate spatial resolution of 1 km, were generated from measurements collected from a global network of meteorological stations combined with land surface temperature images derived from a MODIS satellite. In addition, soil property variables were extracted from the SoilGrids dataset version 1 (Hengl et al., 2017). These variables were derived at 250 m spatial resolution from combined modeling of global soil profiles (WoSIS database) with climate, land cover, and terrain satellite-based covariates. We selected seven variables from SoilGrids: pH, soil organic carbon content, sand content, silt content, clay content, cation exchange capacity (as an indicator of soil fertility (Charman and Murphy, 2007)), and depth to bedrock (Table 2). Variables that characterized soil classes were not considered due to their low modeling accuracy (< 50%). Four additional topographic variables were generated from the Europe-wide EU-DEM product version 1.1, which has a 25 m spatial resolution and a vertical accuracy of 7 m (García González et al., 2015). The four topographic variables were a topographic wetness index calculated with a multiple flow algorithm (Kopecký and Čížková, 2010), a topographic position index calculated from a multi-scale (150–2000 m) moving window (Guisan et al., 1999), the eastness aspect (equal to sin(aspect)) and the northness aspect (equal to cos(aspect)). All variables were resampled at a 250 m grid size according to the MODIS images.
All spectral and environmental variables were scaled, due to their wide and heterogeneous ranges of values, which can strongly bias estimates of their importance in the model (Strobl et al., 2007). For this purpose, they were standardized by subtracting the mean and dividing by the standard deviation to have a zero mean and unit variance. Moreover, combining spectral, bioclimatic, pedological and topographical criteria in this study generated a large number of predictive variables (n = 123), which may result in a complex model that is prone to overfitting (Maxwell et al., 2018). For this reason, only the least correlated variables were retained for further analysis. Based on reference plots, a Spearman correlation coefficient was calculated for each pair of variables. When the correlation between two variables exceeded 0.8, the variable with the highest mean correlation coefficient (i.e., among all pairs) was discarded. Ultimately, 11 (out of 92) spectral variables, 7 (out of 19) bioclimatic variables, 5 (out of 7) soil variables, and all 4 topographic variables were retained for further analysis (Table 2).
2.4 Descriptive analysis
Descriptive analysis was performed in three steps for grassland habitats at EUNIS levels 2 and 3 based on the reference plots. First, each habitat was characterized by its values of spectral and environmental variables. For each variable, a v-test, which compares the mean of the plots of one habitat with the mean of the plots of all habitats, was performed (Husson et al., 2017). An absolute value of the v-test of 1.96 corresponds to a p-value of 0.05. The higher the absolute value of the v-test, the more typical the variable is of the habitat. In other words, if the v-test value is positive or negative, the values of the variable are higher or lower, respectively, for the habitat than the general average. Second, the proportion of inter-class variance explained by spectral and environmental variables was quantified using linear discriminant analysis (LDA), which is a dimension-reduction approach that maximizes inter-class variance (Balakrishnama and Ganapathiraju, 1998) so that the separability and distribution of each habitat can be visualized in a two-dimensional space based on ordination of reference plots. Third, the separability of each habitat pair was estimated using the Jeffries–Matusita distance (JMD) (Richards, 2013), which ranges from 0 (similar and inseparable classes) to √2 (i.e., 1.41; distinctly separable classes).
2.5 Predictive analysis
Discrimination of grassland habitats by RF modeling (Breiman, 2001) was tested by creating one RF model for each EUNIS level. To prevent overfitting due to the small number of reference plots for certain habitats and to estimate model accuracy better, fivefold cross-validation repeated twice was performed (Kuhn and Johnson, 2013). In addition, to decrease the influence of spatial autocorrelation between close reference plots, spatial constraint by “spatial block” was applied when assigning each reference plot to a fold (Valavi et al., 2019). Specifically, reference plots were assigned to a fold depending on their location within 10 × 10 km spatial blocks (Supplementary material S1). In the first iteration, two of the three folds were used to train the model, while the third fold was used to test it. Then, in each of two additional iterations, another fold was held out for testing, while the other two were used for training. This spatial cross-validation process was repeated 10 times to estimate model accuracy better (Refaeilzadeh et al., 2009).
RF models depend on two parameters: “ntree,” which sets the number of decision trees that are generated, and “mtry,” which sets the number of variables selected at each branch of the trees (Breiman, 2001). Since “ntree” has little influence on RF model accuracy (Belgiu and Drăguţ, 2016), its default value (500) was kept constant when training the model, while all possible values of “mtry” were tested. For each model, the value of “mtry” with the highest Kappa index during cross-validation was considered the optimal value (Kuhn and Johnson, 2013). For each EUNIS level, we calculated a confusion matrix, accuracy indices (kappa, overall accuracy, and F1 score) and variable importance, expressed as the mean decrease in the Gini index (MDGini) (Breiman, 2001). The most accurate model was then applied to predict grassland habitats throughout mainland France, using the permanent grassland map as a mask (Hubert-Moy et al., 2019).
All analyses were performed with R software (R. Core Team, 2019) using the packages “randomForest” (Breiman, 2001), “caret” (Kuhn, 2008), and “blockCV” (Valavi et al., 2019), as well as with QGIS software (QGIS Development Team, 2018).
3 Results
3.1 Characterization of grassland habitats
The spectral and environmental characteristics of grassland habitats varied at EUNIS levels 2 (Figure 2) and 3 (Figure 3). EUNIS habitat A2 (littoral sediment) had lower NIR values in winter, probably related to flooding. Its climate had high isothermality, low temperature seasonality, warm winters, and low annual precipitation but which had clear seasonality. This habitat occurred on wet and basic soils, probably related to salt tolerance, with high clay and low organic carbon content.

Characteristics of EUNIS level 2 grassland habitats according to spectral and environmental variables (see Table 2 for acronyms). The more positive or negative the v-test value, the more the habitat had higher or lower variable values, respectively. An absolute value of the v-test value >1.96 corresponds to p < 0.05.

Characteristics of EUNIS level 3 grassland habitats according to spectral and environmental variables (see Table 2 for acronyms). The more positive or negative the v-test value, the more the habitat had higher or lower variable values, respectively. An absolute value of the v-test value >1.96 corresponds to p < 0.05.
EUNIS habitat E1 (dry grasslands) had low biomass throughout the year. Its climate had a high diurnal range and high temperature seasonality. Topographically, it occurred on southern, elevated, and dry sites. At EUNIS level 3, the six sub-habitats showed several differences (Figure 3). Habitat E1.1 was located in bioclimatic areas with higher isothermality. Habitat E1.2 had lower precipitation seasonality. Habitat E1.3 had the lowest biomass and extremely warm winters, high precipitation seasonality, and basic soils. Habitat E1.5 also had more basic soils. Habitat E1.7 had higher biomass, lower diurnal range, lower isothermality, lower summer temperatures, acid soils, and higher elevation.
EUNIS habitat E2 (mesic grasslands) had higher biomass in spring and summer. From a bioclimatic perspective, this habitat was particularly common in areas with high diurnal range and low precipitation seasonality. It grew on silty and low-carbon-content soils located in depressions and slightly wet areas. At EUNIS level 3, the sub-habitats clearly differed from each other. Habitat E2.1 had lower summer temperatures and a mostly northern orientation. Habitat E2.2 had a higher diurnal range, warmer minimum winter temperatures, less annual precipitation, as well as a more basic and wetter soil. Habitat E2.3 had winter snow cover, low minimum winter temperatures, high annual precipitation, relatively fertile soil with high organic matter content but that was relatively acidic. Habitat E2.6 had lower summer biomass, lower temperature seasonality, and higher silt content.
EUNIS habitat E3 (seasonally wet and wet grasslands) had high biomass in spring and summer and flooding in winter. This habitat occurred in an environment with warm winter climate, low temperature seasonality, and low annual precipitation. It also occurred in depressions, on clayey and very wet soils. At EUNIS level 3, several spectral and environmental characteristics discriminated sub-habitats from one another. Habitat E3.1 had lower summer biomass, higher diurnal range, clearer precipitation seasonality, and more basic soil. Habitat E3.4 had soils that were slightly more silty. Habitat E3.5 had lower temperature seasonality and more organic soils.
EUNIS habitat E4 (alpine and subalpine grasslands) had winter snow cover and low biomass in summer. This habitat occurred in a mountain climate with low isothermality, cold winter temperatures, and high annual precipitation. It was found on dry, relatively fertile, carbon-rich, and non-clayey soils. At EUNIS level 3, habitat characteristics did not differ greatly, except for habitat E4.5, which had normal summer biomass and less snow cover in winter.
EUNIS habitat E5 (woodland fringes) had few unique spectral or environmental characteristics, except for slightly higher annual precipitation and a relatively fertile and carbon-rich soil. Nevertheless, at EUNIS level 3, habitat E5.4 had slightly higher biomass in summer. Habitat E5.5 had snow cover in winter, higher annual precipitation, as well as more fertile, organic, and dry soil.
3.2 Spectral and environmental ordination
When the reference plots of grassland habitats were plotted along the first two axes of the LDA, the spectral and environmental variables discriminated the grassland habitats clearly at EUNIS level 2 and relatively clearly at EUNIS level 3 (Figure 4). At EUNIS level 2, axis 1 distinctly discriminated habitat E4 (alpine) and, to a lesser extent, E5 (woodland fringes), while axis 2 discriminated E1 (dry), E2 (mesic), E3 (wet), and A2 (littoral). At EUNIS level 3, axis 1 revealed a gradient among alpine grassland habitats (E4.1, E4.4, E4.3, and E4.5) and mesophilic grassland habitats (E2.3, E2.1, E2.6, and E2.2). Axis 2 revealed a gradient of dry grassland habitats (E1.7, E1.2, E1.1, E1.9, E1.5, and E1.3) and wet grassland habitats (E3.5, E3.4, and E3.1). Nevertheless, several habitats overlapped at EUNIS level 2, e.g., E1 (dry) and E2 (mesic), E2 (mesic) and E3 (wet); and level 3 (e.g., E1.1, E1.2, E1.9, and E5.2) (Figure 4).

3.3 Pairwise separability of grassland habitats
The pairwise separability of grassland habitats calculated from the JMD varied at EUNIS levels 2 (Table 3) and 3 (Table 4). At EUNIS level 2, habitats A2 (littoral) and E4 (alpine) were clearly discriminated. Habitat E1 (dry) was well separated from habitat E3 (wet) and relatively well separated from habitats E2 (mesic) and E5 (woodland fringes). However, separability between habitats E2 (mesic) and E3 (wet) was weak. At EUNIS level 3, separability was generally satisfactory between habitats not in the same EUNIS level 2 group. The separability between EUNIS level 3 habitats in the same EUNIS level 2 group was strong in group E1 (dry), except for E1.1 and E1.2, and was satisfactory in groups E3 (wet) and E5 (woodland fringes). In contrast, habitats in group E2 (mesic) were poorly discriminated, especially E2.1 and E2.2, E2.1 and E2.6, and E2.2 and E2.6. Similarly, separability between habitats was limited in E4 (alpine) group, particularly between habitat E4.3 and habitats E4.4 and E4.5.
Pairwise separability of EUNIS level 2 grassland habitats based on Jeffries–Matusita (JM) distance. Lower JM distances are indicated in light gray (1.1–1.3), gray (0.9–1.1), and dark gray (< 0.9).
Pairwise separability of EUNIS level 3 grassland habitats based on Jeffries–Matusita (JM) distance. Black lines group upper-level EUNIS level 2 habitats. Lower JM distances are indicated in light gray (1.1–1.3), gray (0.9–1.1), and dark gray (<0.9).
3.4 Modeling the accuracy of predicted grassland habitats by EUNIS level
The optimal value of the “mtry” parameter during RF model calibration was 10 for EUNIS level 2 and 11 for EUNIS level 3. In general, RF model accuracy was acceptable at EUNIS level 2 (Table 5) but insufficient at EUNIS level 3 (Table 6). Analysis of the habitats highlighted differences in model accuracy. At EUNIS level 2, modeling accuracy was very high for habitats A2 (littoral) and E4 (alpine), high for E1 (dry) and E2 (mesic), but low for E3 (wet) and E5 (woodland fringes). At EUNIS level 3, model accuracy was very high for habitats A2.5, E1.3, and E3.1; high for E1.7, E3.5, and E4.1; and low for other habitats. Confusion matrices, with corresponding overall accuracy and kappa index, are provided in the Supplementary material (S2 and S3).
RF modeling accuracy (F1 score) per grassland habitat at EUNIS level 2 (overall F1 score = 0.68).
RF modeling accuracy (F1 score) per grassland habitat at EUNIS level 3 (overall F1 score = 0.52).
3.5 Map of grassland habitat types
Given the accuracy of the models, the RF model at EUNIS level 2 was applied to predict grassland habitats throughout mainland France (Figure 5). At the national scale, habitat E2 (mesic) dominated, especially in the Atlantic and Continental biogeographical regions. Habitat E4 (alpine) was consistently predicted in the Alpine region, while A2 (littoral) was clustered in the coastal Atlantic region. Habitat E1 (dry) was widely predicted in the south of the Continental region and in the Mediterranean region. Habitat E5 (woodland fringes) displayed a linear pattern that was poorly identifiable given the spatial resolution of the map. At a finer scale, habitat mosaics and the degree of uncertainty in the RF model could be discerned (Figure 5A–D). In the coastal area of the Atlantic region (Figure 5A), habitats A2 (littoral) and E3 (wet) dominated, while E2 (mesic) occurred away from the coast. In the north of the Continental region (Figure 5B), habitat E2 (mesic) dominated and was predicted with certainty, except in wetlands, where it was mixed with E3 (wet). Interestingly, some patches of habitat E1 (dry) occurred in this area. The north of the Alpine region (Figure 5C) contained a great diversity of grassland habitats patterned by mountain ridges. Habitat E4 (alpine) was located at the highest elevations, and its prediction certainty decreased as elevation decreased (i.e., edges of patches). Habitats gradually transitioned (low certainty) to E5 (woodland fringes) and E2 (mesic) as elevation decreased. On the island of Corsica, in the Mediterranean region (Figure 5D), habitat E4 (alpine) covered the highest elevations (>2000 m above seal level). Interestingly, a gradient appeared between habitats E4 (alpine) and E1 (dry) as elevation decreased. However, habitat E1 (dry) was predicted with high certainty at lower elevations. Habitat E3 (wet) was located on an alluvial plain in the southeast of the area.

Prediction of EUNIS level 2 grassland habitats throughout France based on the combination of spectral and environmental variables. The color gradients reflect the uncertainty of the model for each habitat. A national map of permanent grasslands (Hubert-Moy et al., 2019) was used as a mask (white). Boxes represent typical sites: A, the Loire estuary; B, the Ardennes hills; C, the Aravis mountains; and D, the island of Corsica.
A comparison of EUNIS level 2 grassland habitat modeling results with very-high-spatial-resolution satellite imagery acquired during the summer season and field polygons not used for modeling is presented in some typical sites in Figure 6. The results emphasize that while the 250 m model grid cells encompass a mosaic of habitats at fine scales, the dominant grassland habitat was correctly classified. For example, grid cells classified as littoral sediments (A2) included mostly mowed or senescent grasslands surrounded by brackish channel networks typical of Atlantic salt marshes (Figure 6A), whereas grid cells classified as wet grasslands (E3) characterized dense green grasslands surrounded by freshwater ditches typical of bottom valley marshes (Figure 6D). In southeastern France, grid cells classified as dry grasslands (E1) contained senescent and rocky lawns mosaicked with sclerophyllous shrubby vegetation patches, typical of Mediterranean landscapes (Figure 6B). In the continental biogeographic region, grid cells classified as Mesic grasslands (E2) captured a farm bordered by grazed grasslands (Figure 6C). In the Continental biogeographic region, grid cells classified as mesic grasslands (E2) comprised grazed grasslands around a farm (Figure 6C). In the mountains, grids classified as alpine grasslands (E4) included lawns but also long snow-covered mineral areas (Figure 6E), while lower grids classified as woodland fringes (E5) covered a mosaic of grasslands and coniferous trees (Figure 6F). Comparison of modeling results of EUNIS level 2 grassland habitats (color-coded grids) with field habitat polygons and Google Earth (GE) ™ images in typical sites (one per habitat) where vegetation archives (field habitat polygons) not used to calibrate the model were available. Each of these sites was selected at more than 5 km from a reference plot to minimize spatial autocorrelation bias.
Among the variables used in the RF model at EUNIS level 2, the two NIR spectral variables acquired in the spring were important, but the others were less so (Figure 7). Bioclimatic variables were also important, in particular bio.6 and bio.4. In contrast, the topographic variables TPI and TWI were moderately important, but northness and eastness were not important. Soil variables were also of little importance, except for ORCRDC.

Importance of variables (see Table 2 for acronyms) in the RF model at EUNIS level 2. Spectral variables are in red, bioclimatic variables in blue, soil variables in orange, and topographic variables in green.
4 Discussion
4.1 Complementarity between environmental and spectral variables
The descriptive approach emphasizes that the spectral and environmental variables derived from Earth observation data clearly characterize the grassland habitats in a manner consistent with their ecological description provided by the EUNIS reference system (Davies et al., 2004; Louvel et al., 2013). Although these results were expected given recent studies of plant species mapping (Randin et al., 2020), it nevertheless supports the synergy among the many types of variables, especially the new 250 m SoilGrids soil variables (Hengl et al., 2017), which have rarely been used to model natural habitats. Specifically, soil variables effectively discriminated grassland habitats as a function of soil acidity (e.g., low pH for habitat E1.7, high pH for habitat E1.3) and soil drainage capacity, with poorly drained soil (i.e., clayey) for wet grassland habitats (A2 or E3) and well-drained soil (i.e., silty) for mesophilic grassland habitats (E2). Bioclimatic variables discriminated between grassland habitats related to biogeographic regions, such as Mediterranean (e.g., E1.3, E1.5) and Alpine (E4) grassland habitats. Topographic variables, such as TWI, also helped to discriminate wet grassland habitats (A2, E3). As Moran et al. (2017) reported, TPI also discriminates depressional wet grassland habitats. In this study, TPI also identifies dry grasslands (E1), which are frequently clustered on hills. The two orientation variables were less important than expected, perhaps because the levels of the grassland classification system were too general. Spectral variables were used to characterize mountain grassland habitats (e.g., E4, E2.3) based on snow cover, wet grassland habitats (e.g., A2, E3) based on open water in winter, and fertilized mesic (E2) and unfertilized dry (E1) grassland habitats based on biomass in spring.
Among the 27 variables used in the RF models, the spectral variable acquired in the NIR spectrum in spring contributed the most (Figure 6). This supports the recent research of Chignell et al. (2018) and Shiferaw et al. (2019), who highlighted that spectral variables are more useful than environmental variables for modeling natural vegetation. The spring spectral variable (NIR.105, i.e., April 15) featured a key period during which each grassland habitat has a specific spectral response related to phenological or environmental conditions. Specifically, this period corresponds to: (i) the beginning of photosynthetic activity (Franke et al., 2012), (ii) a peak of biomass in littoral sediments (A2) (Kerneis et al., 2007) and dry grasslands (E1) (Weber et al., 2018), but also (iii) the seasonal draining of wet grasslands (E3) (Rapinel et al., 2019), and (iv) the snowmelt in alpine grasslands (E4) (Fontana et al., 2008). The summer spectral variable (NIR.185, i.e., July 4) is the one of the most contributing to the RF model: the senescence process occurs during early summer in dry grasslands (E1) (Weber et al., 2018) and littoral sediments (A2) (Kerneis et al., 2007), whereas peak growth occurs in wet (E3) (Franke et al., 2012) and alpine (E4) grasslands (Fontana et al., 2008). In addition to spectral variables, temperature variables contribute strongly to the discrimination of grassland habitats at the EUNIS 2 level. However, the contribution of bioclimatic variables may have been affected by sampling biases of reference plots given the availability of vegetation archives with a minimum habitat coverage (250 × 250 m). Hence, reference plots of dry grasslands (E1), wet grasslands (E3), and woodland fringes (E5) may be over-sampled on the Continental, Atlantic, and Alpine biogeographic regions, respectively. Though, the minimum temperature of the coldest month (bio.6) clearly discriminates grassland habitats located in biogeographic regions with warm winters (Atlantic and Mediterranean), such as littoral sediment (A2) or wet grasslands (E3), from those located in biogeographic regions with cold winters (Alpine), such as alpine grasslands (E4) or woodland fringes (E5). To a lesser extent, temperature seasonality (bio.4) discriminates grassland habitats occurring in biogeographic regions with strong seasonality (Continental), such as dry grasslands (E1), from those occurring in biogeographic regions with little seasonality (Atlantic and Mediterranean) such as littoral sediment (A2) or wet grasslands (E3).
4.2 Influence of data quality on grassland habitat discrimination
Although grassland habitats were consistently characterized using spectral and environmental variables, discriminating them was sometimes challenging, especially EUNIS level 3 grassland habitats that belonged to the same EUNIS 2 group (Table 4). This could have been due to the ecological gradient that inevitably occurs in a classification system of natural vegetation such as EUNIS (Chytrý et al., 2020). Thus, the overlaps of several habitats, e.g., E2 (mesic) and E3 (wet), E2.1, and E2.2 observed on the LDA plots (Figure 4) are partly consistent since they reflect an ecological reality. In addition, inadequate discrimination between certain habitats may have been due to outliers embedded in reference plots (He et al., 2015; Moran et al., 2017). These outliers may be related to misassignment of reference plots to EUNIS classes during field surveys, especially in areas with an ecological gradient or fine-grained pattern, or when standardizing data (Jarić et al., 2019). Interestingly, recent studies demonstrated that these outliers can be identified via their spectral and/or environmental properties (Halladin-Dąbrowska et al., 2020; Liu et al., 2018; Raab et al., 2018).
Difficulty in discriminating certain grassland habitats can also be due to the quality of spectral and environmental variables (He et al., 2015). Although the remote-sensing-based variables used in this study cover the continental/global extent, their artefacts and moderate spatial resolution are two disadvantages for discriminating grassland habitats. Hence, spectral variables (i.e., monthly synthesis MOD13Q1) may contain aberrant values caused by haze or shadows (Didan, 2015), soil variables inevitably contain uncertainties because they are generated using machine-learning models (Hengl et al., 2017), and topographic variables from digital elevation models (DEMs) derived from optical and synthetic aperture radar (SAR) remote-sensing data may be biased in wooded or steep-slope areas (Moudrý et al., 2018). In addition, these variables were re-sampled at 250 m spatial resolution to be consistent with the research on MODIS-based grassland mapping (Hubert-Moy et al., 2019). Although the reference plots were extracted within field polygons that contained a dominant habitat, these polygons often encompass a mosaic of fine-grained pattern habitats (Rapinel et al., 2018). The use of high-spatial-resolution satellite time series would help characterize grassland habitats at an intra-plot scale (Raab et al., 2018; Schuster et al., 2015) and introduce new variables such as grassland management (Griffiths et al., 2020).
4.3 Modeling considerations
The model accurately mapped (overall F1 score = 0.68) grassland habitats at EUNIS level 2 that had been previously identified from a MODIS time series (Hubert-Moy et al., 2019). This map provides an original view of grassland habitat distribution at two levels. First, the thematic accuracy of six EUNIS grassland classes is an improvement compared with maps of French national land cover (Inglada et al., 2017) or European CORINE Land Cover (Büttner et al., 2016) which were unable to estimate grassland floristic composition (Amici et al., 2017). Second, spatial grid size (250 × 250 m2) and comprehensiveness have been improved compared with kilometer-scale presence/absence maps derived from spatially incomplete vegetation archives (Carli et al., 2020; Chytrý et al., 2020). In addition, spatialization of model uncertainty (Figure 5) provides a critical interpretation of the map, because areas of low certainty may correspond to ecological gradients (Raab et al., 2018) or rare or non-grassland habitats that were excluded from the classification (Rapinel et al., 2020). However, this national map of EUNIS level 2 habitats has two disadvantages. First, although the mask used to pre-locate grassland habitats was accurately modeled at the national scale, some regional disparities were observed, particularly in the Mediterranean biogeographic region, where natural grasslands were under-detected (Hubert-Moy et al., 2019). Consequently, some grassland habitats in this region were not accurately masked (Figure 5). Second, although the predictive analysis was based on a rigorous automated approach, including model hyper-tuning with a spatial cross-validation constraint, several expert- or literature-based decisions (e.g., threshold settings, maximum number of reference plots per habitat, number of folds during cross-validation, spatial block size) were inevitably subjective and could have decreased accuracy of the predictions.
The prediction at EUNIS level 2 was satisfactorily accurate, but showed disparities for certain grassland habitats. As expected, habitats A2 (littoral) and E4 (alpine) were predicted best due to their distinct characteristics (Figure 4). Habitats E1 (dry) and E2 (mesic) were predicted more accurately than E3 (wet) or E5 (woodland fringes), perhaps because E1 (dry) and E2 (mesic) had more reference plots (487 and 555, respectively) than E3 (wet) and E5 (woodland fringes) habitats (252 and 232, respectively), despite efforts to prevent unbalanced sampling. Since the least-frequent habitats have less influence on the overall accuracy index, which is used to calibrate the model, they are predicted less accurately (Maxwell et al., 2018). A second explanation is related to the ecological diversity of each habitat class. Habitats E3 (wet) and E5 (woodland fringes) are much more diverse at lower levels of the EUNIS classification system than E1 (dry), E2 (mesic), or E4 (alpine) habitats (Davies et al., 2004). Consequently, habitats E3 (wet) and E5 (woodland fringes) had higher ecological variance and, thus, lower discrimination. Moreover, habitat E5 (woodland fringes) is difficult to determine and map in the field given its linear and transitional pattern. Therefore, it is often under-represented or absent from vegetation maps of Natura 2000 sites. Thus, it is challenging to model habitat E5 (woodland fringes) with a 250 × 250 m2 grid, except for the long edges between high-elevation forests and alpine grasslands (Figure 5).
Predictions at EUNIS level 3 were not satisfactorily accurate (Table 6), likely due to the large number of classes (n = 21) and the inherent ecological gradients between certain grassland habitats (Chytrý et al., 2020). To improve the prediction accuracy of EUNIS level 3 grassland habitats, we considered using the approach developed by Gavish et al. (2018) by applying a hierarchical configuration within the habitats predicted at EUNIS level 2. However, this approach assumes that habitats are accurately modeled at the higher hierarchical level (EUNIS level 2), which was not the case in our study. Moreover, it should kept in mind that the number of spectral variables decreased greatly (from 96 to 11) after the correlation analysis performed to decrease complexity of the RF model. Thus, the annual temporal profile of grassland habitats was simplified greatly, which could have influenced the characterization of certain habitats. One interesting approach that could be applied to discriminate grassland habitats better is functional principal component analysis, which reduces the dimensionality of spectral variables while preserving the full temporal profile of each vegetation unit (Pesaresi et al., 2020). The predictive accuracy of grassland habitats at EUNIS level 3 would also be improved by characterizing the dominant plant species using hyperspectral data. Whereas until recently airborne hyperspectral data restricted characterization of herbaceous plant species to sites of a few squared kilometers (Halladin-Dąbrowska et al., 2020), recent (e.g., PRISMA, DESIS, HyspIRI) or upcoming (e.g., EnMAP) launches of hyperspectral space missions offer new perspectives for characterization of dominant plant species of grassland habitats at regional or national scales (Zhang et al., 2021).
We initially considered predicting the distribution of grassland habitats using decision rules defined according to their environmental characteristics (Mücher et al., 2009), since this approach appeared simple and straightforward. However, this approach was too complicated to implement given the ecological gradients available (Figure 4), which would have required setting fuzzy thresholds, and given the large number of spectral and environmental variables initially available (n = 122), which would have required defining complex decision rules. Thus, we let the artificial intelligence run using an RF model that is widely recognized to be able to perform complex and high-dimensional classification analysis (Belgiu and Drăguţ, 2016). The large number of variables used at each branch of the tree in the RF model (mtry = 10–11) highlights the complexity of the decision rules required to discriminate grassland habitats. Although the results are somewhat weak at EUNIS level 3 (Table 5), they are encouraging at EUNIS level 2 according to the F1 scores and the consistency of the mapping results (Figure 5). Model accuracy would likely improve using new deep-learning models (Christin et al., 2019).
We predicted grassland habitats directly by combining spectral and environmental variables since grassland locations were already known from a MODIS time series (Hubert-Moy et al., 2019). Had we not already known grassland locations, the approach of Álvarez-Martínez et al. (2018), which models potential habitats using environmental variables, and then actual habitats using spectral variables, would have been more appropriate.
Although the present study’s approach was evaluated in France using a Europe-specific classification system, an adaptation of the global EcoVeg classification system, which also describes natural habitats using several physionomic, bioclimatic, topographic, edaphic and floristic criteria (Faber-Langendoen et al., 2014), could be extended to other parts of the world.
4.4 Importance of vegetation databases
We used only open data and processed them only with open-source software. Although the software and spectral and environmental variables were easily available, collecting vegetation archives of natural habitats was challenging due to the current lack of a national database. We had to contact each regional environmental department individually, standardize the regional GIS layers at the national level (e.g., format, projection system), and then sometimes harmonize the classification systems to EUNIS. While most of the regional departments supplied field data, some areas were poorly covered, especially in northeastern and southwestern France (Figure 1). Therefore, there is an urgent need to build and contribute to standardized field vegetation databases at the national or European scales. Several projects are working on such databases, such as Information System on Nature and Landscapes (Jomier et al., 2019) or GeoNature (Corny et al., 2019) in France, and Copernicus inSitu (Marconcini et al., 2020) or the European Vegetation Archive (Chytrý et al., 2020) in Europe.
4.5 Implications for conservation
This new national grassland habitat map should contribute to the field of ecology. First, an improved comprehension of grassland habitat functioning could be provided by: (i) deriving landscape indicators of conservation status such as fragmentation (Fletcher et al., 2018), (ii) quantifying the value of ecosystem services such as carbon storage (Zhao et al., 2020), or (iii) modeling the resilience of grassland habitats to climate change (Van Oijen et al., 2018). Second, the protection and conservation of these ecosystems could be strengthened by: (i) a better delineation of protected areas, checking the consistency of the current protected areas with the spatial distribution of grassland habitats, (ii) a better evaluation of the effectiveness of conservation policies in protected areas, which has been done to date with land-cover maps (Kallimanis et al., 2015), (iii) a more comprehensive inventory of grassland habitats, especially outside protected areas (Deák et al., 2020). Third, the spatialization of ecological corridors could be improved with the integration of a “grassland habitat” grid. Fourth, linkages between EUNIS and red-list habitats classification schemes (Rodwell et al., 2018) would support the monitoring of threatened habitats such as dry grasslands (Deák et al., 2020; Gigante et al., 2018), but also flora (Gauthier et al., 2017) and fauna (Trull et al., 2018) on the IUCN red list of threatened species.
5 Conclusion
This study shows that while the combined use of open spectral, environmental, and ground-vegetation data can characterize grassland habitats consistently, it does not discriminate them completely, especially at EUNIS level 3, likely due to ecological gradients and artefacts embedded in predictive variables and vegetation archives. RF modeling combined with a spatial cross-validation constraint yielded a consistent national map of grassland habitats at EUNIS level 2. Representation of model uncertainty provides critical interpretation of the map since it highlights ecological gradients and rare habitats. Future studies will focus on detecting outliers in vegetation archives, the use of finer-scale variables derived from Sentinel satellites, and the development of deep-learning modeling.
Supplemental Material
Supplemental Material, sj-pdf-1-ppg-10.1177_03091333211023689 - Combined use of environmental and spectral variables with vegetation archives for large-scale modeling of grassland habitats
Supplemental Material, sj-pdf-1-ppg-10.1177_03091333211023689 for Combined use of environmental and spectral variables with vegetation archives for large-scale modeling of grassland habitats by Sébastien Rapinel, Léa Panhelleux, Arnault Lalanne and Laurence Hubert-Moy in Progress in Physical Geography: Earth and Environment
Supplemental Material
Supplemental Material, sj-pdf-2-ppg-10.1177_03091333211023689 - Combined use of environmental and spectral variables with vegetation archives for large-scale modeling of grassland habitats
Supplemental Material, sj-pdf-2-ppg-10.1177_03091333211023689 for Combined use of environmental and spectral variables with vegetation archives for large-scale modeling of grassland habitats by Sébastien Rapinel, Léa Panhelleux, Arnault Lalanne and Laurence Hubert-Moy in Progress in Physical Geography: Earth and Environment
Supplemental Material
Supplemental Material, sj-xlsx-1-ppg-10.1177_03091333211023689 - Combined use of environmental and spectral variables with vegetation archives for large-scale modeling of grassland habitats
Supplemental Material, sj-xlsx-1-ppg-10.1177_03091333211023689 for Combined use of environmental and spectral variables with vegetation archives for large-scale modeling of grassland habitats by Sébastien Rapinel, Léa Panhelleux, Arnault Lalanne and Laurence Hubert-Moy in Progress in Physical Geography: Earth and Environment
Supplemental Material
Supplemental Material, sj-xlsx-2-ppg-10.1177_03091333211023689 - Combined use of environmental and spectral variables with vegetation archives for large-scale modeling of grassland habitats
Supplemental Material, sj-xlsx-2-ppg-10.1177_03091333211023689 for Combined use of environmental and spectral variables with vegetation archives for large-scale modeling of grassland habitats by Sébastien Rapinel, Léa Panhelleux, Arnault Lalanne and Laurence Hubert-Moy in Progress in Physical Geography: Earth and Environment
Footnotes
Acknowledgements
The authors are grateful to Clémence Rozo (University of Rennes) for GIS data pre-processing. MODIS images were retrieved from https://lpdaac.usgs.gov, maintained by the NASA EOSDIS Land Processes Distributed Active Archive Center (LP DAAC) at the USGS Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota, USA. WorldClim v2 data were downloaded from https://www.worldclim.org/, maintained by the University of California, Berkeley, USA. SoilGrids data were downloaded from https://www.soilgrids.org and were developed as part of the Global Soil Information Facilities (GSIF) initiative. EU-DEM, retrieved from
, was funded by the European Union in the framework of the Copernicus program. The regional field databases were collected by the Conservatoires Botaniques Nationaux (Alpin, Bailleul, Bassin Parisien, Brest, Franche-Comté, Massif Central, Méditerranéen) and delivered by the Directions Régionales de l’Environnement, de l’Aménagement et du Logement (Auvergne-Rhône-Alpes, Bourgogne-Franche-Comté, Bretagne, Corse, Hauts-de-France, Nouvelle-Aquitaine, Normandie, Pays de la Loire, Provence-Alpes-Côte d’Azur).
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the French Ministry of Ecology.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
