Abstract
The area of karst terrain in China covers 3.63×106 km2, with more than 40% in the southwestern region over the Guizhou Plateau. Karst comprises exposed carbonate bedrock over approximately 1.30×106 km2 of this area, which suffers from soil degradation and poor crop yield. This paper aims to gain a better understanding of the environmental controls on crop yield in order to enable more sustainable use of natural resources for food production and development. More precisely, four kinds of artificial neural network were used to analyse and simulate the spatial patterns of crop yield for seven crop species grown in Guizhou Province, exploring the relationships with meteorological, soil, irrigation and fertilization factors. The results of spatial classification showed that most regions of high-level crop yield per area and total crop yield are located in the central-north area of Guizhou. Moreover, the three artificial neural networks used to simulate the spatial patterns of crop yield all demonstrated a good correlation coefficient between simulated and true yield. However, the Back Propagation network had the best performance based on both accuracy and runtime. Among the 13 influencing factors investigated, temperature (16.4%), radiation (15.3%), soil moisture (13.5%), fertilization of N (13.5%) and P (12.4%) had the largest contribution to crop yield spatial distribution. These results suggest that neural networks have potential application in identifying environmental controls on crop yield and in modelling spatial patterns of crop yield, which could enable local stakeholders to realize sustainable development and crop production goals.
I Introduction
Karst landscape covers vast areas of the globe, including over 30% of China. They are characterized by exposed carbonate rocks that weather rapidly and are highly susceptible to environmental change and natural erosion. In China, the karst landscape in the southwest region has experienced rapid and intensive alterations to land use and associated ecosystem degradation over the last 50 years (Chen et al., 2018; Li et al., 2018; Moore et al., 2017). The intensification of agriculture since the late twentieth century has led to a rapid deterioration of the soil, reflected in reduced crop production and the rapid loss of soil (Green et al., 2019). Under the Grain for Green Program, millions of hectares of farmland have been turned into non-crop vegetation in order to combat ‘rocky desertification’ (Cheng et al., 2015). Ensuring both ecological and food security is a top priority for all stakeholders in China.
The karst environment has unique characteristics, such as soluble rock, a calcium-rich and alkaline nature, soil scarcity, a double-layer structure and water leakage through cavernous channels that rapidly link topsoil to groundwater. These environmental stresses impose an adverse influence on the growth of vegetation in the karst region (Tong et al., 2017; Yuan, 2001). Studying the ecosystem features of karst benefits from a comprehensive understanding of interactions among different elements in the critical zone (CZ) of this region. CZ observatories (CZOs) have thus been established in China’s karst region to gain a holistic understanding of soil formation from bedrock, water transport to the groundwater below and beyond, and the interactions with vegetation (Anderson et al., 2008; Banwart et al., 2012; Grant and Dietrich, 2017; Lin et al., 2011). This research can not only promote the knowledge of CZ processes, but also provides fundamental information that could be applied in practice to help local people by applying, adapting and developing decision support tools (DSTs) and help to guide practices such as crop production (Banwart et al., 2013; Davis et al., 2005; Menon et al., 2014; Rose et al., 2016).
As agriculture is one of the largest drivers of land cover change (Scholes et al., 2018), it provides the focus of numerous CZOs across the globe (Guo and Lin, 2016; Kumar et al., 2018). In karst regions, food production is critically affected by the environment and is manifested as persistently poor and declining crop yield (Liu, 2006; Wang et al., 2004; Zhang et al., 2013). For example, karst rocky desertification and serious soil erosion from poor farming practice decreases land productivity (Nguyen et al., 1996; Tan et al., 2010; Yan and Cai, 2015). Until now, the existing models for estimation of crop yield mainly contain statistical approaches and process-based models (including large scale global gridded crop models), which are not ideally parameterized for the unique and heterogeneous properties of karst landscapes (Zhao et al., 2016; Zhao et al., 2017). Process-based models have three limitations for their use in the complex karst landscapes. Firstly, the assumptions of relevant processes for crop growth vary greatly among different models, leading to different parameterization (Rötter et al., 2011). Secondly, the primary focus of most process-based models is on the aboveground crop biomass, whereas the belowground processes and soil parameters also play an important role in influencing or even controlling crop growth (Folberth et al., 2016), especially in karst systems where soil depth is often a limiting factor (Zhang et al., 2020). Lastly, the impact of climate change on the environmental factors affecting crop growth need to be included in the existing process-based models (Rosenzweig et al., 2014). Statistical approaches also have limitations in karst systems, with direct relationships between crop yield and meteorological data (or other environmental factors) underpinning predictions (Reynolds et al., 2000; Wu et al., 2015; Van Wart et al., 2013). However, traditional models cannot tackle groups of different factors and crop parameters with non-linear relationships (Kogan et al., 2018; Prasad et al., 2006). In addition, the complexity and heterogeneity of karst landscapes, the importance of both sub-surface and surface soil and water resources, and the prevalence of small-scale subsistence farming in Guizhou all contribute to limiting the applicability of existing crop models. In recent years, new technology, such as artificial neural networks (ANNs), have been fast developed, which may provide cost-effective and comprehensive solutions for better crop yield, environmental management and DSTs through their use of non-linear regressions and enabling interaction between different factors (Chlingaryan et al., 2018; Everingham et al., 2016; Panda et al., 2010).
In this paper, we address the current limitations of crop modelling for karst landscapes by assembling spatial data of crop yield per unit (hereinafter called crop yield or YPA) and its influencing factors into ANNs, in order to analyse and simulate the spatial patterns of crop yield for seven crop species in Guizhou Province. It is the first time that multi-factorial analysis has been undertaken to simulate and explain spatial patterns of crop yield in this environment. The approach is made possible by the application of powerful and novel machine learning technology to precisely simulate the spatial patterns of crop yield. Four kinds of artificial neural network were used to: (1) detect and classify the spatial patterns of the crop yield in Guizhou Province, and (2) simulate the spatial patterns based on different influencing factors and evaluate the factor contribution for each. This research is valuable for further developing powerful DSTs to guide land management and farming decisions in karst regions. The approach also has potential to be expanded to the research on crops of other complex landscapes in the world.
II Data and methods
2.1 Study region
Guizhou is located in the southwest of China (Figure 1), with an area of 1.76×105 km2, and has a population of 36 million (as at 2018), ranking nineteenth of all 34 provinces, with gross domestic product for Guizhou Province in 2018 ranking twenty-fifth among Chinese provinces according to statistical data (NSBC, 2019). Guizhou Province is located at the heart of the East Asia Karst, one of the three largest areas of almost unbroken karst in the world (He et al., 1998; Sweeting, 1993). About 73% of the total area is underlain by carbonate rocks, and karst landforms are widely distributed (Su, 2002). In terms of geomorphology, Guizhou Province contains 87% plateau-mountains, 10% hills and 3% basins (He et al., 1998).

Location and administrative map of Guizhou Province and the nine prefectures.
In the study area, the most widely grown food crops are paddy rice, maize, wheat, soybean and potato. According to government statistics, in the past 60 years the total crop yield of Guizhou has increased over threefold, while the crop yield per unit planting area (t/ha) is two times greater than 60 years ago, and therefore the economy has grown substantially. However, due to environmental limitations, the crop yield per unit planting area (t/ha) and income in this region is only 75.6% and 61.1% of the national average over 2005 and 2007 respectively (NSBC, 2019).
2.2 Data resource
In this study, we selected seven main crop species that are produced in Guizhou, including five kinds of food crop (maize, potato, rice, soybean and wheat) and two commercial crops (rapeseed and groundnut). The relative crop-specific data for the seven species were compiled from the datasets of Earthstat, which included crop yield data, total harvested area and fertilization rates, alongside irrigation data from the MIRCA2000 dataset (Table 1). We also imported nine additional crop yield influencing factors including meteorological, topographic (digital elevation model – DEM) and soil properties data (Table 1). Prior to analysis, we first unified the spatial resolution of all data resources into 5’ by aggregation and resampling (cubic method) and extracted all dataset for year of 2000.
Introduction of data resources (*crop-specific data).
2.2.1 Earthstat datasets
EarthStat provides geographic datasets that help solve the grand challenge of feeding a growing global population while reducing agriculture’s impact on the environment. EarthStat is a collaboration between the Global Landscapes Initiative at the University of Minnesota’s Institute on the Environment and the Land Use and Global Environment lab at the University of British Columbia. The datasets contain different kinds of agricultural data, including harvested area, crop yield and fertilization rates (among them we selected the value of nitrogen-N, phosphorous-P, and potassium-K). The harvest area data was achieved by combining agricultural inventory data and satellite-derived land cover data (Ramankutty et al., 2008). The Earthstat data was produced by combining national, state and county level census statistics with a recently updated global dataset of croplands on a 5’ by 5’ latitude/longitude grid. These two kinds of data depict, circa the year 2000, the area (harvested) and yield of 175 distinct crops of the world (Monfreda et al., 2008).
2.2.2 Soil property data
The Harmonized World Soil Database (HWSD, version 1.2) is a global soil database framed within a geographic information system (GIS) and contains up-to-date information on world soil resources (Nachtergaele et al., 2009, 2012; Shangguan et al., 2013). It provides a raster databases, with over 15,000 different soil mapping units, which combines existing regional and national updates of soil information worldwide (Batjes and Bridges, 1994; Shi et al., 2004, 2006). In this study, we analysed five soil properties (soil bulk density, soil organic carbon, pH, soil cation exchange capacity and carbonate content), which were greatly proved influential on crop growth, to investigate the relationships between soil features and the spatial distribution of crop yield (Letey, 1958).
2.2.3 Meteorological data
The European Union Water and Global Change project (http://www.eu-watch.org) provides a gridded European Union Water and Global Change-Forcing-Data-ERA-Interim (WFDEI) data product (Ren et al., 2018; Weedon et al., 2014). It contains eight meteorological variables from 1979 with a spatial resolution of 0.5°. In this study, we selected and calculated the annual average temperature and shortwave radiation (for the year of 2000) as influencing factors on crop yield for further analysis.
2.2.4 Soil moisture data
For soil moisture, we employed the product released by NOAA’s National Center for Environmental Prediction (NCEP), Climate Prediction Center (CPC), with global spatial coverage at 0.5° resolution from 1948 to present (Ibrahim et al., 2015). The monthly dataset consists of a file containing monthly averaged soil moisture water height equivalents for the globe from 1948 onwards. Values are model-calculated and not measured directly. Soil moisture is estimated by a one-layer hydrological model (Huang et al., 1996; Van den Dool et al., 2003). We extracted the data for 2000 and calculated the annual average of soil moisture in Guizhou Province.
2.2.5 Irrigation information
MIRCA2000 (monthly irrigated and rainfed crop areas around 2000) global dataset shows us the monthly irrigated and rainfed crop areas around the year 2000 that distinguishes irrigated and rainfed areas for 26 crop classes, among them 21 major crops and the crop groups of pulses, citrus, fodder grasses, other perennial crops and other annual crops (Portmann et al., 2010). The dataset refers to the period 1998–2002 and has a spatial resolution of 5’ by 5’ (Neumann et al., 2011).
2.2.6 DEM (digital elevation model)
The US Geological Survey (USGS) and the National Geospatial-Intelligence Agency (NGA) have collaborated on the development of a notably enhanced global elevation model named the Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) that replaces GTOPO30 as the elevation dataset of choice for global and continental scale applications (Danielson and Gesch, 2011). The GMTED2010 product suite contains seven new raster elevation products for each of various spatial resolutions and incorporates the current best available global elevation data. The new elevation products have been produced using the following aggregation methods: minimum elevation, maximum elevation, mean elevation, median elevation, standard deviation of elevation, systematic subsample, and breakline emphasis (Athmania and Achour, 2014; Carabajal et al., 2011). Slope data aggregated the variable to different spatial grains using several aggregation approaches (including the 5’ resolution we utilized) (Amatulli et al., 2018).
2.3 Four kinds of artificial neural network
Herein, we adopted four kinds of ANN including Self-organization Feature Map (SOFM), Back Propagation (BP), General Regression Neural Network (GRNN) and Recurrent Neural Network (RNN). Among them, SOFM was used to realize unsupervised classification of Guizhou Province into high-, medium- and low-level crop yield regions for the seven crop species. Meanwhile, BP, GRNN and RNN were employed to simulate the spatial patterns of crop yield for the seven species, by inputting the different influencing factors introduced above. In addition, we compared the simulation of these three networks by evaluating different indices of accuracy and runtime. Details of the networks are included in the supplementary material.
Figure 2 shows the process of simulation of crop yield in Guizhou Province. Firstly, we input the four groups of influencing factors into the three kinds of ANN (BP, GRNN and RNN). Then we randomly assigned the pixels of crop yield into a training group (75% of the total number) and validation group (25% of the total number). Secondly, we trained the networks and simulated the crop yield for the seven species, respectively. Lastly, we compared the simulation of the three networks by evaluating the indices of accuracy and the runtime of the networks. The indices chosen to evaluate the accuracy is R (correlation coefficient between true [observed] value and forecasted [simulated] value of the validation group), RMSE (root mean square error) and RME (relative mean error), the latter two of which indicate absolute and relative deviation of the simulation, respectively – equations (1) and (2), where T and F represent true value and forecasted value, respectively; n is the total number of validation samples.

Process of simulation of crop yield using ANN.
III Results
3.1 Distribution of harvested area and agricultural management
Distribution of harvested area for the seven species in Guizhou shows spatial variation in different prefectures (Figure 3a). Among them, Zunyi and Bijie have the largest total area of the selected crops with values of 641,807 (ha) and 586,506 (ha), respectively. Both of them are located in the northwestern Guizhou Province. Conversely, Guiyang, Liupanshui and Anshun have the smallest total harvested area, with values of 193,178 (ha), 137,444 (ha) and 193,631 (ha), respectively. In terms of different species, maize, rice and wheat have the largest total harvest areas in Guizhou, which are 653,805 (ha), 754,212 (ha) and 545,407 (ha), respectively, accounting for 22.9%, 26.5% and 19.1% of the total crop area. However, the proportion of harvested area for each crop species differ greatly across different prefectures. For example, the maize area in Bijie is 166,121 (ha), which accounts for 28.3% of total area. In contrast, the maize area in Qiandongnan is 52,897 (ha), which only accounts for 9.8% of total area. All crop species received fertilizer (N, P and K) application, whereas the quantity of irrigation was species dependent (Figure 3b). Rice received the highest percentage of irrigated area (monthly mean value 52.2%), while there was no irrigation for rapeseed in the study region. From the result of fertilization, we can see the crops that are commonly cultivated (including maize, rice and wheat) tend to have higher rates of fertilization. The total amount of fertilization for these three species is 204.3 (kg/ha), 224.9 (kg/ha) and 198.4 (kg/ha), respectively. Overall, N fertilizer contributed 74.5% of fertilizer use across all crop species, followed by P fertilizer (16.5%) and K fertilizer (9.0%).

Harvested area (a), fertilization (b) and irrigation (b) of the seven selected crop species in Guizhou Province. Error bars indicate the standard deviation of value among all prefectures.
3.2 Relationship between slope and harvested area/yield
Slope cropland is widely distributed in Guizhou Province. For all of the seven crop species, most of the harvested area is concentrated in the slope region between 7.5° and 20°. Over 85% of harvested area is located on the slope larger than 7.5°. Among the seven species, potato has the largest area with the slope larger than 7.5°, which accounts for 92.7% of the total harvest area. The slope of 15° is an important threshold for the implementation of the Grain for Green Program in many prefectures, with many local governments intending to remove slope cropland above 15° from production to realize the goal of the project. From Figure 4 we can see that this management policy could impact on more than a quarter of all cropland, ranging from a minimum of 23.6% for rapeseed to a maximum of 30.1% for maize.

Distribution of harvested area along slope gradient (dashed line in each subplot indicates the cumulative percentage of harvested area for the area of slope greater than the x axis value).
Herein, we calculated the correlation coefficient between slope and yield per area/harvested area for the seven crop species, based on pixel scale. As shown by Figure 5, all cases show a significantly negative relationship, illustrating that with increasing slope, both yield and harvest area tend to decrease in the study region. Of the seven species, slope has the greatest impact on the YPA of maize (with a significant R of −0.31) and the least impact on the YPA of rice (R = −0.05). However, for the total harvested area, the value of rapeseed decreases most distinctly with the increase of slope (corresponding R is −0.42), while the harvested area of potato has the least relationship with slope (corresponding R is −0.10) among all the crop species.

Linear regression between slope and yield per area (YPA; left panel) and total harvested area (right panel) for the seven crop species (with all linear correlation passing significance test (P < 0.001), except for YPA and slope of rice (P = 0.03)).
3.3 Classification using SOFM
We used SOFM to classify Guizhou Province into regions with different levels (high-, medium- and low- level) of crop production (including crop YPA and total crop yield multiplied by corresponding harvested area; Figure 6) of the seven species. The regions of high-level YPA are mainly located in the central area of Guizhou Province, which occupies a large proportion of Guizhou and Tongren prefectures, and some of Bijie and Anshun. The prefectures of Qiannan and Qiandongnan have relatively large regions of low-level YPA, especially in the southeastern area. Some western and southwestern areas of Guizhou also have low-level characteristics. The result of total crop yield also shows similar spatial traits with crop YPA. Firstly, most regions of high-level are widely concentrated in the middle and northern area of Guizhou. Secondly, some prefectures like Qiannan and Qiandongnan also have a large proportion of low-level regions, which are mainly located in the southern and southeastern area, as well as some other parts located in the very eastern and northern area. However, different from that of crop YPA, which has relatively large clustering of spatial distribution, the pixels with one level of total crop yield tend to be more heterogeneously distributed, resulting in the fragmentation of different levels in the whole region.

Classifying Guizhou Province spatially into different metrics of crop production: (a) crop YPA; (b) total crop yield.
3.4. Simulation of crop yield using three artificial neural networks
Table 2 exhibits different indices to evaluate the result of three ANNs for simulating crop yield of the seven selected species. We randomly divided the thousands of pixels within Guizhou into two groups of training (75%) and validation (25%) and subsequently calculated the indices separately. Overall, the three kinds of networks performed well, with the correlation coefficient of R exceeding 0.40 and passing the significance test (P < 0.001). However, there are differences among the three networks. BP always performs the best, with R ranging from 0.87 (groundnut) to 0.65 (soybean), while GRNN and RNN have lower accuracy of the simulation. Specifically, the MRE of GRNN and RNN is relatively large, indicating a greater deviation between forecasted value and true value. For example, MRE of GRNN and RNN in simulating crop yield of rapeseed is 25.6% and 28.1%, compared to 17.8% for BP. Meanwhile, the accuracy of simulation for all the three networks is ‘crop-specific’, which means it tends to be easier to simulate the crop yield for some specific species. For example, the result of groundnut has the highest R of simulation within each network. On the other hand, if we compare the result of validation group and training group, it is obvious that the training group always has better accuracy in terms of R, RMSE and MRE. This is because, during each iteration, the parameters of each network are adjusted based on the performance of simulation in the training group, instead of validation group. Lastly, the value of runtime for each network shows the efficiency of each simulation. GRNN has the smallest value (less than 1 second in most cases; Table 2) of runtime while RNN has the largest (all are longer than 1 minute). Although the runtime for BP was longer for each simulation than GRNN, the difference was smaller than with RNN (on average less than 7 seconds; Table 2). Therefore, based on all the indices of simulation, BP made the best balance between accuracy and temporal efficiency.
Results of simulation of crop yield by three artificial neural networks.
* All the value of R (correlation coefficient) with significant test result (P < 0.001).
3.5 Factor contribution
Two methods (see supplementary material) of factor contribution in the BP network were analysed to assess the relative weighting of each variable on overall crop yield, and both methods reveal similar results for the seven selected crop species (Figure 7). Among the 13 factors, temperature (16.4%), radiation (15.3%), soil moisture (13.5%), fertilization of N (13.5%) and P (12.4%) had the largest contribution to crop yield, based on the average proportion of the two methods. In contrast, slope, irrigation and other soil properties have lower mean proportions of factor contribution, ranging from 2.1% (slope) to 6.1% (pH). Compared with N and P fertilizer, K fertilizer has a relatively small impact on crop yield, with an average proportion of 3.3%. From Figure 7, we can also see there is some inter-species difference in terms of crop influencing factors. For example, rice irrigation has the mean contribution of 12.2% on crop yield, compared to 0% contribution for rapeseed yield, where no irrigation was recorded.

Factor contribution on crop yield of seven species (by using two methods shown in equations (4) and (5) in the supplementary material, annotated as (1) and (2)).
IV Discussion
In the late 1990s, the Grain for Green Program was first introduced in China (Song et al., 2015). The focus of the project has been on the potential restoration of ecosystem integrity by allowing low-yielding cropland on slopes greater than 15° to revert to natural vegetation where synthetic nutrient input has been withdrawn (Wang et al., 2017; Zhang et al., 2015). However, there has been conflict between conservation and food security, with people blaming the policy as one of the main causes for the recent surge in grain prices and rising food imports (Xu et al., 2006). Therefore, how to put this programme into practice rationally is vitally important for both environment and stakeholders. In this study, the distribution of cropland along elevation gradient and the relationship between slope and crop yield/area, as well as the spatial region of different yield levels, can all provide an important reference in terms of the practice of Grain for Green Program and other land-use policy aspects. When we carry out the programme and other land-use policy, we should consider 1) distribution of slope cropland, 2) difference distribution of cropland among species, and 3) potential crop yield per unit in different regions. Firstly, as the spatial distribution of harvested area is not even across different prefectures in Guizhou, some of them, like Zunyi and Bijie, will be most affected by the implementation of the policy. Secondly, the percentage of slope cropland larger than 15° is greatest for maize, which is also one of the mostly widely cultivated crops in Guizhou. Specifically, 30.1% area of maize will be impacted due to the set goal of the policy. Thirdly, from the result of SOFM, the distribution of high-level region of crop YPA and total crop yield are not strictly consistent. Thus, replacing some croplands with low potential of crop yield (like southern Qianxinan) and developing additional croplands with high potential of crop yield (like eastern Tongren) may have more benefits in terms of total crop yield.
In the past, some researchers have tried to use statistical approaches to simulate the spatial distribution of crop YPA (Buchholz et al., 2004; Drummond et al., 1995). Most of these studies were based on field-scale data. Many have relied on vegetation parameters such as NDVI (Normalized Difference Vegetation Index) or LAI (Leaf Area Index) as input factors, without adequately considering the influence of environmental factors (Doraiswamy et al., 2004). Compared with previous research, this study included more environmental factors to simulate the spatial patterns of crop yield and examine their effect by evaluating the results of ANN. Actually, this work imported the new idea of critical zone into the study of yield crop from an angle of system science, considering multiple elements from underground (soil moisture and soil properties) to vegetation (crops of seven species) and atmosphere (meteorological factors) to research the interaction among different elements. In our study, we employed three artificial neural networks (BP, GRNN and RNN) to conduct the simulation and relevant analyses through the power of machine learning, and the 21 networks (7×3) combined were built to finish the work. The interrelationship between crop yield and the environmental factors can be very complicated, as meteorological, lithological, soil and land management factors can all have an impact, most in non-linear ways (Cassman, 1999; Godfray et al., 2010). Therefore, ANN can bring their superiority into full play, improving the performance of simulation as well as the credibility of factor contribution analysis. Performance varied among the networks, with BP having the best accuracy while GRNN had the least time cost. Although RNN also had acceptable accuracy, it took much longer to finish the training process. Therefore, although the usage of ANN can greatly improve the simulation, consideration in choosing the most appropriate network to balance accuracy and time cost is still needed.
In this study, we focused on the spatial distribution of crop yields and their relationship with other environmental factors, rather than research on the temporal features of these parameters. Indeed, from a temporal perspective, the change in meteorological conditions, or climate, can affect crop production through different pathways (Liang et al., 2019, 2020; Poulter et al., 2009; Zhang et al., 2004). For example, warming during the day can increase or decrease net photosynthesis (photosynthesis-respiration), depending on the measured temperature relative to the optimum temperature. A warmer temperature at night, however, can raise respiration costs without any potential benefit for photosynthesis (Lobell and Gourdji, 2012). Furthermore, a rising temperature, along with greater atmospheric CO2, may favour the growth and survival of pests and diseases that target agricultural crops (Ziska et al., 2011). In addition, the response of crop yield to climate change varies with the spatial distribution pattern of the crop (Leng and Huang, 2017). From a spatial perspective, factor contribution indicated that in total 31.7% of crop yield variation was dependent on annual average temperature and radiation in the study region. Meanwhile, we also imported soil moisture (accounting for 13.5%) instead of rainfall for analysis, as rainfall may not be a direct driven factor on vegetation growth (Leuschner and Lendzion, 2009; Singh and Sasahara, 1981). On the whole, the climatic conditions provide a basic environmental background for the crop growth, which was shown by the significant influence on the spatial patterns of crop yield.
Crops have two special features that are different from natural vegetation. Firstly, most of the crops grow in the topsoil, having no direct contact with the rock below. In contrast, natural vegetation, particularly in karst regions, can grow in thin soils that would not typically be cultivated for agriculture, and sometimes even in thicker soils their roots may penetrate into fissures in weathered rock (Kosmas et al., 2000; Stehfest and Bouwman, 2006). Previous research also revealed the importance of bedrock on natural vegetation growth (Jiang et al., 2020; Zhang et al., 2013). Besides, with natural vegetation, climate is considered the most important determinant of vegetation species and distribution at the global scale. In a given region, with no obvious differentiation of climatic conditions, geomorphic features and geological substrates may influence the spatial heterogeneity of natural vegetation at smaller scales, and this influence has been verified worldwide, especially for some lithophytes (Dasti et al., 2013; Moore and Attwell, 1999; Yetemen et al., 2010). Secondly, crop growth is greatly influenced by human activities, such as fertilization, irrigation and ploughing. All of these management practices have direct impacts on soil, changing its physical and chemical properties, potentially affecting processes from deep in the critical zone that are reflected in surface vegetation (García-Orenes et al., 2010; Sanchez et al., 2002; Tugel et al., 2005). For example, irrigation strategy may be manipulated to offset the impact of insufficient precipitation in a specific time period or to address climate change impacts, thus reducing the influence from meteorological factors (Da Cunha et al., 2015; Schütze and Schmitz, 2010). This impact of agricultural management (including irrigation and fertilization) was also verified by their proportion of factor contribution (31.6% combined). As the most applied fertilizer, nitrogen and phosphatic fertilizer (N and P) had the biggest impact, accounting for 13.5% and 12.4%, respectively. This function is suggested by obvious increase of total phosphorus, potassium and other elements in the soil (Zhang et al., 2007). In contrast, soil properties have less total impact on the spatial variation of crop yield (averaging 21.0% for the five factors). Amongst them, pH had the greatest influence (averaging 6.1%), as it can impact the edaphic environment by 1) controlling the activity of microorganisms and 2) changing the solubility of metals (e.g. the potentially toxicity of Al, Mn, and Cd in soils), as well as the base saturation of soil that further restricts the growth of roots (Falkengren-Grerup, 1989; Falkengren-Grerup et al., 1987; Tyler et al., 1987).
In natural vegetation, topographical variation has been shown to influence the spatial distribution of species in the karst region of southwest China (Zhang et al., 2010). Several studies have suggested that soil erosion was very severe in karst areas in southwest China due to the low soil formation rate from the carbonate bedrocks, steep sloping topography, high annual precipitation and poor vegetation cover (Lin and Zhu, 1999; Yan and Cai, 2015). Sloping cropland is widely distributed in the karst region, because of its climatic and geological features. As well as steep slopes, tillage practice can also accelerate nutrient and soil loss in the study region, causing the water, nutrient and productive capability to be reduced for crop growth (He et al., 1998; Peng and Wang, 2012). Therefore, tillage erosion and water erosion are two main factors in the reduction of crop yield on slopes, by transfer of soil materials from the upper to lower slope positions, increasing the soil depth and nutrients there (Zheng-An et al., 2010). However, this influence of topography is constrained by local meteorology. For example, with wet weather conditions, the difference of yield in low slope and high slope is distinctly larger than that with dry weather conditions (Kravchenko et al., 2000). This conclusion was also indicated by our study as karst region captures both humid climate and steep topography (Chen et al., 2009). From the results of linear regression, we observed a linear decrease in crop yield for all seven species with increasing slope (P < 0.001). However, the factor contribution analysis following ANN suggested that the proportion of influence of slope is only 2.1% on average. This phenomenon shows that the impact of slope does not directly act on vegetation, but through changing soil water condition, nutrient content or some other elements inside soil. Therefore, when considering land use, we should not only focus on slope as the only index, but also include other soil variables that have more direct influence on crop growth in order to achieve the best management practices.
In future work using ANN with remote sensing data, additional optimization could be undertaken. Firstly, during the training process of the networks, we used default values in most circumstances for optimizing hyper-parameters for the number of iterations and hidden nodes, which may affect the accuracy of simulation. However, as the difference is relatively small we felt the default settings were appropriate with this dataset. Secondly, there are constraints with the available remote sensing data. For instance, the time attribute (when the data was collected) and spatial resolution of the data sources were different, which may also cause uncertainties in comparative analyses. For instance, soil property data was available for 1995, whereas other datasets used in this study were more recent. In addition, the unrestricted use of inorganic fertilization and the unique environmental conditions of karst soils have induced a great change to the mineralogy of the soil during the past 30 years (Richardson and Kumar, 2017), which could impact on the comparability of the soil property data. Lastly, the crop yield data could also be influenced by breed improvement. The introduction of hybrid maize has improved the yield distinctly over the last decades (Bai et al., 2007; Ping et al., 2007). However, use of improved crop species varies greatly from county to county inside the province and, due to the data availability, this influence by change of breed was unavoidable.
V Conclusions
The karst region of southwest China experienced rapid population and economic growth, producing many competing demands on the available soil and water resources that support livelihoods and ensure food security. In this study, we utilized four kinds of ANNs to analyse and simulate the spatial patterns of crop yield and the relationships with meteorological factors, soil properties, irrigation and fertilization in the landscape of Guizhou Province. According to relevant analyses and results, we drew the following conclusions in this study: The negative relationship between crop yield and slope of cropland is distinct. Among all species, the yield of maize decreases the fastest with the increase of slope. Meanwhile, maize (as a staple crop) has the largest percentage of cropland over 15°, this should be considered with the application of Grain for Green Program. The spatial distribution of crop yield in Guizhou Province is uneven. Most high-level yield regions are located in the central-north area of Guizhou, despite some regions with high-level yield per area not being spatially consistent with those of total crop yield. All crop-specific artificial neural networks have significant correlation between the forecasted crop yield and true value. Among them, BP has the best performance, balancing both accuracy and time cost. From the results of factor contribution analysis, temperature, radiation, soil moisture, N and P fertilizers have the most impact on crop yield of the selected seven species.
By combining analysis of processes occurring in the critical zone (from belowground environment to vegetation and atmosphere) with ANN modelling, the study has advanced the potential to improve and parameterize other models to simulate crop growth in karst region with high accuracy and credibility. Meanwhile, it can help to develop informed decision support tools that could be used to guide both regional land-use decisions and local farming practices to enhance crop productivity and further deliver societal good through farming practices that are more efficient, less polluting and more sustainable for food, land and water.
Supplemental material
supplementary_material - Analysing and simulating spatial patterns of crop yield in Guizhou Province based on artificial neural networks
supplementary_material for Analysing and simulating spatial patterns of crop yield in Guizhou Province based on artificial neural networks by Boyi Liang, Hongyan Liu, Timothy A Quine, Xiaoqiu Chen, Paul D Hallett, Elizabeth L Cressey, Xinrong Zhu, Jing Cao, Shunhua Yang, Lu Wu and Iain P Hartley in Progress in Physical Geography: Earth and Environment
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by the National Natural Science Foundation of China under grant no. 41571130044, and the scholarship of the China Scholarship Council. The UK team were supported by the Natural Environmental Research Council MIDST-CZO project (NE/S009167/1, NE/S009175/1).
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
