Ensemble machine learning methods for spatio-temporal data analysis of plant and ratoon sugarcane

Abstract

Recent technological enhancements in the field of information technology and statistical techniques allowed the sophisticated and reliable analysis based on machine learning methods. A number of machine learning data analytical tools may be exploited for the classification and regression problems. These tools and techniques can be effectively used for the highly data-intensive operations such as agricultural and meteorological applications, bioinformatics and stock market analysis based on the daily prices of the market. Machine learning ensemble methods such as Decision Tree (C5.0), Classification and Regression (CART), Gradient Boosting Machine (GBM) and Random Forest (RF) has been investigated in the proposed work. The proposed work demonstrates that temporal variations in the spectral data and computational efficiency of machine learning methods may be effectively used for the discrimination of types of sugarcane. The discrimination has been considered as a binary classification problem to segregate ratoon from plantation sugarcane. Variable importance selection based on Mean Decrease in Accuracy (MDA) and Mean Decrease in Gini (MDG) have been used to create the appropriate dataset for the classification. The performance of the binary classification model based on RF is the best in all the possible combination of input images. Feature selection based on MDA and MDG measures of RF is also important for the dimensionality reduction. It has been observed that RF model performed best with 97% accuracy, whereas the performance of GBM method is the lowest. Binary classification based on the remotely sensed data can be effectively handled using random forest method.

Keywords

Random forest CART GBM KNN NDVI Landsat-8

1. Introduction

Timely and accurate information about agricultural statistics is essential to support decision making related to crop production to meet future food needs and ensure food security [1]. Remotely sensed data may assist the policy makers to extract the agricultural information in an effective manner [2]. The crop information so generated may be coupled with sophisticated statistical techniques for effective and optimized decision making. Sugarcane is one of the important cash crops and India is Second largest producer of the sugarcane after the Brazil.

Sugarcane regrown from the remaining buds of subversible stublle of plantation sugarcane is known as ratoon sugarcane. Ratoon sugarcane has been adopted as frequent farming practice all over the world, and it covers approximately half of the entire area covered under the sugarcane [3]. Despite the low yield, the authors explored various benefits of the ratoon sugarcane, such as early maturity, better transport management, and economical advantages. The authors of [4] explored the impact of ratoon crop on various biometric parameters such as plant height, stalk diameter and number of tillers. In addition to the biometric characteristics, the effect of ratooning on physiological parameters such as Leaf Area Index (LAI) and leaf weight. LAI is a crucial parameter to analyse the photosynthesis and related activities during the growth season of a crop.

The growth and yield of the ratoon sugarcane have a strong relationship with Number of Millable Canes (NMC), stalk height and cane weight [5]. The growth rate of ratoon sugarcane is high during the initial stages of the season, but the growth of plant sugarcane shoots up during the grand growth phase. This distinct variation in the growth may be captured for the different kind of analytical studies. The growth rate of the sugarcane starts declining around 160 days after the plantation or the previous harvest in the case of ratoon sugarcane [3]. Consequently, the ratoon sugarcane becomes available for earlier crushing as compared to planted sugarcane. Sugarcane mills of the area purchase the sugarcane from the peasants for the crushing. Therefore ratoon sugarcane plays a significant role associated with sugarcane production. An increasing number of stakeholders prefer ratoon sugarcane due to its economic benefits as well as time-saving in planting and other operations [6]. On the other hand, the yield decline in the successive ratoons restricts the farmers about comprehensive ratooning. Hence, there is a need to analyze and explore the growth pattern of ratoon and plant sugarcane.

The growth status of a plant is associated with the energy absorbed by its leaves and later conversion to the other forms of energy by photosynthesis. For that reason, the acquaintance of the variations in the LAI is highly significant for the design of classification, growth and production assessment models [7]. However, the amount of water or moisture has a significant role in the development of leaves in a plant. Hence it is also essential to consider the irrigation scheduling during the development of the growth models. Moreover, the LAI has a strong relationship with spectral information obtained from the satellite images.

The information obtained from LAI in the temporal domain and other related parameters are important to understand the growth pattern in the temporal domain. The modelling accuracy in case of multi-temporal analysis is a function of the number and time period of the participating spectral images [8]. The different images over the entire growth season may be stacked to generate the temporal profile to study the behaviour of the crop in the study area. The accuracy of classification model may be significantly enhanced on the application of the multi-spectral and multi-temporal remotely sensed data [9]. The study [10] demonstrated that the behaviour of the long duration crop is commonly sigmoidal. It starts with a slow growth in the initial stage, reaches peak at the middle stage and starts decreasing after attaining the peak. The temporal profile based crop growth model based on Landsat data, has been shown in Fig. 1 [11, 12]. The function for the crop growth model was given by:

$\displaystyle G(t)=G_{o}+(G_{m}-G_{o})(2\beta e/\alpha)^{\alpha/2}(t-t_{o})^{2% }\textit{exp}[-\beta(t-t_{o})^{2}]$ (1)

where $G_{o}$ is the greenness value of the soil at time $t_{o}$ , $G_{m}$ is greenness at peak time $t_{p}$ of the growth season, $\alpha$ and $\beta$ are the crop specific constants. The Eq. (1) has two inflections points $t_{1}$ and $t_{2}$ , corresponding to the onset and offset of greenness respectively. The difference of these two points is denoted by $\sigma$ and is given by $\sqrt{1/\beta}$ . The width of temporal profile $\sigma$ , the time period of maximum greenness $t_{p}$ and $G_{m}$ contains the maximum information to participate in the growth model [13].

Figure 1.

Crop growth curve [12].

Recent technological enhancements in the field of information technology and statistical techniques allowed the sophisticated and reliable analysis. A number of machine learning data analytical tools may be exploited for the classification and regression problems. The purpose of this study is to design and develop binary classification model to discriminate the ratoon from the plantation sugarcane. However the selection of appropriate input to the machine learning model is a major concern. Besides, the design and development of discrimination model, the study also focused on the optimization of input data as well as model parameters.

2. Related work

The applications of of satellite sensors, machine learning methods, digital imagery and relevant geoinformatics tools to explore the agricultural information have been found to be more accurate, robust and economical [14].

Multi-temporal spectral data has been used to discriminate the ratoon from the plantation in the Muzzaffarnagar and Meerut districts of Uttar Pradesh, India [15]. Results of the study indicated that merely a single remote sensing image was not adequate for the discrimination. The study explored the imagery in the temporal domain to generate the rules for the classification. In addition to the temporal images, the study also incorporated the ancillary data to generate the expert knowledge-base to achieve the privileged accuracy of discrimination. Decision trees play an important role in multi-temporal discrimination of ratoon from the plantation [16].

Another experimental work was conducted in Sugarcane Breeding Institute (SBI), Coimbatore, India to demonstrate the relationship of ratoon and plant sugarcane. Tiller production has been recorded as maximum during the initial stage i.e., around 90–100 days after the previous harvest for both plant and ratoon crops. Still, it has been recorded a bit lower in the successive ratoons. Different parameters such as LAI, stalk height, cane diameter and leaf size at various stages exhibit different behaviour for ratoon and plant sugarcane. These differences are quite lower in the initial stages as compared to the later stages of the growth period. These variations can be used for the discrimination of the ratoon and plant sugarcane [3]. An experimental work [17] have carried out in Lucknow, India, to demonstrate the variations of different parameters related to the ratoon and plant sugarcane.

The discrimination of the specific crops may be fruitful in the prediction of crop yield, area under the cultivation, and monitoring of the crop growth [18, 19]. Temporal profile of Normalized Difference Vegetation Index (NDVI) has been used as an efficient and reliable indicator to discriminate the specific crop at the field scale or global scale [20]. The recent advancements in the information technology led to the various machine learning models. These models are flexible enough to integrate ground truth information with the classification and regression process [21]. Various methods such as Random Forest (RF), Decision Tree (DT), Classification and Regression Tree (CART) [22], Support Vector Machine (SVM) [23] and Artificial neural Network (ANN) and have been explored in recent studies related to the remote sensing of agriculture. Researchers in [24] presented a comprehensive review of the research related to the employability of machine learning models in agriculture. The review also suggested the integration of remote sensing datasets and ancillary data into the machine learning models, which may lead to the artificial intelligence and knowledge-based agriculture. However, various ensemble machine learning methods such as bagging and boosting have been developed in the recent past for the classification. A robust classification system based on the combination of classifiers may significantly increase the classification accuracy [25]. These methods further enhance the prediction and classification accuracy just by proper selection of the training data [26, 27]. The proposed work employed random forest bagging and gradient boosting for the multi-temporal discrimination of specific plantation.

Multi-temporal profile of the satellite data often generates a large number of features or predictors [28]. The high number of features may lead to the poor performance of the classifier or regressor due to the complex computations [29]. The performance of the machine learning model may be regulated by the selection of the most appropriate predictors as input variables. This selection may be termed as dimensionality reduction [30]. Dollar et al. [31] proposed a modeling framework for automatic extraction and mining of features to improve the overall accuracy. However, the proposed work also suggested to learn important features from the input data itself. A method based on RF has been used in the past for the optimized selection of features for the multi-temporal crop classification [32]. Shuai et al. [33] discussed the importance of the extraction of discriminative features from social network observations to precisely detect potential cases of Social Network Mental Disorders (SNMD). CART feature selection method may also be effectively used for the multi-temporal classification of remotely sensed data [34]. Hence, the present work is being premeditated to the optimized use of the recent expansion of satellite-based remote sensing and machine learning tools for the acquisition of consistent and near real-time crop information. The obtained information may be associated with growth status of the plants in spatial as well as temporal domain.

3. Modelling framework

The proposed research is focused on the discrimination of ratoon sugarcane from the plant sugarcane in the study area. Ensemble machine learning methods have been explored for the binary classification process. Prior to the discrimination, the optimised selection of important variables has been carried out to enhance the performance of the underlying model. For the study, the data and information from multiple sources have been fused to synthesize the analysis process. The design of a machine learning model starts with appropriate selection of input data. Adequate and representative data collection is the backbone of the overall model development [35]. Unnecessary data may lead to the redundancy, whereas, small amount of data leads to the loss of important information to guideline the model.

3.1 Data acquisition

3.1.1 Reference data

Ground truth data at regular intervals have been obtained to gather the information related to the crop calendar of the study area. The entire crop calendar for the sugarcane is covered in four stages. These stages are (i) GS1 – Germination, (ii) GS2 – Tillering, (iii) GS3 – Grand growth period and (iv) GS4 – Ripening stage. The duration of each stage has been shown in Fig. 2. To assess the possibilities of binary classification, the agricultural data has been collected from farmers and sugarcane industry. The boundary of each sugarcane field has been recorded by a hand-held GPS device. The information about ratoon and plantation sugarcane has been collected separately. Additional ancillary information about the biometric and biophysical parameters has been collected from the sugarane mills and farmers of the area.

Figure 2.

Growth stages of the sugarcane.

Figure 3.

Biophysical and biometric parameters.

3.1.2 Biometric and biophysical parameters

Separate measurements of sugarcane ratoon and plantation fields have been recorded to perform the analysis based on LAI, stalk height and stalk diameter. All the required parameters of the sugarcane have been collected during the regular visits to the experimental areas. The visits have been planned according to the crop calendar of the study area, as well as the availability of the satellite data. These parameters have been normalized to the index values from 0 to 1 for detailed analysis and have been shown in Fig. 3. It has been observed from the graph that there was a significant difference in the growth of ratoon and plant sugarcane. The differences were quite visible during the germination stage (GS1) and tillering stage (GS2). Subsequently, these variations may act as an important input to the binary classification model. However, the acquisition of these metrics are hindered by biasing, time lagging and inaccuracy.

3.1.3 Remote sensing data

Satellite imagery of Landsat 8 from the year 2015 to 2019 have been acquired for the proposed study. The band details of Landsat 8 OLI (Operational Land Imager) and Thermal Infrared Sensor (TIRS) have been presented in Table 1. The spatial resolution of these satellite images is 30 m, whereas the temporal resolution is 16 days.

The Digital Number (DN) obtained from the satellite images has been converted to the corresponding radiance. Further, these radiance values have been converted to the corresponding reflectance values. The details of the available satellite data for the year 2015 are given in Table 2 [36]. These satellite images after the preprocessing operations are further used to generate the vegetation indices images. Open-source software, QGIS has been used to obtain the temporal profile of different vegetation indices.

Table 1
Landsat 8 bands

No.	Designation	Spatial resolution ( $m$ )	Spectral range ( $\mu m$ )
Band 1	Coastal/Aerosol	30	0.435–0.451
Band 2	Blue	30	0.452–0.512
Band 3	Green	30	0.533–0.590
Band 4	Red	30	0.636–0.673
Band 5	NIR	30	0.851–0.879
Band 6	SWIR-I	30	1.566–1.651
Band 7	SWIR-II	30	2.107–2.294
Band 8	Pan	15	0.503–0.676
Band 9	Cirrus	30	1.363–1.384
Band 10	TIR-I	100	10.60–11.19
Band 11	TIR-II	100	11.50–12.51

Table 2

Details of landsat dataset used in the study

Date	DOY	Cloud cover	Sun angle
Feb 12, 2015	43	14.8	146.8982601
April 01, 2015	91	32.9	134.9770484
April 17, 2015	107	2.48	129.05180483
May 03, 2015	123	2.7	121.86937466
May 19, 2015	139	3.37	113.71697489
June 04, 2015	155	38.07	106.84325657
August 23, 2015	235	42.47	126.65193402
September 08, 2015	251	1.16	136.0425281
September 24, 2015	267	19.46	144.27069689
October 10, 2015	283	4.28	150.69751018
October 26, 2015	299	39.61	155.09744127
November 11, 2015	315	2.54	157.5269078

3.1.4 Spectral vegetation indices

Vegetation indices may act as an important tool in qualitative and quantitative measures of growth parameters, classification and regression problems of remote sensing in agriculture [37, 38, 39, 40]. A significant number of indices such as NDVI, Soil Adjusted Vegetation Index (SAVI), Ratio Vegetation Index (RVI) have been proposed and investigated by the researchers in the past. Diverse studies related to the review and characteristics of these indices exist in the literature [41, 37]. NDVI images of different time periods have been shown in Fig. 4.

Figure 4.

NDVI images.

The multispectral vegetation indices and the specific spectral bands have been extracted from the selected polygons. The selection of these indices and the bands depend upon their ability to discriminate various classes. After the extraction of the indices, the random forest model has been used to identify the most appropriate variables for the further analysis based on these indices. The description and the mathematical expressions for the other indices have been given in Table 3.

Table 3

Spectral vegetation indices

No.	Index	Formula	Reference
1	RVI	$\frac{\textit{NIR\_ref}}{\textit{R\_ref}}$	[42]
2	NDVI	$\frac{\textit{NIR\_ref}-\textit{R\_ref}}{\textit{NIR\_ref}+\textit{R\_ref}}$	[43]
3	SAVI	$\frac{(\textit{NIR\_ref}-\textit{R\_ref})(1+L)}{\textit{NIR\_ref}+\textit{R\_% ref}+L}$	[44]
4	GNDVI	$\frac{\textit{NIR\_ref}-\textit{G\_ref}}{\textit{NIR\_ref}+\textit{G\_ref}}$	[45]
5	OSAVI	$\frac{(\textit{NIR\_ref}-\textit{R\_ref})(1+L)}{\textit{NIR\_ref}+\textit{R\_% ref}+0.16}$	[46]
6	DVI	$\textit{NIR\_ref}-\textit{R\_ref}$	[47]
7	ARVI	$\frac{\textit{NIR\_ref}-((2\textit{R\_ref})-\textit{B\_ref})}{\textit{NIR\_% ref}+((2\textit{R\_ref})-\textit{B\_ref})}$	[48]
8	GCI	$\frac{\textit{NIR\_ref}}{\textit{G\_ref}}-1$	[49, 45]
9	EVI	$\frac{G(\textit{NIR\_ref}-\textit{R\_ref})}{\textit{NIR\_ref}+C1(\textit{R\_% ref})-C2(\textit{B\_ref})+L}$	[50, 51]
10	VARI	$\frac{\textit{G\_ref}-\textit{R\_ref}}{\textit{G\_ref}-\textit{R\_ref}-\textit% {B\_ref}}$	[52]
11	NDWI	$\frac{\textit{G\_ref}-\textit{NIR\_ref}}{\textit{G\_ref}+\textit{NIR\_ref}}$	[53]
12	NDMI	$\frac{\textit{NIR\_ref}-\textit{SWIR\_ref}}{\textit{NIR\_ref}-\textit{SWIR\_% ref}}$	[54]
13	NR	$\frac{\textit{R\_ref}}{\textit{NIR\_ref}+\textit{R\_ref}+\textit{G\_ref}}$	[55]
14	NG	$\frac{\textit{G\_ref}}{\textit{NIR\_ref}+\textit{R\_ref}+\textit{G\_ref}}$	[55]
15	NN	$\frac{\textit{NIR\_ref}}{\textit{NIR\_ref}+\textit{R\_ref}+\textit{G\_ref}}$	[55]

3.2 Temporal profile of vegetation indices

A review of available literature revealed that the multi-temporal information from remotely senssed data is the nucleus for classification and regression [56, 57, 58]. The research work [59] explored the noticeable NDVI distinction flanked by vegetated and non-vegetated areas in addition to the demonstration of the superior capability of Landsat-8 NDVI for agricultural applications such as crop growth monitoring. Nevertheless, satellite-derived NDVI temporal profile may be attenuated by a variety of factors. These factors are clouds, snow, geometric errors and other atmospheric effects. Most of the times, these errors and attenuations reduce the consistency, robustness and applicability of spectral data, especially for the agricultural applications [60, 61]. A study [62] investigated the Savitzky-Golay filter, Fourier transform and Wavelet transform for the reconstruction of the time-series satellite data.

The generated temporal NDVI profile of the few agricultural fields with normal crop growth trend and containing both types of sugarcane in the study area for the year 2015 is shown in Fig. 5. The visual analysis of Fig. 5, in conjunction with general crop growth information pertaining to the study area, confirms the belief that some kind of noise is present in the computed values. An attempt has been made in the present work to lower the effect of attenuations for the effective spatial as well as temporal analysis.

Figure 5.

NDVI temporal profile of sugarcane fields (P1–P5).

3.3 Random forest selection of predictors

The accuracy of the binary classification may enhanced, though not necessarily, by combination of multiple spectral bands and indices together for agricultural applications [63]. The list of spectral bands (6), vegetation indices (15) and biophysical as well as biometric parameters (4) used as predictors (24) has been presented by Table 4. However, efficiency starts falling by increasing the number of predictors after a specific threshold.

Table 4
Predictors used for the modelling in the study

Sr. no.	Category	Specific variables used in the study
1	Spectral bands	Blue Band (B), Green Band (G), Red Band (R), Near Infrared (NIR),
		Shortwave infrared-1 (SWIR1), Shortwave Infrared-2 (SWIR2)
2	Spectal indices	ARVI, DVI, EVI, GCI, GNDVI, LAI, NDMI, NDVI, NDWI, NG, NN,
		NR, OSAVI, RVI, SAVI, VARI
3	Biophysical and biometric parameters	LAI, Diameter, Height

The selection of such optimal parameters, which provide the storage economy as well as maximized accuracy, is called “feature selection" or dimensionality reduction [30]. Random Forest method based on machine learning has the potential to handle both classification as well as regression problems [22]. Research work [64] explored the application of random forest in multi-temporal classification. In addition to the classification, RF may be significantly useful for the optimized selection of the predictors. RF estimates the importance of a variable by determining the amount by which the prediction error increases when out-of-bag (OOB) data for that variable are permuted, whereas all other data items are left unchanged. The two critical parameters ntree and mtry are significant parameters that control the performance and the complexity of the models based on Random Forest. The parameter ntree may be used to derive bootstrap samples ntree times from the original dataset, and then each bootstrap sample is applied to derive a separate tree. For each derived tree, only the random selection of mtry predictor variables may be employed [65]. RF has also been used to rank the variables on the basis of the two measures known as Mean Decrease Gini (MDG) and Mean Decrease Accuracy (MDA). These measures act as filters to remove the unnecessary and unimportant variables. MDA measure assumes that the variable which is not affecting the predictive accuracy of the underlying model, may not be important to participate in the further classification or regression process [66]. MDG assigns the ranking to the variables on the basis of their efficiency to classify when that variable is selected at the node. R package “randomForest” has been used to implement the concept of MDA and MDG (source:http://cran.r-project.org/).

3.4 Machine learning classification

Recent advancements in the field of Information Technology and computing techniques such as high-performance computing, grid computing, cloud computing and the algorithms based on machine learning allowed the researchers and data scientists to build models to extract reliable and accurate information. These tools and techniques can be effectively used for the highly data-intensive operations such as agricultural and meteorological applications, bioinformatics and stock market analysis based on the daily prices of the market [24].

Various implementations of the machine learning algorithms allowed the researchers to explore multiple ensemble methods. Ensemble methods work in the collaboration of predictive models to liberate higher accuracies. Bagging or Bootstrap Aggregation and Boosting are the two most commonly used ensemble algorithms. Bagging reduces the variances by the application of random bootstrapped samples rather than merely a small collection of samples. Another model based on the concept of bagging is Random Forest. Random Forest works by the selection of an optimal number of variables to select the splits at each iteration during the classification or regression. This optimal number may be assigned as a parameter (mtry) to the random forest. By default, most of the implementations of RF have assigned the value of this parameter as $p/3$ ( $p$ is the number of predictors) for regression problems. Whereas, the value for the classification problems is considered as $\sqrt{p}$ .

The ensemble algorithm Boosting works on the enhancement of the performance of classifiers in each iteration. Models are added iteratively to the week classifiers until the desired accuracy is not achieved. AdaBoost and Gradient Boosting Machine (GBM) are the most commonly used machine learning ensemble methods. Decision Tree (C5.0), CART, GBM and RF has been investigated in the proposed work.

3.5 Performance evaluation metrics

Performance evaluation of the binary classification methods may be effectively handled with Accuracy and Cohen’s Kappa Coefficient, Logarithmic Loss and Area under Receiver Operating Characteristic (ROC) Curve. Another useful parameters are Precision, Recall or Sensitivity, Specificity and F-Measure [67]. These metrics may be derived from a single matrix known as “Confusion Matrix” or “Contingency Matrix”. The concept diagram for the confusion matrix has been shown in Fig. 6. The four quadrants of the matrix are:

•
Quadrant 1: True Positive (TP)
•
Quadrant 2: False Positive (FP)
•
Quadrant 3: True Negative (TN)
•
Quadrant 4: False Negative (FN)

The quadrant TP represents the predicted observations labelled as positive that are actually positive. FP represents the predictions that are labelled as positive but those are actually negative. TN is for those observations marked as negative that are actually negative. FN are those predictions that are marked as the negative but, the actual status of those observations is positive. All other parameters may be derived from these four matrix entries.

Figure 6.
Confusion matrix.

3.5.1 Accuracy

Number of correctly discriminated observations among the given records is known as “Accuracy”. Accuracy may be represented either in terms of percentage or scaled from 0 to 1. This metric explains the capability and reliability of the underlying model for the detection of negative and positive classes. It may be expressed as:

$\displaystyle\mathrm{Accuracy}=\frac{TP+TN}{TP+TN+FP+FN}$ (2)

The F1 score, generated from precision and recall may be expressed as:

$\displaystyle\mathrm{F1}=2\times\frac{\textit{Precision}*\textit{Recall}}{% \textit{Precsion}+\textit{Recall}}$ (3)

3.5.2 Cohen’s Kappa coefficient

Cohen’s Kappa coefficient [68] or Kappa is also similar to the overall accuracy but it is normalized at the baseline of random chance on the dataset. Kappa is actually, a score that represents the level of agreement between two annotators on a binary classification problem. Mathematically, it is defined as:

$\displaystyle\textit{Kappa Score}=\frac{p_{0}-p_{e}}{1-p_{e}}$ (4)

where $p_{0}$ is the empirical probability of agreement on the label assigned to any sample (the observed agreement ratio), and $p_{e}$ is the expected agreement when both annotators assign labels randomly [69].

Figure 7.

Flow diagram of the methodology.

The methodology adopted in the present study is given in Fig. 7. The steps of the proposed methodology may be implemented as:

Step 1:

Initialization:

–

Acquisition of data: meteorological, Landsat images(n), ancillary data, and ground truth

Step 2:

Pre-processing and spectral analysis: Algorithm 7

Step 3:

Selection of appropriate data: Algorithm 7

Step 4:

Temporal analysis: Algorithm 7

Step 5:

Comparative assessment:

–

Assessment based on the image set scenarios ( $n=$ 1 to 6)

–

Analysis based on the obtained results

–

Application of predictive model

Creation of LayerStack $j\leftarrow 1$ $i=1$ to $n$ $\textit{image\_i\_meta\_cloud}<20$ $\textit{Convert\_DNtoRef}(\textit{Ti})$ $k=2$ to 7 $\textit{Push Ti}(\textit{Band\_k})$ to LayerStack $M=1$ to 15 $\textit{Push Ti}(\textit{VI\_m})$ to LayerStack $j\leftarrow j+1$

Variable Importance $i=1$ to $j$ $\textit{plot}_{\textit{MDA}}(\textit{laytertack}(i),\textit{LAI},\textit{% Height},\textit{Diameter})$ $\textit{plot}_{\textit{MDG}}(\textit{laytertack}(i),\textit{LAI},\textit{% Height},\textit{Diameter})$ $\textit{optimize}(\textit{ntree},\textit{mtry},\textit{oob}_{m}in)$ $x=1$ to ntree $\textit{caclclate}_{\textit{OOB}}(\textit{mtry},\textit{oob}_{m}in)$ $\textit{caclclate}_{\textit{OOB}}(\textit{mtry}/2,\textit{oob}_{m}in)$ $\textit{caclclate}_{\textit{OOB}}(\textit{sqrt}(\textit{mtry}),\textit{oob}_{m% }in)$ select ntree and corresponding mtry with minimum OOBBinary Classification $\textit{validationIndex}\leftarrow\textit{createDataPartition}(\textit{mydata}% ,p=0.80)$ $\textit{validation}\leftarrow\textit{mydata}[-\textit{validationIndex},]$ $\textit{dataset}<-\textit{mydata}[\textit{validationIndex},]$ $\textit{trnControl}\leftarrow\textit{trainControl}(\textit{method}=``\textit{% repeatedcv}^{\prime\prime},n=10,\textit{rep}=3)$ $\textit{model.fit}\leftarrow\textit{train}(\textit{Crop}.,\textit{dataset},% \textit{method}=(\textit{treebag},rf,C5.0,\textit{gbm}),\textit{Accuracy},% \textit{trnControl})$

4. Results and discussion

The proposed work is focused on the discrimination of ratoon sugarcane from the plant sugarcane. The proposed discrimination model has been considered as a binary classification problem. The model for binary classification has been designed, developed and tested with machine learning methods. Tree ensemble methods, boosting and bagging algorithms (RF, C5.0, GBM and CART (BAG)) have been explored in the proposed work. The development of the classification model encompasses following phases:

•
Preliminary analysis of spectral, biophysical and biometric parameters
•
Dataset generation (split into training and testing data)
•
Temporal analysis
•
Performance evaluation

Prior to the discrimination, the optimised selection of the important variables has been carried out to enhance the performance of the underlying model. For the study, the data and information from multiple sources have been fused to synthesize the analysis process. The results obtained from the binary classification, as well as analysis based on the selection of predictors have been presented in this section.
4.1 Analysis of spectral parameters

The cloud-free Landsat 8 OLI data of the year 2015 have been investigated in the analysis. The details of the satellite data have been presented in Table 2. The proposed algorithm has been trained, tested and validated with geometrically and atmospherically corrected reflectance values. The most appropriate six cloud-free images (T1 to T6) have been selected for further analysis. The distribution of the images throughout the sugarcane growing season has been shown in Fig. 8. It has been observed that one image (T1) has represented the germination stage, two images (T2 and T3) headed for the tillering stage of the sugarcane growth season. The images (T4 and T5) represented the grand growth stage and the last image (T6) represented the maturity stage of the sugarcane.

Figure 8.

Distribution of images.

Figure 9.

Response – spectral bands.

After the preprocessing steps, the six spectral bands (B2 to B7) and 14 spectral vegetation indices have been extracted for the spectro-temporal analysis. The spectral range and the spatial resolution of each band have been presented in Table 1. The spectral response of the bands for the extracted sugarcane plant and ratoon fields has been presented in Fig. 9. It has been observed that NIR (B5), Red (B4) and Green (B3) have more variations than other bands.

The correlations of the bands (B2 to B7) have been shown in Fig. 10. The visual, as well as statistical analysis of Figs 9 and 10, indicated that the spectral bands SWIR-II (B7), NIR (B5), Red (B4), and Green (B3) might be used to discriminate the ratoon and plant sugarcane. However, the SWIR-I (B6) and Blue (B2) bands have been also tested for the proposed model of binary classification. In addition to the bands, the temporal profile of the spectral bands and vegetation indices has been considered for the classification process.

Figure 10.

Correlation – spectral bands.

The temporal profile of the spectral bands for the extracted sugarcane plant and ratoon fields has been presented in Fig. 11. The difference in the growth of ratoon and plant has been observed from the temporal profile of the bands. Most of these variations have been observed during the initial stages as compared to the later stages. Red band (B4) and Blue band (B3) have been most influential to demonstrate these variations in the initial stage. This may be attributed to the fact that during the germination phase, the greenness of the ratoon sugarcane is more than that of the plant sugarcane. The growth rate of both types of sugarcane is almost the same towards the end of the growth period. The similar behaviour has been observed from the temporal profile of the bands for ratoon and plant sugarcane. However, the researchers in the past suggested the use of the combination of spectral bands for agricultural applications [70].

Figure 11.

Temporal profile (a) B2 (b) B3 (c) B4 (d) B5 (e) B6 (f) B7.

Vegetation indices enhanced the discrimination because these do not represent the absolute reflectance values of specific bands. Rather, these vegetation indices represent the variations in the slope of the reflectance curve of the participating bands [8]. These variations may be used for the classifications of crops and their types. Although a number of vegetation indices have been proposed in the past, yet the most commonly used are NDVI, SAVI, RVI and Optimized Soil Adjusted Vegetation Index (OSAVI).

For the preliminary analysis, the different vegetation indices have been extracted from the five different sugarcane fields. During the field traversal, the fields with normal crop growth pattern have been chosen to demonstrate the distinctive phases of the plant and ratoon sugarcane. The temporal profile of spectral indices has been generated from these five fields. The obtained temporal profile of NDVI, GNDVI and OSAVI have been presented in Fig. 12. Visual analysis of Fig. 12 indicated the presence of an attenuated profile, particularly in the initial stage of the crop growth. An attempt has been made in the proposed work to reconstruct the temporal profile of vegetation indices, so as to remove the attenuations.

Figure 12.

Temporal profile of NDVI, GNDVI and SAVI.

In addition to the spectral variables such as bands and vegetation indices, the biometric and biophysical parameters may play a significant role in the discrimination process. The next section has been devoted to the analysis of biophysical parameters with reference to the proposed discrimination model.

4.2 Analysis of biometric and biophysical parameters

The temporal profile of LAI for the plantation, as well as ratoon sugarcane, has been extracted for the various fields. The temporal domain plot for the LAI has been diagrammatically shown in Fig. 3. These profiles also represent that the sugarcane initially starts with slow growth during the germination stage, followed by an increase in the growth during the tillering stage and another phase of a slow pace and finally a phase of decrease in the growth till the harvesting stage. However, the profile also demonstrated that the temporal profile of plant LAI is slightly different from the ratoon LAI. The study also explored that the rate of growth of ratoon LAI is high during the grand growth stage as compared to other stages. The range of ratoon LAI remains lower than that of the plant LAI. These results are similar to the investigations explored in [3]. The diminution of the ratoon LAI is due to the low tillering as well as attributed to the soil conditions after the harvest of the plant sugarcane.

The biometric parameters, height and diameter of the stalk have been recorded for the fields of the study area. The temporal variations of these parameters have been shown in Fig. 3. These differences are more in the maturity stage (GS4) and germination stage (GS1). The comprehensive study of the graphs explained that ratoon sugarcane exhibit distinctive growth stages as compared to the plant sugarcane. This distinction may be captured by the binary classification models for the efficient mapping. The analysis based on the biophysical parameters points out that the use of biometric and biophysical variables play a vital role in the classification process. But, one major hindering towards the applicability of these variables is the collection of data at regular intervals. In addition to the complexity of the data collection, the biasing is another primary concern.

4.3 Dataset generation

Crop types and even crop varieties have distinctive growth patterns during the entire season. These distinctive patterns may assist the classification and regression models to achieve higher accuracy [71]. Multispectral remotely sensed data in the temporal domain may lead to the privileged classification accuracy as compared to the single date images [72]. Nevertheless, it is far vital to identify the optimal set of temporal images to discriminate the different targets in the study area. The optimal selection of the images depends upon the crop growth season and the particular study area for the agricultural applications of remote sensing [73]. Moreover, the increase in the number of temporal images may increase the computational complexity as well as data redundancy [74]. The accuracy of the discrimination process may be significantly enhanced by the optimized selection of the most appropriate input variables of the dataset [75].

The dataset for the machine learning models have been created from spectral bands, spectral indices and other crop growth parameters (Table 4). These parameters have been recorded in the temporal domain of the entire crop growing season. To design and evaluate the models, the available datasets need to be divided into training and testing datasets. Training data is used for the prediction whereas, the testing data is required to measure the prediction accuracy. The selection of training data and testing data affects the performance of the model, hence this selection should be tuned appropriately. Most of the Machine learning methods commonly used the k-fold cross-validation for the parameter tuning. This validation mechanism divides the input datasets into $k$ disjoint subsets of equal size ( $n/k$ ). The parameter $k$ is normally taken as 5 or 10. It becomes leave-one-out when the value of $k$ and $n$ are equal. 10-fold cross-validation has been used in the proposed work to divide the datasets into training and testing datasets.

The random forest model has been used in numerous classification and mapping experiments based on agriculture. The RF model is reliable and computationally efficient, even in the case of higher-dimensional inputs and a lower number of training data samples. RF model is based on the selection of the most appropriate parameters to guide the classification process. The classification accuracy and the computational time are the two main factors towards the evaluation of the classification models. The performance of the classification model may be significantly enhanced by the appropriate selection of the predictors and random forest parameters [28].

4.3.1 Random forest parameters (mtry and ntree)

The parameters mtry and ntree have been considered for the time-space trade-off regarding the performance of the classifiers. The parameter ntree represents the count of trees generated for the given dataset, whereas mtry represents the number of dataset variables selected for each tree split. The large value of the parameter ntree may generate reliable models but with more storage space and high computational time. RF parameter ntree may be assigned by way of iterative procedure to start with the least value and pick the final value when the error rate stabilizes at some threshold value of ntree. The default value for ntree in various software packages such as R is usually fixed at 500. On the other hand, the default value of the mtry is commonly fixed as $\sqrt{p}$ ( $p$ is the number of predictors in the dataset). The same iterative procedure of fixation of ntree, may be adopted to select the ideal value of the mtry.

The reliable model for the classification always requires the higher values for the ntree and lower values for the mtry. However, both parameters, after reaching some threshold value, do not affect the performance, neither in the positive aspect nor in the negative. The Out of the Bag (OOB) error rate on different values of parameters ntree and mtry has been shown in Fig. 13. The effective value for the ntree has been recorded as 300, whereas the mtry has been set to 6 for the input dataset in the proposed classification process.

Figure 13.

Selection of mtry and ntree.

4.3.2 Random Forest Selection of Predictors – RFSP

Random Forest Selection of Predictors – RFSP has been used prior to the classification process for the initial selection of relevant input data. This process may also be referred as “Dimensionality Reduction”. MDA and MDG scores on the basis of OOB error, have been utilized to select the important variables for the discrimination process. MDA assumes that if the input variable is not relevant, then the accuracy of the underlying model will neither increase nor decrease. Whereas, the MDG assigns the grades to the input variables of the dataset on the basis of their ability to differentiate the target classes. The rankings based on the MDA and MDG have been shown in Fig. 14.

Figure 14.

Feature rankings (MDA).

It has been observed from the rankings that spectral bands and spectral indices are on the top of the list of most important variables. The biophysical variables did not get any spot in the top rankings. The six bands B2 to B7 and nine spectral indices (GNDVI, NDVI, SAVI, RVI, NDWI, NDMMI, NN, ARVI, NG) have been selected for the temporal analysis. A digital layer of these sixteen predictor variables has been selected as the dataset for the discrimination model. On the basis of these rankings, the top 16 features have been selected for the proposed binary classifications. All these features have been employed as input to the underlying model as the plant growth is a function of the composition of these features. Few of these parameters are related to the greenness such as GNDVI, NDVI, SAVI and some are related to the moisture content of the plant such as NDWI, NDMI, NN. The mathematical formulation and significance of each parameter has been presented in Table 3.

4.4 Temporal analysis based on machine learning

Boosting and Bagging tree ensemble methods RF, C5.0, GBM and CART (BAG) have been comparatively investigated over the temporal domain of the dataset. The proposed work focused on the enhancement of the classification accuracy of the discrimination model by utilizing multiple images acquired throughout the cropping season of the same year. The classification accuracy and the Kappa values have been recorded for the various combinations of the images in the temporal domain. The proposed work was initialized with a single image in each iteration from the dataset of T1 to T6 images. Each set of images has been tested separately, and the analysed results have been presented in this section.

4.4.1 Single image analysis

One image from the dataset of six images (T1 to T6) has been selected at one time, as the input for the classification model. The mean accuracy and Kappa values for single date image analysis have been shown in Fig. 15 and presented in Table 5. It has been observed that the front-line image of the season T1 (April 17, 2015) is able to discriminate the sugarcane with a maximum accuracy of 91%, and the corresponding Kappa coefficient is 0.81. The least value for the accuracy is 0.71 for the image T2 (May 03, 2015).

Figure 15.

Comparitive performance (single images).

It has also been observed that the performance of the RF model is the best among all the cases. It has been inferred from the single date image analysis that the germination period is crucial to differentiate the ratoon and plant in case of single image analysis. However, during the later stages of the season also, the higher accuracy values have been recorded. The appropriate combination of the different images may enhance the overall accuracy of the discrimination.

Table 5

Single image analysis

Image	Accuracy	Kappa	Model	Image	Accuracy	Kappa	Model
T1	0.91	0.81	RF	T4	0.88	0.76	RF
T1	0.88	0.77	BAG	T4	0.87	0.74	BAG
T1	0.83	0.65	C50	T4	0.83	0.66	GBM
T1	0.81	0.62	GBM	T4	0.78	0.55	C50
T2	0.88	0.76	RF	T5	0.9	0.79	RF
T2	0.87	0.75	BAG	T5	0.89	0.78	BAG
T2	0.79	0.59	GBM	T5	0.88	0.77	C50
T2	0.71	0.43	C50	T5	0.86	0.71	GBM
T3	0.9	0.79	RF	T6	0.88	0.76	RF
T3	0.88	0.76	BAG	T6	0.87	0.75	BAG
T3	0.81	0.61	GBM	T6	0.8	0.6	GBM
T3	0.74	0.47	C50	T6	0.75	0.51	C50

Figure 16.

Comparative performance (two-set images).

4.4.2 Two-set image analysis

A set of possible pairs of two images from the growing season has been formed and evaluated in this section. The decreasing order of the accuracies and the Kappa values have been presented in Table 6. The mean accuracy and Kappa values for the two-set images have been shown in Fig. 16. It has been revealed that the image set (T35), i.e., combined images from May 19, 2015, and October 10, 2015 topped the table in both measures. Set T35 is able to discriminate the sugarcane with a maximum accuracy of 94% and Kappa coefficient as 0.88.

Image T3 belongs to the tillering stage, whereas T5 belongs to the grand growth stage of the plant growth season. Consequently, non-availability of the images from all the seasons of growth period due to the clouds or some other reasons may be compensated by these two crucial stages to discriminate the ratoon from the plantation. The minimum value for the accuracy has been recorded for set T23, i.e., the combined images from May 03, and May 19, 2015. Hence, the combined images from the same stage did not classify the ratoon and plant efficiently. Once again, the RF model exhibited the best performance in terms of both measures. It has been revealed that tillering and the grand growth stages together performed significantly for the discrimination process.

Table 6
Two-set image analysis

Images	Accuracy	Model	Images	Accuracy	Model	Images	Accuracy	Model
T35	0.94	RF	T25	0.92	C50	T56	0.9	BAG
T15	0.94	RF	T14	0.92	C50	T45	0.9	BAG
T25	0.93	RF	T56	0.92	C50	T23	0.9	RF
T26	0.93	RF	T46	0.92	C50	T26	0.9	GBM
T24	0.93	C50	T26	0.92	C50	T14	0.9	GBM
T25	0.93	BAG	T13	0.92	C50	T46	0.9	BAG
T35	0.93	C50	T26	0.92	BAG	T23	0.9	BAG
T35	0.93	BAG	T25	0.91	GBM	T36	0.9	BAG
T24	0.93	RF	T36	0.91	C50	T12	0.89	RF
T15	0.93	C50	T13	0.91	BAG	T46	0.89	GBM
T45	0.93	C50	T34	0.91	RF	T56	0.89	GBM
T34	0.92	C50	T16	0.91	C50	T12	0.88	BAG
T16	0.92	RF	T34	0.91	BAG	T36	0.88	GBM
T14	0.92	BAG	T46	0.91	RF	T12	0.88	C50
T13	0.92	RF	T35	0.91	GBM	T34	0.88	GBM
T14	0.92	RF	T24	0.91	GBM	T16	0.88	GBM
T45	0.92	RF	T15	0.91	GBM	T23	0.87	GBM
T15	0.92	BAG	T56	0.91	RF	T13	0.87	GBM
T24	0.92	BAG	T36	0.9	RF	T12	0.85	GBM
T16	0.92	BAG	T45	0.9	GBM	T23	0.84	C50
Images	Kappa	Model	Images	Kappa	Model	Images	Kappa	Model
T35	0.88	RF	T26	0.84	C50	T56	0.81	BAG
T15	0.87	RF	T14	0.84	C50	T45	0.81	BAG
T25	0.87	RF	T46	0.84	C50	T23	0.81	RF
T26	0.86	RF	T56	0.84	C50	T26	0.8	GBM
T24	0.86	C50	T26	0.84	C50	T14	0.8	GBM
T25	0.86	BAG	T13	0.83	C50	T46	0.79	BAG
T35	0.85	C50	T26	0.83	BAG	T23	0.79	BAG
T35	0.85	BAG	T25	0.83	GBM	T36	0.79	BAG
T24	0.85	RF	T36	0.83	C50	T12	0.78	RF
T15	0.85	C50	T34	0.83	RF	T46	0.78	GBM
T45	0.85	C50	T13	0.83	BAG	T56	0.77	GBM
T16	0.85	RF	T16	0.82	C50	T12	0.77	BAG
T34	0.85	C50	T34	0.82	BAG	T36	0.76	GBM
T14	0.85	BAG	T46	0.82	RF	T12	0.76	C50
T45	0.84	RF	T35	0.82	GBM	T34	0.76	GBM
T13	0.84	RF	T24	0.82	GBM	T16	0.75	GBM
T14	0.84	RF	T15	0.81	GBM	T23	0.74	GBM
T15	0.84	BAG	T56	0.81	RF	T13	0.73	GBM
T24	0.84	BAG	T36	0.81	RF	T12	0.7	GBM
T16	0.84	BAG	T45	0.81	GBM	T23	0.67	C50

Table 7

Three-set image analysis

Set	Acc.	Model	Set	Acc.	Model	Set	K	Model	Set	K	Model
T256	0.95	RF	T235	0.93	BAG	T256	0.89	RF	T235	0.85	BAG
T245	0.94	BAG	T136	0.93	RF	T246	0.89	RF	T136	0.85	RF
T246	0.94	RF	T125	0.93	GBM	T245	0.89	BAG	T125	0.85	GBM
T246	0.94	C50	T156	0.93	C50	T246	0.88	C50	T156	0.85	C50
T235	0.94	RF	T136	0.93	C50	T235	0.88	RF	T136	0.85	C50
T345	0.94	RF	T156	0.93	BAG	T345	0.88	RF	T156	0.85	BAG
T135	0.94	RF	T146	0.93	C50	T135	0.88	RF	T146	0.85	C50
T245	0.94	RF	T156	0.93	GBM	T245	0.88	RF	T156	0.85	GBM
T245	0.94	C50	T146	0.93	RF	T245	0.88	C50	T146	0.85	RF
T256	0.94	BAG	T356	0.92	GBM	T256	0.88	BAG	T356	0.85	GBM
T125	0.94	RF	T134	0.92	C50	T125	0.88	RF	T134	0.85	C50
T246	0.94	BAG	T235	0.92	GBM	T246	0.87	BAG	T235	0.85	GBM
T134	0.94	BAG	T156	0.92	RF	T134	0.87	BAG	T156	0.85	RF
T245	0.93	GBM	T126	0.92	C50	T245	0.87	GBM	T126	0.85	C50
T356	0.93	RF	T234	0.92	GBM	T235	0.87	C50	T234	0.85	GBM
T235	0.93	C50	T356	0.92	BAG	T356	0.87	RF	T124	0.85	GBM
T135	0.93	BAG	T124	0.92	GBM	T126	0.87	RF	T356	0.85	BAG
T126	0.93	RF	T234	0.92	BAG	T135	0.87	BAG	T234	0.85	BAG
T256	0.93	GBM	T346	0.92	RF	T256	0.87	GBM	T346	0.84	RF
T345	0.93	C50	T456	0.92	RF	T345	0.87	C50	T456	0.84	RF
T356	0.93	C50	T246	0.92	GBM	T356	0.87	C50	T246	0.84	GBM
T125	0.93	BAG	T126	0.92	GBM	T125	0.87	BAG	T126	0.84	GBM
T124	0.93	C50	T145	0.92	GBM	T124	0.87	C50	T145	0.84	GBM
T135	0.93	C50	T236	0.92	GBM	T234	0.87	C50	T236	0.84	GBM
T234	0.93	C50	T135	0.92	GBM	T135	0.87	C50	T135	0.84	GBM
T236	0.93	RF	T236	0.92	BAG	T236	0.86	RF	T236	0.84	BAG
T236	0.93	C50	T345	0.92	GBM	T236	0.86	C50	T345	0.84	GBM
T124	0.93	RF	T346	0.92	BAG	T124	0.86	RF	T346	0.83	BAG
T256	0.93	C50	T123	0.92	RF	T256	0.86	C50	T123	0.83	RF
T234	0.93	RF	T456	0.91	BAG	T234	0.86	RF	T456	0.83	BAG
T346	0.93	C50	T136	0.91	BAG	T346	0.86	C50	T136	0.83	BAG
T126	0.93	BAG	T136	0.91	GBM	T126	0.86	BAG	T136	0.83	GBM
T345	0.93	BAG	T146	0.91	BAG	T345	0.86	BAG	T146	0.82	BAG
T145	0.93	C50	T146	0.91	GBM	T145	0.86	C50	T146	0.82	GBM
T125	0.93	C50	T134	0.91	GBM	T125	0.86	C50	T134	0.81	GBM
T124	0.93	BAG	T346	0.91	GBM	T124	0.85	BAG	T346	0.81	GBM
T456	0.93	C50	T123	0.91	C50	T456	0.85	C50	T123	0.81	C50
T145	0.93	RF	T456	0.90	GBM	T145	0.85	RF	T456	0.81	GBM
T145	0.93	BAG	T123	0.90	BAG	T134	0.85	RF	T123	0.81	BAG
T134	0.93	RF	T123	0.90	GBM	T145	0.85	BAG	T123	0.79	GBM

4.4.3 Three-set image analysis

The machine learning models have been tested for the set of three images in each iteration. The comprehensive study of Table 7 and Fig. 17 revealed that the combined image set (T256) from the tillering, grand growth and maturity stage has been able to discriminate the ratoon and sugarcane with an accuracy of 95%. On the other hand, the image set T123, which contains only initial stages and the image set T456, which contains the later stages are recorded as the lowest value of 0.90. These results confirmed that the images only from the initial stages and only from the maturity stages are not significant for the discrimination. The majority of the higher accuracy values belonged to the bagging ensemble models RF and CART (BAG). The observations from three-set image analysis gave an interesting indication that approximately one image from each of the stages is crucial to discriminate the ratoon form the sugarcane.

Figure 17.

Comparative performance (three-set images).

4.4.4 Four-set image analysis

The results obtained for the four-set images have been presented in Table 8. The accuracy of 97% has been attained for the image set T2456 under the RF model. The Kappa value has also shown significance with the value 0.93 for the random forest model. The performance of the C50 model is lowest in this set of images. The graph for the comparison of the accuracy and Kappa values has been plotted in Fig. 18. It has been observed that approximately one image from each season has been able to discriminate the type of sugarcane effectively. However, the other image sets (five-set and six-set) are also investigated in the proposed work. The next sections are devoted to the analysis of these image sets.

Figure 18.

Comparative performance (four-set images).

Table 8

Four-set image analysis

Set	Acc.	Model	Set	Acc.	Model	Set	K	Model	Set	K	Model
T2456	0.97	RF	T1235	0.94	RF	T2456	0.93	RF	T1235	0.87	RF
T1356	0.95	RF	T2346	0.94	GBM	T1356	0.91	RF	T1235	0.87	C50
T2456	0.95	BAG	T1235	0.94	C50	T2456	0.9	BAG	T2346	0.87	GBM
T2345	0.95	RF	T1236	0.93	C50	T2345	0.9	RF	T1236	0.87	C50
T2356	0.95	RF	T2356	0.93	BAG	T2356	0.9	RF	T2356	0.87	BAG
T2456	0.95	GBM	T1234	0.93	RF	T2456	0.89	GBM	T1234	0.87	RF
T2356	0.94	GBM	T1235	0.93	BAG	T2356	0.89	GBM	T1235	0.87	BAG
T1246	0.94	RF	T1456	0.93	RF	T1246	0.89	RF	T1456	0.87	RF
T2345	0.94	BAG	T1234	0.93	C50	T2345	0.89	BAG	T1234	0.87	C50
T1256	0.94	RF	T1346	0.93	C50	T1256	0.89	RF	T1346	0.87	C50
T2346	0.94	RF	T2346	0.93	BAG	T2346	0.89	RF	T2346	0.87	BAG
T1256	0.94	BAG	T3456	0.93	GBM	T1256	0.89	BAG	T3456	0.86	GBM
T2456	0.94	C50	T1234	0.93	BAG	T2456	0.89	C50	T1234	0.86	BAG
T1345	0.94	RF	T1356	0.93	BAG	T1345	0.89	RF	T1356	0.86	BAG
T1256	0.94	C50	T1346	0.93	RF	T1256	0.89	C50	T1346	0.86	RF
T1235	0.94	GBM	T2345	0.93	GBM	T1235	0.88	GBM	T2345	0.86	GBM
T1245	0.94	C50	T1246	0.93	GBM	T1245	0.88	C50	T1246	0.86	GBM
T2345	0.94	C50	T1456	0.93	C50	T2345	0.88	C50	T1456	0.86	C50
T3456	0.94	RF	T1245	0.93	GBM	T3456	0.88	RF	T1245	0.86	GBM
T1245	0.94	BAG	T1346	0.93	GBM	T1245	0.88	BAG	T1346	0.86	GBM
T3456	0.94	C50	T1356	0.93	GBM	T3456	0.88	C50	T1356	0.85	GBM
T1245	0.94	RF	T1236	0.93	BAG	T1245	0.88	RF	T1236	0.85	BAG
T1345	0.94	C50	T1246	0.93	BAG	T1345	0.88	C50	T1246	0.85	BAG
T1356	0.94	C50	T1236	0.93	GBM	T1356	0.88	C50	T1236	0.85	GBM
T1345	0.94	BAG	T1346	0.92	BAG	T1345	0.88	BAG	T1346	0.85	BAG
T1246	0.94	C50	T1234	0.92	GBM	T1256	0.87	GBM	T1234	0.85	GBM
T1256	0.94	GBM	T1456	0.92	GBM	T1246	0.87	C50	T1456	0.85	GBM
T1236	0.94	RF	T3456	0.92	BAG	T1236	0.87	RF	T3456	0.85	BAG
T2346	0.94	C50	T1345	0.92	GBM	T2346	0.87	C50	T1345	0.84	GBM
T2356	0.94	C50	T1456	0.92	BAG	T2356	0.87	C50	T1456	0.84	BAG

4.4.5 Five-set image analysis

The five images from the entire growth season have been clubbed together to investigate the effect of these images on the discrimination. Six separate image sets have been generated and tested for all the four models. Image set T12456 has topped the table (Table 9) with an accuracy value of 0.95, with a 2% decrease from the set T2456. The observations indicated that, in spite of the most accurate single image T1, the addition of this image to T2456 did not produce effective results for the classification (Fig. 19). The stages between germination and tillering are more effective than the consideration of germination only. The investigations explored that the stages between tillering and germination, as well as stages between maturity and grand growth are more significant for the classification.

Table 9
Five-set image analysis

Set	Acc.	Model	Set	Acc.	Model	Set	K	Model	Set	K	Model
T12456	0.95	RF	T23456	0.94	BAG	T12456	0.9	RF	T23456	0.88	BAG
T23456	0.95	RF	T12356	0.94	GBM	T23456	0.9	RF	T12356	0.88	GBM
T12356	0.95	RF	T12346	0.94	C50	T12356	0.89	RF	T12346	0.88	C50
T12345	0.94	RF	T12345	0.94	BAG	T12345	0.89	RF	T12345	0.88	BAG
T23456	0.94	C50	T12456	0.94	GBM	T23456	0.89	C50	T12456	0.88	GBM
T12346	0.94	RF	T12456	0.94	BAG	T12346	0.89	RF	T12456	0.88	BAG
T12345	0.94	C50	T12345	0.94	GBM	T12356	0.89	C50	T12345	0.87	GBM
T12356	0.94	C50	T13456	0.94	GBM	T12345	0.89	C50	T13456	0.87	GBM
T13456	0.94	RF	T13456	0.93	C50	T13456	0.89	RF	T13456	0.87	C50
T12356	0.94	BAG	T12346	0.93	GBM	T12356	0.89	BAG	T12346	0.87	GBM
T23456	0.94	GBM	T12346	0.93	BAG	T23456	0.88	GBM	T12346	0.86	BAG
T12456	0.94	C50	T13456	0.93	BAG	T12456	0.88	C50	T13456	0.85	BAG

4.4.6 Six-set image analysis

The accuracy and Kappa value obtained for the six-set images were 0.94 and 0.90 respectively 20. The accuracy is 3% lower than that of T2456. The results revealed that a combination of all the images from the growth season is merely redundant and will increase the computational complexity and storage space. Addition of T1 and T3 images to T2456 did not increase the performance of the underlying model neither in terms of accuracy nor in Kappa value. RF model exhibits superior performance in six-set images.

Table 10
Six-set image analysis

Image set	Accuracy	Model	Image Set	Kappa	Model
T123456	0.94	RF	T123456	0.90	RF
T123456	0.94	C50	T123456	0.88	C50
T123456	0.94	BAG	T123456	0.87	BAG
T123456	0.94	GBM	T123456	0.87	GBM

Figure 19.

Comparative performance (five-set images).

Figure 20.

Comparative performance (six-set images).

4.5 Comparative performance of models

The performance of the RF classifier was observed as superior for all the image data sets and higher accuracy and Kappa are achieved in almost all of the sets. The comparative performance of all the ensemble models has been diagrammatically presented in Fig. 21. It has been observed that RF model performed significantly well on all the image sets, and the highest accuracy has been obtained as 0.97. The highest accuracy has been recorded for the four-image set T2456 as shown in Fig. 22. The Kappa coefficient is also observed as highest for the RF model and the four-image set T2456. The plot for the comparison of Kappa coefficient values has been given in Fig. 23. The comparison of models explored that GBM model performed on the lower side. While considering the images, the single image results have been performed significantly lower than other sets.

From the comparison, it can be ascertained that a single image was not sufficient to discriminate the ratoon from the plantation with higher accuracy. Also, the addition of more than four images to the dataset did not improve the performance of the binary classification process. The accuracy and Kappa, both saturated at the five-set and six-set images. Therefore, it can be concluded that one image from each season is sufficient for the discrimination of ratoon and plant sugarcane.

Figure 21.

Comparative performance (ensemble models).

Figure 22.

Accuracy comparison.

Figure 23.

Kappa comparison.

The RF model has been selected for further analysis on the validation datasets and unseen datasets from the different sites in the study area and different years. Various metrics such as Accuracy, Kappa, Precision, Recall and F-Score has been observed. The confusion matrix and the other metrics obtained for the validation dataset have been presented in Fig. 24. The overall accuracy for the validation dataset have been recorded as 96%, whereas, the Kappa coefficient values was 0.93. The precision, recall and F1-Score were also significant for the discrimination model. The discrimination model has been tested on the unseen data from the different years. The analysis for the new data has been presented in the next section.

4.6 Predictive performance

Field experimentation confirms the fact that the greenness of the sugarcane has maximum variations initiated from the germination stage and continued till the grand growth stage. This variation factor can be used to discriminate the type of sugarcane as well as sugarcane varieties. Stages of the sugarcane growth based on the temporal variations may be effectively used for the classification. The accuracy analysis and measurement of the model’s performance have been conducted based on the on-field survey and sample data collected from the different fields of sugarcane for the year 2016 to 2019.

The comparison of the performance metrics for all the years has been presented in Table 11. The cavernous study of the Table 11 indicated that the performance of the models in the Dhanauri area is quite significant. This may be attributed to the fact that the data that has been employed to train the model belonged to this area. The irrigation system of this area is quite better than other areas. The lowest performance has been noted for the Sirmaur area. The poor performance of the developed model in this area may be due to the little difference in the climate conditions and the soil properties. Hence, the model may be enhanced for the area by incorporating some digital soil mapping model.

Table 11
Performance metrics

Dataset	Season	Accuracy	Kappa	Recall	Specificity
Site 1 – Dhanauri	2016	0.970	0.941	0.965	0.976
Site 2 – Khelri	2016	0.946	0.893	0.964	0.929
Site 3 – Sirmaur	2016	0.874	0.748	0.750	1.000
Site 1 – Dhanauri	2017	0.979	0.957	0.972	0.986
Site 2 – Khelri	2017	0.976	0.953	0.953	1.000
Site 3 – Sirmaur	2017	0.875	0.750	0.750	1.000
Site 1 – Dhanauri	2018	0.975	0.949	0.960	0.990
Site 2 – Khelri	2018	0.953	0.905	0.953	0.952
Site 3 – Sirmaur	2018	0.866	0.732	0.786	0.946
Site 1 – Dhanauri	2019	0.973	0.947	0.965	0.982
Site 2 – Khelri	2019	0.965	0.929	0.953	0.976
Site 3 – Sirmaur	2019	0.769	0.540	0.541	1.000

Figure 24.

Confusion matrix – validation dataset.

The overall results and the analysis based on the different performance evaluation metrics indicated that the discrimination model proposed in the study may be effectively used for the binary classification of ratoon and plant sugarcane based on the remote sensing data.

5. Summary and implications

The present study demonstrated that temporal variations in the spectral bands and the vegetation indices may be effectively used for the discrimination of ratoon sugarcane from the plantation. The discrimination has been considered as a Binary classification problem. The preliminary analysis and the streamlining of the temporal profile of remote sensing data have been acquired to extract the potential information for the classification. Different variables such as spectral bands, spectral indices, biometric and biophysical variables have been employed. Variable importance selection based on the random forest method has been used to create the appropriate dataset for the classification. In spite of the temporal variations, the biometric variables have not shown noticeable importance.

Machine learning ensemble methods bagging and boosting have been utilized for binary classification problem. Machine learning models RF, CART (BAG), GBM and C5.0 have been used for the temporal image analysis. It has been explored that a single image from the entire season is not sufficient to discriminate the ratoon and plant sugarcane. One image from each stage of the growth may produce quality results. Four images (May 03, September 08, October 10 and November 11, 2015) taken together have shown the best performance in terms of classification accuracy and Kappa value. However, the addition of extra images into the dataset did not improve the accuracy. It has been observed that RF model performed best with 97% accuracy, whereas the performance of GBM model is the lowest. The proposed model has explored the testing data with the desired performance. In spite of the fact that the present model has been developed from a single year training dataset, it is reasonable to deduce that the observed accuracy of the proposed method is most suitable and operational in achieving the global normal accuracy in Himalayan Foothill areas having diverse agricultural practices.

References

Mutanga

Ramoelo

and Gonah

, Trend analysis of small scale commercial sugarcane production in post resettlement areas of mkwasine zimbabwe, using hyper-temporal satellite imagery, Advances in Remote Sensing 2 (2013), 29–34.

Clark

Feng

Matwin

and Fung

, Improving image classification by combining statistical, case-based and model-based prediction methods, Fundamenta Informaticae 30(3–4) (1997), 227–240.

Gomathi

Rao

P.N.G.

Rakkiyappan

Sundara

B.P.

and Shiyamala

, Physiological studies on ratoonability of sugarcane varieties under tropical indian condition, American Journal of Plant Sciences 4 (2013), 274–281.

Anitha

Mary

and Purushothaman

R.S.

, Biometric and physiological characteristics of sugarcane ratoon under waterlogging condition, Plant Archives 16(1) (2016), 105–109.

Singh

R.K.

Singh

S.P.

and Singh

S.B.

, Correlation and path analysis in sugarcane ratoon, Sugar 7(4) (2005), 176–178.

Jackson

P.A.

, Breeding for improved sugar content in sugarcane, Field Crop Research 92 (2005), 277–290.

Teruel

D.A.

Barbieri

and Ferraro

L.A.

Jr., Sugarcane leaf area index modeling under different soil water conditions, Scientia Agricola 54 (1997), 39–44.

Lillesand

T.M.

Kiefer

R.W.

and Chipman

J.W.

, Remote Sensing and Image Interpretation, 6th edn, Wiley India Pvt. Ltd., 2008.

Patra

Ghosh

and Ghosh

, Change detection of remote sensing images with semi-supervised multilayer perceptron, Fundamenta Informaticae 84(3–4) (2008), 429–442.

10.

Badhwar

G.D.

, Automatic corn-soybean classification using landsat MSS data. I. Near-harvest crop proportion estimation, Remote Sensing of Environment 14(1) (1984), 15–29.

11.

Badhwar

G.D.

, Classification of corn and soybean using multitemporal thematic mapper data, Remote Sensing of Environment 16(2) (1984), 175–181.

12.

Bauer

M.E.

, Spectral inputs to crop identification and condition assessment, Proceedings of the IEEE 73(6) (1985), 1071–1085.

13.

Rajak

D.R.

Oza

M.P.

Bhagia

and Dadhwal

V.K.

, Spectral wheat growth profile in Punjab using IRS WiFS data, Journal of the Indian Society of Remote Sensing 33(2) (2005), 345–352.

14.

Mulianga

Bégué

Simoes

and Todoroff

, Forecasting regional sugarcane yield based on time integral and spatial aggregation of MODIS NDVI, Remote Sensing 5(5) (2013), 2184–2199.

15.

Kudrat

Sharma

K.P.

Tiwari

A.K.

Kumar

Prabhakaran

and Manchanda

M.L.

, Discrimination of newly planted and ratoon crops of sugar cane using multidate IRS-1C LISS III data: a knowledge based approach, Jounral of Indian Society of Remote Sensing 28(2) (2000), 179–185.

16.

Singla

S.K.

Garg

R.D.

and Dubey

O.P.

, Machine Learning Models to Estimate the Sugarcane Brix Values from Multitemporal Vegetation Indices, in: 2020 International Conference on Computation, Automation and Knowledge Management (ICCAKM), 2020, pp. 177–183.

17.

Singh

Rai

R.K.

Suman

Srivastava

T.K.

Singh

K.P.

Arya

and Yadav

R.L.

, Soil-root interface changes in sugarcane plant and ratoon crops under subtropical conditions: implications for dry-matter accumulation, Communications in Soil Science and Plant Analysis 46(4) (2015), 454–475.

18.

Morel

Todoroff

Bégué

Bury

Martiné

J.-F.

and Petit

, Toward a satellite-based system of sugarcane yield estimation and forecasting in smallholder farming conditions: a case study on reunion island, Remote Sensing 6(7) (2014), 6620–6635.

19.

Dangwal

Patel

N.R.

Kumari

and Saha

S.K.

, Monitoring of water stress in wheat using multispectral indices derived from landsat-TM, Geocarto International 31(6) (2016), 682–693.

20.

Silleos

N.G.

Alexandridis

T.K.

Gitas

I.Z.

and Perakis

, Vegetation indices: advances made in biomass estimation and vegetation monitoring in the last 30 years, Geocarto International 21(4) (2006), 21–28.

21.

Atkinson

P.M.

and Tatnall

A.R.L.

, Introduction Neural networks in remote sensing, International Journal of Remote Sensing 18(4) (1997), 699–709.

22.

Breiman

, Random Forests, Machine Learning 45(1) (2001), 5–32.

23.

Cortes

and Vapnik

, Support-vector networks, Machine Learning 20 (1995), 273–297.

24.

Liakos

K.G.

Busato

Moshou

Pearson

and Bochtis

, Machine learning in agriculture: a review, Sensors 18 (2018), 2674–2703.

25.

Maulik

and Chakraborty

, A robust multiple classifier system for pixel classification of remote sensing images, Fundamenta Informaticae 101(4) (2010), 286–304.

26.

Weigend

A.S.

Mangeas

and Srivastava

A.N.

, Nonlinear gated experts for time sereis: discovering regimes and avoiding overfiting, International Journal of Neural Systems 6(4) (1995), 373–399.

27.

Punera

and Ghosh

, Consensus-based ensembles of soft clusterings, Applied Artificial Intelligence 22(7–8) (2008), 780–810.

28.

Heinl

Walde

Tappeiner

and Tappeiner

, Classifiers vs. input variables. The drivers in image classification for land cover mapping, International Journal of Applied Earth Observation and Geoinformation 11(6) (2009), 423–430.

29.

and Weng

, A survey of image classification methods and techniques for improving classification performance, International Journal of Remote Sensing 28(5) (2007), 823–870.

30.

Rodriguez-Galiano

V.F.

Chica-Olmo

Abarca-Hernandez

Atkinson

P.M.

and Jeganathan

, Random forest classification of mediterranean land cover using multi-seasonal imagery and multi-seasonal texture, Remote Sensing of Environment 121 (2012), 93–107.

31.

Dollar

Tao

and Belongie

, Feature Mining for Image Classification, in: 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–8.

32.

Hao

Zhan

Wang

Niu

and Shakir

, Feature selection of time series MODIS data for early crop classification using random forest: a case study in Kansas, USA, Remote Sensing 7(5) (2015), 5347–5369.

33.

Shuai

Shen

Yang

Lan

Y.C.

Lee

P.S.

and Chen

, A comprehensive study on social network mental disorders detection via online social media mining, IEEE Transactions on Knowledge and Data Engineering 30(7) (2018), 1212–1225.

34.

Conrad

Colditz

R.R.

Dech

Klein

and Vlek

P.L.G.

, Temporal segmentation of MODIS time series for improving crop classification in Central Asian irrigation systems, International Journal of Remote Sensing 32(23) (2011), 8763–8778.

35.

Belgiu

and Dragut

, Random forest in remote sensing: a review of applications and future directions, ISPRS Journal of Photogrammetry and Remote Sensing 114 (2016), 24–31.

36.

Singla

S.K.

Garg

R.D.

and Dubey

O.P.

, Sugarcane ratoon discrimination using LANDSAT NDVI temporal data, Spatial Information Research 26 (2018), 415–425.

37.

Xue

and Su

, Significant remote sensing vegetation indices: a review of developments and applications, Journal of Sensors 2017 (2017), 1–17.

38.

Wang

Rich

P.M.

and Price

K.P.

, Temporal responses of NDVI to precipitation and temperature in the central great plains, USA, International Journal of Remote Sensing 24(11) (2003), 2345–2364.

39.

Turner

D.P.

Cohen

W.B.

Kennedy

R.E.

Fassnacht

K.S.

and Briggs

J.M.

, Relationships between leaf area index and landsat TM spectral vegetation indices across three temperate zone sites, Remote Sensing of Environment 70(1) (1999), 52–68.

40.

Townshend

J.R.G.

Goff

T.E.

and Tucker

C.J.

, Multitemporal dimensionality of images of normalized difference vegetation index at continental scales, IEEE Transactions on Geoscience and Remote Sensing GE-23(6) (1985), 888–895.

41.

Basso

Cammarano

and Carfagna

, Review of Crop Yield Forecasting Methods and Early Warnings, 2013.

42.

Jordan

C.F.

, Derivation of leaf-area index from quality of light on the forest floor, Ecology 50(4) (1969), 663–666.

43.

Rouse

J.W.

Haas

R.H.

Schell

J.A.

and Deering

D.W.

, Monitoring vegetation systems in the Great Plains with ERTS, 1974.

44.

Huete

A.R.

, A soil-adjusted VEGETATION index SAVI, Remote Sensing of Environment 25 (1988), 295–309.

45.

Gitelson

A.A.

Kaufman

Y.J.

and Merzlyak

M.N.

, Use of a green channel in remote sensing of global vegetation from EOS-MODIS, Remote Sensing of Environment 58(3) (1996), 289–298.

46.

Rondeaux

G.R.

Steven

and Baret

, Optimization of soil-adjusted vegetation indices, Remote Sensing of Environment 55(2) (1996), 95–107.

47.

Tucker

C.J.

, Red and photographic infrared linear combinations for monitoring vegetation, Remote Sensing of Environment 8(2) (1979), 127–150.

48.

Kaufman

Y.J.

and Tanre

, Atmospherically resistant vegetation index (ARVI) for EOS-MODIS, IEEE Transactions on Geoscience and Remote Sensing 30(2) (1992), 261–270.

49.

Gitelson

A.A.

Y.G.

and Merzlyak

M.N.

, Relationships between leaf chlorophyll content and spectral reflectance and algorithms for non-destructive chlorophyll assessment in higher plant leaves, Journal of Plant Physiology 160(3) (2003), 271–282.

50.

Huete

Justice

and Liu

, Development of vegetation and soil indices for MODIS-EOS, Remote Sensing of Environment 49(3) (1994), 224–234.

51.

Huete

Didan

Miura

Rodriguez

E.P.

Gao

and Ferreira

L.G.

, Overview of the radiometric and biophysical performance of the MODIS vegetation indices, Remote Sensing of Environment 83(1) (2002), 195–213.

52.

Gitelson

A.A.

Stark

Grits

Rundquist

Kaufman

and Derry

, Vegetation and soil lines in visible spectral space: a concept and technique for remote estimation of vegetation fraction, International Journal of Remote Sensing 23(13) (2002), 2537–2562.

53.

McFeeters

S.K.

, The use of the normalized difference water index (NDWI) in the delineation of open water features, International Journal of Remote Sensing 17(7) (1996), 1425–1432.

54.

Wilson

E.H.

and Sader

S.A.

, Detection of forest harvest type using multiple dates of Landsat TM imagery, Remote Sensing of Environment 80(3) (2002), 385–396.

55.

Sripada

Heiniger

White

and Meijer

, Aerial color infrared photography for determining early in-season nitrogen requirements in corn, Agronomy Journal 98(4) (2006), 968–977.

56.

Tucker

C.J.

Townshend

J.R.G.

and Goff

T.E.

, African land-cover classification using satellite data, Science 227(4685) (1985), 369–375.

57.

Gers

C.J.

, Relating remotely sensed multi-temporal Landsat 7 ETM+ imagery to sugarcane characteristics, in: Proc S Afr Sug Technol Ass, 2003, p. 7.

58.

Chatwachirawong

Kitaura

Srinives

and Nawata

, Construction of a simple yield estimation model for productivity prediction in sugarcane, Tropical Agriculture and Development 56(3) (2012), 113–116.

59.

Lee

Gong

and Ryu

, Characteristics of Landsat 8 OLI-derived NDVI by comparison with multiple satellite sensors and in-situ observations, Remote Sensing of Environment 164 (2015), 298–313.

60.

Mulianga

Bégué

Clouvel

and Todoroff

, Mapping cropping practices of a sugarcane-based cropping system in kenya using remote sensing, Remote Sensing 7(11) (2015), 14428–14444.

61.

Wei

Yang

and Zhou

, Selecting the Optimal NDVI Time-Series Reconstruction Technique for Crop Phenology Detection, Intelligent Automation & Soft Computing, 2015, 1–11.

62.

Priyadarshi

Chowdary

V.M.

Srivastava

Y.K.

Das

I.C.

and Jha

C.S.

, Reconstruction of time series MODIS EVI data using de-noising algorithms, Geocarto International 33(10) (2018), 1095–1113.

63.

Duro

D.C.

Franklin

S.E.

and Dube

M.G.

, A comparison of pixel-based and object-based image analysis with selected machine learning algorithms for the classification of agricultural landscapes using SPOT-5 HRG imagery, Remote Sensing of Environment 118 (2012), 259–272.

64.

Grinand

Rakotomalala

Gond

Vaudry

Bernoux

and Vieilledent

, Estimating deforestation in tropical humid and dry forests in Madagascar from 2000 to 2010 using multi-date Landsat satellite images and the random forests classifier, Remote Sensing of Environment 139 (2013), 68–80.

65.

Wang

Xia

Tang

and Zhu

, A novel consistent random forest framework: bernoulli random forests, IEEE Transactions on Neural Networks and Learning Systems 29(8) (2018), 3510–3523.

66.

Genuer

Poggi

J.-M.

and Tuleau-Malot

, Variable seelction using random forests, Pattern Recognition Letters 31(14) (2010), 2225–2236.

67.

Leonard

L.C.

, Chapter One – Web-Based Behavioral Modeling for Continuous User Authentication (CUA), in: Advances in Computers, Vol. 105, A.M. Memon, ed., Elsevier, 2017, pp. 1–44. ISSN 0065-2458.

68.

Cohen

, A coefficient of agreement for nominal scales, Educational and Psychological Measurement 20(1) (1960), 37–46.

69.

Artstein

and Poesio

, Inter-coder agreement for computational linguistics, Computational Linguistics 34(4) (2008), 555–596.

70.

Ali

Martelli

Lupia

and Barbanti

, Assessing multiple years Š spatial variability of crop yields using satellite vegetation indices, Remote Sensing 11(20) (2019), 1–23.

71.

Foerster

Kaden

Foerster

and Itzerott

, Crop type mapping using spectral-temporal profiles and phenological information, Computers and Electronics in Agriculture 89 (2012), 30?0–.

72.

Fassnacht

F.E.

Latifi

Sterenczak

Modzelewska

Lefsky

Waser

L.T.

Straub

and Ghosh

, Review of studies on tree species classification from remotely sensed data, Remote Sensing of Environment 186 (2016), 64–87.

73.

Vuolo

Neuwirth

Immitzer

Atzberger

and Ng

W.-T.

, How much does multi-temporal Sentinel-2 data improve crop type classification? International Journal of Applied Earth Observation and Geoinformation 72 (2018), 122–130.

74.

Pax-Lenney

and Woodcock

C.E.

, Monitoring agricultural lands in Egypt with multitemporal Landsat TM imagery: how many images are needed? Remote Sensing of Environment 59(3) (1997), 522–529.

75.

Abad

M.S.J.

Abkar

A.A.

and Mojaradi

, Effect of the temporal gradient of vegetation indices on early-season wheat classification using the random forest classifier, Applied Sciences 8(8) (2018), 1216.

Ensemble machine learning methods for spatio-temporal data analysis of plant and ratoon sugarcane

Abstract

Keywords

1. Introduction

3. Modelling framework

3.1 Data acquisition

3.1.1 Reference data

3.1.3 Remote sensing data

Table 1 Landsat 8 bands

Table 4 Predictors used for the modelling in the study

3.5 Performance evaluation metrics

4.3 Dataset generation

4.3.1 Random forest parameters (mtry and ntree)

4.4.1 Single image analysis

Table 6 Two-set image analysis

Table 9 Five-set image analysis

Table 10 Six-set image analysis

Table 11 Performance metrics

References

Table 1
Landsat 8 bands

Table 4
Predictors used for the modelling in the study

Table 6
Two-set image analysis

Table 9
Five-set image analysis

Table 10
Six-set image analysis

Table 11
Performance metrics