Abstract
Existence of non-lint materials (or botanic trash) within commercial cotton bales degrades their market value, requires a further cleaning process, and compromises finished product quality. To meet the challenge of assessing the trash content, a number of approaches have been in practice. In the US, one term to assess the degree of trash amount is leaf grade, which was originally determined by qualified US Department of Agriculture’s AMS cotton classers via a visual inspection procedure. Recently, the AMS has revised the protocol for cotton leaf grade classification, by replacing the classer’s leaf determination with instrumental leaf measurement from cotton classification HVI™ system. In this study, visible/NIR spectra were acquired to explore the potential for the discrimination of cotton samples with various leaf grade categories. Seven-class classification models in different spectral regions were developed to optimize the identification efficiency. Results indicated that using the model in the 1105–1700 nm NIR region could reach an acceptable separation of ∼95.0%, with a 89.9% correct identification in validation set and a 100% success in calibration set. Furthermore, factors of influencing the correct classification were discussed briefly.
Cotton fiber is harvested by machines in the US and Australia and also in other countries. 1 Mechanically picked cottons contain some degree of plant-related contaminants and other irregular foreign matter. 2 Considerable efforts have been made to remove foreign matter (e.g. botanic trash) as much as possible during subsequent ginning and cleaning practices. 3 However, it remains a challenge to remove all trash from lint fiber without damaging fibers.
Trash in commercial cotton bales compromises its market value, requires further cleaning steps, and impacts yarn and fabric end-products. Various techniques, such as the Shirley analyzer (SA),
4
high volume instrument (HVI),
5
advanced fiber information system (AFIS),
6
micro dust and trash analyzer (MDTA),
7
and FibroLab,
8
are commonly used to assess cotton trash in the US and other countries. The US Department of Agriculture (USDA)’s Agricultural Marketing Service (AMS) has implemented the automation-based HVI™ procedure, as a universal testing method and official classification system, to identify the number of non-lint particles (particle count or
In July 2011, the AMS revised the cotton classification protocol for determining cotton leaf grade. Beginning with the 2011 cotton classing season, the AMS replaced the classers’ leaf determination with instrumental leaf measurement from the HVI™ system. In the developmental stage, the AMS applied mathematical algorithms to correlate
To evaluate the possibility of a rapid and low-cost method that can be used, away from the laboratory, in places such as ginning sites, near infrared (NIR) spectroscopy has been attempted for the prediction of trash contents from HVI™ measurement.11,12 Thomasson and Shearer reported the development of optimal NIR models for eight HVI™ cotton quality characteristics including HVI™ trash content.
11
In evaluating different trash determinations by NIR technique, trash models built from
The main objective of this study was to examine the potential of NIR spectroscopy, with an extension to the visible (400–750 nm) region, for the classification efficiency among seven leaf grade categories of commercial cottons through the approach of soft independent modeling of class analogy of principal component analysis (SIMCA/PCA). This strategy differs from earlier quantitative prediction of HVI™ trash content from the NIR technique.11,12 The ultimate goal is to develop this technique for rapid, accurate, nondestructive, and routine determination of cotton fiber qualities (including leaf grade) in cotton fields and ginning sites.
Materials and methods
Cotton samples and official instrumental leaf grade readings
A total of 650 lint cotton samples and their instrumental leaf grade assignments were utilized, of which 350 and 300 samples were retained from the 2010 and 2011 crop year, respectively, to represent the diversities in upland cotton varieties, growth locations, crop years, and ginning practices within the US. The spectra were acquired under a standard procedure of conditioning at a constant relative humidity of 65 ± 2% and temperature of 21 ± 2℃ for at least 48 hours.
Visible/NIR reflectance spectral measurement
All visible/NIR reflectance spectra of samples from two crop years were acquired on a Foss XDS rapid content analyzer (Foss NIRSystems Inc., Laurel, MD). About 10 g of cotton fibers was pressed into a Foss coarse granular cell, which is rectangular with internal dimensions of 3.8 cm-wide × 15.2 cm-long × 4.8 cm-depth. With the granular cell moving across the optical window and stopping at 8 locations, the spectral scan covered a large surface sampling area of approximately 36 cm2. To keep a good contact between the cotton sample and the optical window, 750 g of extra weight (∼0.18 PSI, pound-force per square inch) was loaded on the top of fiber samples consistently throughout the entire experiment. A background was recorded with the use of an internal ceramic reference tile before scanning the samples. The log(1/reflectance) or log(1/R) readings were acquired over the 400–2500 nm wavelength range at 0.5 nm interval and 32 scans. Three spectra were collected for each cotton sample by repacking and then mean spectrum was available for model development. It took less than 5 min for one sample from the start to finish.
Basics of classification models
Multivariate quantitative models, from such a procedure as partial least-squares (PLS) regression, require a large number of training samples to build accurate reliable calibration equations. It takes time collecting the samples and measuring the references of targeted properties by the established or standard methods. In general, quantitative models will predict reasonable values for the calibrated constitutes, provided the spectra of the unknown samples are fairly similar to the training set. In some scenarios, the constituents’ information is not easy to determine or the samples are hard to collect. Nevertheless, the spectrum of a sample is unique to its compositions, and samples of similar compositions should have spectra that are very similar as well. Therefore, it may be possible to show a difference between two samples by only comparing the spectra with such simple methods as visual inspection or spectral subtraction. These subjective methods cannot be applied for large spectral sets. Consequently, principal component analysis (PCA)-based discriminant protocol, a process of classifying the samples on the basis of their spectral characteristics and logical assignment, is a preferred technique. 13 It becomes clear that PLS and PCA are different approaches and reflect the complementary information.
PCA is a very effective variable reduction technique for spectroscopic data. It decomposes the training (or calibration) set spectra into mathematical spectra (called loading vectors, factors, principal components, etc.) that represent the most common variations to all data. When the scores (or scaling coefficients) are multiplied by the loading vectors, and the results summed, the original spectra are reconstructed. The ‘restructured’ spectrum can be subtracted from the original spectrum to determine how well the PCA model is performing for the sample. The result of this subtraction is known as the spectral residual that effectively measures the amount of each spectrum left over. By calculating the sum of the squares of the spectral residual across the wavelengths, an additional value can be generated for each spectrum and is the basis of discrimination method known as soft independent modeling of class analogy (SIMCA).
During the discrimination process, the spectrum of the ‘unknown’ sample is compared against multiple models and then the spectral residual values are calculated. Next, the sample (or spectrum) that shows the smallest spectral residual value is classified as a member of the class being modeled.
Development of SIMCA/PCA classification models
Sample distribution and assignment in calibration and validation set a
Leaf grade 8 and partial leaf grade 7 samples were not available.
All spectra were imported into PLSplus/IQ package in Grams/AI (Version 7.01, Thermo Fisher Scientific, Waltham, MA), and discriminant models were subsequently developed. Classification models were established using seven classes (representing leaf grade 1 through 7) with mean centering (MC) and the first (1st) derivative spectral pretreatment in various spectral regions. 14 For each of the seven classes in different models, the optimal number of factors, suggested by the software package to be between 9 and 13 mostly, was utilized. With applying the SIMCA/PCA models to validation (and calibration) samples and employing the class assignment rule of the least spectral residual values, the sample was identified as belonging in the class being modeled, i.e. one of seven groups ranging from leaf grade 1 to 7 categories. During the SIMCA/PCA process, leaving one-out cross-validation method was used.
Results and discussion
Visible/NIR reflectance spectra of cotton fibers with various leaf grade
Figure 1 shows the average log (1/R) spectra in the spectral region of 400–2500 nm of cotton fibers with instrumental leaf grade 1, 3, 5, and 7, respectively. There are at least five intense and broad bands with one (<700 nm) in the visible region (400–750 nm) and four (1490, 1935, 2105, and 2340 nm) in the NIR region (750–2500 nm). In this study, samples were from commercial cotton bales, thus the interferences from non-lint or foreign contaminants (such as botanical trash) could not be neglected. In general, the visible region of 400–750 nm contains the color information and represents a mixture of contributions not only from the natural pigmentation compounds present in cotton fibers, for example, flavonoids and/or degraded products between a reducing sugar and an amino acid,15,16 and also from the pigmentation species of cotton plant parts (i.e. non-lint components), such as the chlorophylls and its degradation derivatives in leaves and bracts.
17
The NIR bands are mainly due to the (first (1st) and second (2nd)) overtones and combinations of OH and CH stretching vibrations of both cotton cellulose and trash cellulosic-related compounds, in which the cotton cellulose contributes more to total mass (in turn the spectral intensity) than trash portion. The broad absorptions between 1150 nm and 1300 nm are from the 2nd overtones of CH stretching modes and their 1st overtones appear in the 1675–1860 nm region.
14
Features in the 1300–1400 nm region are ascribed to combination bands of the CH vibrations. Broad and intense bands in the 1400–1675 nm region are due to the overlap of the 1st overtones of the OH stretching modes in hydrogen bonded forms. The strong bands at 1935 and 2105 are most likely attributed to the combination of OH stretching and deformation mode and the combination of OH and CO stretching vibrations in cellulose, respectively.
14
Representative visible/NIR log(1/R) spectra of cotton fibers with four leaf grade reading in the 400–2500 nm region.
Although Figure 1 suggests that cotton fibers with low leaf grade have the visible/NIR bands in common with ones having high leaf grade, there appear to be some intensity variations induced by the leaf grade elevating. For example, the spectra of those with high leaf grade show great log (1/R) intensity increase in the spectral region of visible/short-wavelength (SW) NIR region (<1100 nm) and also relatively weak intensity reduction in the range from 1100 to 2500 nm. This is reasonable, mainly due to large variation and complexity in color and compositions between low and high leaf grade cottons, because higher leaf grade samples should have the characteristics of larger
Comparison of non-trash lint and trash component in the 400–2500 nm region is given in Figure 2. Both non-trash lint and trash samples were acquired during the SA cleaning process, then were cut into powders before collecting the respective spectra. The reason for the cutting was to make the closely contact between optical window and such trash portions as plant stems and hulls. There exist expected spectral differences between trash and non-trash lint, owing to clear difference in chemical compositions between the trash, a mixture of main contributions from plant parts, and cotton fibers, a majority (>94%) of cellulose species. However, it is very unlikely to interpret the origins of broad and overlapped trash bands from the molecular vibration mode level, since trash is a mixture of such plant parts as leaves, seed coats, hulls, and stems, and also the distribution of individual plant part in trash may change from one sample to another. It is of great interest to understand the spectral feature of individual or pure plant parts and further for their identification, and this concept was reported by Fortier et al. previously.
18
Typical visible/NIR log(1/R) spectra of non-trash lint and trash components, which were retained from SA cleaning operation and cut into powders for comparison.
Classification models
Seven-class classification of instrumental leaf grade 1 through 7 in cotton fibers from visible/NIR reflectance a
Total number of calibration (calib) and validation (valid) samples were 442 and 208.
Mean of % correct classification in calibration and validation set.
Of the 208 and 442 samples in validation and calibration sets, 170 and 435 samples were correctly classified into respective classes by the model from the 405–2495 nm visible/NIR region, with a 90.0% of overall classification. The use of discriminant model from the 1105–2495 nm NIR region slightly improved the accuracy of correct identification from 90.0% to 90.9%, but the model from the 405–1095 nm visible/SW-NIR region compromised the identification capability to 86.3%.
To further examine the effect of spectral regions on classification modeling power, additional four spectral ranges were considered. It is encouraging to observe a more enhanced separation of around 95.0% by the model from the 1105–1700 nm region, with a 89.9% correct identification in validation set and a 100% determination in calibration set (shown in
Unexpectedly, the models that involved the extension to 405 nm visible region generated poorer separations than those created from the regions higher than 725 nm. For instance, three models extending to 405 nm region yielded a classification rate of 86.3% to 90.0%, while the ones excluding the 405 nm region produced an identification success ranging from 90.9% to 95.0%.
Despite more apparent intensity variations in the 400-1100 nm region than in the 1100-2500 nm region (Figure 1), the optimal model given in Table 2 utilized the portion bands from the latter range. A probable rationale may be due to the increase in non-leaf trashes among high leaf grade cottons.
Spectral intensity variations and classification efficiency
Analysis of correct classification in individual leaf grade validation sets of leaf grades 1–7 from the optimal 1105–1700 nm model indicated that there were 3, 5, 0, 2, 5, 3, and 3 samples that were misclassified (Table 3), respectively. In general, total numbers of samples incorrectly determined in validation set increased along with those in calibration set (Table 2). These two factors may suggest that the classification error likely was from the degree of diversity and representation of both calibration and validation samples within each leaf grade category. To this regard, typical plots of average log(1/R) spectra ± one standard deviation (SD) of four leaf grade classes are depicted in Figure 3, and also the average of SD values for all samples in this region are inserted in Table 3.
Plots of average log(1/R) spectra ± one standard deviation (SD) of four leaf grade samples. Seven-class classification of instrumental leaf grade 1 through 7 fibers in validation set from the 1105-1700 nm NIR region
It can be seen from Table 3 that samples in the leaf grade 1 pool showed the lowest mean SD value, which is reasonable as they contain little trash amount. Overall, the correct classification would be better than 90% for leaf grade 2 through 7 category when the SD values are larger than 0.0080. This might address the importance of diversity and variation of samples within individual leaf grade category during the model development.
As another approach to interpret the spectral intensity variation, all 442 spectra in calibration set were subjected to PCA characterization in the 1105–1700 nm region with full cross validation method. The first three principal components (PCs) accounted for 93.2% of the total variation, with the first PC (PC1), the second PC (PC2), and the third PC (PC3) explaining 55.6%, 21.2%, and 16.4% of the spectral variation, respectively.
The plot of PC1 vs. PC3 scores provided the better visualization of separation among seven leaf grade groups than such a combination as PC1 vs. PC2, and a representative version of four selected leaf grade samples is shown in Figure 4. It reveals that the fiber clusters could not be separated clearly by different leaf grade categories with the use of a sole PC score. This is anticipated, because samples having identical leaf grade reading could result from two extreme cases, either with high PC1 vs. PC 3 score–score plot of four leaf grade samples from the 1105–1700 nm NIR region: leaf grade 1 (▪), leaf grade 3 (▴), leaf grade 5 (•), and leaf grade 7 (♦).
Relationship between principal component (PC) scores and selected leaf grade categories a
68 samples for leaf grade 1, 3, and 5 class; 34 samples for leaf grade 7 group.
Conclusions
The results of the present study demonstrate the usefulness and effectiveness of visible/NIR spectroscopy in determining leaf grade of commercial cotton fibers. In order to optimize the classification efficiency, seven-class SIMCA/PCA models were developed in different spectral regions and then compared. Results revealed that the discrimination model from the 1105–1700 nm NIR region could distinguish one class of leaf grade fibers from other six groups at a satisfactory level of ∼95.0%. Meanwhile, factors impacting the correct identification efficiency were discussed, by characterizing the SDs within individual leaf grade category and also performing PCA on all calibration samples. The finding is most promising in the development of optical spectral sensing system for in-situ measurement of instrumental leaf grade at ginning facilities.
Notably, the current results are solely based on US cotton samples. With a wider range of cottons from other origins, the percentage of correct classification will most likely be impacted, as the cotton as well as the trash show different characteristics in different regions (or countries). A future perspective may be necessary to broaden the study.
Footnotes
Acknowledgements
Part of the work was done at ARS Clemson facility (officially closed in November 2011). We sincerely thank Mr. James Knowlton (USDA, AMS, Memphis, TN) for meeting our request of providing the diversified cotton samples and also Ms. Mattie Morris (retired from ARS) for technical assistance.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
