Potential of visible and near infrared spectroscopy in the determination of instrumental leaf grade in lint cottons *

Abstract

Existence of non-lint materials (or botanic trash) within commercial cotton bales degrades their market value, requires a further cleaning process, and compromises finished product quality. To meet the challenge of assessing the trash content, a number of approaches have been in practice. In the US, one term to assess the degree of trash amount is leaf grade, which was originally determined by qualified US Department of Agriculture’s AMS cotton classers via a visual inspection procedure. Recently, the AMS has revised the protocol for cotton leaf grade classification, by replacing the classer’s leaf determination with instrumental leaf measurement from cotton classification HVI™ system. In this study, visible/NIR spectra were acquired to explore the potential for the discrimination of cotton samples with various leaf grade categories. Seven-class classification models in different spectral regions were developed to optimize the identification efficiency. Results indicated that using the model in the 1105–1700 nm NIR region could reach an acceptable separation of ∼95.0%, with a 89.9% correct identification in validation set and a 100% success in calibration set. Furthermore, factors of influencing the correct classification were discussed briefly.

Keywords

Cotton trash leaf grade cotton classification visible and near infrared spectroscopy NIR

Cotton fiber is harvested by machines in the US and Australia and also in other countries.¹ Mechanically picked cottons contain some degree of plant-related contaminants and other irregular foreign matter.² Considerable efforts have been made to remove foreign matter (e.g. botanic trash) as much as possible during subsequent ginning and cleaning practices.³ However, it remains a challenge to remove all trash from lint fiber without damaging fibers.

Trash in commercial cotton bales compromises its market value, requires further cleaning steps, and impacts yarn and fabric end-products. Various techniques, such as the Shirley analyzer (SA),⁴ high volume instrument (HVI),⁵ advanced fiber information system (AFIS),⁶ micro dust and trash analyzer (MDTA),⁷ and FibroLab,⁸ are commonly used to assess cotton trash in the US and other countries. The US Department of Agriculture (USDA)’s Agricultural Marketing Service (AMS) has implemented the automation-based HVI™ procedure, as a universal testing method and official classification system, to identify the number of non-lint particles (particle count or ${HVI}_{count}^{TM}$ ) and to measure the surface area covered by non-lint (or trash) particles (area or ${HVI}_{area}^{TM}$ ).⁹ Meanwhile, qualified AMS human classers visually examined the cotton samples and compared them to the Universal Cotton Standards for manual determination of leaf grade and extraneous matter.

In July 2011, the AMS revised the cotton classification protocol for determining cotton leaf grade. Beginning with the 2011 cotton classing season, the AMS replaced the classers’ leaf determination with instrumental leaf measurement from the HVI™ system. In the developmental stage, the AMS applied mathematical algorithms to correlate ${HVI}_{count}^{TM}$ and ${HVI}_{area}^{TM}$ data to the Universal Cotton Standards for leaf grade during the 2009 and 2010 cotton classing seasons. The trial results showed that the HVI™ was capable of determining leaf grade as defined by the Universal Cotton Standards for Leaf Grade more accurately than cotton classers, as factors such as grader fatigue were eliminated. Hence, the AMS has determined that instrumental leaf grading has improved the repeatability, consistency, and accuracy of leaf grade classification data provided to the cotton industry, while improving operational efficiency.¹⁰

To evaluate the possibility of a rapid and low-cost method that can be used, away from the laboratory, in places such as ginning sites, near infrared (NIR) spectroscopy has been attempted for the prediction of trash contents from HVI™ measurement.^11,12 Thomasson and Shearer reported the development of optimal NIR models for eight HVI™ cotton quality characteristics including HVI™ trash content.¹¹ In evaluating different trash determinations by NIR technique, trash models built from ${HVI}_{count}^{TM}$ and ${HVI}_{area}^{TM}$ were observed to be promising in the quantitative determination of both trash components.¹²

The main objective of this study was to examine the potential of NIR spectroscopy, with an extension to the visible (400–750 nm) region, for the classification efficiency among seven leaf grade categories of commercial cottons through the approach of soft independent modeling of class analogy of principal component analysis (SIMCA/PCA). This strategy differs from earlier quantitative prediction of HVI™ trash content from the NIR technique.^11,12 The ultimate goal is to develop this technique for rapid, accurate, nondestructive, and routine determination of cotton fiber qualities (including leaf grade) in cotton fields and ginning sites.

Materials and methods

Cotton samples and official instrumental leaf grade readings

A total of 650 lint cotton samples and their instrumental leaf grade assignments were utilized, of which 350 and 300 samples were retained from the 2010 and 2011 crop year, respectively, to represent the diversities in upland cotton varieties, growth locations, crop years, and ginning practices within the US. The spectra were acquired under a standard procedure of conditioning at a constant relative humidity of 65 ± 2% and temperature of 21 ± 2℃ for at least 48 hours.

Visible/NIR reflectance spectral measurement

All visible/NIR reflectance spectra of samples from two crop years were acquired on a Foss XDS rapid content analyzer (Foss NIRSystems Inc., Laurel, MD). About 10 g of cotton fibers was pressed into a Foss coarse granular cell, which is rectangular with internal dimensions of 3.8 cm-wide × 15.2 cm-long × 4.8 cm-depth. With the granular cell moving across the optical window and stopping at 8 locations, the spectral scan covered a large surface sampling area of approximately 36 cm². To keep a good contact between the cotton sample and the optical window, 750 g of extra weight (∼0.18 PSI, pound-force per square inch) was loaded on the top of fiber samples consistently throughout the entire experiment. A background was recorded with the use of an internal ceramic reference tile before scanning the samples. The log(1/reflectance) or log(1/R) readings were acquired over the 400–2500 nm wavelength range at 0.5 nm interval and 32 scans. Three spectra were collected for each cotton sample by repacking and then mean spectrum was available for model development. It took less than 5 min for one sample from the start to finish.

Basics of classification models

Multivariate quantitative models, from such a procedure as partial least-squares (PLS) regression, require a large number of training samples to build accurate reliable calibration equations. It takes time collecting the samples and measuring the references of targeted properties by the established or standard methods. In general, quantitative models will predict reasonable values for the calibrated constitutes, provided the spectra of the unknown samples are fairly similar to the training set. In some scenarios, the constituents’ information is not easy to determine or the samples are hard to collect. Nevertheless, the spectrum of a sample is unique to its compositions, and samples of similar compositions should have spectra that are very similar as well. Therefore, it may be possible to show a difference between two samples by only comparing the spectra with such simple methods as visual inspection or spectral subtraction. These subjective methods cannot be applied for large spectral sets. Consequently, principal component analysis (PCA)-based discriminant protocol, a process of classifying the samples on the basis of their spectral characteristics and logical assignment, is a preferred technique.¹³ It becomes clear that PLS and PCA are different approaches and reflect the complementary information.

PCA is a very effective variable reduction technique for spectroscopic data. It decomposes the training (or calibration) set spectra into mathematical spectra (called loading vectors, factors, principal components, etc.) that represent the most common variations to all data. When the scores (or scaling coefficients) are multiplied by the loading vectors, and the results summed, the original spectra are reconstructed. The ‘restructured’ spectrum can be subtracted from the original spectrum to determine how well the PCA model is performing for the sample. The result of this subtraction is known as the spectral residual that effectively measures the amount of each spectrum left over. By calculating the sum of the squares of the spectral residual across the wavelengths, an additional value can be generated for each spectrum and is the basis of discrimination method known as soft independent modeling of class analogy (SIMCA).

During the discrimination process, the spectrum of the ‘unknown’ sample is compared against multiple models and then the spectral residual values are calculated. Next, the sample (or spectrum) that shows the smallest spectral residual value is classified as a member of the class being modeled.

Development of SIMCA/PCA classification models

The 2010 and 2011 crop year samples were received in 2010 and 2012, respectively, at two USDA Agricultural Research Service (ARS) laboratories that were located in Clemson (SC) and New Orleans (LA). In each shipment, the samples having the same leaf grade category were randomly packed into a large bag. At the laboratory, the samples within each leaf grade class were assigned to different identification numbers indiscriminately. On the increasing order of sample identification numbers, every third sample was used for model validation while remaining samples were used for SIMCA/PCA classification model development. This led to a set consisting of 442 calibration and 208 validation samples (Table 1).

Table 1.

Sample distribution and assignment in calibration and validation set^a

Leaf grade	1	2	3	4	5	6	7	Total
Year 2010	50	50	50	50	50	50	50	350
Year 2011	50	50	50	50	50	50	0	300
Calibration no.	68	68	68	68	68	68	34	442
Validation no.	32	32	32	32	32	32	16	208

Leaf grade 8 and partial leaf grade 7 samples were not available.

All spectra were imported into PLSplus/IQ package in Grams/AI (Version 7.01, Thermo Fisher Scientific, Waltham, MA), and discriminant models were subsequently developed. Classification models were established using seven classes (representing leaf grade 1 through 7) with mean centering (MC) and the first (1^st) derivative spectral pretreatment in various spectral regions.¹⁴ For each of the seven classes in different models, the optimal number of factors, suggested by the software package to be between 9 and 13 mostly, was utilized. With applying the SIMCA/PCA models to validation (and calibration) samples and employing the class assignment rule of the least spectral residual values, the sample was identified as belonging in the class being modeled, i.e. one of seven groups ranging from leaf grade 1 to 7 categories. During the SIMCA/PCA process, leaving one-out cross-validation method was used.

Results and discussion

Visible/NIR reflectance spectra of cotton fibers with various leaf grade

Figure 1 shows the average log (1/R) spectra in the spectral region of 400–2500 nm of cotton fibers with instrumental leaf grade 1, 3, 5, and 7, respectively. There are at least five intense and broad bands with one (<700 nm) in the visible region (400–750 nm) and four (1490, 1935, 2105, and 2340 nm) in the NIR region (750–2500 nm). In this study, samples were from commercial cotton bales, thus the interferences from non-lint or foreign contaminants (such as botanical trash) could not be neglected. In general, the visible region of 400–750 nm contains the color information and represents a mixture of contributions not only from the natural pigmentation compounds present in cotton fibers, for example, flavonoids and/or degraded products between a reducing sugar and an amino acid,^15,16 and also from the pigmentation species of cotton plant parts (i.e. non-lint components), such as the chlorophylls and its degradation derivatives in leaves and bracts.¹⁷ The NIR bands are mainly due to the (first (1^st) and second (2^nd)) overtones and combinations of OH and CH stretching vibrations of both cotton cellulose and trash cellulosic-related compounds, in which the cotton cellulose contributes more to total mass (in turn the spectral intensity) than trash portion. The broad absorptions between 1150 nm and 1300 nm are from the 2^nd overtones of CH stretching modes and their 1^st overtones appear in the 1675–1860 nm region.¹⁴ Features in the 1300–1400 nm region are ascribed to combination bands of the CH vibrations. Broad and intense bands in the 1400–1675 nm region are due to the overlap of the 1st overtones of the OH stretching modes in hydrogen bonded forms. The strong bands at 1935 and 2105 are most likely attributed to the combination of OH stretching and deformation mode and the combination of OH and CO stretching vibrations in cellulose, respectively.¹⁴

Figure 1.

Representative visible/NIR log(1/R) spectra of cotton fibers with four leaf grade reading in the 400–2500 nm region.

Although Figure 1 suggests that cotton fibers with low leaf grade have the visible/NIR bands in common with ones having high leaf grade, there appear to be some intensity variations induced by the leaf grade elevating. For example, the spectra of those with high leaf grade show great log (1/R) intensity increase in the spectral region of visible/short-wavelength (SW) NIR region (<1100 nm) and also relatively weak intensity reduction in the range from 1100 to 2500 nm. This is reasonable, mainly due to large variation and complexity in color and compositions between low and high leaf grade cottons, because higher leaf grade samples should have the characteristics of larger ${HVI}_{area}^{TM}$ and (or) ${HVI}_{count}^{TM}$ trash readings. It remains unclear how the fiber nature color may affect the results presented here; apparently, further study is necessary and will be reported when available.

Comparison of non-trash lint and trash component in the 400–2500 nm region is given in Figure 2. Both non-trash lint and trash samples were acquired during the SA cleaning process, then were cut into powders before collecting the respective spectra. The reason for the cutting was to make the closely contact between optical window and such trash portions as plant stems and hulls. There exist expected spectral differences between trash and non-trash lint, owing to clear difference in chemical compositions between the trash, a mixture of main contributions from plant parts, and cotton fibers, a majority (>94%) of cellulose species. However, it is very unlikely to interpret the origins of broad and overlapped trash bands from the molecular vibration mode level, since trash is a mixture of such plant parts as leaves, seed coats, hulls, and stems, and also the distribution of individual plant part in trash may change from one sample to another. It is of great interest to understand the spectral feature of individual or pure plant parts and further for their identification, and this concept was reported by Fortier et al. previously.¹⁸

Figure 2.

Typical visible/NIR log(1/R) spectra of non-trash lint and trash components, which were retained from SA cleaning operation and cut into powders for comparison.

Classification models

The classification models were developed using different combinations of full/narrow spectral regions and spectral pre-treatment of MC + 1^st derivative. In addition to the entire 405–2495 nm region, the log(1/R) spectra were analyzed subjectively in six narrow regions: 405–1095 nm, 1105–2495 nm, 1105–1700 nm, 900–1700 nm, 725–1700 nm, and 405–1700 nm. The reasons for choosing narrow spectral regions were (1) to compare the model performances from the characteristic bands in different spectral ranges, and (2) to facilitate the development of portable optical and spectral imaging sensors in either the visible or NIR region.^19,20 For each model, the classification efficiency was assessed by the sample numbers corrected classified, and the statistics in calibration and validation sets from seven spectral regions are compared in Table 2.

Table 2.

Seven-class classification of instrumental leaf grade 1 through 7 in cotton fibers from visible/NIR reflectance^a

Spectral region (nm)	405–2495	405–1095	1105–2495	1105–1700	900–1700	725–1700	405–1700
No. correct classified (valid)	170	161	175	187	178	177	165
% correct classified (valid)	81.7	77.4	84.1	89.9	86.6	85.1	79.3
No. correct classified (calib)	435	421	432	442	441	433	434
% correct classified (calib)	98.4	95.2	97.7	100	99.8	98.0	98.2
% average^b	90.0	86.3	90.9	95.0	93.2	91.6	88.8

Total number of calibration (calib) and validation (valid) samples were 442 and 208.

Mean of % correct classification in calibration and validation set.

Of the 208 and 442 samples in validation and calibration sets, 170 and 435 samples were correctly classified into respective classes by the model from the 405–2495 nm visible/NIR region, with a 90.0% of overall classification. The use of discriminant model from the 1105–2495 nm NIR region slightly improved the accuracy of correct identification from 90.0% to 90.9%, but the model from the 405–1095 nm visible/SW-NIR region compromised the identification capability to 86.3%.

To further examine the effect of spectral regions on classification modeling power, additional four spectral ranges were considered. It is encouraging to observe a more enhanced separation of around 95.0% by the model from the 1105–1700 nm region, with a 89.9% correct identification in validation set and a 100% determination in calibration set (shown in Bold). Meanwhile, inclusion of more bands in the models of 1105–1700 nm, 900–1700 nm, 725–1700 nm, and 405–1700 nm decreased the likelihood for positive determination from 95.0% to 88.8% gradually.

Unexpectedly, the models that involved the extension to 405 nm visible region generated poorer separations than those created from the regions higher than 725 nm. For instance, three models extending to 405 nm region yielded a classification rate of 86.3% to 90.0%, while the ones excluding the 405 nm region produced an identification success ranging from 90.9% to 95.0%.

Despite more apparent intensity variations in the 400-1100 nm region than in the 1100-2500 nm region (Figure 1), the optimal model given in Table 2 utilized the portion bands from the latter range. A probable rationale may be due to the increase in non-leaf trashes among high leaf grade cottons.

Spectral intensity variations and classification efficiency

Analysis of correct classification in individual leaf grade validation sets of leaf grades 1–7 from the optimal 1105–1700 nm model indicated that there were 3, 5, 0, 2, 5, 3, and 3 samples that were misclassified (Table 3), respectively. In general, total numbers of samples incorrectly determined in validation set increased along with those in calibration set (Table 2). These two factors may suggest that the classification error likely was from the degree of diversity and representation of both calibration and validation samples within each leaf grade category. To this regard, typical plots of average log(1/R) spectra ± one standard deviation (SD) of four leaf grade classes are depicted in Figure 3, and also the average of SD values for all samples in this region are inserted in Table 3.

Figure 3.

Plots of average log(1/R) spectra ± one standard deviation (SD) of four leaf grade samples.

Table 3.

Seven-class classification of instrumental leaf grade 1 through 7 fibers in validation set from the 1105-1700 nm NIR region

Leaf grade	1	2	3	4	5	6	7	Total
No. total	32	32	32	32	32	32	16	208
No. correct classified	29	27	32	30	27	29	13	187
% correct classified	90.6	84.4	100	93.4	84.4	90.6	81.2	89.9
Average SD	0.0063	0.0078	0.0082	0.0084	0.0074	0.0100	0.0077

It can be seen from Table 3 that samples in the leaf grade 1 pool showed the lowest mean SD value, which is reasonable as they contain little trash amount. Overall, the correct classification would be better than 90% for leaf grade 2 through 7 category when the SD values are larger than 0.0080. This might address the importance of diversity and variation of samples within individual leaf grade category during the model development.

As another approach to interpret the spectral intensity variation, all 442 spectra in calibration set were subjected to PCA characterization in the 1105–1700 nm region with full cross validation method. The first three principal components (PCs) accounted for 93.2% of the total variation, with the first PC (PC1), the second PC (PC2), and the third PC (PC3) explaining 55.6%, 21.2%, and 16.4% of the spectral variation, respectively.

The plot of PC1 vs. PC3 scores provided the better visualization of separation among seven leaf grade groups than such a combination as PC1 vs. PC2, and a representative version of four selected leaf grade samples is shown in Figure 4. It reveals that the fiber clusters could not be separated clearly by different leaf grade categories with the use of a sole PC score. This is anticipated, because samples having identical leaf grade reading could result from two extreme cases, either with high ${HVI}_{count}^{TM}$ and low ${HVI}_{area}^{TM}$ or with low ${HVI}_{count}^{TM}$ but high ${HVI}_{area}^{TM}$ . Variations in ${HVI}_{count}^{TM}$ and ${HVI}_{area}^{TM}$ could cause the relative intensity fluctuations in NIR region, as NIR bands were correlated very well with the increase of ${HVI}_{count}^{TM}$ or low ${HVI}_{area}^{TM}$ .¹⁰

Figure 4.

PC1 vs. PC 3 score–score plot of four leaf grade samples from the 1105–1700 nm NIR region: leaf grade 1 (▪), leaf grade 3 (▴), leaf grade 5 (•), and leaf grade 7 (♦).

However, careful analysis indicated that some samples did vary by leaf grade level. For example, among leaf grade 1 samples, there were 62 and 52 samples that had respective negative PC1 (PC1 < 0) and PC3 (PC3 < 0) values, while within leaf grade 7 samples, there were 26 and 33 samples that had positive PC1 (PC1 > 0) and PC3 (PC3 > 0) values (Table 4). Meanwhile, the number or percentage of samples with both negative PC1 and PC3 scores decreased, and of those with positive values on PC1 and PC3 increased as leaf grade increased from 1 to 7. The observation suggested that generally, with increasing leaf grade level, the samples had a tendency to be positive on PC1 and PC3, indicating a subtle relationship between the characteristics of samples among leaf grade categories and their respective determination by NIR technique.

Table 4.

Relationship between principal component (PC) scores and selected leaf grade categories^a

PC score	Leaf grade
PC score	1	3	5	7
PC1 < 0	62	48	23	8
PC1 > 0	6	20	45	26
PC3 < 0	52	36	24	1
PC3 > 0	16	32	44	33
PC1 < 0 and PC3 < 0	47	22	3	0
% in each grade	69%	32%	4%	0%
PC1 > 0 and PC3 > 0	1	7	25	25
% in each grade	1%	10%	37%	74%

68 samples for leaf grade 1, 3, and 5 class; 34 samples for leaf grade 7 group.

Conclusions

The results of the present study demonstrate the usefulness and effectiveness of visible/NIR spectroscopy in determining leaf grade of commercial cotton fibers. In order to optimize the classification efficiency, seven-class SIMCA/PCA models were developed in different spectral regions and then compared. Results revealed that the discrimination model from the 1105–1700 nm NIR region could distinguish one class of leaf grade fibers from other six groups at a satisfactory level of ∼95.0%. Meanwhile, factors impacting the correct identification efficiency were discussed, by characterizing the SDs within individual leaf grade category and also performing PCA on all calibration samples. The finding is most promising in the development of optical spectral sensing system for in-situ measurement of instrumental leaf grade at ginning facilities.

Notably, the current results are solely based on US cotton samples. With a wider range of cottons from other origins, the percentage of correct classification will most likely be impacted, as the cotton as well as the trash show different characteristics in different regions (or countries). A future perspective may be necessary to broaden the study.

Footnotes

Acknowledgements

Part of the work was done at ARS Clemson facility (officially closed in November 2011). We sincerely thank Mr. James Knowlton (USDA, AMS, Memphis, TN) for meeting our request of providing the diversified cotton samples and also Ms. Mattie Morris (retired from ARS) for technical assistance.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

References

Wakelyn

Edwards

Bertoniere

. Cotton fiber chemistry and technology, Boca Raton, FL: CRC Press, 2007.

Funk

Armijo

Hanson

. Converting gin and dairy wastes to methane. Trans ASAE 2005; 48: 1197–1201.

Anthony

The harvesting and ginning of cotton. In: Gordon

Hsieh

Y-L

(eds). Cotton: Science and technology, Cambridge, UK: Woodhead Publishing Limited, 2007.

American Society for Testing and Materials (ASTM). Standard test method for non-lint content of cotton (Designation: D2812-05). West Conshohocken, PA: ASTM International, 2012.

American Society for Testing and Materials (ASTM). Standard test method for measurement of physical properties of cotton fibers by high volume instruments (Designation: D5867-05). West Conshohocken, PA: ASTM International, 2012.

American Society for Testing and Materials (ASTM). Standard test method for neps in cotton fibers (Designation: D5866-05). West Conshohocken, PA: ASTM International, 2012.

Boykin

Armijo

Whitelock

. Fractionation of foreign matter in ginned lint before and after lint cleaning. Trans ASABE 2009; 52: 419–426.

Matusiak

Walawska

. Important aspect of cotton color measurement. Fibres Text East Eur 2010; 18: 17–23.

USDA, AMS. Cotton Classification. http://www.ams.usda.gov/AMSv1.0/getfile?dDocName=stelprdc5074569. (2005, accessed February 19, 2013).

10.

USDA, AMSRevision of cotton classification procedures for determining cotton leaf grade. Fed Regist 2012; 77: 20503–20505.

11.

Thomasson

Shearer

. Correlation of NIR data with cotton quality characteristics. Trans ASAE 1995; 38: 1005–1010.

12.

Liu Y, Gamble G and Thibodeaux D. Evaluation of three cotton trash measurements methods by visible/near-infrared reflectance spectroscopy. ASABE Paper No. 1008708, St. Joseph, MI, 2010.

13.

Sanguansat

. Principal component analysis – engineering applications, Rijeka, Croatia: Intech, 2012.

14.

Burns

Ciurczak

. Handbook of near-infrared analysis, Marcel Dekker: New York, 2001.

15.

Hua

Wang

Yuan

. Characterization of pigmentation and cellulose synthesis in colored cotton fibers. Crop Sci 2007; 47: 1540–1546.

16.

Gamble

. Method for the prediction of the rate of +b color change in upland cotton (Gossypium hirsutum L.) as a function of storage temperatures. J Cotton Sci 2008; 12: 171–177.

17.

Croce

Cinque

Holzwarth

. The soret absorption properties of carotenoids and chlorophylls in antenna complexes of higher plants. Photosynth Res 2000; 64: 221–231.

18.

Fortier

Rodgers

Cintron

. Identification of cotton and cotton trash components by Fourier transform near-infrared spectroscopy. Text Res J 2011; 81: 230–238.

19.

Jia

Ding

. Detection of foreign materials in cotton using a multi-wavelength imaging method. Meas Sci Technol 2005; 16: 1355–1362.

20.

Sui

Thomasson

. Multispectral sensor for in-situ cotton fiber quality measurement. Trans ASABE 2008; 51: 2201–2208.