Abstract
Cotton extraneous matter (EM) and special conditions are the only cotton quality attributes still determined manually by US Department of Agriculture Agricultural Marketing Service (USDA-AMS) classers. To develop a machine EM classing system, a better understanding of what triggers a classer EM call is needed. The goal of this work was to develop new information about cotton EM, such as bark and grass, and leaf particles, using machine measurements, to aid in the development of instrumentation for cotton quality measurements. AMS classers were tasked in identifying and denoting bark/grass in large-area color images of cotton samples. Image segmentation analysis was applied to detect non-cotton items, such as leaf particles, and the classer denoted bark/grass objects were segmented manually. Further image analysis was used to measure shape and color parameters of these bark/grass objects and leaf particles in the sample images. These measurements of the bark/grass objects and leaf particles were compared and logistical regression analyses conducted to evaluate classification. For every shape and color parameter, there were significant differences between the bark/grass objects and the detected leaf particles in the images. The differences were greater for the shape parameters than for the color parameters. A classification model with shape, color, and log-transformed shape parameters consistently classified the bark/grass objects and leaf particles most accurately with 99.5% and 97.6% correct classification rate, respectively. However, classification models that were 99% correct classifying manually segmented bark/grass were only about 77% correct when applied to the machine detected bark/grass particles.
Almost all of the cotton grown in the USA is classed by the US Department of Agriculture Agricultural Marketing Service (USDA-AMS) using standardized procedures for measuring physical attributes that are related to the cotton’s quality and manufacturing performance. 1 Cotton quality attributes currently measured as part of AMS classification are fiber length, length uniformity, fiber strength, micronaire, color, trash, leaf, and extraneous matter (EM). Cotton classification relies heavily on instrumentation for cotton quality measurements. In fact, all attributes are determined using the high volume instrument (HVI, Uster Technologies, Knoxville, TN) with the exception of EM and special conditions (e.g. mixture of Upland and Pima, fire or water damaged, and reginned or repacked), which are determined manually by a human classer. These few quality attributes that are still determined manually suffer from the limitations of the human classer. Manual classing is subjective and can be influenced by fatigue and differences in skill level, visual acuity, and lighting. 2
Cotton foreign matter refers to non-lint materials. Classing designations for non-lint content are leaf grade, trash, and EM. 1 EM is any substance in cotton other than the fiber or leaf and includes various types of materials and conditions such as bark, grass, seed coat fragments, dust, oil, and spindle twist. EM, when present, is noted by the classer along with the level as a coded number. For example, light and heavy bark are coded 11 and 12, respectively, and light grass is coded 21.
About 10% of the 2014 US Upland cotton crop of 15.3 million bales had EM calls. 3 Bark and grass EM calls are common, with light bark accounting for 97% of all the Upland cotton EM calls from the 2014 crop. According to Bragg et al. 4 and Bargeron et al., 5 bark and grass significantly reduce spinning efficiency by causing yarn breaks. Thus, the presence of bark and grass result not only in a price discount to the producer, ranging from −6.5 to −15.9 US cents per kg depending on level and location in the USA, 6 but also in an additional cost to the textile mill.
Current instruments to determine the physical properties of cotton fall under two categories: gravimetric and imaging. Gravimetric foreign matter instruments currently used include the Shirley Analyzer 7 and the Micro Dust and Trash Analyzer III (Uster Technologies, Knoxville, TN). These devices employ mechanical and pneumatic principles to separate foreign matter from a known mass of cotton lint for subsequent weighing and calculation of percentage of original sample mass. The HVI currently used by the AMS Classing Offices and the Advanced Fiber Information System (AFIS, Uster Technologies, Knoxville, TN) utilize optical sensors to determine fiber properties, including trash content. None of these methods differentiate EM.
Several methods have been investigated for classifying cotton foreign matter. Xu et al.8,9 developed a system that utilized a charge-coupled device (CCD) camera and clustering analyses to categorize cotton foreign matter based on chromatic and geometric features. They concluded that color attributes were more reliable than geometric attributes in categorizing foreign matter. A neural network-based Cotton Trash Identification System, developed by Siddaiah et al.,10–12 identified and categorized cotton foreign matter, including bark and grass, using a camera and a scanner, and small (58 cm2) and large (181 cm2) area images. The system categorized a much higher number of objects as bark/grass, mainly due to misclassification of buried objects and very small objects (normally ignored by the classer). Himmelsbach et al. 13 utilized attenuated total reflectance/Fourier transform infrared measurements to identify cotton foreign matter based on chemical composition. The spectral method consistently identified the foreign matter, regardless of the particle size.
The US cotton classification system has been in place for almost 100 years, and for most of that time, has relied on human senses to classify cotton samples. 1 Current cotton grading, which is almost entirely determined by HVI measurements, is based on methods and standards that were developed from the historical manual classing system. Thus, new instrument measurements of cotton attributes like EM must be representative of the current standard, the manual classer. The goal of this work was to develop new information about cotton EM, such as bark and grass, and leaf particles, using machine measurements, to aid in the development of instrumentation for cotton quality measurements. The objectives of this work were to evaluate color and shape characteristics of bark and grass objects identified manually and leaf material detected by imaging techniques in cotton samples using imaging analysis and to assess the importance of these characteristics in differentiating between bark and grass and leaf material.
Methods
In an effort to categorize bark/grass EM, cotton lint samples with varying levels of bark/grass were selected from AMS Checklot samples from the 2007/2008 crop year. Checklot samples are classing samples randomly selected from AMS cotton classification facilities across the USA and retested at the AMS Quality Assurance Division in Memphis, TN, for quality assurance assessment.
1
Each cotton lint sample was split in two, lengthwise, and red, green, blue (RGB) color images were acquired from each of the four sample faces at 15.7 pixels (px) per mm (400 px/in.) resolution with an EPSON Perfection 3170 photo scanner (Epson America, Inc., Long Beach, CA) and saved in uncompressed tagged image file format (TIF). The scanner imaging window was fitted with a template that provided a 10.2 cm × 17.8 cm (4 in. × 7 in.) cotton image for analysis (Figure 1). Two AMS classers with the Standardization & Engineering Branch, Memphis, TN viewed each image on a PC monitor and by consensus denoted objects on the printed sample image that were either bark or grass (Figure 1), but did not differentiate between the two. The classers then assigned a call to the sample image for bark/grass: either “No EM” for no bark/grass, 11 or 12 for light or heavy bark, respectively, or 21 or 22 for light or heavy grass, respectively.
Scanned image of cotton sample face (a) and image with bark/grass objects denoted by human classers (b).
Each of the scanned images of the cotton samples were then analyzed using ImageJ image processing and analysis software (v. 1.49j; National Institutes of Health, US Department of Health and Human Services, Bethesda, MD).
14
The RGB color images were first copied, and the copy was then transformed to the L*a*b* color space using the ImageJ plugin Color Transformer 2 (v. 2.0; Maria E. Barilla-Perez, Birmingham University, UK).
15
The L*a*b* color space was designed for defining color differences and to model human visual perception.16,17 To describe an object’s color in the L*a*b* color space, the value of L* indicates the level of brightness (low values [0–50] indicate dark and high values [51–100] indicate light), the value of a* indicates the redness (positive value) or greenness (negative value), and the value of b* indicates the yellowness (positive value) or blueness (negative value). The transformation resulted in individual images for each L*, a*, and b* channel. A preliminary investigation showed that satisfactory item segmentation could be achieved by thresholding only the L* image. Thus, each L* image was then thresholded to segment the image into non-cotton particles and background using the maximum entropy automatic thresholding method in ImageJ with automatically set upper and lower threshold levels. These detected particles were also numbered and labeled automatically by the ImageJ software (Figure 2). To reduce the number of particles in the analysis and computational time, particles less than 12 px
2
(0.048 mm2) were ignored. These particles were smaller than criteria that define cotton trash particles as having an equivalent diameter greater than about 0.25–0.5 mm.18,19
L* image with detected and labeled particles.
In each of the analyzed images, detected particles that belonged to each bark/grass object denoted by the classers were identified (Figure 3). Also, the bark/grass objects denoted by the classers were outlined or segmented manually using a graphics pen and tablet (Intuos CTH680, Wacom Technology Corp., Vancouver, WA) with the ImageJ software and added to the list of particles in each image (Figure 3). These manually segmented objects were considered as the best representation of the bark/grass objects that the classers observed in the samples. After these operations, each image included three types of items: manually segmented bark/grass objects, detected particles belonging to denoted bark/grass objects, and detected leaf particles (all other particles not associated with the denoted bark/grass).
Magnified image of a classer identified bark/grass object (a), detected particles belonging to the bark/grass object (b), and the manually segmented bark/grass object (c).
Using ImageJ, characteristics of all items (manually segmented bark/grass objects, detected bark/grass particles, and detected leaf particles) in the images were measured. These characteristics included the following.
Shape parameters (measurements in pixels [1 px length = 0.0635 mm; 1 px2 = 0.004 mm2]):
area – area of items; perimeter – length of the outside boundary of the items; height, width, and BoxAR – height, width, and aspect ratio (height/width) of smallest bounding rectangle with sides parallel to the image axes enclosing the item (Figure 4); major, minor, and EllipseAR – primary and secondary axes and aspect ratio (primary/secondary) of the ellipse that best fits (same area, orientation and centroid) the items (Figure 4); MaxFeret, MinFeret, and FeretAR – the maximum and minimum distance between two parallel lines enclosing the item (Feret diameter or caliper diameter) and aspect ratio (MaxFeret/MinFeret) (Figure 4); circularity – 4π × area/perimeter
2
, a value approaching “0” indicates an elongated item; roundness – 4 × area/(π × major
2
).
Illustration of shape parameters – bounding rectangle height and width, best fit ellipse major and minor axes, and maximum and minimum Feret diameter.

Color parameters (measurements made for each channel in the RGB and L*a*b* color spaces):
mean and median of the color values of all the pixels in the item; integrated density (IntDen) – sum of the color values of all the pixels in the item.
Due to the skewness of the shape parameter distributions, shape parameter data were also log-transformed and included in the subsequent statistical analyses.
Statistical analyses were conducted using JMP statistical software (v. 11.2.1, SAS Institute, Inc., Cary, NC) to explore differences in measured parameters between the bark/grass objects and leaf particles in 200 of the analyzed images from 50 of the checklot bale samples. Response screening was conducted to identify significant shape and/or color parameters.
To investigate how the measured shape and color parameters could be used to classify items in the images as bark/grass objects or leaf particles, classification models were constructed. Firstly, training and validation datasets were formed using the images utilized in the previous statistical analyses. From the entire set of 200 images, 615 out of 769 (80%) bark/grass objects and 615 leaf particles were randomly sampled for training data. The remaining bark/grass objects (154) and the remaining leaf particles (72,532 out of 73,147) from the 200 images were set aside for validation.
Using the training dataset, nominal logistic regression analyses were performed to construct the models in the JMP Fit Model platform. The logistic function estimates the probability of a response based on a set of independent factors. In this analysis, it described the probability that an object was bark/grass. The logistic function was defined by
For P(y) > 0.5, the object was bark/grass. The stepwise model selection method with three effects selection criteria (minimum Bayesian Information Criterion [BIC], minimum Corrected Akaike Information Criterion [AICc], and P-value) incorporating forward, backward, and mixed selection directions was used to develop candidate models. The values of alpha used for parameters entering and leaving were 0.05 and 0.01, respectively. The best fitting model from the stepwise methods was selected based on the number of correctly classified objects (classification rate), AICc, BIC, and R2 (U) (uncertainty coefficient). This model building approach was used to construct best fitting models derived from each type of parameter (shape, log-transformed shape, and color) individually and their combinations for classifying items as bark/grass objects or leaf particles. Six models resulted: (1) shape parameter model; (2) color parameter model; (3) log-transformed shape parameter (log) model; (4) shape and color parameters (shape|color) model; (5) log-transformed shape and color parameters (log|color) model; and (6) shape, log-transformed shape, and color parameters (shape|log|color) model.
The six nominal logistic models were then applied to the remaining validation dataset and compared based on the number of correctly classified objects. To further evaluate the types of constructed models, the models were then used to classify bark/grass objects and leaf particles in a test dataset formed from 55 additional images from 15 checklot bale samples that was independent of the training and validation datasets.
Results
Measured parameters
Classer identified bark/grass objects, detected particles associated with the bark/grass objects, and detected leaf particles in 200 sample images with assigned bark/grass and No extraneous matter (EM) calls.
Response screening in JMP showed that there were differences in the measured shape and color parameters between classer bark/grass objects and detected leaf particles. Figure 5 shows the transformed p-value, LogWorth (–log10[p-value]), plotted against the effect size (extent response values differ between bark/grass objects and leaf particles) for the shape and color parameters. LogWorth gives a clearer representation of significance level when p-values are small. A LogWorth value greater than 2 corresponds to a p-value less than 0.01 significance level. The significance level for the difference between bark/grass and leaf was less than 0.01 or LogWorth > 2.0 for all measured parameters. Also, it is apparent in Figure 5 that the shape parameters had greater LogWorth values (greater significance) than nearly all color parameters. Only the color parameters that were related to object or particle area (–IntDen) had significance levels on the same magnitude as the shape parameters. Shape parameters would likely play a more significant role in classifying objects as bark/grass than color parameters.
Significance (LogWorth = –log10[p-value]) of difference in measured shape and color parameters between bark/grass objects and leaf particles. LogWorth > 2 corresponds to p-value < 0.01.
As seen in Table 2, shape parameters that described overall size (MaxFeret to Minor) were more significantly different between bark/grass objects and leaf particles (greater LogWorth) than those that described elongation or roundness (FeretAR to BoxAR). MaxFeret was the most significant shape parameter (LogWorth = 10,351) with bark/grass object mean (189 px) more than 10 times the mean for leaf particles (15.1 px). Histograms of MaxFeret illustrate the prominent difference between the bark/grass objects and leaf particles (Figure 6). Less than 1% of the bark/grass objects had MaxFeret less than 40 px, while MaxFeret was less than 40 px for more than 95% of the leaf particles.
Distributions of MaxFeret for bark/grass objects ( Mean shape parameter values and analysis of variance observed significance probabilities of equal means for manually segmented bark/grass objects and detected leaf particles. Bark/grass and leaf means were all significantly different at α = 0.01. LogWorth = –log10 (p-value) ≥ 2.0 corresponds to p-value ≤ 0.01.
) and leaf particles (
).
Integrated color density parameters (sum of the color values of all the pixels in an item, Green-IntDen to a-IntDen) that reflect item size were more significant (LogWorth > 500) than the raw color measures (Red-mean to Blue-median, LogWorth ≤ 110) (Table 3). The difference in mean Green-IntDen values between bark/grass objects (579,993) and leaf particles (16,475) was the most significant (LogWorth = 5859) among color parameters. Red-mean was the most significant raw color measure (LogWorth = 110) with average values for bark/grass and leaf equal to 176 and 164, respectively. Due to the integration of item size, the histograms for Green-IntDen were similar to those of MaxFeret (Figure 7). Green-IntDen was greater than 35,000 for almost 98% of bark/grass objects and less than 35,000 for more than 90% of leaf particles. On the other hand, the distributions of Red-mean values for bark/grass objects and leaf particles did not show obvious differences and overlapped considerably (Figure 8).
Distributions of Green-IntDen for bark/grass objects ( Distributions of Red-mean for bark/grass objects ( Mean color parameter values and analysis of variance observed significance probabilities of equal means for manually segmented bark/grass objects and detected leaf particles. Bark/grass and leaf means were all significantly different at α = 0.01. LogWorth = −log10(p-value) ≥ 2.0 corresponds to p-value ≤ 0.01.
) and leaf particles (
).
) and leaf particles (
).
Log-transformation reduced the overall spread of the shape parameter data without compromising the differences between the bark/grass objects and leaf particles. This is illustrated in Figure 9, which shows the distributions for the log-transformed MaxFeret.
Distributions of log-transformed MaxFeret for bark/grass objects (
) and leaf particles (
).
Model building
Parameters and coefficients for Equation (2) included in constructed models for classification of bark/grass objects and leaf particles.
Classification rates and model fit statistics from models constructed using shape, log-transformed shape (log), and color parameters for manually segmented bark/grass objects and detected leaf particles in the training dataset.
Total count: bark/grass objects = 615, leaf particles = 615, and combined = 1230.
AICc = Corrected Akaike Information Criterion, BIC = Bayesian Information Criterion, R2 (U) = uncertainty coefficient.
Model validation
Classification rates from models constructed using shape, log-transformed shape (log), and color parameters for manually segmented bark/grass objects and detected leaf particles in the validation dataset.
Total count: bark/grass objects = 154, leaf particles = 72,532, and combined = 72,686.
Calculated as the average of the bark/grass object and the leaf particle percentages.
Model testing
Classification rates from models constructed using shape, log-transformed shape (log), and color parameters for manually segmented bark/grass objects and detected leaf particles in the test dataset.
Total count: bark/grass objects = 200, leaf particles = 11,024, and combined = 11,224.
Average total count per image: bark/grass objects = 5.13 (for images with bark/grass indicated by classer), leaf particles = 200.44.
Calculated as the average of the bark/grass object and the leaf particle percentages.
Analysis of only the images that had bark/grass objects indicated by the classers in the test dataset showed that the models derived from log, shape|color, and shape|log|color parameters had the highest average bark/grass classification rate per image at 5.10 (Table 7). The average number of correctly classified leaf particles per image was greater than 195 for all of the models, except for the color parameter model, which was 193 per image.
These results show that there are clear differences between bark/grass objects that the human classer sees in a cotton sample and leaf particles, and those differences can be detected with imaging analysis. Also, the classification model that included shape, log-transformed shape, and color parameters (shape|log|color parameter model) consistently had the highest classification rates of bark/grass objects and leaf particles in the training, validation, and test datasets.
Conclusions
Bark and grass objects were identified in and denoted on images of cotton samples by AMS classers. The shape and color characteristics of these bark/grass objects were then compared to the leaf particles detected in the images using image analysis techniques. There were significant differences in all the characteristics between the classer identified bark/grass objects and the detected leaf particles. These differences were greater for parameters describing shape than for color parameters. A nominal logistic classification model with shape, log-transformed shape, and color parameters (best fit ellipse minor axis; minimum Feret diameter; log-transformed circularity; log-transformed best fit ellipse minor axis; log-transformed perimeter; and Mean-red color value) consistently classified the bark/grass objects and leaf particles most accurately with 99.5% and 97.6% correct classification rate, respectively, for the test dataset.
Future research
Classification rates from models constructed using shape, log-transformed shape (log), and color parameters for detected particles associated with classer identified bark/grass objects in the test dataset.
Total bark/grass particle count = 634.
Disclaimer
Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the USDA. USDA is an equal opportunity provider and employer.
Footnotes
Acknowledgements
The authors would like to thank the technical staff and classers of the USDA-AMS, Cotton and Tobacco Programs, Standardization & Engineering Branch for their collaboration on this project.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
