Machine Learning Approaches for the Fusion of Near-Infrared,Mid-Infrared,and Raman Data to Identify Cartilage Degradation in Human Osteochondral Plugs

Abstract

Vibrational spectroscopy methods such as mid-infrared (MIR), near-infrared (NIR), and Raman spectroscopies have been shown to have great potential for in vivo biomedical applications, such as arthroscopic evaluation of joint injuries and degeneration. Considering that these techniques provide complementary chemical information, in this study, we hypothesized that combining the MIR, NIR, and Raman data from human osteochondral samples can improve the detection of cartilage degradation. This study evaluated 272 osteochondral samples from 18 human knee joins, comprising both healthy and damaged tissue according to the reference Osteoarthritis Research Society International grading system. We established the one-block and multi-block classification models using partial least squares discriminant analysis (PLSDA), random forest, and support vector machine (SVM) algorithms. Feature modeling by principal component analysis was tested for the SVM (PCA-SVM) models. The best one-block models were built using MIR and Raman data, discriminating healthy cartilage from damaged with an accuracy of 77.5% for MIR and 77.8% for Raman using the PCA-SVM algorithm, whereas the NIR data did not perform as well achieving only 68.5% accuracy for the best model using PCA-SVM. The multi-block approach allowed an improvement with an accuracy of 81.4% for the best model by PCA-SVM. Fusing three blocks using MIR, NIR, and Raman by multi-block PLSDA significantly improved the performance of the single-block models to 79.1% correct classification. The significance was proven by statistical testing using analysis of variance. Thus, the study suggests the potential and the complementary value of the fusion of different spectroscopic techniques and provides valuable data analysis tools for the diagnostics of cartilage health.

Graphical Abstract

This is a visual representation of the abstract.

Keywords

Osteoarthritis OA human cartilage Raman near-infrared NIR mid-infrared MIR spectroscopy machine learning

Introduction

Today, more than 250 million people are affected by osteoarthritis (OA), of which over 50% are older than 65 years.¹ In Europe, the annual cost of OA ranges between €1330 and €10 452 per person.² OA is leading to the global socio-economic burden due to the increasing population, aging, and obesity, and it is estimated that OA will become the most prevalent disease in developed countries.^1,3

The exact mechanism of the progression of OA is poorly understood, and it is characterized by cartilage and subchondral bone degradation.⁴ Without any intervention and due to the low self-regeneration capabilities of the articular cartilage, an injury can progress to OA. No cure exists for OA. However, post-traumatic OA can be slowed down with surgical interventions, medication, and conventional physiotherapy.⁴

Although modern imaging techniques are effective for verifying the initial suspicion of OA, during arthroscopic joint repair surgeries, surgeons rely on visual evaluation and manual palpation to evaluate the cartilage condition.⁵ Although this method is the gold standard, conventional arthroscopy is highly subjective.^6,7 It has been suggested that the effectiveness of arthroscopic evaluation could be radically improved by complementary spectroscopy-based diagnostic methods that are minimally invasive, nondestructive, sensitive, and objective.^8,9 For example, vibrational spectroscopic techniques such as near-infrared (NIR), mid-infrared (MIR), and Raman spectroscopies have demonstrated potential as diagnostic tools to evaluate cartilage properties.^8,10–14 Moreover, there are multiple attempts to develop multi-modal spectroscopy/imaging systems.^15–17

Vibrational spectroscopic techniques can be combined as a multi-modal approach for providing accurate and objective diagnostic information on connective tissue integrity. The penetration depth of electromagnetic radiation in tissue varies substantially between NIR, MIR, and Raman spectroscopic techniques; thus, these methods could provide depth-sensitive diagnostics. For example, in MIR, the penetration depth is approximately 5–10 µm, compared to 1–5 mm for NIR and Raman spectroscopies.¹⁸ Hence, MIR-based spectroscopic techniques could provide chemical information from superficial articular cartilage. This information is essential to understand the pathogenesis of tissue in early OA.

In contrast to MIR, NIR and Raman spectroscopies can be applied for monitoring OA-related changes in deep cartilage tissue and subchondral bone. Moreover, it is well known that Raman and infrared (IR) spectroscopies have complementary sensitivity to different molecular vibrations through distinct measurement mechanisms: IR spectroscopy is based on dipole moment changes in molecular bonds, while Raman spectroscopy is based on polarizability changes.¹⁹ The critical characteristics of MIR, NIR, and Raman spectroscopic techniques are summarized in Table I. By combining spectral information from these spectroscopic techniques in a multi-modal approach using data fusion, their strengths and limitations can be amplified and minimized, respectively. However, surgeons need to understand what underlies the model's decision. This requires that the data fusion approach allows the building of interpretable models. Such an approach could provide a powerful arthroscopic tool to diagnose connective tissue injuries and related degeneration.¹⁹

Table I.

Characteristics of NIR, MIR, and Raman spectroscopies.

Characteristics	NIR	MIR	Raman
Phenomenon	Absorption	Absorption	Scattering
Interference from water signal	Yes	Yes	No
Acquisition time	<1 s	<1 min	<1 min
Arthroscopic adaptation via a fiber-optic probe	Yes	Yes	Yes
Depth of penetration	1–5 mm	5–10 µm	1–5 mm
Lateral spatial resolution	∼1 µm	∼2.5–20 µm	<1 µm (in the visible range)

All methods in machine learning (ML) can be split into so-called black-box models and the models that allow interpretation. This is one of the crucial differences that should be considered when selecting the modeling approach. Since spectroscopic data are information-rich and provide important chemical and physical information about the sample quality,^20,21 it is very valuable to use a method that would preserve the information, as well as allow model interpretation. Powerful nonlinear methods such as artificial neural networks, as well as very robust linear methods such as support vector machines (SVMs), which can also be turned into nonlinear models by introducing nonlinear kernels, do not allow interpretation of the results at all or only to some extent. Random forest (RF) is a very popular method in the ML community that allows model interpretation, but it is sub-optimal for the analysis of spectroscopic data due to the high collinearity of spectral variables. In contrast, standard chemometric methods such as principal component analysis (PCA) and partial least squares discriminant analysis (PLSDA), developed for the analysis of spectroscopic data, are suitable for the task.

Loading vectors in PCA and regression coefficients of the PLSDA model allow interpretation of the modeling results. Data fusion is specifically addressed by methods such as multi-block PCA (also known as consensus PCA^22,23) and multi-block PLSDA (MB-PLSDA^24,25). The multi-block methods allow interpretation of the results on several levels: through scores and loadings in a “global” space across modalities and in each “local” space of the individual modality. An important aspect that is specific to the chemometric methods is the dimensionality reduction. This removes the necessity of feature modeling when PLSDA-based methods are used. Feature modeling by PCA allows extracting important features that can be interpreted and understood and further merged with the algorithm of choice. However, the weakness of such an approach is the model interpretation that becomes difficult, as well as the validation of the models that should be done with extra care in this case.

We hypothesized that combining the MIR, NIR, and Raman data from human osteochondral samples can improve the models to detect cartilage health. To address this hypothesis, we evaluated different data fusion techniques to analyze human osteochondral samples measured by NIR, MIR, and Raman spectroscopies. PLSDA, RF, and SVM were compared, and feature selection by PCA was tested using the SVM algorithm.

Materials and Methods

Sample Preparation

The specimen collection was approved by the Research Ethics Committee of the Hospital District of Northern Savo, 123/2015 (58/2013), as well as the experiment. The osteochondral samples (n = 272) with 4 mm diameter were extracted from 18 knees of nine human cadavers by using a dental drill (NTI-Kahla Rotary Dental Instruments). The pieces were removed from the central location of the femur, tibia, and patella. After sample extraction, they were stored in phosphate-buffered saline (PBS) at −80 °C until spectroscopic measurement.

Spectroscopic Data Acquisition

Before the spectroscopic measurements, samples were thawed for 30 min at room temperature; subsequently, samples were glued (Cyanoacrylate, Loctite 401, Henkel Corporation) from subchondral bone to a plastic petri dish.

Fourier Transform Infrared Spectroscopy (FT-IR)

The FT-IR spectra were measured by Bruker Alpha HR spectrometer (Bruker Optics GmbH) equipped with a deuterated triglycine sulfate detector and IR source that utilizes Bruker's CenterGlow technology. The system control and the data acquisition were done by using Bruker Opus 8.1 software (Bruker Optics GmbH). The spectrometer is equipped with a Bruker Platinum attenuated total reflection (ATR) sampling cell (Bruker Optics GmbH), which has a single reflection diamond ATR crystal embedded in tungsten carbide. The cell provides an inert environment for the biomaterial, and the ATR crystal contains a regulated pressure clamp that enables the ideal contact with the specimen for reproducible acquisition of spectra. Each spectrum was acquired at a resolution of 2 cm⁻¹, over the range of 4000 cm⁻¹ to 600 cm⁻¹, and with a digital spacing of 1.029 cm⁻¹ and a total of 128 scans. The sampling size for ATR measurements was 0.5 mm² for each specimen. All samples were immersed in PBS to avoid dehydration and measured in triplicates; however, we did not notice any water interference in the MIR spectra, as the ATR surface was in direct contact with the specimen. The ATR crystal was cleaned by isopropanol between each measurement, and a background (reference) spectrum was recorded using the sample-free setup.

Near-Infrared (NIR) Spectroscopy

A NIR spectrometer AvaSpec ULS2048XL (Avantes BW) with a grating of 75 lines/mm, a slit size of 50 lumens (lm), which gives k = 1.0–2.5 lm, a light source Avalight-HAL-S, a resolution of 6.4 nm, and digital spacing equal to 6.39 cm⁻¹ was used to measure spectra in the 8333–4545 cm⁻¹ range. A custom-made stainless-steel fiber-optic arthroscopic probe was connected to the spectrometer during measurements. The probe includes 100 illuminating and seven signal-collecting optical fibers and its tip diameter is 3.25 mm. Each spectrum was measured using 16 ms integration time and 50 accumulations. The NIR spectra were measured in triplicates. Avasoft 8.7.0 software (Avantes BV) was used to acquire all spectra. All samples were immersed in PBS to avoid dehydration. However, as for the MIR spectra, we did not notice any water interference in the NIR spectra, as the NIR probe was in direct contact with the specimen.

Raman Spectroscopy

The DXR2xi Raman confocal microscope (Thermo Fisher Scientific) with a 10× objective and 50 μm confocal pinhole was used. OMNICxi software was used to measure the spectra in the 3400–200 cm⁻¹ range with the resolution 5 cm⁻¹ at full width half-maximum (FWHM), and digital spacing 1.9285 cm⁻¹ (Thermo Fisher Scientific). The wavelength of the laser was 785 nm, and the laser power at the sample was 30 mW. The diameter of data collection for the Raman microscope was 50 μm. The white light image was used to focus the sample surface before acquiring the spectra. Each spectrum was acquired using 0.5 s acquisition time and 120 co-added scans. All samples were immersed in PBS to avoid dehydration and measured in triplicates.

Histopathology

After all spectroscopic measurements, the samples were fixed and decalcified in formalin containing a 5% ethylenediaminetetraacetic acid solution. After decalcification, the samples were embedded in paraffin, cut into 3 μm thick sections, and stained with Safranin O stain. These stained histological slides were examined in random order, according to the Osteoarthritis Research Society International (OARSI) grading system,²⁶ by three independent graders who were blinded to the sample names. The OARSI evaluates the lesion depth of the cartilage surface to assess the severity of OA, the grades range from 0 (healthy) to 6 (severe OA). All the OARSI grades of the three observers were averaged to reduce errors. Table S1 (Supplemental Material) summarizes the OARSI grades for all samples according to the location in the knee, i.e., femur, tibia, and patella (Figure 1). The final grades were combined to form two classes, one with an intact cartilage matrix (OARSI grades 0–2) and another with a damaged cartilage matrix (OARSI grades 2.5–6). In total, there were 148 healthy cartilage samples (OARSI grades 0–2) and 124 damaged cartilage samples (OARSI grades 2.5–6). This means the data set had quite nicely balanced groups for the binary classification modeling.

Figure 1.

Anatomical locations of the extracted osteochondral samples.

Data Analysis

The spectral data of NIR, MIR (FT-IR), and Raman data sets were pre-processed first by using the Savitzky–Golay (SG) method for derivation or smoothing and selection of the spectral region of interest (SROI). The following pre-processing was applied to the data sets: (i) MIR data were pre-processed taking the second derivative by the SG algorithm with window size 101 and third-order polynomial followed by an SROI of 1900–900 cm⁻¹. (ii) NIR data were smoothed using the SG algorithm with windows size 17 and third-order polynomial followed by an SROI of 1900–1050 cm⁻¹. (iii) Raman data were pre-processed by taking the second derivative using the SG algorithm with windows size 69 and third-order polynomial followed by an SROI of 1800–750 cm⁻¹. The difference in window size selection is justified by the spectral resolution and digital spacing and is optimized for each data set. Outlier detection was done using leverage and residual distance in a PCA model. One outlier was removed in the NIR data set and ten in the Raman data set (results not shown).

Binary classification models were built using an individual modality, i.e., one block, and a multi-modal approach, i.e., two blocks and three blocks, to classify the normal cartilage (OARSI grades 0–2) and damaged cartilage (OARSI grades 2.5–6). Since the number of samples in each class (healthy vs. damaged) is quite balanced, no special approach to balancing the data was done. The performance of the models was estimated using the leave-one-joint-out cross-validation (CV) approach. The leave-one-joint-out CV method was selected to ensure that no samples of the same knee were used to evaluate the prediction performance. The following classification algorithms were used.

Partial Least Squares Discriminant Analysis (PLSDA)

PLSDA is a multivariate classification method that regresses matrix Y (response variables) onto matrix X (predictor variables),²⁷ where Y is a matrix of class labels, i.e., 0 (normal cartilage) or 1 (damaged cartilage), while X is a matrix of spectral data. The method is very powerful for highly multivariate data and allows dimensionality reduction, which is the core strength for spectral data with highly collinear variables. However, the method was shown to be powerful for the analysis of sparse data when only a handful of spectral variables are available.^28,29 To build a PLSDA model, the optimal number of components or latent variables (LVs) needs to be estimated. This optimization process was done by leave-one-joint-out CV, and the number of LVs was selected to minimize misclassification error.

Random Forest (RF)

RF is a powerful statistical method that works for classification and regression analysis.³⁰ It is an ensemble method based on decision trees. The power and strength of a forest-based approach lie in its random selection of samples and variables to build an individual tree. For tree building, a subset of a training set called bootstrap is selected by random sampling with replacement. For node building in a tree, a subset of predictors or variables are randomly selected. For each node, the algorithms chose the best variable that performs the split to minimize Gini impurity.³¹ When the model is built, each sample is predicted using all trees, and the majority voting is applied to identify the class label. RF works especially well for classification problems even when the number of classes is very big.³² The method normally does not require any parameter optimization; the only parameter that was set up in this study was the number of trees, which was set to 100.

Support Vector Machines (SVM)

SVM is a linear classification method that can be turned into nonlinear using a kernel function. SVM maps the data samples belonging to two categories to a new space to maximize the width of the gap between the categories. For SVM models, linear kernels were used with the sequential minimal optimization (SMO) algorithm. Box constraint b parameter was optimized in a range (0.01, 3) and the optimal values were b_MIR= 0.7, b_NIR= 2.12, b_Raman= 0.01 for MIR, NIR, and Raman data, respectively.

Principal Component Analysis Support Vector Machines (PCA-SVM)

Feature modeling using PCA was tested using the SVM algorithm. Feature modeling is known to improve classification and regression modeling results and is widely used in the ML community to reduce computational costs and improve modeling results.^33,34 In this case, simple PCA compression was performed, and the principal components (PCs) were used as variables in the SVM algorithm. The number of PCs (N_PCs) was optimized for each data set separately, resulting in N_PCs= 38, 36, and 16 for MIR, NIR, and Raman data, respectively. For SVM models, linear kernels were used with the SMO algorithm. Box constraint b parameter was optimized in a range (0.01, 3), and the optimal values were b_MIR= 0.04, b_NIR= 1, and b_Raman= 0.8.

Data Fusion: Two-Block Analysis

Data fusion was carried out in a two-block fashion by combining two spectroscopic data blocks at a time (MIR and NIR; MIR and Raman; and NIR and Raman) using RF, SVM, and PCA-SVM methods and using the MB-PLSDA method. In the case of RF, SVM, and PCA-SVM methods, simple fusing of blocks was done by concatenating variables of one block to others.

The same linear kernels were used with the SMO algorithm in SVM and PCA-SVM while parameter optimization resulted in box constraint parameter b_MIR&NIR= 0.31, b_NIR&Raman= 0.02, and b_MIR&Raman= 0.04 when fusions of MIR and NIR, NIR and Raman, and MIR and Raman were done, respectively, in the case of the SVM algorithm; while b_MIR&NIR= 1 and N_PCs= 19; b_NIR&Raman= 0.15 and N_PCs= 22; and b_MIR&Raman= 0.06 and N_PCs= 21 when fusions of MIR and NIR, NIR and Raman, and MIR and Raman were done, respectively, in the case of the PCA-SVM.

Multi-Block PLSDA (MB-PLSDA)

Data fusion was carried out in a two-block fashion by combining two spectroscopic data blocks at a time (MIR and NIR; MIR and Raman; and NIR and Raman data) using the MB-PLSDA method.²⁴ The multi-block analysis allows understanding of the relations between different blocks of data and their influence on the classification performance. The model builds the global space with scores representing samples across modalities and loadings representing blocks X_MIR, X_NIR, and X_Raman, as well as local spaces of each block of data with scores and loadings corresponding to the block's samples and variables, respectively. To build the models, each X block was normalized by its Frobenius norm, i.e., the square root of the sum of the absolute squares of its elements, to compensate for the scale of the variables. This is the only weighting that is used in this case; otherwise, each block is given equal weight in the model.

Data Fusion: Three-Block Analysis

The fusion of three blocks was done by combining all three data blocks (MIR, NIR, and Raman) using RF, PCA-SVM, and MB-PLSDA. In the case of PCA-SVM, scaling of blocks by the Frobenius norm was done to improve the prediction performance. The same linear kernels were used with the SMO algorithm, and the N_PCs was optimized to be N_PCs= 24 with box constraint parameter b = 0.06. Otherwise, RF and MB-PLSDA were applied in the same manner as for the two-block analysis.

Data analyses were performed by standard algorithms, algorithms developed in-house, and open-source algorithms in Matlab v.R2022a (The Mathworks Inc.).

Results and Discussion

Classification modeling results using different algorithms in a single-block fashion are shown in Figure 2, where classification accuracy, sensitivity, and specificity of cross-validation (CV) are presented. As can be seen, the best results were obtained for the MIR and Raman spectral data: CV accuracy (Acc_CV) ranged between 69.7% and 77.8% (Table S2, Supplemental Material). The best models were obtained using PCA-SVM for both MIR and Raman with the same Acc_CV =77.8%. The NIR data set achieved a classification under 70%, even for the best model. Among the classification algorithms, the best results were achieved by PCA-SVM, followed by SVM and PLSDA. RF models underperformed in all cases.

Figure 2.

Classification performance of one-block models using different ML techniques: PLSDA, RF, and SVM, with and without feature modeling using PCA.

As can be seen from Figure 2 and Table S2 (Supplemental Material), some skewness is observed for the models, having a bit higher specificity and lower sensitivity. This means that models have a high false positive rate and a low false negative rate. The most balanced predictions are observed for the PLSDA, SVM, and PCA-SVM models built using MIR spectra. When Raman spectra are used, the skewness is a bit higher than that for the MIR spectra, while the worst predictions are observed using the NIR spectra. The skewness can be explained by the fact that the class of healthy cartilage is over-represented. From the classification results, we can conclude that SVM and PCA-SVM worked best. However, those methods required time-consuming optimization routines, while PLSDA and RF models are simple and require minimum parameter optimization. Feature modeling in SVM improves the prediction results slightly; however, since the optimization is done in a CV manner, the external test set evaluation needs to be done to evaluate the significance of the improvement.

Data fusion was done using RF, SVM, PCA-SVM, and MB-PLSDA, combining two blocks and all three blocks together. The results of classification are presented in Figure 3 and Table S3 (Supplemental Material) for the best models only. In SVM, two-block fusion improved classification only in the case of fusing the best blocks of data in MIR and Raman. The model accuracy increased from 76.6% for SVM on Raman and 76.4% on MIR as one-block models (Figure 2) to 78.8% for SVM on MIR and Raman fused (Figure 3 and Table S3, Supplemental Material). A similar result was obtained for the PCA-SVM for the two-block model MIR and Raman, Acc_CV= 79.2% (Figure 3). However, fusing three blocks by PCA-SVM resulted in Acc_CV= 81.4% (Figure 3). This is a rather minor improvement; however, it shows the potential of using different blocks of data. MB-PLSDA models achieved better classification performance compared to the one-block models.

Figure 3.

Classification performance of models by two-block fusion and three-block fusion using different ML techniques: PLSDA, RF, and SVM, with and without feature modeling using PCA.

Fusing MIR and Raman spectra allowed improvement of the models from 75.1% for Raman and 74.5% for MIR one-block models using PLSDA (Figure 2) to 77.7% by MB-PLSDA for MIR and Raman fused (Figure 3). Fusing NIR and Raman spectra allowed improvement of the models from 75.1% for Raman and 66.7% for NIR one-block models by PLSDA (Figure 2) to 77.2% NIR and Raman fused (Figure 3). Fusing three blocks MIR, NIR, and Raman improved even further the models to 79.1% correct classification by MB-PLSDA (Figure 3). The number of LVs for PLSDA and MB-PLSDA models is an important merit of the complexity of the models. As given in Tables S2 and S3 (Supplemental Material), the complexity of the models is quite modest for the one-block PLSDA models, number of LVs = 6–7. However, it increases a bit for the MB-PLSDA models, the number of LVs = 10. This shows that the multi-block algorithm utilized the complementary information of the fused data. To test the significance of the results, analysis of variance (ANOVA) analysis was implemented. This was done to test the improvement of the three-block model by MB-PLSDA compared to the single-block models by PLSDA. To do that, CV was employed using a five-fold CV where samples were randomly split into folds, and the accuracy of classification into healthy and damaged samples was used to compare the models. In total, 30 runs were done for each model, and the CV accuracies were collected. To make sure the comparison is valid, the model accuracies were collected for exactly the same CV folds, meaning exactly the same splitting of samples in each segment of CV when building and testing models across single-block and multi-block models. The ANOVA analysis revealed that the three-block MB-PLSDA model is significantly better than PLSDA_MIR (p-value < 0.001), PLSDA_NIR (p-value < 0.001), and PLSDA_Raman (p-value < 0.05).

Another important improvement in models lies in the fact that the skewness became much less pronounced even for the data that produced very skewed models in a one-block fashion like NIR data (see sensitivity and specificity, Figures 2 and 3). Fusing different blocks in RF did not improve classification (results are not presented). This could be due to the vast number of variables that represent somewhat similar information inside each modality and across modalities, making the RF model sub-optimal.

The strength and value of the data fusion approach presented in this study lies in the opportunity to learn from the models and interpret the results. However, the interpretation of the models depends on the algorithm used. It is not a straightforward task to interpret the results of the RF or SVM models, especially when feature modeling is applied in PCA-SVM models, as the PCs are used as features for the modeling and not the original variables. On the contrary, MB-PLSDA models allow the discovery of the patterns in the data, the relationships between blocks of data, and their contribution to the model. Figure 4 shows the MB-PLSDA model fusing three blocks: MIR, NIR, and Raman. Figures 4a and 4b present the global space of the model, where the scores plot represents the sample pattern in the global space and super weights show the contribution of each block (MIR, NIR, and Raman) in LV1 and LV2. As can be seen, the samples of both classes are quite close to each other and indistinguishable in the first two LVs. This explains that the model is complex and requires 10 LVs to separate the classes. The most significant contributions in LV1 and LV2 are from Raman and MIR data blocks, respectively. The NIR block is not contributing to the first two LVs. Looking at the block scores, we can observe different sample patterns in the score plots among the spectral techniques MIR, NIR, and Raman. The patterns seen in the global scores are combinations of the patterns in the respective blocks along the respective LVs. Therefore, for example, the trend in the separation of the classes along LV1 in Raman is reflected in the global scores plot along the x-axis. The cluster patterns along LV2 in the MIR score plot are reflected in the global score plot along the y-axis. Thus, mostly MIR and Raman contributions are seen in the first two LVs. The NIR block contributes substantially to LV5, thus contributing to the higher-order LVs (results not shown).

Figure 4.

Three-block model using MB-PLSDA for the first two LVs: (a) Global scores plot; (b) super weights; (c–e) block scores representing MIR block, NIR, and Raman blocks, respectively. Blue circles represent healthy samples (group 0) and red squares represent damaged samples (group 1).

Figure 5 presents the regression coefficients of the three-block model for the MIR (upper row), NIR (middle row), and Raman (lower row) data. The regression coefficient for the MIR block shows that the most pronounced peaks that contribute to the differentiation between healthy and damaged cartilages are observed in amide I (1720–1580 cm⁻¹), amide II (1580–1490 cm⁻¹), and amide III (1300–1200 cm⁻¹) regions. The MIR regression coefficient contains several peaks associated with the collagen at 1458, 1317, 1238, 1200, and 1080 cm⁻¹. The results are consistent with our previously reported studies.^17,35 The regression coefficient corresponding to the NIR spectral block selects the peaks between 6667 and 7200 cm⁻¹ representing the overtone OH band due to cartilage water content as well as several peaks in the region between 5280–6400 cm⁻¹ and 7250–8700 cm⁻¹ representing the proteoglycans content and collagen in the cartilage, respectively.³⁶ Regression coefficient of the Raman spectral block contains peaks of the amide I (1780–1600 cm⁻¹), δCH₂ region (1580–1490 cm⁻¹), and amide III (1300–1200 cm⁻¹) regions. As the most important peaks for the differentiation of the cartilage quality, the following were obtained: peak at 920 cm⁻¹ and the phenylalanine ring vibrations at 1005 and 1033 cm⁻¹ representing a difference in collagen type,³⁷ peaks at 820, 867 and 1274 cm⁻¹ associated with the degraded cartilage³⁸ and ring and C–N stretching at 1365 cm⁻¹ as well as methylene CH2 deformation at 1459 cm⁻¹ associated with the differences in protein secondary structures in cartilage.³⁹

Figure 5.

Regression coefficients for the three-block fusion model using MB-PLSDA for the first two LVs for the MIR block, and NIR and Raman blocks, respectively. The regression coefficient for the healthy samples (group 0) is in the blue solid line, while for the damaged samples (group 1) is in the red dashed line.

A correlation loading plot is another way to interpret modeling results using PLSDA models. The plot allows us to discover correlations of different variables, such as design parameters and spectral variables. An example of the plot is presented in Figure 6. The variables between red and blue circles show the highest regression coefficients, where the correlation coefficient is close to one near the blue circle, and it is 0.5 close to the dotted red circle. The correlation is close to zero at the center of the correlation plot. Positively correlated variables are clustered close to each other, and negatively correlated variables are placed opposite of each other in the plot.

Figure 6.

Correlation loading plot for the three-block fusion model using MB-PLSDA for the first two LVs. The spectral bands for the MIR spectra are plotted in black color starting with a letter M, NIR bands in red starting with an N, and Raman bands in blue starting with an R, respectively.

Figure 6 shows the correlation loading plot for the three-block model (MIR, NIR, and Raman) MB-PLSDA. Only the highest peaks were selected to represent the correlations and only the top of the peaks are provided. The MIR bands are highlighted in black, and provide the important variables representing the amide I (1700, 1667, 1623 cm⁻¹), II (1515, 1479 cm⁻¹), and III (1198, 1242 cm⁻¹) regions. NIR spectral variables, represented in red, represent the influence of OH overtone (6689, 6990, 7151, and 7184 cm⁻¹) and proteoglycan content (5355, 5687, 5708, 5728, and 5750 cm⁻¹). The Raman spectral peaks of the PLSDA three-block model are represented in blue. We observed the influence of the peaks from δCH₂ around 1501 cm⁻¹ region, amide III around 1219 cm⁻¹ region, as well as a peak at 868 cm⁻¹ associated with the degraded cartilage, 920 cm⁻¹ representing difference in collagen type, and C–N stretching at 1366 cm⁻¹.¹⁴ The design parameter related to the cartilage quality is represented in green by 0 (healthy) and 1 (damaged). As can be seen, the difference between healthy and damaged cartilage is represented mostly by the first LV (along the x-axis), while the second LV does not contribute to the differentiation since the points are close to 0 (along the y-axis). The highly correlated spectral variables associated with the damaged cartilage are the peaks 1242 cm⁻¹ from MIR and 868 and 920 cm⁻¹ from Raman spectra. This is in accordance with the studies that identified peaks related to collagen quality and degraded cartilage.^35,38 The peaks associated with the healthy cartilage are the peaks from MIR spectra at 1198, 1515, and 1700 cm⁻¹ associated with the collagen, amide II and I bands, respectively, and Raman spectra represented by δCH₂ around 1501 cm⁻¹, amide III at 1219 cm⁻¹, ring and C–N stretching at 1366 cm⁻¹. The NIR peaks, all highly correlated to each other, do not show any correlation to the cartilage groups.

Thus, we have demonstrated that the MB-PLSDA models are advantageous compared to other methods. The modeling approach allows interpretation of the results and comparison of the different data blocks used in the analysis: their similarities and differences, using scores and loadings (weights) plots of the global space of the model as well as in each local space of one block. We saw that MIR and Raman blocks showed to have the most importance for the discrimination of cartilage quality compared to NIR (super weights plot). Regression coefficients of the model allow interpretation of the important chemical variation underlying the difference between healthy and damaged cartilages while correlation loading plots allow finding patterns in the spectral data across modalities and with the experimental design parameters. Important peaks were identified in all modalities and were shown to be related to collagen type and quality, and degradation in cartilage and protein structures. This indicates the meaningfulness of the model and the validity of the results.

Certain points are worth the discussion. The methods and models presented here are simplified versions of the real world. Here the data collected by spectroscopic techniques are collected under highly controlled conditions, something that is hard to achieve in a clinical setting. Challenges in developing single-beam methods for spectroscopic analysis are significant, particularly in overcoming probe-to-probe variations. Variations due to probes and operators can affect measurements, and any model developed for practical application must be validated considering these factors. Integrating multiple spectroscopic methods into a single probe is an area of ongoing research.⁴⁰

We acknowledge that hydration levels, the presence of living cells (chondrocytes), extracellular matrix, and body fluids can complicate in vivo spectroscopic analysis. During arthroscopy, the knee joint is typically filled with saline, which helps mitigate issues related to synovial fluid and hydration. While living cells and extracellular matrix can influence spectra, chondrocytes constitute <5–10% of cartilage tissue. The state of chondrocytes and the extracellular matrix reflects cartilage health, and if spectroscopy can assess this without histology, it would be highly beneficial for clinicians in cartilage repair surgeries.

Machine learning (ML) model performance cannot beat the error of the reference method that is used to analyze the samples. The reference histological method used here, namely OARSI, is a commonly used method for evaluating cartilage health. Another method that is used to grade cartilage samples is the MANKIN grade, developed in 1971. The MANKIN method assesses cartilage structure, cellularity, Safranin O staining, and tidemark integrity, scoring from 0 (normal) to 14 (severe OA). However, it has limitations, especially for mild and moderate OA, and its reproducibility and validity have been questioned.⁴¹ The OARSI system, which we used here, is more sensitive to mild OA grades and can be consistently applied by less experienced observers, providing a reliable measure of disease progression.

The methods presented in this study could be adapted but need to be further developed to provide the tools for real-world clinical arthroscopic procedures. We believe the results presented in this study have significance in the field of cartilage diagnostics, and the modeling approaches can provide a valuable tool for the analysis of multimodal spectroscopic data.

Conclusion

This study shows the potential value of data fusion for the evaluation of cartilage health and diagnostics of cartilage degeneration. In this study, cartilage quality was assessed by discriminating between healthy and damaged tissues according to the OARSI grading system. Three vibrational spectroscopy techniques, MIR, NIR, and Raman spectroscopies showed to be complementary to each other and improved the modeling results when data fusion was applied by MB-PLSDA, SVM, and PCA-SVM algorithms compared to the single-block analysis. Feature modeling was shown to improve the modeling results even further. Only in the case of the RF algorithm, the data fusion did not improve the models compared to the single block models. The study also exhibited the advantage of the MB-PLSDA models compared to other methods by providing several tools to interpret and explore the results, a property of the models that is very valuable for the surgeon who needs to understand what underlies the model decisions.

Supplemental Material

sj-docx-1-asp-10.1177_00037028241285583 - Supplemental material for Machine Learning Approaches for the Fusion of Near-Infrared, Mid-Infrared, and Raman Data to Identify Cartilage Degradation in Human Osteochondral Plugs

Supplemental material, sj-docx-1-asp-10.1177_00037028241285583 for Machine Learning Approaches for the Fusion of Near-Infrared, Mid-Infrared, and Raman Data to Identify Cartilage Degradation in Human Osteochondral Plugs by Valeria Tafintseva, Ervin Nippolainen, Vesa Virtanen, Johanne Heitmann Solheim, Boris Zimmermann, Simo Saarakkala, Heikki Kröger, Achim Kohler, Juha Töyräs, Isaac O Afara and Rubina Shaikh in Applied Spectroscopy

Footnotes

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The study was financially supported by the MIRACLE project-Horizon 2020 research and innovation programme-H2020-ICT-2017-1 (grant agreement 780598), Academy of Finland (projects 315820, 310466), Finnish Cultural Foundation (65211977 North Savo Regional Fund), and Kuopio University Hospital (VTR project 5203111 and 5203127).

ORCID iD

Valeria Tafintseva

Supplemental Material

All supplemental material mentioned in the text is available in the online version of the journal.

References

Hunter

D.J.

Bierma-Zeinstra

. “Osteoarthritis”. Lancet. 2019. 393(10182): 1745–1759.

Hiligsmann

Reginster

J.Y.

. “The Economic Weight of Osteoarthritis in Europe”. Medicographia. 2013. 35(1): 197–202.

Kiadaliri

A.A.

Lohmander

L.S.

Moradi-Lakeh

Petersson

I.F.

Englund

. “High and Rising Burden of Hip and Knee Osteoarthritis in the Nordic Region: Findings from the Global Burden of Disease Study 2015”. Acta Orthopaedica. 2018. 89(2): 177–183.

Bennell

K.L.

Hinman

R.S.

. “A Review of the Clinical Evidence for Exercise in Osteoarthritis of the Hip and Knee”. J. Sci. Med. Sport. 2011. 14(1): 4–9.

Brittberg

Winalski

C.S.

. “Evaluation of Cartilage Injuries and Repair”. J. Bone Jt. Surg. 2003. 85(Suppl. 2): 58–69.

Brismar

B.H.

Wredmark

Movin

Leandersson

Svensson

. “Observer Reliability in the Arthroscopic Classification of Osteoarthritis of the Knee”. J. Bone Jt. Surg. Brit. 2002. 84(1): 42–47.

Spahn

Klinger

H.M.

Baums

Pinkepank

Hofmann

G.O.

. “Reliability in Arthroscopic Grading of Cartilage Lesions: Results of a Prospective Blinded Study for Evaluation of Inter-Observer Reliability”. Arch. Orthop. Unfall.-Chir. 2011. 131(3): 377–381.

Prakash

Joukainen

Torniainen

Honkanen

M.K.M.

, et al. “Near-Infrared Spectroscopy Enables Quantitative Evaluation of Human Cartilage Biomechanical Properties During Arthroscopy”. Osteoarthr. Cartil. 2019. 27(8): 1235–1243.

Sarin

J.K.

Nykänen

Tiitu

Mancini

I.A.D.

, et al. “Arthroscopic Determination of Cartilage Proteoglycan Content and Collagen Network Structure with Near-Infrared Spectroscopy”. Ann. Biomed. Eng. 2019. 47(8): 1815–1826.

10.

Sarin

J.K.

Te Moller

N.C.R.

Mancini

I.A.D.

Brommer

, et al. “Arthroscopic Near Infrared Spectroscopy Enables Simultaneous Quantitative Evaluation of Articular Cartilage and Subchondral Bone In Vivo”. Sci. Rep. 2018. 8(1): 13409–13409.

11.

Sarin

J.K.

Torniainen

Prakash

Rieppo

, et al. “Dataset on Equine Cartilage Near Infrared Spectra, Composition, and Functional Properties”. Sci. Data. 2019. 6(1): 164–164.

12.

Hanifi

Yang

Kavukcuoglu

, et al. “Infrared Fiber Optic Probe Evaluation of Degenerative Cartilage Correlates to Histological Grading”. Am. J. Sports Med. 2012. 40(12): 2853–2861.

13.

Esmonde-White

K.A.

Esmonde-White

F.W.L.

Morris

M.D.

Roessler

B.J.

. “Fiber-Optic Raman Spectroscopy of Joint Tissues”. Analyst. 2011. 136(8): 1675–1685.

14.

Rieppo

Töyräs

Saarakkala

. “Vibrational Spectroscopy of Articular Cartilage”. Appl. Spectrosc. Rev. 2017. 52(3): 249–266.

15.

Tuck

Blanc

Touti

Patterson

N.H.

, et al. “Multimodal Imaging Based on Vibrational Spectroscopies and Mass Spectrometry Imaging Applied to Biological Tissue: A Multiscale and Multiomics Review”. Anal. Chem. 2021. 93(1): 445–477.

16.

Summers

K.L.

Fimognari

Hollings

Kiernan

, et al. “A Multimodal Spectroscopic Imaging Method to Characterize the Metal and Macromolecular Content of Proteinaceous Aggregates (‘Amyloid Plaques’)”. Biochemistry. 2017. 56(32): 4107–4116.

17.

Shaikh

Tafintseva

Nippolainen

Virtanen

, et al. “Characterisation of Cartilage Damage via Fusing Mid-Infrared, Near-Infrared, and Raman Spectroscopic Data”. J. Pers. Med. 2023. 13(7): 1036.

18.

Padalkar

M.V.

Pleshko

. “Wavelength-Dependent Penetration Depth of Near Infrared Radiation into Cartilage”. Analyst. 2015. 140(7): 2093–2100.

19.

Gowen

A.A.

Dorrepaal

M.R.

. “Multivariate Chemical Image Fusion of Vibrational Spectroscopic Imaging Modalities”. Molecules. 2016. 21(7): 870.

20.

Magnussen

E.A.

Zimmermann

Blazhko

Dzurendova

, et al. “Deep Learning-Enabled Inference of 3D Molecular Absorption Distribution of Biological Cells from IR Spectra”. Commun. Chem. 2022. 5(1): 175.

21.

Zimmerman

Tafintseva

Bagcıoglu

Høegh Berdahl

Kohler

. “Analysis of Allergenic Pollen by FT-IR Microspectroscopy”. Anal. Chem. 2016. 88(1): 803–811.

22.

Wold

Hellberg

Lundstedt

Sjostrom

Wold

. “IPLS Model Building: Theory and Applications”. In: Proceedings of the Symposium on PLS Model Building: Theory and Application. Frankfurt am Main; 23–25 September 1987.

23.

Diehn

Zimmermann

Tafintseva

Seifert

, et al. “Combining Chemical Information From Grass Pollen in Multimodal Characterization”. Front. Plant Sci. 2020. 10: 1788. https://doi.org/10.3389/fpls.2019.01788

24.

Westerhuis

J.A.

Kourti

MacGregor

J.F.

. “Analysis of Multiblock and Hierarchical PCA and PLS Models”. J. Chemom. 1998. 12(5): 301–321.

25.

Curtasu

M.V.

Tafintseva

Bendiks

Z.A.

Marco

M.L.

, et al. “Obesity-Related Metabolome and Gut Microbiota Profiles of Juvenile Gottingen Minipigs-Long-Term Intake of Fructose and Resistant Starch”. Metabolites. 2020. 10(11): 456.

26.

Pritzker

K.P.H.

Gay

Jimenez

S.A.

Ostergaard

, et al. “Osteoarthritis Cartilage Histopathology: Grading and Staging”. Osteoarthr. Cartil. 2006. 14(1): 13–29.

27.

Martens

. Multivariate Analysis of Quality: An Introduction. Chichester, UK: John Wiley and Sons, 2001.

28.

Tafintseva

Lintvedt

T.A.

Solheim

J.H.

Zimmermann

, et al. “Preprocessing Strategies for Sparse Infrared Spectroscopy: A Case Study on Cartilage Diagnostics”. Molecules. 2022. 27(3): 873.

29.

Aledda

Kohler

Zimmermann

Patel

, et al. “Sparse Wavelengths Data in Mid-Infrared Spectroscopy: Modelling Approaches and Channel Sampling”. J. Biophotonics. 2023. 16(10): e202300049.

30.

Breiman

. “Random Forests”. Mach. Learning. 2001. 45(1): 5–32.

31.

Duda

R.O.

Hart

P.E.

Stork

D.G.

. Pattern Classification. Hoboken, NJ: John Wiley and Sons, 2012.

32.

Tafintseva

Vigneau

Shapaval

Cariou

, et al. “Hierarchical Classification of Microorganisms Based on High-Dimensional Phenotypic Data”. J. Biophotonics. 2018. 11(3): e201700047.

33.

Müller

Schuhmacher

Schörner

Großerueschkamp

, et al. “Dimensionality Reduction for Deep Learning in Infrared Microscopy: A Comparative Computational Survey”. Analyst. 2023. 148(20): 5022–5032.

34.

Zebari

Abdulazeez

Zeebaree

Zebari

Saeed

. “A Comprehensive Review of Dimensionality Reduction Techniques for Feature Selection and Feature Extraction”. J. Appl. Sci. Technol. Trends. 2020. 1(2): 56–70.

35.

Virtanen

Nippolainen

Shaikh

Afara

I.O.

, et al. “Infrared Fiber-Optic Spectroscopy Detects Bovine Articular Cartilage Degeneration”. Cartilage. 2021. 13(Suppl. 2): 285S–294S.

36.

Afara

I.O.

Oloyede

. “Resolving the Near-Infrared Spectrum of Articular Cartilage”. Cartilage. 2021. 13(Suppl. 1): 729S–737S.

37.

Martinez

M.G.

Bullock

A.J.

MacNeil

Rehman

I.U.

. “Characterisation of Structural Changes in Collagen with Raman Spectroscopy”. Appl. Spectrosc. Rev. 2019. 54(6): 509–542.

38.

Abdulwahab

. “Introductory Chapter: Cartilage Disorders”. In: Almqvist

El Hamaky

A.E.

Abdulwahab

, editors. Cartilage Disorders: Recent Findings and Treatment. London: IntechOpen, 2023.

39.

Pezzotti

Zhu

Terai

Marin

, et al. “Raman Spectroscopic Insight Into Osteoarthritic Cartilage Regeneration by mRNA Therapeutics Encoding Cartilage-Anabolic Transcription Factor Runx1”. Mater. Today Bio. 2022. 13: 100210.

40.

Olson

N.E.

Xiao

Lei

Ault

A.P.

. “Simultaneous Optical Photothermal Infrared (O-PTIR) and Raman Spectroscopy of Submicrometer Atmospheric Particles”. Anal. Chem. 2020. 92(14): 9932–9939.

41.

Pauli

Whiteside

Heras

F.L.

Nesic

, et al. “Comparison of Cartilage Histopathology Assessment Systems on Human Knee Joints at All Stages of Osteoarthritis Development”. Osteoarthr. Cartil. 2012. 20(6): 476–485.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB