Abstract
Intervertebral disc diseases are a leading cause of chronic low back pain and disability worldwide. Conventional imaging diagnostic techniques—such as X-ray, CT, and MRI—exhibit limitations in diagnostic accuracy, efficiency, and other aspects. This review examines recent advances in artificial intelligence (AI)-integrated medical imaging for diagnosing intervertebral disc disorders. We first assess the current roles and limitations of conventional imaging modalities—X-ray, CT, and MRI—before delving into the technical foundations of machine learning (ML) and deep learning (DL) in this field. The review also surveys the current state of AI applications in spinal imaging, detailing specific implementations of AI combined with X-ray, CT, and MRI. Both common multi-modal approaches and distinctive single-modal applications are examined. Additionally, the review addresses current challenges in AI technology, including constrained sample size and quality, as well as limitations in model performance. It concludes by outlining promising future pathways, including multi-modal data fusion and the development of end-to-end diagnostic workflows, which support the translation of efficient, standardized AI tools into clinical practice.
Keywords
1. Introduction
Spine imaging serves a wide range of purposes in clinical practice, aiming to provide a complete evaluation of spinal anatomy, including the vertebral bodies, spinal canal, neural structures, intervertebral discs and so on. Among these structures, pathological conditions of the intervertebral disc—such as disc herniation, degeneration, and spinal stenosis—are major contributors to chronic low back pain and neurological impairment worldwide, with an estimated annual incidence ranging from 5 to 20 cases per 1000 individuals.
1
In clinical practice, precise characterization of the morphology, function, and pathology of intervertebral discs is critical for formulating both conservative and surgical treatment strategies. Current clinical workflows are often challenged by some problems such as the oversight of early or subtle lesions and the misdiagnosis of complex cases, the ability to rapidly and accurately diagnose these conditions directly dictates patient prognosis. While conventional diagnosis relies heavily on imaging modalities, necessitating that clinicians perform tasks ranging from lesion detection to classification and grading across diverse imaging platforms—each modality possesses distinct clinical utilities (Figure 1) and inherent limitations. MRI, owing to its superior soft tissue contrast, serves as the preferred modality for evaluating disc morphology, herniation type, and grading severity. However, the diagnostic workflow is cumbersome: radiologists must first detect pathological segments on the images, visually delineate the discs to assess the extent of involvement, and then perform subjective classification and grading based on established standards. This process is not only time-consuming but also susceptible to inter-observer variability.
2
CT provides excellent osseous detail. Although it can reveal disc herniations, particularly calcified herniations, its diagnostic utility similarly depends on a physician’s ability to detect abnormalities on reconstructed images and classify the herniation type. Furthermore, its insufficient soft tissue contrast may hinder the identification of subtle pathologies. X-ray, as the most accessible imaging modality, is primarily employed for initial assessment of spinal alignment and osseous structures. For disc-related diagnosis, radiologists must identify indirect signs of degeneration, such as disc space narrowing or spondylolisthesis. However, its inability to directly visualize soft tissues like intervertebral discs results in a high rate of missed diagnoses for early-stage disc pathologies.
3
Notably, accurate segmentation can substantially facilitate precise quantitative characterization of pathological changes, thereby supporting more reliable subsequent diagnosis and quantitative analysis.
4
However, conventional imaging workflows for disc disorders rarely incorporate such segmentation tasks, instead, clinicians predominantly rely on subjective visual assessment to classify and grade lesions according to established criteria. Even when performed for specific research purposes, manual segmentation is labor-intensive and exhibits poor reproducibility,
5
leading to inefficient workflows and compromised reliability. Collectively, the inherent technical limitations of conventional imaging modalities, coupled with significant deficiencies in efficiency, accuracy, and inter-observer agreement associated with manual assessment, highlight the critical need for improved diagnostic approaches. Imaging evaluation and diagnostic flowchart for intervertebral disc lesions using X-ray, CT, and MRI.
Artificial Intelligence (AI), which mean work in AI research that aims to build intelligent machines,
6
has emerged as a transformative force in medical imaging (Figure 2). Combined with these traditional imaging techniques, it provides a new direction for addressing the abovementioned limitations by optimizing the image post-processing workflow and unleashing the full potential of existing data (Figure 3). For instance, segmentation models, such as the U-Net architecture utilized in the deep learning framework Spine Explorer,
7
can automatically delineate vertebral and disc contours in MRI scans within seconds, providing precise morphometric data far more rapidly than manual segmentation. Classification models, employing architectures like ResNet, can be trained to categorize disc pathologies (e.g., distinguishing normal, bulging, and herniated discs) in MRI images with an accuracy exceeding 90%,
8
thereby reducing subjective bias. Furthermore, AI can extract diagnostic information from modalities with inherent limitations; detection models, such as deep learning algorithms, are capable of identifying signs of disc degeneration or spondylolisthesis from X-ray images with accuracy surpassing that of human assessment, lowering the rate of missed diagnoses.
9
The transformative role of AI in medical imaging. This schematic illustrates the integration of AI, encompassing ML and DL paradigms, into the diagnostic workflow for intervertebral disc diseases. It highlights how AI technologies facilitate automated, quantitative analysis from image acquisition to diagnostic reporting, enhancing accuracy and efficiency beyond conventional methods. Comparison of accuracy between traditional methods and AI models.

This article reviews the recent advances in artificial intelligence technologies for diagnosing disc diseases across different imaging modalities, with a focus on how AI models can assist in performing indispensable diagnostic tasks within clinical workflows—specifically detection, segmentation, classification, and grading. It aims to provide a systematic overview of how AI addresses existing bottlenecks, thereby laying the groundwork for the development of more efficient, standardized, and accessible diagnostic tools in spinal imaging.
2. Current status of imaging diagnosis for intervertebral disc diseases
2.1 Applications and limitations of conventional X-ray and CT in disc diagnosis
Conventional X-ray imaging is widely employed as an initial diagnostic modality for intervertebral disc diseases, its primary value lies in assessing spinal alignment (e.g., scoliosis, spondylolisthesis) and osseous changes, It can effectively reveal imaging manifestations such as intervertebral space narrowing and spondylolisthesis, which may indicate intervertebral disc degeneration. 10 Owing to its operational convenience and low cost, X-ray is frequently used as a preliminary screening tool in outpatient clinics.
In traditional diagnostics, physicians must rely on their own expertise to correctly identify spinal segments and radiographic projections, manually annotate key anatomical landmarks, and perform a series of geometric measurements to obtain quantitative assessment data. 11 They also need to search for and recognize various imaging features of degenerative changes based on specific positional views. Ultimately, a clinical diagnosis is made by integrating all measured data and observed characteristics. Nevertheless, it exhibits considerable limitations: its poor soft-tissue resolution prevents direct visualization of disc herniation or neural compression, resulting in low detection rates for early degenerative changes and subtle lesions (e.g., non-calcified herniations), with reported sensitivity as low as 51%–61% in diagnosing lumbar pathologies. 12 Meanwhile, the manual diagnostic process is time-consuming and suffers from relatively low accuracy. One study indicated that traditional manual diagnosis based on X-rays achieved an accuracy of only 68.3% in diagnosing cervical spondylosis. 9 Additionally, the use of ionizing radiation also limits its suitability for repeated examinations.
CT provides high spatial resolution of bony anatomy, allowing detailed evaluation of vertebral osteophytes, pedicle morphology, and spinal canal stenosis. Coupled with 3D reconstruction techniques, CT enables precise visualization of vertebral structures and pedicles, offering critical guidance for surgical interventions, including the planning of percutaneous endoscopic approaches.
13
Traditional CT diagnosis involves two primary scanning protocols: direct axial scanning, and spinal helical scanning followed by manual post-processing reconstruction. The former requires stepwise adjustment of the gantry tilt to align the scanning plane with the target intervertebral disc. However, it sometimes fails to adequately visualize the disc and its adjacent structures due to conditions such as scoliosis, carrying a risk of misdiagnosis. The latter protocol, while eliminating the need for disc-by-disc angle adjustment, requires manual, slice-by-slice reconstruction of standardized axial disc images, which are then subjectively analyzed by radiologists. The entire workflow is cumbersome, time-consuming, and labor-intensive. Moreover, even on these reconstructed images, the accuracy of manual detection for the presence of disc herniation (Figure 4) and its subsequent classification reaches only 77.16%.
4
Meanwhile, its soft-tissue contrast remains insufficient to discriminate between the nucleus pulposus and annulus fibrosus within the disc. Furthermore, CT entails substantially higher radiation exposure compared to X-ray, raising concerns about cumulative dose risks with repeated imaging.
10
Intervertebral Disc Pathology on X-ray and CT. Left: X-ray demonstrating intervertebral space narrowing. Right: CT image showing a disc herniation.
In summary, while both X-ray and CT provide valuable information regarding osseous abnormalities, neither modality adequately captures the full spectrum of pathophysiological changes in disc degeneration. Moreover, the manual diagnostic process is labor-intensive, time-consuming, and potentially prone to significant variability. This highlights the need for more advanced assistive technologies to help overcome these considerable limitations.
2.2 Applications and limitations of MRI in disc diagnosis
MRI is widely regarded as the “gold standard” for diagnosing intervertebral disc diseases, owing to its superior soft tissue resolution. Through multi-sequence protocols—such as T1-weighted and T2-weighted imaging—MRI clearly delineates disc anatomy (including the nucleus pulposus and annulus fibrosus), hydration status, and adjacent soft tissue structures (e.g., nerve roots and thecal sac).10,14 It effectively captures microstructural changes associated with disc degeneration, including alterations in signal intensity of the nucleus pulposus, integrity of the annulus fibrosus, and the presence of nerve root compression.
15
Furthermore, MRI can accurately depict the compression of soft tissues in cases of disc herniation, thereby aiding in its diagnosis (Figure 5). The diagnostic workflow for radiologists when interpreting MRI typically involves: detecting and localizing the affected intervertebral disc; differentiating and grading disc degeneration (using systems such as the Pfirrmann grading scale
16
); classifying the type of herniation (e.g., protrusion, extrusion, sequestration) and its location (e.g., central, foraminal); and assessing the severity of secondary complications, such as central canal stenosis and nerve root compression. Additionally, as a non-ionizing modality, MRI is also well-suited for longitudinal monitoring and safe for use in sensitive populations, including pregnant women. MRI findings of intervertebral disc pathology. The left image shows a disc with a high-intensity zone (HIZ) sign at the L4-L5 level, and the right image shows the MRI of a patient with disc herniation.
Despite these advantages, MRI has several limitations. Although exceptional in soft tissue contrast, it underperforms compared to CT in visualizing calcified tissues, often necessitating complementary CT imaging for cases involving ossified or calcified disc herniations. 17 Beyond this, conventional MRI biomarkers such as the high-intensity zone (HIZ) exhibit limited sensitivity—only 62.8% in detecting annular fissures—which may contribute to underdiagnosis of early degenerative changes. 18 Other constraints include relatively long acquisition times, contraindications in patients with certain metallic implants, and a dependence on radiologist expertise that introduces potential subjectivity and variability in the interpretation of complex or atypical cases.
More importantly, the traditional manual interpretation process itself faces significant bottlenecks in both efficiency and accuracy. When confronted with vast quantities of images, the tasks of manual measurement, localization, and classification by physicians are not only time-consuming and labor-intensive but also lack standardized and quantitative evaluation criteria. There may be diagnostic inconsistencies among different physicians, with studies indicating that such variability can reach up to 15%, 19 primarily stemming from subjective differences in assessing imaging features. Secondly, the accuracy of manual diagnosis is highly dependent on the type of lesion and the imaging modality, and it is generally lower than that of AI-assisted systems. Research has shown that for identifying calcified lumbar disc herniation on lateral lumbar magnetic resonance imaging (MRI), the accuracy of manual recognition was only 70.87%, which is substantially lower than the 91.67% accuracy achieved by the AI model. 20
To summarize, while MRI offers unparalleled capabilities in soft tissue evaluation and avoids ionizing radiation, its inherent limitations, coupled with the inefficiencies, inaccuracies, and subjectivity of human evaluation, highlight the need for advanced complementary approaches—including AI-enhanced analysis—to improve diagnostic precision, efficiency, and consistency in the evaluation of intervertebral disc diseases.
3. Foundation and development of AI in intervertebral disc imaging diagnosis
3.1 Technical foundation
AI refers to a branch of computer science dedicated to developing systems capable of performing tasks that traditionally require human intelligence. 21 Broadly speaking, AI methodologies can be classified into three categories 22 : symbolic approaches (e.g., rule-based expert systems), Bayesian methods (relying on probabilistic reasoning), and connectionist models (based on neural networks). In terms of technical subfields, it encompasses domains such as Natural Language Processing (which focuses on enabling computers to comprehend and generate human language), Computer Vision (concerned with interpreting images and videos, including tasks like object detection and facial recognition), DL, ML and so on. Among this, ML, a central subset of AI, entails the design of algorithms that improve automatically through experience. 23 These systems learn mappings between inputs and outputs directly from data without explicit instruction, enabling applications in prediction, recognition, and decision-making.
Based on the learning paradigm, ML can be divided into four categories 24 :Supervised learning utilizes labeled datasets to learn input-output relationships, supporting tasks such as classification and regression (e.g., artificial neural networks); Unsupervised learning identifies hidden patterns or groupings within unlabeled data (e.g., clustering), which is applicable to exploratory data analysis; Semi-supervised learning combines both labeled and unlabeled samples, often blending techniques from the above categories; Reinforcement learning involves an agent learning optimal behaviors through environmental feedback and reward maximization.
In the field of medical imaging, ML enables quantitative analysis of image features, such as texture and morphology, with radiomics playing a significant role. Radiomics transforms conventional image data into minable high-dimensional data by high-throughput extraction of a large number of quantitative features from medical images, thereby revealing subtle characteristics that are difficult to recognize with the human eye and providing more precise information for clinical practice. Studies have demonstrated that ML-based radiomics models applied in conventional imaging can effectively differentiate between primary and secondary lesions, outperforming traditional radiological evaluation. 25
DL, a rapidly advancing subfield of ML, employs deep neural networks with multiple hidden layers to model complex hierarchical representations. 26 Commonly used architectures include CNNs, 27 Recurrent Neural Networks (RNN), Generative Adversarial Network (GAN) and more recently, Transformers based on self-attention mechanisms. CNNs—among the most widely used architectures—typically consist of convolutional layers (for local feature extraction), pooling layers (for dimensionality reduction), nonlinear activation functions (enable the network to learn and model complex nonlinear relationships), and fully connected layers (the extracted features were aggregated and subsequently utilized to perform the final classification or regression tasks). 28 Compared to traditional ML, DL excels at processing large-scale unstructured data such as images and text, autonomously learning discrimin features without manual engineering. In medical imaging applications, DL has demonstrated remarkable performance in analyzing complex image datasets, capturing subtle diagnostic cues beyond human perception.29,30 However, these models generally demand greater computational resources and are often considered “black boxes” due to their limited interpretability. 31
3.2 Current state of AI technology in spinal imaging
The integration of AI into spinal imaging has catalyzed a profound paradigm shift from traditional qualitative morphological observation toward a highly intelligent and quantitative diagnostic framework. By leveraging ML and DL algorithms, AI enables automated and precise image analysis, providing clinicians with efficient diagnostic tools that significantly enhance both the accuracy and efficiency of spinal evaluations. 32 At the frontend of the clinical workflow, the frontiers of AI research have extended into the optimization of image acquisition and reconstruction protocols. DL-based reconstruction algorithms, such as GAN-based compressed sensing techniques, can substantially accelerate MRI acquisition speeds or reduce radiation exposure in CT imaging without compromising diagnostic fidelity, facilitating high-throughput screening and early intervention.32,33
At the foundational level, AI technologies facilitate the rapid automated localization and precise segmentation (Figure 6) of critical anatomical units, including the vertebrae, intervertebral discs, and neural structures. By establishing reliable anatomical benchmarks, these automated approaches effectively circumvent the inherent limitations of manual annotation.
34
AI Applications in Spinal Imaging. This figure illustrates a two-stage automated pipeline for analyzing spinal (e.g., MRI) images, leveraging deep learning for precise structural analysis. The process begins with a raw spinal image (left) as input. Stage 1: Localization and Labeling: The raw image is processed by DNN. Stage 2: Precise Segmentation: The labeled image from Stage 1 serves as the input for a CNN. This stage conducts a per-pixel analysis to achieve precise segmentation.
Beyond structural parsing, AI demonstrates significant potential in computer-aided detection and pathological recognition. In the context of spinal trauma, algorithms assist in identifying vertebral fractures on radiographs or CT scans while providing objective quantitative metrics to enhance the reproducibility of clinical assessments.35,36 In spinal oncology, specialized models are designed to improve the sensitivity and specificity of metastatic lesion detection on MRI, facilitating the discrimination of malignant pathologies within complex backgrounds to minimize diagnostic omissions and false positives.37,38 Furthermore, for pervasive degenerative conditions, AI is increasingly utilized to automate the identification and standardized grading of disc degeneration, herniation, and spinal stenosis, bridging the gap between imaging findings and standardized clinical reporting.
4. Application of AI technology in disc diagnosis across imaging modalities (X-ray, CT, and MRI)
AI applications in disc diagnosis across the three major imaging modalities—MRI, CT, and X-ray—share a unified objective: enhancing the accuracy and efficiency of disc lesion assessment. For example, the RIMNet model proposed in one study achieved simultaneous disc identification and segmentation in multi-modal MRI with an identification accuracy of 94%. 39 Similarly, the SpineTK system delivered median Dice Similarity Coefficient scores exceeding 0.95 for disc segmentation across MRI, CT, and X-ray, with an average processing time under 1.7 seconds per modality. 5 A recent meta-analysis further supports these advances, reporting a pooled Dice coefficient of 0.90 for AI-based lumbar disc segmentation, underscoring the robustness and generalizability of these models. 40 By automating the localization, detection, and segmentation of intervertebral discs, AI effectively addresses the labor-intensive and time-consuming nature of manual annotation, significantly shortening preprocessing time and establishing a reliable foundation for downstream pathological evaluation.
At the same time, due to fundamental differences in imaging mechanisms, clinical strengths, and inherent limitations among these modalities, AI applications have also evolved in a modality-specific manner to address unique diagnostic challenges and opportunities.
4.1 Application of AI combined with X-ray and CT in disc diagnosis
4.1.1 X-ray-based inference of disc pathology via “indirect signs”
Conventional X-ray infers disc pathology primarily through secondary osseous changes, however, this process is inherently constrained by the modality’s limited sensitivity and the subjectivity of manual assessment. AI-driven computational approaches now play an essential role in extracting objective, quantifiable biomarkers from these indirect features.
In addressing disc space narrowing—a key surrogate marker of degeneration—a DL framework integrating High-Resolution Network (HRNet) and Deformable Convolution (DAC) achieved end-to-automated lumbar intervertebral disc height measurement, reporting an intraclass correlation coefficient of 0.93–0.98 against radiological standards. 11 This illustrates the ability of structurally-aware architectures to capture fine-grained anatomical details under projective distortion. Clinically, this automated measurement provides an objective longitudinal tool for monitoring degenerative progression, addressing the subjectivity inherent in manual interpretation. Further exploiting transfer learning, a VGG-16 model optimized for lateral (LAT) cervical X-rays attained 95.8% sensitivity in identifying disc space narrowing and osteophyte formation, highlighting how domain-adapted convolutional networks can excel even with limited disc-level signal. 3
For curvature-based assessment, MVC-Net
41
(Figure 7) introduced a multi-view correlation mechanism that combines information from anteroposterior (AP) and LAT projections to mitigate vertebral occlusion. This mechanism mimics a radiologist’s spatial reasoning by cross-referencing orthogonal views, enabling the automated calculation of the Cobb angle and sagittal alignment parameters. These measurements are clinically essential for diagnosing spinal deformities such as scoliosis or abnormal kyphosis, which serve as critical indirect indicators of underlying disc stress. Complementing this, a Mask R-CNN-based pipeline achieved real-time vertebral instance segmentation using region-based feature alignment, providing pixel-level anatomical grounding for fully automated Cobb angle measurement while significantly reducing inter-observer variability.
42
The SpineTK system
43
further introduced a hardware-invariant calibration technique that ensures measurement consistency across different imaging devices. Schematic of the Multi-view Correlation Network (MVC-Net) for automated spinal curvature assessment. The model takes anteroposterior (AP) and lateral (LAT) radiographs as input. The core X-module explicitly learns a joint feature representation from both views to address vertebral occlusion in the LAT view. These enriched features are then fed into two parallel output branches: one for spinal landmark estimation (vertebral corner detection in both views) and another for direct Cobb angle estimation. The predicted landmarks are used to calculate key coronal (e.g., Cobb angle) and sagittal (e.g., thoracic kyphosis, lumbar lordosis) parameters, enabling comprehensive and automated AIS diagnosis and severity assessment.
Key AI models for intervertebral disc localization and Multi -Angle disc degeneration assessment.
In summary, recent AI methodologies significantly advance X-ray-based disc diagnosis by combining precise anatomical segmentation with multi-feature integration and cross-view reasoning. These technical developments not only enhance objective quantification of indirect signs but also establish a new paradigm for automated spinal pathology assessment in routine radiography.
4.1.2 Application of AI combined with CT in disc diagnosis
Current research on AI-integrated CT for disc diagnosis remains relatively limited, yet emerging studies indicate transformative potential across multiple dimensions.
A CNN-based system enabled automatic segmentation and Modic change (MC) classification in lumbar disc CT images, demonstrating high segmentation accuracy and diagnostic performance while enhancing radiologists’ interpretive efficacy.
4
In the surgical domain, AI-generated 3D fusion images combining MRI and CT data successfully simulated a full endoscopic transforaminal discectomy (FED-TF) approach. This provides intuitive visual guidance for assessing the safe zone of Kambin’s triangle and identifying bony obstructions, which is critical for minimizing nerve root injury and optimizing portal placement during complex surgical planning.
44
Further advancing classification performance, a Vision Transformer (ViT) framework outperformed conventional CNNs in both vertebral localization and disc abnormality classification on CT, while providing improved model interpretability through Grad-CAM-generated attention maps.
45
(Figure 8) Additionally, an active contour-based AI segmentation system quantitatively evaluated treatment outcomes in lumbar disc herniation (LDH) by measuring disc height reduction and vertebral slippage, enabling precise comparison of therapeutic efficacy
46
(Table 2). A Vision Transformer (ViT) Framework for Lumbar Disc Herniation Diagnosis and Interpretability Analysis in CT Imaging. This figure demonstrates the full pipeline of an automated lumbar disc herniation diagnostic model based on a ViT. The stages are: (1) Preprocessing: Original lumbar CT images undergo normalization and resizing. (2) Patching and Embedding: Images are divided into 32×32pixel patches, which are then linearly projected and combined with positional encodings to form a sequence of embedding vectors. (3) Feature Extraction: The embedded sequence is processed by a Transformer encoder composed of 12 multi-head self-attention blocks. This architecture globally models spatial dependencies among multiple vertebrae. (4) Dual-Task Output: The model features two parallel output branches: a localization branch identifies the specific intervertebral disc level, and a qualitative classification branch determines its status as normal, bulging, or herniated. (5) Interpretability Analysis: Gradient-weighted Class Activation Mapping (Grad-CAM) generates heatmaps to visualize the key image regions (highlighted) that the model relies on for decision-making. The focused areas are consistent with anatomical landmarks used in clinical diagnosis, enhancing the model’s trustworthiness and transparency. Key AI models for pathological classification and diagnosis of intervertebral disc diseases.
In conclusion, the integration of AI with CT imaging not only elevates diagnostic precision and operational efficiency but also actively supports surgical planning and outcome assessment. Despite the currently modest volume of studies, the field exhibits considerable potential for further technical innovation and clinical translation.
4.2 Application of AI combined with MRI in disc diagnosis
4.2.1 Identification of subtle lesions
Owing to their small size and subtle imaging manifestations, these lesions are often challenging to detect reliably using conventional MRI evaluation, leading to considerable diagnostic uncertainty. AI possesses pronounced advantages in identifying such subtle disc pathologies, offering a powerful tool for enhancing diagnostic precision.
Specifically, Waldenberg et al. proposed a method that extracts textural features from standard MRI sequences and integrates attention mapping mechanisms with AI classification models to accurately detect annular fissures (Figure 9)—a subtle yet clinically significant lesion strongly associated with chronic low back pain. Their approach achieved 100% sensitivity and 87% spatial localization accuracy, substantially outperforming the conventional HIZ criterion and enabling confident identification of lesions that are otherwise imperceptible to the human eye.
18
Further reinforcing this capability, research by Lagerstrand et al. found that a ML model leveraging global and local MRI biomarkers could classify subtle outer annular fissures with 97% accuracy, effectively compensating for the limitations of subjective radiological assessment in early degenerative change detection.
47
Workflow of the AI model for intervertebral disc annular fissure detection. The flowchart illustrates the architecture of the proposed AI model. The process begins with conventional T2-weighted MRI. Following disc segmentation, 480 radiomic features are extracted. These features are fed into an ensemble artificial neural network (ANN) for classification, determining the presence of an annular fissure extending to the outer annulus. For positive cases, the model generates a localization heatmap via an attention mapping module based on 22 selected features. The model achieves high accuracy in fissure detection (sensitivity 100%, specificity 96.6%) and localization (accuracy 87%).
Generally speaking, AI markedly enhances the diagnostic accuracy of disc pathology by capturing subvisual imaging features that are frequently overlooked in conventional MRI analysis. This capability reduces the rate of missed diagnoses and mitigates interpreter subjectivity, thereby providing critical support for both clinical decision-making and scientific research in spinal disorders.
4.2.2 Diagnosis of disc herniation
Disc herniation represents one of the most prevalent conditions in spinal practice. The integration of AI—particularly through DL and advanced image recognition technologies—into MRI-based evaluation of LDH has enabled automated, high-fidelity analysis of imaging data, substantially improving both the efficiency and accuracy of diagnostic processes.
The diagnostic capability of AI in LDH has been validated across multiple studies. A DL model utilizing the PP-YOLOv2 algorithm achieved a mean average precision (mAP) of 90.08% in distinguishing among normal discs, LDH, and spondylolisthesis in lumbar MRI, with LDH-specific precision reaching 91.74%.
48
The diagnostic value lies in its ability to precisely delineate the boundaries of disc displacement and the degree of spinal canal compromise, helping clinicians differentiate between stable protrusions and high-risk extrusions that may require urgent intervention. For specific subtypes such as calcified lumbar disc herniation (CLDH), a ResNet34-based model (Figure 10) attained accuracies of 91.67% and 88.76% in the internal test set and an external validation cohort, respectively, demonstrating robust generalizability.
20
Further advancing model architecture, the GE-YOLOv8 framework (Figure 11) incorporated a Gradient Search (GS) module and Efficient Channel Attention (ECA) mechanism, achieving superior accuracy and operational efficiency compared to both conventional models and manual diagnosis.
49
Even beyond human medicine, a two-stage AI model with dedicated spine localization achieved a mAP of 75.32% in detecting disc herniation in veterinary MRI, illustrating the cross-species applicability of DL approaches.
50
A systematic review corroborates that CNN and YOLO-based models frequently exceed 85% accuracy in LDH diagnosis, highlighting considerable potential to standardize interpretations and reduce inter-reader variability, despite ongoing challenges related to limited dataset size and insufficient external validation.
19
Schematic architecture of a deep Residual Neural Network (ResNet). The diagram illustrates the key innovation of ResNet: the residual block (detailed in the top inset). Each residual block utilizes shortcut connections that bypass one or more convolutional layers (e.g., 3×3 conv), allowing the input to be added directly to the output of the block. This identity mapping mitigates the vanishing gradient problem in very deep networks, enabling stable training and superior performance. The overall architecture begins with an initial convolution and pooling layer, followed by a series of stacked residual blocks with increasing feature dimensions (e.g., 64, 128, 256), and concludes with global average pooling and a fully connected output layer. Architectural comparison between the baseline YOLOv8 model (left) and the proposed GE-YOLOv8 model (right) for LDH detection. Key improvements in the GE-YOLOv8 model include: (1) Replacement of the C2f module with a GS module to enhance multi-scale feature extraction while reducing computational complexity; (2) Integration of an ECA module within the head network to optimize feature channel weights and improve sensitivity to small lesions; (3) Retention of the anchor-free double-branch head structure for efficient bounding box regression and classification. These modifications collectively contribute to the model’s superior performance in accuracy and efficiency.

In summary, AI technology—aided by diverse and optimized model architectures—delivers highly efficient and precise MRI diagnosis of LDH, offering substantial clinical utility. Future efforts should prioritize large-scale, annotated multi-center datasets and rigorous external validation to facilitate translation into routine practice.
4.2.3 Classification and grading of degenerative disc diseases
Lumbar disc degeneration represents a leading etiology of chronic low back pain. Although MRI serves as the gold standard for diagnosis, conventional manual interpretation is often limited by subjectivity and inefficiency. Recent advances in AI, particularly DL models, have demonstrated substantial potential in automating the classification and grading of degenerative disc diseases on MRI, thereby offering precise and standardized solutions for clinical evaluation.
CNNs have been employed to detect multiple degenerative changes—such as disc herniation and bulging—within lumbar MRI studies, while simultaneously performing accurate disc localization and labeling. This approach validates the feasibility of using a unified DL architecture for multi-pathology detection. 51 Multi-task learning frameworks further extend the functionality of AI, enabling concurrent diagnosis of several related conditions. For instance, one multi-task model achieved accuracies exceeding 80% for grading LDH, lumbar central canal stenosis (LCCS), and lumbar nerve root compression (LNRC) on an internal test set, and maintained performance between 74.1% and 79.6% on an external validation set, showing strong agreement with clinical standards. 52 Cross-sequence generalization has also been realized through innovative model designs. The YOLOv7-WRN-SVM model successfully predicted T1ρ-based disc degeneration stages using conventional T1-weighted MR images, attaining an accuracy of 84.0%. This method provides a promising alternative for T1ρ-MR applications in intervertebral disc degeneration (IDD) without requiring specialized sequences. 53
In fine-grained grading tasks, YOLOv8-based models excelled in classifying both severity (4 grades) and spatial distribution (8 categories) of lesions, achieving kappa coefficients of 0.88 and 0.77, respectively, which reflects a high level of detail capture and diagnostic consistency. 54 Ensemble methods such as WDRIV-Net—which integrates DenseNet169 and ResNet101 through weighted fusion—attained a classification accuracy of 96.25% for single and combined degeneration types, with Area Under the Curve (AUC) improvements of ≥2% compared to individual models, significantly outperforming conventional approaches. 55
Quantitative analysis has been enhanced through models like BianqueNet (Figure 12), which combines signal intensity and geometric parameters (e.g., disc height index) to objectively assess degeneration. The extracted parameters showed strong correlation with Pfirrmann grades, establishing an imaging biomarker framework for degeneration severity.
56
CNNs also improved grading reliability across different systems, achieving a kappa (It is a statistical metric used to assess classification agreement, particularly for measuring the degree of consistency between two or more observers when classifying the same set of subjects. It quantifies the agreement beyond that expected by chance alone, with a value greater than 0.60 generally indicating substantial agreement.) of 0.68 in Pfirrmann grading—substantially higher than the average human performance (0.38)—and demonstrated similar benefits in Fujiwara grading for facet joint degeneration.
57
Schematic of an advanced DL architecture for fine-grained intervertebral disc analysis. The model employs an encoder-decoder structure. The Encoder Path extracts multi-scale features through initial convolution/pooling and residual blocks. The ST-SC Module (likely Spatial Transformer -Self-Calibration or similar) utilizes window-based multi-head self-attention (W-MSA) and multilayer perceptrons (MLP) to model long-range spatial dependencies. The DFE Module captures multi-context information using Spatial Pyramid Pooling (SPP) and Atrous Spatial Pyramid Pooling (ASPP). The MFF Module integrates high- and low-level features before transposed convolution and 2x upsampling reconstruct the high-resolution output for precise segmentation or grading.
Key AI models for quantitative assessment and grading of intervertebral disc degeneration.
In conclusion, AI technologies—through diverse and increasingly sophisticated model architectures—now support end-to-end assessment of disc degeneration, spanning basic detection fine-grained grading, cross-sequence and cross-species generalization and ensemble-driven performance gains. These advances significantly enhance the accuracy, efficiency, and standardization of MRI-based diagnostic workflows, laying a robust foundation for future clinical integration.
5. Current limitations of AI technology in intervertebral disc imaging diagnosis
Despite the considerable advancements offered by AI, particularly DL, in the imaging diagnosis of disc diseases, several critical limitations persist in current research that must be addressed before widespread clinical adoption.
5.1 Limitations in sample size and data quality
Many studies rely on datasets that are limited in scale and derived from single institutions,59,60 resulting in small or homogeneous samples that do not represent the general population. This increases the risk of overfitting and constraining model generalizability. A small-sample study demonstrated that while the model achieved an accuracy of 80% on the internal test set, its accuracy dropped to 63.23% on the external validation set. 20 In addition, incomplete or biased patient data collection further restricts the spectrum of pathologies that models can accurately recognize. 61 In classification and grading tasks, Pfirrmann grade 1 constituted only 3.9% of the baseline sample size. The model’s accuracy in correctly classifying this grade was 68.3%, which is over 20 percentage points lower than the 92.6% accuracy achieved for grade 2, which comprised 64.9% of the initial sample. 62 Additionally, many studies fail to account for inter- and intra-observer variability during manual annotation. This compromises label consistency, embeds biases into the model, and ultimately undermines the reliability of both model training and evaluation.
5.2 Constraints in model performance
Several technical shortcomings remain evident in current AI applications. Certain segmentation methods exhibit tendencies toward oversegmentation—particularly in critical regions such as the thecal sac and intervertebral discs. 63 In challenging scenarios like spinal stenosis, discontinuous or incomplete segmentation of the thecal sac is frequently observed, reflecting limited robustness in complex anatomical contexts. Some studies also employ outdated algorithm versions or omit state-of-the-art auxiliary techniques (e.g., iterative SAM architectures or advanced attention mechanisms), thereby limiting overall model capability. 64 Performance inconsistency across studies 65 raises concerns regarding diagnostic reliability, as erroneous predictions could lead to misdiagnosis or delayed treatment.
5.3 Insufficient validation and generalizability
The evaluation protocols adopted in many studies lack rigor and comprehensiveness. A limited number of studies have employed K-fold cross-validation. 43 This method divides the dataset into K folds, iteratively using K-1 folds for training and the remaining fold for testing, repeating the process K times. The final performance metrics (e.g., accuracy, AUC) are averaged over the K test results. This approach yields more stable and reproducible outcomes than a single random train/test split, providing a better estimate of a model’s true generalization capability. Despite its proven efficacy, it is still frequently overlooked in many studies, 66 leading to insufficient stability in performance evaluatio. More critically, many models are not validated on external datasets, 67 many studies have shown that models demonstrate substantial performance gaps between internal and external datasets (e.g., GE-YOLOv8 achieved an mAP50 of 78% on the internal validation set versus 62.9% on the external test set). 49 Furthermore, performance varies across different imaging protocols, 4 for instance, the model’s classification accuracy on spiral CT was 89.5%, over ten percentage points higher than on axial CT. This illustrates that the lack of external data validation can lead to overly optimistic performance metrics and severely compromise the model’s generalizability across different populations or imaging protocols. The absence of multi-center validation further exacerbates these issues. This is because the models fail to establish their applicability to real-world data derived from diverse institutions and various devices. Conversely, studies that have undergone rigorous multi-center validation 54 can provide robust evidence of their model’s generalization capability, which constitutes an indispensable validation step when planning the clinical deployment of an AI model.
5.4 Barriers to clinical translation and implementation
Even with algorithmically superior models, the practical deployment of AI systems in clinical environments still faces multiple challenges.
Firstly, computational resources and deployment costs pose a substantial real-world barrier. Complex learning models often place high demands on computational resources. 40 The significant computational requirements of advanced models such as Vision Transformers can lead to high hardware costs, creating a practical obstacle for resource-limited clinical institutions aiming to deploy AI systems. 45 Secondly, integration with existing clinical workflows remains inadequate. Most research prototypes are not designed as software capable of seamless integration with hospital systems, and models often operate as isolated systems. 51 This inevitably increases the operational burden and time costs for clinical staff. Moreover, The “black-box” nature of many DL models 68 remains a significant barrier to clinical integration. Without clear explanations for AI-driven decisions, clinicians may be hesitant to adopt such tools in high-stakes diagnostic settings. The lack of interpretability not affects trust but also complicates compliance with emerging regulatory standards for AI-based medical devices.
In conclusion, while AI shows transformative potential in disc disease imaging, overcoming these limitations—through improved data curation, model design, validation practices, and explainability—is essential to achieving clinically reliable and widely applicable diagnostic systems.
6. Future development trends
Although deep learning models, particularly CNNs, have achieved remarkable success in medical image analysis, their inherent architectural limitations continue to constrain further performance improvements and clinical translation. Traditional architectures such as CNNs suffer from perceptual constraints: their local receptive fields hinder the modeling of inter-vertebral relationships across multiple spinal levels, while pooling operations tend to diminish fine-grained disc abnormalities. To address these issues, ViT models have been proposed. 45 By leveraging a global self-attention mechanism, ViT directly captures long-range dependencies between any two regions within the scan—such as those between different vertebrae, intervertebral discs, and neural structures. Typically implemented with a 12-layer Transformer architecture, ViT integrates features across different levels of abstraction through stacked attention layers, thereby constructing complex hierarchical representations without relying on pooling operations that discard spatial details. This design enables better preservation and utilization of subtle image information.
At the same time, the trajectory of AI in the domain of intervertebral disc diagnosis is advancing beyond conventional image analysis toward integrated, intelligent, and clinically deployable systems. This evolution is anticipated to shift the focus from automation alone toward predictive and personalized medicine—a transformation propelled by several critical technological innovations.
A principal driver of this shift is multi-modal data integration. Radiomics serves as a powerful feature-engineering methodology, enabling the high-throughput extraction of quantitative features—such as texture descriptors based on the gray-level co-occurrence matrix and the gray-level difference matrix—from medical images (e.g., MRI), thereby transforming visual information into mineable data. By integrating these radiomic features with clinically relevant variables, AI predictive models can achieve diagnostic and prognostic accuracy surpassing that of single-modality approaches. This strategy has already demonstrated enhanced efficacy in LDH research, where combined radiomic-clinical models have outperformed those utilizing imaging or clinical data in isolation. 69 Significantly, recent architectural advances in hierarchical vision transformers (e.g., Swin Transformer) and cross-modal attention mechanisms 70 offer powerful frameworks for synthesizing multi-source data, including imaging, clinical metrics, and genetic information. These approaches facilitate a comprehensive pathophysiological assessment that transcends traditional visual evaluation.
To overcome the persistent challenges of data scarcity and model generalizability, federated learning is emerging as a key enabling technology. This framework supports the development of robust algorithms trained across multi-institutional datasets71,72 without exchanging sensitive patient information, thereby markedly improving the generalizability, fairness, and robustness of models when applied to heterogeneous data sources (e.g., varying hospitals, scanner protocols, or population demographics).
Moreover, incorporating Explainable AI methodologies 73 will be crucial for elucidating model decision processes, fostering clinician trust, and facilitating regulatory approval. Ultimately, these technologies are expected to be embedded seamlessly into end-to-end clinical workflows, offering real-time decision support—potentially even during image acquisition. The automatic incorporation of AI-derived insights into structured reporting systems—such as through Digital Imaging and Communications in Medicine (DICOM) Structured Reporting for auto-completion of diagnostic templates 74 —will substantially elevate workflow efficiency and provide prognostic insights tailored to individual treatment planning.
This transformative progression will not only improve diagnostic precision and operational efficiency but also redefine diagnostic and therapeutic standards in spinal care, enabling earlier interventions, improving patient outcomes, and promoting more sustainable clinical practices.
7. Conclusion
AI technology is advancing the field of intervertebral disc imaging by introducing quantitative, data-driven methods that complement and enhance traditional qualitative assessment. Its integration across multi-modal imaging data has proven valuable for tasks ranging from lesion detection to outcome prediction.
Nevertheless, significant challenges remain, including limited and heterogeneous datasets, insufficient model interpretability, and barriers to seamless clinical integration. These issues must be urgently addressed through interdisciplinary efforts spanning radiology, orthopedics, computer science, and bioethics.
Looking forward, the continued collaboration between clinical and technical disciplines will be essential to translate algorithmic innovation into tangible clinical impact. By fostering robust, generalizable, and ethically deployed AI systems, we can advance toward a future of precise and personalized management of disc-related diseases—benefiting hundreds of millions of patients worldwide.
