Abstract
Background:
Axial spondyloarthritis (axSpA) is an inflammatory disease in which, despite expanding therapeutic options, a substantial proportion of patients do not achieve the desired treatment target, highlighting the emerging concept of difficult-to-manage (D2M) axSpA.
Objectives:
To identify characteristics and predictive factors of D2M axSpA and to develop machine learning models for early identification.
Design:
Longitudinal observational cohort study with external validation.
Methods:
Patients with axSpA from the SpA-Paz cohort initiating a first biological or targeted synthetic disease-modifying antirheumatic drug (b/tsDMARD) between 2004 and 2019 were included. D2M was defined as failure of ⩾2 b/tsDMARDs, and very good responders (GR) as retention of the first b/tsDMARD ⩾3 years or discontinuation due to improvement. Baseline clinical data and baseline/6-month disease activity measures were collected. Factors associated with D2M were assessed using descriptive, comparative, and logistic regression analyses. Classification and Regression Tree (CART) models were developed and externally validated with the REGISPONSERBIO registry.
Results:
Of 311 patients initiating b/tsDMARDs, 101 were included (42 D2M, 59 GR), with a D2M prevalence of 13.5%. D2M patients were more often smokers, Human Leukocyte Antigen B27 (HLA-B27) negative, and had higher rates of enthesitis and comorbidities. Baseline Axial Spondyloarthritis Disease Activity Score (ASDAS) and Bath Ankylosing Spondylitis Disease Activity Index (BASDAI) did not differ between groups, but after 6 months D2M patients showed higher disease activity (ASDAS 2.8 vs 1.6, BASDAI 5.4 vs 3.3; both p < 0.001). Multivariable models identified ASDAS, or BASDAI plus C-reactive protein (all at 6 months), as predictors of D2M. CART models achieved areas under the receiver operating characteristic curve of 0.70 (95% confidence interval (CI) 0.46–0.93; ASDAS model) and 0.76 (95% CI 0.55–0.97; BASDAI model), with external validation confirming discrimination.
Conclusion:
D2M axSpA affects approximately 1 in 8 patients initiating advanced therapy and is associated with smoking, HLA-B27 negativity, enthesitis, comorbidities, and poor 6-month response to first-line b/tsDMARDs. CART models using routine clinical data may support early identification.
Plain language summary
Introduction
Axial spondyloarthritis (axSpA) is a chronic inflammatory disease that predominantly affects the spine and the sacroiliac joints. According to previous studies, up to 1.4% of the population may suffer from axSpA, making it one of the most common rheumatic conditions. 1
For many decades, non-steroidal anti-inflammatory drugs (NSAIDs) were the only available therapy for axSpA. In the past years, new therapies have emerged, significantly improving the management and prognosis of patients with axSpA. In fact, three different families of disease-modifying antirheumatic drugs (DMARDs) are now approved for axSpA. These therapies comprise both biological DMARDs (bDMARDs), including tumor necrosis factor inhibitors (TNFi) and interleukin-17 inhibitors (IL-17i), as well as Janus kinase inhibitors (JAKi), a class of targeted synthetic DMARD (tsDMARD). 2
Notably, axSpA typically begins in early adulthood, and because conventional synthetic DMARDs (csDMARDs) are not recommended for purely axial disease, treatment often escalates to advanced therapy early and at younger ages.1,3 This paradigm is associated with substantial and sustained pharmaceutical costs, making early identification of responders, and especially non-responders, clinically and economically relevant. 4
Beyond its economic implications, axSpA also involves a substantial human burden, affecting multiple domains of patients’ lives. Patients with axSpA may experience impaired health-related quality of life, an increased risk of mental health disorders, including depression and anxiety, and considerable work-related difficulties. Importantly, these outcomes have been consistently associated with higher disease activity,5–7 highlighting the importance of achieving and maintaining optimal disease control.
According to the current Assessment of SpondyloArthritis International Society (ASAS)-European Alliance of Associations for Rheumatology (EULAR) recommendations, axSpA management should be individualized and guided by regular monitoring of clinical manifestations, disease activity, and a predefined treatment target. 3 Clinical assessment should consider not only axial involvement, but also peripheral manifestations and extra-musculoskeletal manifestations (EMMs), as these may influence therapeutic choice and have an important impact on disease activity and patients’ quality of life. 8 The Axial Spondyloarthritis Disease Activity Score (ASDAS) is the preferred instrument for disease activity assessment, complemented by objective measures such as C-reactive protein (CRP). 3 Although the TICOSPA trial did not meet its primary endpoint, several secondary outcomes supported the potential value of a treat-to-target strategy in axSpA. 9 Currently, the recommended target is achieving remission (ASDAS <1.3) or, alternatively, low disease activity (ASDAS <2.1). 2
Despite a broader therapeutic arsenal, approximately one in three patients with axSpA still do not achieve the desired treatment target. 10 One of the first steps in understanding the reasons behind treatment failure is to characterize this group of patients. This is why the concept of “difficult-to-manage” (D2M) axSpA has emerged. This term originated in the area of rheumatoid arthritis (RA), the disease in which the first consensus definition of “difficult-to-treat” was established in 2020. 11
While both conditions share similarities, axSpA has different clinical manifestations, fewer available therapeutic options, and distinct disease activity indices. For all these reasons, the definition of D2M axSpA should not be directly extrapolated from RA and requires further research. In this context, ASAS convened a task force in 2022 that has recently developed a definition of D2M axSpA. 12 However, this ASAS definition is based on expert consensus rather than evidence, as the available literature on D2M axSpA remains limited and heterogeneous. These considerations highlight the need to deepen our understanding of this subgroup of patients, which has been insufficiently characterized to date.
The main objectives of this study are: (i) to determine the characteristics of a group of patients with D2M axSpA, comparing them with a population of “GRs”; (ii) to identify predictive factors for D2M axSpA at the initiation of b/tsDMARD therapy; and (iii) to develop a classification tool for D2M axSpA using machine learning models.
Methods
Study population and inclusion criteria
This longitudinal study used data from the SpA-Paz cohort, an ongoing prospective cohort of patients with axSpA initiating b/tsDMARD therapy at La Paz University Hospital (Madrid, Spain). Patients who started a first b/tsDMARD between January 2004 and October 2019 were initially considered. Baseline assessment was performed before or at treatment initiation, with follow-up visits every 3–12 months until December 2022.
The inclusion criteria were as follows: (1) adult patients (⩾18 years of age) diagnosed with axSpA, according to their treating rheumatologist and who fulfilled ASAS classification criteria for axSpA; and (2) initiation of a first b/tsDMARD. Subsequently, these patients were divided into two groups according to the following definitions:
(1) “D2M”: discontinuation of at least two b/tsDMARDs, irrespective of the reason for discontinuation or their mechanism of action (MoA).
(2) “GRs”: patients who had been treated with only one b/tsDMARD and received it for at least 3 years, or patients who were able to stop treatment due to clinical improvement. The 3-year cut-off was established because the median retention of the first b/tsDMARD was equal to or below this point.
Patients who did not meet either of the two definitions (mainly patients who had received treatment with 2 b/tsDMARDs) were excluded in order to improve discrimination between groups. No additional exclusion criteria were applied.
For external validation of the classification models, the REGISPONSERBIO cohort, a multicentre prospective registry involving 17 Spanish centres, was used. This cohort included 257 patients with axSpA, of whom 83 initiated a first TNFi between 2013 and 2014, and 174 were already receiving TNFi therapy at inclusion, with follow-up every 6 months for 3 years.13,14 Since both baseline and 6-month data from treatment initiation were required, only the 83 patients starting TNFi at baseline were considered for external validation. Of these, 56 fulfilled one of the two study definitions (10 D2M and 46 GR) and were therefore included in the analysis.
Data collection
For all the included patients, the following data were recorded at baseline (before starting their first b/tsDMARD): demographic, clinical, and serological characteristics, musculoskeletal manifestations (enthesitis, peripheral arthritis, and dactylitis), EMMs, including anterior uveitis, psoriasis, or inflammatory bowel disease (IBD), associated comorbidities, available imaging tests (both pelvic radiograph and sacroiliac magnetic resonance imaging), and concomitant therapy, comprising NSAIDs, glucocorticoids, and csDMARDs.
In addition, simple and composite disease activity indices, including ASDAS or Bath Ankylosing Spondylitis Disease Activity Index (BASDAI), and acute phase reactants (CRP and erythrocyte sedimentation rate (ESR)) were collected at baseline and 6 months after initiating the first b/tsDMARD. 6-Month clinically important improvement and major improvement were defined as a ΔASDAS ⩾1.1 or ⩾2 points, respectively. 15 The detailed description of the collected variables is presented in Table S1.
Finally, b/tsDMARD courses and reasons for their discontinuation were recorded until December 2022. The reasons for b/tsDMARD discontinuation were classified into five different categories: inefficacy (primary or secondary), adverse effects, EMMs (which were not covered or fully controlled by the current therapy and caused a change of treatment), remission or significant improvement, and miscellaneous.
The reporting of this study conforms to the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement. 16
Statistical analyses
Descriptive statistics were stratified into two groups (D2M vs GR axSpA) and are presented as frequencies and percentages for qualitative variables, and as mean and standard deviation (SD) or median and interquartile range for quantitative variables, as appropriate. The Chi-square or Fisher’s exact tests were used for comparative analysis of qualitative variables, whereas the Student’s t-test or Mann–Whitney U test were used for quantitative variables in order to find differences between D2M and GR patients.
Logistic regression models (univariable and multivariable) were used to identify factors associated with D2M. After univariable regression analyses, variables with p < 0.10 were considered for inclusion in the multivariable model. Given the sample size, the final number of variables was limited to 10, selected according to their clinical relevance and previous literature on factors associated with treatment response. Due to collinearity between ASDAS and BASDAI/CRP, and since both are the most commonly used indices in clinical practice, two different multivariable models were obtained.
On the other hand, Classification and Regression Tree (CART) models, which are based on decision tree algorithms, were created as a potentially useful tool to classify D2M. This kind of model uses the most associated variables with the desired outcome (D2M in our study) and creates subsequent splits until the terminal node, where no significant splits can be made. First, a random forest was built to identify the variables most strongly associated with D2M axSpA, using both the Gini index and accuracy as metrics. From all these variables, the CART model then selected those that best discriminate between D2M and GR and determined the optimal cut-off points for classification. When generating the tree, the population was subsequently divided into two branches, guided by the Gini index. The value at each terminal node was defined as the mode of the observations in that sub-node. To ensure data integrity, variables with 100% missingness in the external validation cohort were removed, without data imputation. These included the following variables: psychiatric disorders, fibromyalgia, enthesitis, arthritis, dactylitis, and tender and swollen joint counts (TJC and SJC) at baseline and 6 months, since REGISPONSERBIO excluded patients with fibromyalgia or peripheral manifestations and did not record mental health disorders. Finally, only cases without missing values were included in the analyses (86 from the SpA-Paz cohort and 34 from REGISPONSERBIO registry). In addition, a maximum tree depth of three was imposed to avoid overfitting.17,18
Once the CART was developed, internal validation was performed via 10-fold cross-validation, followed by an external validation using REGISPONSERBIO cohort. The characteristics of the patients who met the definitions for D2M or GR and were included from REGISPONSERBIO for external validation are summarized in Tables S2 and S3. The performance and accuracy of the models were assessed by calculating the area under the curve (AUC) in receiver-operating characteristic (ROC) curves.
Statistical analyses were performed using SPSS v25 software (IBM, Armonk, NY, USA), whereas R v3.5.3 software was used to develop the CART models. In particular, caret_7.0-1 and rpart_4.1.24 packages were used to set up a grid of tuning parameters and develop the CART models. The pROC_1.18_4 package was used to build a ROC curve and return AUC index curve.
Results
Sociodemographic, clinical, and treatment-related characteristics
Out of 311 patients with axSpA initiating a first b/tsDMARD identified in the SpA-Paz cohort, 101 were included, 42 (41.6%) were classified as D2M, and 59 (58.4%) as GR, with a 13.5% prevalence of D2M in our axSpA cohort (Figure S1). Mean follow-up time was 9.0 ± 4.3 years and was slightly longer in the D2M group than in the GR group (10.0 ± 4.6 vs 8.2 ± 3.9 years, p = 0.04). Among D2M patients, the mean time from first b/tsDMARD initiation to fulfilment of the D2M definition was 5.0 ± 3.5 years.
Baseline characteristics are summarized in Table 1. Among all patients, mean age at first b/tsDMARD was 43.0 ± 12.5 years and almost 60% were male. Most patients initiated their first b/tsDMARD after 2010 (81.2% overall, 73.8% in the D2M group, and 86.4% in the GR group), and a higher proportion of GR patients started treatment after 2015 compared with D2M patients (54.2% vs 31.0%), while no differences were observed across the earlier 5-year periods. No significant differences in sex or age were observed between D2M and GR. Approximately three out of four patients in both groups had radiographic axSpA. Interestingly, the duration of the disease before first b/tsDMARD initiation was significantly shorter in the D2M group than in GR (5.5 ± 7.7 vs 10.5 ± 10.7 years, p = 0.007). Among D2M patients, there were more current smokers (31.0% vs 11.9%, p = 0.02) and less frequently HLA-B27 positive (64.3% vs 82.8%, p = 0.04) compared with GR.
Sociodemographic, clinical, and therapy-related characteristics in D2M and GR patients.
Results are shown as number (%) for qualitative variables and as mean ± standard deviation for continuous variables.
Statistically significant (p < 0.05).
b/tsDMARD, biological or targeted synthetic disease-modifying antirheumatic drug; BMI, body mass index; csDMARDs, conventional synthetic disease-modifying antirheumatic drugs; D2M, difficult-to-manage; EMMs, extra-musculoskeletal manifestations; GR, very good responders; HLA-B27, Human Leukocyte Antigen B27; IBD, inflammatory bowel disease; NSAIDs, non-steroidal anti-inflammatory drugs; r-axSpA, radiographic axial spondyloarthritis.
Regarding peripheral manifestations, D2M patients had more peripheral involvement than GR (97.6% vs 78.0%, p = 0.005), specifically more enthesitis (95.2% vs 59.3%, p < 0.001). The prevalence of arthritis was 58.4% overall (69.0% in the D2M group and 50.8% in the GR group, p = 0.07). Moreover, D2M patients had a significantly higher burden of comorbidities, including hypertension, dyslipidemia, psychiatric disorders, and fibromyalgia, when comparing to GR. Among EMMs, although psoriasis and IBD were numerically more common in D2M patients than in GR, no statistically significant differences were found. NSAID prescription was more frequent in D2M patients than in GR (97.2% vs 81.1%, p = 0.02).
All patients started a TNFi as their first b/tsDMARD, which was also the most frequent class of b/tsDMARDs used, with a total of 179 courses of treatment. Among D2M patients, only TNFi and IL-17i were used as second b/tsDMARD (in 35 and 7 patients, respectively). Regarding the pattern of successive drug prescriptions in D2M, the most frequent sequence was TNFi–TNFi–TNFi as first–second-third drugs, which was used in 20 patients; followed by TNFi–TNFi–IL-17i (12 patients) and TNFi–IL-17i–TNFi (4 patients), whereas only 2 patients received TNFi–IL-17i–JAKi (Figure 2(a)).
Disease activity during the first b/tsDMARD
Disease activity outcomes prior (baseline) and at 6 months of starting the first b/tsDMARD are shown in Figure 1 and Table S4.

Disease activity measures prior (a) and 6 months after (b) starting the first b/tsDMARD. Results are shown as mean and p-value.
At baseline, mean ASDAS and BASDAI were 3.6 ± 0.9 and 6.4 ± 1.7 in the D2M group, and 3.3 ± 1.0 and 5.6 ± 2.0 in the GR group. No significant differences were found in ASDAS, BASDAI, or acute phase reactants (CRP and ESR). Nevertheless, D2M patients scored higher in some baseline variables, such as BASDAI questions for spinal pain (7.5 ± 2.1 vs 6.4 ± 2.7, p = 0.04), stiffness severity (7.1 ± 2.5 vs 5.9 ± 2.8, p = 0.04), or stiffness duration (6.1 ± 2.8 vs 4.7 ± 2.7, p = 0.02), as well as for TJC (4.4 ± 6.7 vs 1.1 ± 2.7, p = 0.004), patient global assessment (7.0 ± 1.9 vs 6.0 ± 2.2, p = 0.02), and physician global assessment (5.0 ± 2.0 vs 4.0 ± 2.0, p = 0.03), in comparison with the other group.
As early as 6 months after starting b/tsDMARD therapy, the D2M group scored significantly higher in almost every simple activity measure, except for SJC (over 66 joints) and pain visual numeric scale. They also had higher scores in composite activity indices, both ASDAS (2.8 ± 1.1 vs 1.6 ± 0.9, p < 0.001) and BASDAI (5.4 ± 2.0 vs 3.3 ± 2.1, p < 0.001), and CRP (5.9 ± 7.9 vs 1.7 ± 2.7, p = 0.002) or ESR (14.0 ± 10.6 vs 8.7 ± 7.3, p = 0.007), as compared to GR. Moreover, the proportion of D2M patients achieving clinically important improvement (20.5% vs 66.7%, p < 0.001) or major improvement (7.7% vs 42.6%, p < 0.001) according to ΔASDAS, was significantly lower when comparing to GR.
Reasons for b/tsDMARD discontinuation
In the D2M group and among all courses of treatment, inefficacy (78.0%), both primary (36.2%) and secondary (41.7%), was the main cause leading to b/tsDMARD interruption. Other reasons for discontinuation were: adverse events (17.3%), which included toxicoderma, gastrointestinal intolerance, hepatotoxicity, etc., EMMs (5.1%) that were not controlled or covered by the current b/tsDMARD treatment, and other infrequent motives (6.3%) such as lack of adherence, lost to follow-up, desire of pregnancy, and other reasons. None of the D2M patients stopped their treatment on account of recurrent infections (Figure 2(b)).

Sankey diagram showing consecutive changes of treatment until the 3rd b/tsDMARD in the D2M group (a) and reasons for b/tsDMARD discontinuation in the D2M group (b). Percentages may exceed 100% because more than one reason for discontinuation may occur simultaneously within the same treatment course.
Regarding b/tsDMARD discontinuation until becoming D2M (Table S5), secondary inefficacy was the main reason for stopping the first b/tsDMARD (64.3%), while primary inefficacy was the leading cause for discontinuing the second b/tsDMARD (42.9%).
Regression models using BASDAI and ASDAS
Both univariable and multivariable logistic regression analyses are shown in Table 2. The first model identified 6-month ASDAS (OR 3.1 (95% confidence interval (CI) 1.9–5.1), p < 0.001), whereas the second one found 6-month BASDAI (OR 1.5 (95% CI 1.2–1.9), p = 0.001) and 6-month CRP (OR 1.2 (95% CI 1.0–1.3), p = 0.03), as predictive factors for D2M.
Univariable and multivariable logistic regression analyses.
Multivariable model using 6-month ASDAS.
Multivariable model using 6-month BASDAI + 6-month CRP.
Statistically significant (p < 0.05).
ASDAS, Axial Spondyloarthritis Disease Activity Score; b/tsDMARD, biological or targeted synthetic disease-modifying antirheumatic drug; BASDAI, Bath Ankylosing Spondylitis Disease Activity Index; CI, confidence interval; CRP, C-reactive protein; HLA-B27, Human Leukocyte Antigen B27; OR, odds ratio.
Classifying D2M axSpA: CART models with ASDAS and BASDAI
All variables already collected were considered to develop the CART models and those with better classification capability were selected. Nonetheless, because of the same reasons two multivariable regression models were obtained, after creating a first CART model with BASDAI, another one was made including ASDAS. The CART models, their cut-off points, and the probability of D2M axSpA after each step are shown in Figure 3. In both models, pre-test probability of D2M was 41%.

CART models classifying D2M axSpA using ASDAS (a) or BASDAI (b). CART models are based on decision tree algorithms and inputs the most associated variables with D2M, choosing the optimal cut-off point for classification and creating subsequent splits until the terminal node, where no significant splits can be made. Each terminal node represents the observational mode (green for GR and red for D2M) and the probability of being classified as D2M (pD2M).
The first model (Figure 3(a)) recognized patient global assessment after 6 months of treatment with the first b/tsDMARD and ASDAS before starting the first b/tsDMARD as the main variables. In the first step, 40 patients (47%) had patient global assessment less than 4.2 with a D2M probability of 15%, whereas 46 patients (53%) had a patient global assessment greater than or equal to 4.2. In this second group, another step was applied using baseline ASDAS, which was less than 2.9 in 13 patients (15%) and greater than or equal to 2.9 in 33 patients (38%) with a D2M probability of 23% and 79%, respectively.
On the other hand, the second model (Figure 3(b)) identified BASDAI scores obtained after 6 months of treatment with the initial b/tsDMARD and the disease duration until the first b/tsDMARD, as the key variables. In the first step, 6-month BASDAI with a cut-off of 4 was considered. A total of 40 patients (47%) achieved a BASDAI less than 4 with a D2M probability of 15%, whereas 46 patients (53%) had a BASDAI greater than or equal to 4, with a D2M probability of 63%. In this latter group, a second step was applied based on the disease duration until the first b/tsDMARD, which was greater than or equal to 2.9 years in 15 patients (17%) and less than 2.9 years in 31 patients (36%), with a D2M probability of 27% and 81%, respectively.
After validation using the REGISPONSERBIO cohort (Table S6), the first CART model reached an AUC of 0.70 (95% CI 0.46–0.93), with a sensitivity of 66.7%, specificity of 75.0%, positive predictive value of 36.4%, and negative predictive value of 91.3%, correctly classifying 73.5% of patients. The second model reached an AUC of 0.76 (95% CI 0.55–0.97), with a sensitivity of 66.7%, specificity of 78.6%, positive predictive value of 40.0%, and negative predictive value of 91.7%, correctly classifying 76.5% of patients.
Post hoc analysis: D2M cases fulfilling the ASAS-D2M definition
Among the 42 patients classified as D2M in our cohort, a post hoc analysis was performed to identify those also fulfilling the recent ASAS-D2M definition. 12 A total of 26 patients (61.9%) met this definition. These patients had failed at least two b/tsDMARDs (criterion 1), showed insufficient control of signs or symptoms of the disease, defined as ASDAS >2.1 (criterion 2), and had disease perceived as problematic by the patient and/or the rheumatologist, assessed as patient and/or physician global assessment ⩾5/10 (criterion 3).
A sensitivity analysis comparing these ASAS-D2M cases with GR patients is shown in Table S7. Consistent with the main analysis, ASAS-D2M patients had a significantly shorter disease duration before initiation of the first b/tsDMARD, more peripheral involvement (enthesitis and arthritis), more comorbidities (diabetes mellitus, dyslipidaemia, psychiatric disorders, and fibromyalgia), and higher disease activity at 6 months after the first b/tsDMARD (ASDAS, BASDAI, BASDAI subitems, swollen and TJCs, patient global assessment, CRP, and ESR). In addition, differences were also observed in some baseline disease activity measures, including higher BASDAI and some of its subitems, joint counts, and both patient and physician global assessments.
When ASAS criteria 2 and 3 were assessed among the 42 D2M patients in our cohort, all fulfilled them except for one patient who discontinued treatment because of neoplasia.
Discussion
This study provides one of the first descriptions of a population of D2M axSpA patients, offering new insights into the characteristics and predictive factors of this subgroup of individuals after 6 months of treatment with the first b/tsDMARD. Moreover, D2M patients were compared with a subgroup of individuals showing an exceptionally good response to advanced therapy, representing a novel and previously unexplored approach. Finally, two classification tools using machine learning models were developed, incorporating data commonly collected during routine clinical follow-up of patients with axSpA, which could serve as a valuable resource for decision-making in the early phases of b/tsDMARD therapy.
In this cohort, D2M axSpA had a frequency of 13.5%. In other cohorts, the frequency of D2M axSpA has been reported to range from 5.3% to 28.3%.19–23 The variability in prevalence across studies may be explained by the definitions of D2M axSpA applied (in this cohort defined as failure to ⩾2 b/tsDMARDs irrespective of MoA) and by the characteristics of the populations assessed, as some cohorts included all patients with axSpA, whereas others, including the present cohort, focused on patients already receiving advanced therapies.
D2M axSpA patients from our cohort were more often current smokers and HLA-B27 negative, had greater peripheral involvement (particularly enthesitis), a higher prevalence of comorbidities such as hypertension, dyslipidemia, psychiatric disorders, and fibromyalgia, shorter disease duration before initiating advanced therapy, and more frequent concomitant NSAID use. Smoking is a well-known risk factor for poorer response to advanced therapies, 2 whereas HLA-B27 positivity has been identified as a predictor of remission in previous studies, 24 both findings consistent with our results. Similarly, other cohorts have also reported higher smoking rates 19 and lower HLA-B27 positivity among D2M patients. 20 Peripheral manifestations and EMMs, particularly psoriasis, have also been found more frequently in D2M patients in other cohorts,21–23 as they can complicate management and lead to treatment switching. 3
The shorter disease duration before the first b/tsDMARD observed in our cohort, contrasting with the longer duration reported by Philippoteaux et al., 21 may reflect a more aggressive or symptomatic disease course requiring earlier treatment initiation. Alternatively, it could indicate patients with a shorter history of symptoms, making precise assessment of diagnosis or inflammatory activity more challenging and potentially leading to poorer treatment response; however, given the study design, an underlying diagnostic delay cannot be excluded. In addition, the broad inclusion period (2004–2019) should be considered when interpreting our findings, as it was intended to better reflect real-world clinical practice, where patients who initiated b/tsDMARD therapy many years ago coexist with those treated more recently. Nevertheless, most patients initiated their first b/tsDMARD after 2010, and the only difference across calendar periods was a higher proportion of GR patients starting after 2015, likely because these patients had a shorter follow-up time and therefore fewer opportunities to discontinue their first b/tsDMARD.
Regarding the high rates of enthesitis and fibromyalgia, this comorbidity is common in axSpA and can interfere with disease activity evaluation, since nociplastic pain may elevate composite indices such as ASDAS and BASDAI, thus potentially overestimating true inflammatory activity.25,26 Moreover, the prevalence of fibromyalgia in our cohort (6.9%) appears lower than that reported in other published series (approximately 16.3%), further complicating its relationship with measured activity. 27 Similarly, the high rate of enthesitis observed, particularly in the D2M group, could be overestimated due to reliance on clinical scores like MASES, whose tender points overlap substantially with those used to diagnose fibromyalgia, leading to false-positive enthesitis in patients with coexisting fibromyalgia. 28 Future studies could incorporate objective imaging, such as Doppler ultrasound, to distinguish true enthesitis from fibromyalgia entheseal pain. 29 Likewise, a higher prevalence of psychiatric disorders, including depression and anxiety, was observed in D2M patients (54.8%), in line with prior studies reporting an association between depression and higher disease activity. 6 Nevertheless, these conditions were also common in the overall cohort (36.6%) and even among GR patients (23.7%), with rates higher than those previously reported, 30 emphasizing the need to consider comorbidities in clinical management.
Patients with D2M axSpA scored higher on certain “subjective” disease activity measures, including patient and physician global assessment, some BASDAI subitems and TJC, before starting the first b/tsDMARD. Worse patient-reported outcomes in D2M patients have also been reported in other cohorts. 22 However, there were no differences in the main composite disease activity indices (ASDAS and BASDAI) between groups. In contrast, after only 6 months of therapy, the D2M group exhibited significantly higher values across almost all disease activity measures, including ASDAS, BASDAI, and acute phase reactants. Further, either 6-month ASDAS or the combination of 6-month BASDAI and 6-month CRP proved to be the best independent predictors of D2M in the multivariable analyses.
One of the most significant contributions of our study is the development of CART models to classify D2M axSpA. Both models, incorporating baseline ASDAS or 6-month BASDAI, achieved acceptable performance in the external validation cohort, with AUCs of 0.70 and 0.76, respectively. The BASDAI model showed a higher AUC and slightly higher accuracy than the ASDAS model. Similar models have also been developed for other diseases, such as RA. 31 The utility of these models lies in their ability to promptly identify individuals at risk of failing at least two advanced therapies, allowing for closer monitoring or therapeutic adjustments.
Six-month longitudinal follow-up with activity measures was a major strength of this study. Furthermore, to our knowledge, this is among the first reports to identify lack of response to the first b/tsDMARD at 6 months as a primary predictor of ultimately becoming D2M, which may enable early identification of these patients. Other strengths include the use of a multicenter cohort for external validation of the CART models.
Limitations
A key limitation of this study is that our operational definition of D2M axSpA differed from the recent ASAS consensus definition. 12 Specifically, we defined D2M axSpA as failure of ⩾2 b/tsDMARDs irrespective of MoA, whereas the ASAS definition requires failure of ⩾2 b/tsDMARDs with different mechanisms of action. This approach reflects the therapeutic landscape during the period in which our cohort was established (2004–2019), when TNFi were virtually the only advanced therapy available for most patients and IL-17i and JAKi were introduced only in the final years. Importantly, when ASAS criteria 2 and 3 were assessed post hoc in our D2M patients, all fulfilled them except for one patient. Furthermore, in a post hoc sensitivity analysis applying the full ASAS-D2M definition, including not only failure of ⩾2 b/tsDMARDs with different mechanisms of action but also insufficient control of signs/symptoms and disease being perceived as problematic, 61.9% of our D2M cases fulfilled the ASAS criteria. When these ASAS-D2M cases were compared with GR patients, the main findings were confirmed, supporting the robustness of our results. Regarding the third criterion, disease being perceived as problematic was operationalized using patient and/or physician global assessment ⩾5/10, in line with approaches used in previous studies in RA 32 and axSpA. 19 However, this remains a subjective criterion and is particularly difficult to capture retrospectively in cohorts established before the publication of the consensus definition; indeed, some previous studies did not include it. 21 Therefore, although we fully endorse the new ASAS consensus and support its adoption in future studies, our data still provide a valuable historical framework for understanding D2M axSpA in a real-world setting.
Apart from the differences between our operational definition and the recent ASAS consensus definition, another important limitation of this study is the sample size, both in the primary cohort and in the external validation cohort. This was partly driven by the strict selection criteria applied to ensure clear discrimination between study groups. Although external validation in an independent multicentre cohort adds value to our findings, this cohort had specific limitations, including the exclusion of patients with fibromyalgia or peripheral manifestations of the disease, the lack of data on mental health problems, and a more restricted bDMARD initiation period (2013–2014).
In addition, CART models may be mainly applicable to populations of patients with axSpA with similar characteristics to those included in this study, particularly those initiating advanced therapy with a TNFi as the first b/tsDMARD. Moreover, the models were derived from extreme phenotypes (D2M vs “GRs”), which may inflate discrimination and require evaluation in unselected cohorts. Another limitation is the proportion of patients excluded due to missing data, both in the derivation cohort and in the external validation cohort, which may have further affected the generalizability of the models.
These limitations highlight the need for future research focused on characterizing larger cohorts of both D2M and treatment-refractory axSpA patients according to ASAS definitions. 12 It will be particularly important to delineate the latter group, in whom objective evidence indicating disease activity (including elevated CRP or active inflammation on MRI) is mandatory, since ASDAS and BASDAI may be insufficient to demonstrate true inflammatory activity. 25 For this reason, incorporating imaging data, particularly MRI of the sacroiliac joints and spine, may be crucial for the adequate characterization of these patients. Future studies should also address the time dimension of D2M disease, as the ASAS consensus definition does not specify a time interval for treatment failure, and it remains unclear whether patients who fail several therapies rapidly differ from those who progress to D2M over many years. Finally, patients who fail only one b/tsDMARD, and were not included in the present study, should also be investigated to determine whether their profile is closer to that of GR or D2M patients, as well as to assess how the CART models would perform in discriminating D2M from non-D2M patients, including both GR patients and those failing only one b/tsDMARD.
Conclusion
In summary, these findings highlight the need for a comprehensive management of patients with axSpA, considering EMMs, peripheral manifestations, and comorbidities, beyond axial inflammatory activity. Early identification of D2M patients could be feasible, using tools like the CART models we developed. Recognizing and characterizing this subgroup of patients with axSpA may contribute to a more personalized and timely therapeutic approach.
Supplemental Material
sj-docx-1-tab-10.1177_1759720X261456162 – Supplemental material for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees
Supplemental material, sj-docx-1-tab-10.1177_1759720X261456162 for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees by Manuel Juárez-García, Chamaida Plasencia-Rodríguez, Diego Benavent, Mariana Díaz-Almirón, Marta Novella-Navarro, Carolina Tornero, Diana Peiteado, Xavier Juanola, Mireia Moreno, Eugenio de Miguel, Jordi Gratacós and Victoria Navarro-Compán in Therapeutic Advances in Musculoskeletal Disease
Supplemental Material
sj-docx-10-tab-10.1177_1759720X261456162 – Supplemental material for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees
Supplemental material, sj-docx-10-tab-10.1177_1759720X261456162 for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees by Manuel Juárez-García, Chamaida Plasencia-Rodríguez, Diego Benavent, Mariana Díaz-Almirón, Marta Novella-Navarro, Carolina Tornero, Diana Peiteado, Xavier Juanola, Mireia Moreno, Eugenio de Miguel, Jordi Gratacós and Victoria Navarro-Compán in Therapeutic Advances in Musculoskeletal Disease
Supplemental Material
sj-docx-2-tab-10.1177_1759720X261456162 – Supplemental material for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees
Supplemental material, sj-docx-2-tab-10.1177_1759720X261456162 for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees by Manuel Juárez-García, Chamaida Plasencia-Rodríguez, Diego Benavent, Mariana Díaz-Almirón, Marta Novella-Navarro, Carolina Tornero, Diana Peiteado, Xavier Juanola, Mireia Moreno, Eugenio de Miguel, Jordi Gratacós and Victoria Navarro-Compán in Therapeutic Advances in Musculoskeletal Disease
Supplemental Material
sj-docx-3-tab-10.1177_1759720X261456162 – Supplemental material for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees
Supplemental material, sj-docx-3-tab-10.1177_1759720X261456162 for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees by Manuel Juárez-García, Chamaida Plasencia-Rodríguez, Diego Benavent, Mariana Díaz-Almirón, Marta Novella-Navarro, Carolina Tornero, Diana Peiteado, Xavier Juanola, Mireia Moreno, Eugenio de Miguel, Jordi Gratacós and Victoria Navarro-Compán in Therapeutic Advances in Musculoskeletal Disease
Supplemental Material
sj-docx-4-tab-10.1177_1759720X261456162 – Supplemental material for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees
Supplemental material, sj-docx-4-tab-10.1177_1759720X261456162 for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees by Manuel Juárez-García, Chamaida Plasencia-Rodríguez, Diego Benavent, Mariana Díaz-Almirón, Marta Novella-Navarro, Carolina Tornero, Diana Peiteado, Xavier Juanola, Mireia Moreno, Eugenio de Miguel, Jordi Gratacós and Victoria Navarro-Compán in Therapeutic Advances in Musculoskeletal Disease
Supplemental Material
sj-docx-5-tab-10.1177_1759720X261456162 – Supplemental material for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees
Supplemental material, sj-docx-5-tab-10.1177_1759720X261456162 for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees by Manuel Juárez-García, Chamaida Plasencia-Rodríguez, Diego Benavent, Mariana Díaz-Almirón, Marta Novella-Navarro, Carolina Tornero, Diana Peiteado, Xavier Juanola, Mireia Moreno, Eugenio de Miguel, Jordi Gratacós and Victoria Navarro-Compán in Therapeutic Advances in Musculoskeletal Disease
Supplemental Material
sj-docx-6-tab-10.1177_1759720X261456162 – Supplemental material for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees
Supplemental material, sj-docx-6-tab-10.1177_1759720X261456162 for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees by Manuel Juárez-García, Chamaida Plasencia-Rodríguez, Diego Benavent, Mariana Díaz-Almirón, Marta Novella-Navarro, Carolina Tornero, Diana Peiteado, Xavier Juanola, Mireia Moreno, Eugenio de Miguel, Jordi Gratacós and Victoria Navarro-Compán in Therapeutic Advances in Musculoskeletal Disease
Supplemental Material
sj-docx-7-tab-10.1177_1759720X261456162 – Supplemental material for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees
Supplemental material, sj-docx-7-tab-10.1177_1759720X261456162 for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees by Manuel Juárez-García, Chamaida Plasencia-Rodríguez, Diego Benavent, Mariana Díaz-Almirón, Marta Novella-Navarro, Carolina Tornero, Diana Peiteado, Xavier Juanola, Mireia Moreno, Eugenio de Miguel, Jordi Gratacós and Victoria Navarro-Compán in Therapeutic Advances in Musculoskeletal Disease
Supplemental Material
sj-docx-8-tab-10.1177_1759720X261456162 – Supplemental material for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees
Supplemental material, sj-docx-8-tab-10.1177_1759720X261456162 for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees by Manuel Juárez-García, Chamaida Plasencia-Rodríguez, Diego Benavent, Mariana Díaz-Almirón, Marta Novella-Navarro, Carolina Tornero, Diana Peiteado, Xavier Juanola, Mireia Moreno, Eugenio de Miguel, Jordi Gratacós and Victoria Navarro-Compán in Therapeutic Advances in Musculoskeletal Disease
Supplemental Material
sj-docx-9-tab-10.1177_1759720X261456162 – Supplemental material for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees
Supplemental material, sj-docx-9-tab-10.1177_1759720X261456162 for Early identification of difficult-to-manage axial spondyloarthritis using machine-learning decision trees by Manuel Juárez-García, Chamaida Plasencia-Rodríguez, Diego Benavent, Mariana Díaz-Almirón, Marta Novella-Navarro, Carolina Tornero, Diana Peiteado, Xavier Juanola, Mireia Moreno, Eugenio de Miguel, Jordi Gratacós and Victoria Navarro-Compán in Therapeutic Advances in Musculoskeletal Disease
Footnotes
Acknowledgements
The authors would like to acknowledge the valuable contribution of all members of the REGISPONSERBIO Study Group, including Pedro Zarco Montejo, Jesús Sanz Sanz, Beatriz Joven, Eduardo Cuende Quintana, Miram Almirall, Mª Cruz Fernandez Espartero, Enrique Batlle Gualda, Cristina Campos, Eduardo Collantes Estevez, Maria Dolores Ruiz Montesinos, Pilar Font, Teresa Clavaguera Poch, Luis F. Linares Ferrando, Carlos Rodríguez Lozano, Beatriz Yoldi, and María Llop.
Declarations
ORCID iDs
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
