Optimized OF-Score– Therapy Prediction in Successfully Treated Patients With Osteoporotic Spine Fractures Using Minimal Clinically Important Difference

Abstract

Study Design

Multicenter study with prospective collected data.

Objectives

This study investigates the relationship between individual score components and treatment success, defined by achieving Minimal Clinically Important Difference (MCID) thresholds in functional outcomes, using multicenter prospectively collected data. This work aimed to optimize the OF-Score while maintaining its original structure. By using outcome-oriented data, we refined this established decision-support tool to improve its predictive accuracy for successful clinical outcomes.

Methods

Data from 518 patients from the EOFTT study with osteoporotic vertebral fractures were analyzed. Only patients with clinically successful outcomes, defined by improvement beyond MCID thresholds in functional scores after conservative or surgical treatment, were selected from this cohort. Optimization was performed using a data-driven reweighting approach combined with structured clinical expert evaluation, adjusting variable weights within predefined limits to improve alignment with successful therapies.

Results

The subset data of 374 successfully treated patients were analyzed. Before optimization, the OF-Score showed an accuracy of 73%, with pain and mobilization being the most important parameters. After optimization, the OF-Score showed an accuracy of 80.7% (sensitivity 85.2%, specificity 71.2%). The number of nonapplicable (indifferent) therapy recommendations dropped from 144 (37%) to 72 (19%). A threshold of 5 points provided optimal discrimination between conservative treatment (≤5) and surgical treatment (>5).

Discussion

The optimized OF-Score, targeted weight adjustments, improves the alignment between clinical recommendations and treatment success. This refinement enhances the score’s predictive accuracy for treatment responders while maintaining the simple, practical structure.

Keywords

Osteoporotic vertebral fractures OF-Score minimal clinically important difference (MCID)treatment decision-making spinal fracture outcomes data driven optimization

Introduction

Due to the aging population, osteoporotic vertebral fractures (OVFs) are becoming a growing medical and socioeconomic challenge. Since OVFs differ significantly from those of patients with healthy bones, special classifications and treatment concepts have been developed.^1,2 In this regard, the OF-Score has been developed and has become more widely used in recent years (Table 1). However, treatment outcome variability and patient heterogeneity are still key challenges. The multicenter study “Evaluation of the Osteoporotic Fracture Classification, Treatment Score and Therapy Recommendations (EOFTT)” enrolled 518 patients in 17 centers and evaluated the OF-Score.^3,4

Table 1.

Original Weights of the OF-Score and Limits Applied for Optimization. The Table Summarizes the Original Scoring System and the Weight Limits of the Variables in the Optimization Process. Only Integer Values Within These Limits Were Evaluated.

Parameter	OF-score		Optimization limits
Parameter	Grade	Points	Points	Explanation
OF classification (morphology)	1-5	2-10	1-30	Original Factor 2, tested range 1 to 3
Severity of osteoporosis	T-Score < -3	+1
Deformity progression	Yes/No	+1/-1
Pain (under adequate analgesia)	VAS ≥5/<5	+1/-1	Threshold 4 to 7, +5 to +1/-1 to -5	Threshold and both directions (≤and ≥),
Fracture related neurological deficit	Yes	+2
Able to mobilize without help	No/Yes	+1/-1	+1 to +5/-1 to -5	Limits are tested for both presence and absence
Health status	ASA>3, BMI<20 kg/m², nursing case, anticoagulation	Each parameter -1; maximum -2
OF-Score recommendation (Threshold 6)
	≤5 – conservative		Threshold 5 to 8	Threshold both directions (≤ and ≥) were tests, eliminating indifferent recommendation
	=6 – indifferent
	≥7 – surgical

While 71% of patients received treatment in line with OF-Score recommendations, one-third were treated contrary to the score’s guidance.⁴ The OF-Score showed a potential of theoretical accuracy of 80%. Moreover, the OF-Score and its component variables demonstrate strong validity and reliability.^5,6 Some studies confirmed the score’s utility, while others raised concerns about its robustness.^5,7,8

There is potential to further refine the clinical utility of the OF-Score. Previous evaluations mainly examined concordance between the recommended and applied therapy, rather than actual patient outcomes.^3,5,9-11 The OF-Score has not yet been re-evaluated retrospectively in terms of treatment success. However, prior studies have shown that adherence to OF-Score recommendations is associated with significant improvements in patient outcomes.

In addition, the functional outcome and changes in functional outcome measures over time are essential when re-evaluating the clinical performance of a score. To distinguish statistically detectable from clinically meaningful improvements in patient-reported outcomes such as the Oswestry Disability Index (ODI), the concept of the Minimal Clinically Important Difference (MCID) is commonly applied and well-established in spine research.^12-15

This study aims to assess the influence of individual OF-Score components on treatment choice and success, assessed by applying MCID thresholds for relevant outcome measures (ODI, VAS, Barthel Index, EQ5D-5L). Based on these findings, score modifications will be sought to improve agreement between the OF-Score therapy recommendation and clinically successful treatment, without fundamentally changing the score’s structure or adding new variables. The analysis focuses on patients with successful outcomes following either conservative or surgical therapy.

Methods

Data from 518 patients with osteoporotic spine fractures enrolled in the prospective, observational multicenter EOFTT study were used to evaluate the OF-Score and its parameters for therapy recommendations.^3,4 By focusing exclusively on patients with clinically successful outcomes, this analysis seeks to identify the specific factors that drive effective treatment pathways, rather than merely documenting failures. The OF-Score uses key parameters, including fracture morphology (OF classification), severity of osteoporosis, deformity progression, pain, fracture-related neurological deficits, the ability to mobilize without assistance, and overall health status, to determine treatment recommendations for patients with osteoporotic vertebral fractures. Patients were evaluated at baseline prior to the treatment decision, at discharge, and at follow-up at least 6 weeks after treatment. Scheduled follow-ups were to be conducted at 6 weeks, 12 weeks, 6 months, and 12 months. The most recent available follow-up data were used for the analyses.

Successful therapy was defined based on the clinically functional outcomes (ODI, VAS-pain, Barthel index, EQ5D-5L) at the time of the follow-up examination, as follows:

The threshold for minimal impairment was set at 20% for the ODI,¹⁶ and for pain, a VAS score ≤3 was used.^17,18

The Barthel index measures the functional independence of patients, specifically in terms of mobility and daily activities. Patients with a Barthel index score between 80 and 100 have only minimal functional limitations, as they are generally able to perform most of the daily activities independently, including self-care, mobility, and using the toilet, with occasional support if needed.¹⁹

The EuroQol Group (2015) outlines the detailed procedures for calculating and interpreting EQ5D-5L scores, including the translation of personal profiles into index values.²⁰ It is implied that index values between 0.7 and 1.0 represent health states that are not perfect but are still relatively good with minimal limitations.²⁰ The EQ5D-5L index value was set at 0.7, and the self-reported health status (EQ5D-VAS) was set at 70 or better.

Patients who showed only minor impairments in the ODI, low pain scores, and minimal limitations in the Barthel index or EQ5D-5L on the day of treatment decision and maintained these levels throughout the follow-up period were included. Additionally, patients who improved by the minimal clinically important difference (MCID) in these outcome variables were also included.

The MCID was calculated for all outcome variables on the day of treatment decision using a distribution-based method,^21,22 as no validated anchor measures were available in this secondary analysis of a multicenter observational cohort. This method recommends a factor for the standard deviation of 0.3 for conservative estimates and 0.5 for moderate estimates of the MCID.²³ For our study, we used a factor of 0.5. A stricter MCID threshold was deliberately chosen to ensure that only clinically meaningful improvements were classified as treatment success. This corresponds to a medium effect size according to Cohen’s guideline (d=0.5), meaning that the patients needed to show a more substantial improvement to be considered as having a clinically detectable change.^23,24

In summary, data were analyzed from patients who were considered successfully treated if they fulfilled one of two criteria concepts. First, patients with only minor impairment at baseline (ODI <20%, VAS pain ≤3, Barthel index >80, EQ5D-5L index >0.7, EQ5D-VAS >70) who maintained these levels throughout follow-up were included. Second, patients with relevant baseline impairment were included if they achieved clinically meaningful improvement during follow-up based on the minimal clinically important difference (MCID) of the respective outcome variables. Patients were only included if they met the criteria for at least four out of these five outcome variables. Additionally, only patients who showed an improvement equal to or greater than the MCID in all outcome variables were included. In other words, only patients who either improved or maintained a relatively good functional level were included; patients who deteriorated were excluded.

Treatments were categorized as conservative, cement augmentation only, or instrumentation.

Statistical Methods

Distributional differences between the OF-Score recommendation and the therapy performed were examined using Chi² tests. Differences in OF-Score variables and baseline functional outcomes between conservatively and surgically treated patients were analyzed using a general linear model for ordinal and continuous variables, and Fisher’s exact test for nominal data.

Binary discriminant analysis was applied to identify and compare variables using standardized canonical discriminant coefficients. High coefficients, whether positive or negative, indicate a strong contribution to group separation, while values near zero suggest minimal influence. A logistic regression model was used to assess the total explanatory power of the OF-Score variables on the day of the treatment decision. The logistic regression model quality was assessed using Nagelkerke’s pseudo-R².

Optimization Process

The OF-Score was optimized through four sequential phases to determine the ideal weighting of its variables. Importantly, the final reduction to a streamlined set of key parameters was not a predefined constraint but emerged as a direct result of this data-driven optimization process. These phases comprised: (1) identification of variables with the highest discriminatory power for the performed therapy, (2) narrowing of the permissible weight ranges, (3) systematic evaluation of weight combinations, and (4) selection of the optimal solution. The optimization aimed to minimize misclassification rates and to eliminate cases resulting in ambiguous treatment recommendations.

Phase 1: Identification of Variables With the Highest Discriminatory Power

A discriminant analysis was performed to identify the variables with the greatest influence on treatment decision-making. Standardized canonical discriminant coefficients were used to determine the variables with the highest discriminatory power and to guide the subsequent narrowing of parameter ranges.

Phase 2: Narrowing of the Permissible Weight Ranges

In a second step, Microsoft Excel Solver (version 2021) was used to narrow the weight limits of the optimized OF-Score while preserving its basic structure. By restricting both the variables and their weighting ranges in advance, the number of possible solutions was reduced to a manageable range for subsequent analyses. For each of the previously identified variables, upper and lower bounds for integer weighting were defined. The optimization process considered only integer values within these predefined limits. The final applied bounds are presented in Table 1. All variables of the original OF-Score were retained for calculation of the total score and its resulting treatment recommendation. However, only the weightings of OF classification, pain, and mobility were varied, as these variables demonstrated the highest discriminatory power. The remaining variables were included using their original weights.

Phase 3: Systematic Evaluation of Weight Combinations

In Phase 3, all possible combinations of weights within the limits established for OF classification, pain, and mobility were systematically evaluated, resulting in a total of 60,000 solutions. For each solution, the OF-Score and its corresponding treatment recommendation were calculated for each patient and compared with the treatment actually performed. Accuracy, number of misclassified cases, sensitivity, and specificity were assessed for each weight combination.

Phase 4: Selection of the Optimal Solution

In the final phase, all solutions were reviewed and filtered, and the optimal solution was selected based on three predefined criteria:

• overall accuracy,

• a low number of misclassified patients, and

• simplicity and feasibility for clinical use.

The final weighting scheme was confirmed by expert consensus, ensuring both statistical robustness and clinical feasibility. Following an initial Delphi-based consensus process, it was agreed to maintain the OF-Score as a simple and user-friendly tool. Therefore, the weights assigned to pain and mobility were required to be symmetrical (e.g., −4 and +4 or −2 and +2). Combinations with asymmetrical weights (e.g., −3 and +4 or −5 and +2) were excluded. The final solution was subsequently reviewed and confirmed by expert consensus within the study group.

Descriptive and inferential statistical analyses were performed using IBM SPSS Statistics Version 29.0 (IBM Corp., Armonk, NY, USA), with a significance level of p<0.05.

Results

Among the 518 patients included in the study (128 men, 390 women), 174 (34%) were treated conservatively, and 344 (66%) underwent surgical treatment. In total, 374 patients (72% of the EOFTT cohort) met the inclusion criteria based on good functional outcomes or improvements (91 men, 283 women), with an age of 74±10 years. The OF-Score optimization was conducted, comprising 118 (32%) conservatively treated and 256 (68%) surgically treated patients. The average follow-up period for this group was 7.4±4.9 months, with a minimum of 6 weeks and a maximum of 2.5 years. Descriptive data for the variables analyzed in the OF-Score are provided in Table 2.

Table 2.

Descriptive Data of the Analyzed Patients and the Variables Used in the OF-Score and the Performed Therapy. The Percentages Represent the Proportions Within the Respective Groups. Pain Values are Reported as Mean and Standard Deviation.

OF-score variables	Total	Conservative	Surgical	p
Treatment	374	118	256
Cement augmentation only			97 (38%)
instrumentation			159 (62%)
Pain (VAS 0-10)	6.1±2.2	4.5±1.9	6.9±1.9	<0.001
OF classification				<0.001
OF1	1 (0%)	1 (1%)	-
OF2	92 (25%)	47 (40%)	45 (18%)
OF3	155 (41%)	46 (39%)	109 (43%)
OF4	108 (29%)	24 (20%)	84 (33%)
OF5	18 (5%)	-	18 (7%)
ASA				0.003
I	45 (12%)	24 (20%)	21 (8%)
II	148 (40%)	48 (41%)	100 (39%)
III	168 (45%)	41 (35%)	127 (50%)
IV	13 (3%)	5 (4%)	8 (3%)
Deformity progression	142 (38%)	40 (34%)	102 (40%)	0.303
Fracture related neurological deficit	8 (2%)	1 (1%)	7 (3%)	0.444
BMI<20 kg/m²	19 (5%)	3 (3%)	16 (6%)	0.203
Anticoagulation	100 (27%)	29 (25%)	71 (28%)	0.615
Severe osteoporosis	228 (61%)	67 (57%)	161 (63%)	0.305
Nursing case	30 (8%)	8 (7%)	22 (9%)	0.603
Able to mobilize without help	230 (61%)	110 (93%)	155 (61%)	<0.001

The calculated MCID for the outcome variables and the descriptive values of the functional outcome measures on the day of treatment decision are given in Table 3.

Table 3.

Descriptive Statistics of Functional Outcomes Including Mean, Standard Deviation, Range, and Corresponding MCID at the Time of Treatment Decision in Patients With Osteoporotic Spine Fractures.

	Total	Range	MCID	Conservative	Surgical	P
ODI	0.68±0.21	(0-1)	0.10	0.58±0.21	0.71±0.19	<0.001
Pain (VAS 0-10)	6.1±2.2	(0-10)	1.1	4.5±1.9	6.9±1.9	<0.001
Barthel index	67±27	(0-100)	13	77±23	64±27	<0.001
EQ5D-5L index value	0339±0.309	(-0.205-1)	0.152	0.480±0.296	0.272±0.296	<0.001
EQ5D-VAS	39±23	(0-100)	11	48±23	34±22	<0.001

Treatment and OF-Score

Of the 374 patients included, 310 (82.9%) received a definitive treatment recommendation according to the OF score, comprising 128 patients (34.2%) with a recommendation for conservative treatment and 182 patients (48.7%) with a recommendation for surgical treatment. In 64 patients (17.1%), the OF-Score yielded an indifferent treatment recommendation. Overall, 223 patients (71.9%) were treated in accordance with the OF-Score recommendation. The sensitivity and specificity of the OF score for predicting the treatment performed were 72.5% and 70.7%, respectively. A significant association between OF-Score recommendation and the treatment ultimately performed was observed (r phi=0.409, p<0.001). When including patients with an indifferent recommendation, the overall accuracy of the OF score for predicting successful treatment allocation was 59.6%. Among patients with a conservative recommendation, 70 of 128 patients (54.7%) received adherent conservative treatment. In contrast, adherence was higher in patients with a surgical recommendation, with 153 of 182 patients (84.1%) undergoing surgical treatment as recommended. In total, 151 patients (40.4%) received either a non-adherent treatment or had an indifferent OF-Score recommendation.

The mean OF-Score was 6.5±2.5 (range 0–14) and differed significantly between conservatively (4.8±2.4) and surgically (7.3±2.2) treated patients (p<0.001). Post hoc analysis by treatment subgroup also showed significant differences, with mean scores of 4.8±2.4 for conservative treatment, 6.2±1.8 for cement augmentation, and 7.9±2.2 for instrumentation (each p<0.001). Seven patients with neurological deficits underwent surgical treatment. One additional patient showed radicular neurological symptoms, which improved with conservative management, despite surgical treatment recommendation. On the day of treatment decision, the conservatively treated patients showed less pain and better functional outcome with p<0.001 (Table 3).

The standardized canonical coefficients of the discriminant analysis are given in Table 4. Among the variables included, pain, OF classification, and mobility accounted for the highest explained variance in therapy selection in successfully treated patients (p<0.001). Pain and the OF classification showed the highest positive values, while mobility showed the strongest negative coefficient. This indicates that the higher the pain levels, the higher the OF classification and reduced mobility were associated with a higher likelihood of surgical treatment. The overall accuracy of the discriminant analysis in predicting the type of therapy based on the OF-Score variables was 79.7%.

Table 4.

Standardized Canonical Discriminant Coefficients, Sorted by Their Value.

	Standardized canonical discriminant coefficients
Pain	0.747
OF classification	0.426
ASA	0.163
Deformity progression	0.075
Fracture-related neurological deficit	0.040
BMI<20 kg/m²	0.064
Anticoagulation	0.038
Severe osteoporosis	0.021
Nursing case	-0.170
Able to mobilize without help	-0.393

The logistic regression model achieved an overall accuracy of 80.7% in predicting the type of therapy performed (Nagelkerke’s pseudo-R²=0.548), with a sensitivity of 87.4% and a specificity of 69.5%. The logistic regression model demonstrated moderate explanatory power with a Nagelkerke’s pseudo-R² of 0.594 (p<0.001). Using the three variables with the highest absolute standardized canonical discriminant coefficients, the logistic regression yielded an accuracy of 80.4%, with a Nagelkerke’s pseudo-R² of 0.573 (p<0.001).

OF-Score Optimization

The group of indifferent therapy recommendations could be eliminated as part of the optimization process. Through the sequential four phases of the optimization process, the number of patients whose treatment did not match the OF-Score recommendation or had an indifferent recommendation was reduced from 151 to 74 (Table 5). The optimized OF-Score showed an accuracy of 80.2% with sensitivity and specificity of 84.4% and 71.2%. In Table 5, an overview of the predicted and performed therapy is given. A cut-off value of 5 points was found in both versions to show the best results (≤5 points: conservative treatment recommendation; >5 points: surgical recommendation). Table 6 shows the previous OF-Score and the result of the optimized OF-Score.

Table 5.

Results of the Optimized OF-Score, After the Change of Weight for Pain, Mobility, Using a Cut off Value in the OF-Score of 5 Points (≤5 Points: Conservative Treatment Recommendation; >5 Points: Surgical Recommendation).

	Performed
	Therapy	Conservative	Surgical	Total
Recommendation	Conservative	84	34	118
Recommendation	Surgical	40	216	256
	Total	124	250	374

Table 6.

Comparison Between the Original and Optimized OF Score. The Optimized Score Includes Only the Key Variables OF Classification, Pain, Mobility, and Fracture-Related Neurological Deficit, Whereas All Other Variables Were Omitted. Optimization Yielded a Cut-Off Value of ≤5 Points for Recommending Conservative Treatment and >5 Points for Recommending Surgical Treatment.

Parameter	Previous OF-score [3]		Optimized OF-score
Parameter	Grade	Points	Points
OF classification (morphology)	1-5	2-10	2-10
Severity of osteoporosis	T-Score < -3	+1	+1
Deformity progression	Yes / No	+1 / -1	+1 / -1
Pain (under adequate analgesia)	VAS ≥5 / <5	+1 / -1	+3 / -3
Fracture related neurol. deficit	Yes	+2	+2
Able to mobilize without help	No /Yes	-1 / +1	+2 / -2
Health status	ASA>3, BMI<20 kg/m², Nursing case, Anticoagulation	Each Parameter -1; Maximum -2	Each Parameter -1; Maximum -2


OF-Score points and recommendation
	≤5 – Conservative		≤5 – Conservative
	=6 – Indifferent		>5 – Surgical
	≥7 – Surgical

Discussion

A specific and individualized therapy for OVFs with predictable results would be desirable. The OF-Score has been developed to guide surgeons in their decision-making process. The aim of the study was to improve the accuracy of the OF-Score in the predicted therapy recommendation by simultaneously reducing the number of patients with incorrectly predicted therapy and eliminating the group of patients with indifferent therapy recommendations. Patients who had undergone successful therapy served as the patient group. After optimization of the OF-Score, two variants showed a significant improvement in accuracy. The achieved accuracy approaches the theoretical maximum of logistic regression, while keeping simplifications such as symmetry of the range of values, reduction of the number of variables, and elimination of indifferent therapy recommendation. The variables with the greatest influence on the treatment decision are the OF classification, pain, mobility, and neurological deficits, already shown recently [23]. It is particularly noteworthy that, despite reducing the OF-Score to four variables (OF classification, pain, mobility, and fracture-related neurological deficit), a higher accuracy can be achieved relative to the original classification.

The elimination of the group with an indifferent recommendation is a clinically relevant advance. This increases the selectivity and facilitates the use of the OF-Score, especially for less experienced practitioners. Previous studies had shown that adherence to the OF-Score recommendation was not always guaranteed in clinical practice [1, 24]. With improved accuracy, the score can now be used more consistently as a basis for decision-making.

A large proportion of the EOFTT cohort could be included in this study, which may indicate a generally successful therapeutic approach, even though the predictive accuracy of the OF-Score was initially limited. However, it should also be noted that 28% of patients from the EOFTT cohort did not meet the inclusion criteria of our study. A detailed analysis of this subgroup was not performed. Some of these patients were excluded due to missing functional outcome parameters at follow-up, leaving it unclear whether their treatment was successful or not. Additionally, it is reasonable to assume that some patients were not successfully treated. In future analyses, it would be of interest to identify factors associated with treatment failure.

A methodological aspect of our work is the use of the minimal clinically important difference (MCID) as a criterion for evaluating treatment success. For the ODI, thresholds between 12 and 18 points have been described in the literature,^25-27 and for back pain between 1.2 and 4 points.^12,25 There are also established MCID thresholds for EQ-5D, but these vary depending on the population.²⁸ This enabled us to ensure that not only statistically significant changes, but also clinically relevant changes were evaluated as therapeutic success. In addition, the concept of patient acceptable symptom state is becoming increasingly important, describing an acceptable symptom state for the ODI at values ≤ 25.^25,29 Distribution-based methods for determining the MCID, such as those based on standard deviation or effect size, are well-established in studies involving patients with spinal conditions.^21,22 This focus on MCID could further enhance the clinical usability of future score applications. In our work, we also provide MCID values for patients with osteoporotic vertebral fractures for the ODI, pain, and the EQ5D-5L to improve future threshold values for this patient group. Nevertheless, MCID thresholds should be interpreted with caution, as they are influenced by baseline severity, follow-up duration, and the methodological approach used for their derivation.

Pain emerged as the most influential factors in therapy decision-making within our multicenter cohort. The assignment of ±5 points in the OF-Score based on whether the pain threshold is exceeded or not represents a substantial weighting. Although this threshold was derived from our data and is statistically robust, it remains based on a highly subjective parameter

While the threshold and the weight are statistically supported, it is based on a subjective parameter, which may limit its generalizability. Although pain is a subjective parameter, it is widely used to describe functional impairment and has proven to be a meaningful and clinically relevant measure in various settings.^5,30-32 Moreover, pain was assessed in general and not in relation to physical activity or load, which limits the clinical interpretability of the results. Although pain is subjective, it remains a central determinant of functional impairment and is commonly integrated into clinical decision-support tools. Future studies should aim to differentiate between resting and activity-related pain to allow for more nuanced evaluations. In addition, external validation in independent cohorts is necessary, especially considering possible differences in pain perception across populations. While the comprehensive methodological approach enhances reproducibility, the resulting complexity may itself constitute a limitation, potentially affecting the feasibility of implementation in everyday clinical practice.

Nevertheless, around 20% of patients were not correctly identified in the predicted therapy. This, together with the results of the logistic regression and discriminant analysis, indicates influencing factors that are not yet captured by the OF-Score. Psychosocial factors could play a central role here: anxiety and depression are common in patients with osteoporotic fractures and are associated with poorer functional outcomes in the ODI, EQ5D and Barthel.^10,33,34 A prospective analysis showed that anxiety during hospitalization correlates with significantly poorer functional scores after therapy.³⁵ These findings suggest that taking psychosocial variables into account, could further improve the predictive accuracy of the OF-Score. Such factors could be also considered in the therapy process, to improve patient outcomes.

In addition to the improvement in the OF-Score achieved through a targeted combination of methods, there are limitations to our study. The number of patients with very mild (OF1) or very severe fractures (OF5) was low, so no reliable conclusions can be drawn for these subgroups. OF3 fractures occurred significantly more frequently, which can lead to the decision for conservative or surgical treatment in general—or a specific surgical treatment—being correspondingly variable, a factor that was not analyzed within the scope of this study. A fundamental selection bias may be possible, as patients with very mild symptoms may not have presented to the participating clinics because they were treated in outpatient practices. This discrepancy highlights the difference between real-world treatment patterns and outcome-based optimization and underscores the exploratory nature of the proposed modification. Nevertheless, despite the non-randomized treatment allocation, the majority of patients included in this study achieved clinically meaningful improvement or maintained a good functional status according to the predefined outcome thresholds. This does not necessarily imply that the performed treatment represented the optimal strategy for every individual patient, but it indicates that the applied treatment pathways were associated with generally favorable short-to mid-term outcomes within the investigated cohort. The follow-up period ranges from a minimum of 6 weeks to a maximum of more than one year, which can lead to bias. This should be taken into account, especially when evaluating long-term effects such as re-fractures, sagittal balance, or late complications. A further differentiation between augmentation procedures and instrumented stabilization was beyond the scope of this study but represents an important area for future investigation. The OF-Score’s generalizability is limited, as validation to date has focused primarily on Central European populations.^2,3 Consequently, patients with clinical deterioration or unsuccessful treatment outcomes were excluded by study design, which may limit the generalizability of the findings. Comparative analyses between successfully and unsuccessfully treated patients may provide additional insights into predictors of treatment failure, but this was beyond the scope of the present optimization study. As optimization and evaluation of the score were performed within the same cohort, overfitting cannot be excluded. Therefore, the optimized score should be regarded as exploratory and requires validation in independent patient cohorts. In our study, the follow-up period showed considerable variability, which may have affected the assessment of clinically meaningful improvement. Long-term outcomes and complications, such as adjacent segment degeneration or junctional problems in instrumented cases, were not assessed in this study and require dedicated long-term follow-up studies.

Ultimately, the decision on treatment is always part of a participatory process in which patient preferences and individual expectations can be taken into account.^36,37 Deviations from the score recommendation are, therefore, unavoidable and should not necessarily be considered misclassifications.

Despite these limitations, our study demonstrates that the OF-Score achieves significantly improved accuracy and practicality after adjustment. The OF score has clearly demonstrated its clinical utility, as it has guided the successful treatment of many patients, requiring only minimal modifications to enhance its precision. This refinement represents the next step in the development of the OF-Score, improving accuracy and usability while preserving its core principles. The optimized OF-Score supports treatment decisions aligned with clinical guidelines and adapted to the individual patient. So far, there is no other validated algorithm that helps to choose between conservative and surgical treatment in these patients, and the evidence is based on comparable methods used. A special aspect is the methodical use of MCID values. For the first time, the recommendation of the OF-Score is evaluated by results that are important for the patient. This can help to avoid surgeries that are not needed, lower the costs, and improve the recovery. These effects are not only important for each patient but also for the healthcare system. Better decisions can prevent wrong treatments, reduce the risk of needing care, and support return to daily life. While the current structure of the OF-Score remains unchanged for now, the proposed weight adjustments require prospective validation in an independent patient cohort. Future studies will determine if these refinements consistently improve outcomes before a formal update to the decision-support tool is implemented. At the same time, our analysis also points to existing inequalities in care, for example between outpatient and inpatient care or in international contexts. Future studies and the use of the score in practice should look at these problems more closely.

Conclusion

Our analysis of successfully treated patients revealed that pain and mobilization were the most important parameters of the OF-Score. Higher pain levels, higher OF classification, and reduced mobility were associated with a higher likelihood of surgical treatment. After optimization, the OF-Score showed an accuracy of 80.7% (sensitivity 85.2%, specificity 71.2%). The optimized OF-Score retained its simple structure while eliminating the indifferent treatment recommendation.

Footnotes

Author Note

AO Membership, First-, Last- and Coauthor Author: First Author: Philipp Schenk, AO Membership ID: pp500128579, Last Author: Klaus Schnake, AO Membership ID: pp100015128, Co-Author: Georg Osterhoff AO Membership ID: pp100036738.

Very last “Author” Position: In the very last Position, after the last author position, the following institution should be listed: & Working Group Osteoporotic Fractures Spine Section of the German Society of Orthopedics and Trauma.

Acknowledgments

The authors thank all participating centers of the EOFTT study and the members of the Working Group Osteoporotic Fractures of the German Society of Orthopedics and Trauma for their contribution and expertise.

ORCID iDs

Philipp Schenk

Bernhard Wilhelm Ullrich

Marx Ribeiro

Ulrich J. Spiegl

Gregor Schmeiser

Martin Bäumlein

Kai Sprengel

Falko Schwarz

Ethical Considerations

The EOFTT study was conducted in accordance with the Declaration of Helsinki and approved by the responsible institutional ethics committees of the participating centers.

Consent to Participate

Written informed consent was obtained from all patients prior to inclusion. The processing number of the ethics committee of the lead clinical center is: Ethics Committee of the Medical Association Saxony-Anhalt, file 31/17.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data supporting the findings of this study are available upon reasonable request to the corresponding author and after consultation with the Working Group Osteoporotic Fractures, Spine Section of the German Society of Orthopedics and Trauma.*

Anonymity Statement

All identifying information related to authors, institutions, and ethics committees is provided on this title page and has been removed from the blinded manuscript.

References

Blattert

Schnake

Gonschorek

, et al. Nonsurgical and surgical management of osteoporotic vertebral body fractures: recommendations of the spine section of the German Society for Orthopaedics and Trauma (DGOU). Global spine journal. 2018;8(2_suppl):50S-55S.

Schnake

Blattert

Hahn

, et al. Classification of osteoporotic thoracolumbar spine fractures: recommendations of the spine section of the German Society for Orthopaedics and Trauma (DGOU). Global spine journal. 2018;8(2_suppl):46S-49S.

Ullrich

Schenk

Scherer

, et al. OF-score for osteoporotic thoracolumbar fractures–which parameter is decisive for the therapy decision?-a prospective multicentric cohort study. European Spine Journal. 2025;34:1-8.

Ullrich

Schenk

Scheyerer

, et al. Georg Schmorl prize of the German spine society (DWG) 2022: current treatment for inpatients with osteoporotic thoracolumbar fractures—results of the EOFTT study. European Spine Journal. 2023;32(5):1525-1535.

Mekariya

Santipas

Khamnurak

, et al. Validity and reliability of the osteoporotic fracture treatment score (OF score) and outcomes across various treatments in osteoporosis vertebral compression fracture patients. Journal of Orthopaedic Surgery and Research. 2024;19(1):750. 1-9.

Mitani

Takahashi

Tokunaga

, et al. Therapeutic prediction of Osteoporotic Vertebral Compression Fracture using the AO Spine-DGOU osteoporotic fracture classification and classification-based score: a single-Center Retrospective Observational Study. Neurospine. 2023;20(4):1166-1176.

Tarawneh

Narayanan

McCurdy

, et al. Evaluation of perioperative care and drivers of cost in geriatric thoracolumbar trauma. Brain and Spine. 2024;4:102780. 1-7. doi:10.1016/j.bas.2024.102780.

Ba-Ali

Bech

Hallager

. Inter-and Intrarater Agreement of the AO Spine-DGOU Osteoporotic Fracture Classification System Using Radiography and Computed Tomography Imaging. Global Spine Journal. 2025;15:21925682251318654.

Osterhoff

Schenk

Katscher

, et al. Treatment and outcome of osteoporotic thoracolumbar vertebral fractures with anterior or posterior tension band failure (OF 5): short-term results from the prospective EOFTT multicenter study. Global Spine Journal. 2023;13(1_suppl):44S-51S.

10.

Schwarz

Klee

Schenk

, et al. Impact of anxiety during hospitalization on the clinical outcome of patients with osteoporotic thoracolumbar vertebral fracture. Global Spine Journal. 2023;15(2):417-424. doi:10.1177/21925682231192847.

11.

Spiegl

Schenk

Schnake

, et al. Treatment and outcome of osteoporotic thoracolumbar vertebral body fractures with deformation of both endplates with or without posterior wall involvement (OF 4): short-term results from the prospective EOFTT multicenter study. Global Spine Journal. 2023;13(1_suppl):36S-43S.

12.

Power

Perruccio

Canizares

, et al. Determining minimal clinically important difference estimates following surgery for degenerative conditions of the lumbar spine: analysis of the Canadian Spine Outcomes and Research Network (CSORN) registry. The Spine Journal. 2023;23(9):1323-1333.

13.

Ostelo

de Vet

. Clinically important outcomes in low back pain. Best practice & research Clinical rheumatology. 2005;19(4):593-607.

14.

Schwind

Learman

O’Halloran

Showalter

Cook

. Different minimally important clinical difference (MCID) scores lead to different clinical prediction rules for the Oswestry disability index for the same sample of patients. Journal of Manual & Manipulative Therapy. 2013;21(2):71-78.

15.

Lauridsen

Hartvigsen

Manniche

Korsholm

Grunnet-Nilsson

. Responsiveness and minimal clinically important difference for pain and disability instruments in low back pain patients. BMC musculoskeletal disorders. 2006;7(1):82. doi:10.1186/1471-2474-7-82.

16.

Fairbank

Couper

Davies

O’Brien

. The Oswestry low back pain disability questionnaire. Physiotherapy. 1980;66(8):271-273.

17.

Chiarotto

Boers

Deyo

, et al. Core outcome measurement instruments for clinical trials in nonspecific low back pain. Pain. 2018;159(3):481-495.

18.

Chiarotto

Deyo

Terwee

, et al. Core outcome domains for clinical trials in non-specific low back pain. European Spine Journal. 2015;24(6):1127-1142.

19.

Mahoney

Barthel

. Functional evaluation: the Barthel Index: a simple index of independence useful in scoring improvement in the rehabilitation of the chronically ill. Maryland State Medical Journal. 1965;14:61-65.

20.

Devlin

Roudijk

Ludwig

. Value sets for EQ-5D-5L: a compendium, comparative review & user guide. Bielefeld, Germany: Springer; 2022. doi:10.1007/978-3-030-89289-0.

21.

Sedaghat

. Understanding the minimal clinically important difference (MCID) of patient-reported outcome measures. Otolaryngology–Head and Neck Surgery. 2019;161(4):551-560.

22.

Franceschini

Boffa

Pignotti

Andriolo

Zaffagnini

Filardo

. The minimal clinically important difference changes greatly based on the different calculation methods. The American Journal of Sports Medicine. 2023;51(4):1067-1073.

23.

Mouelhi

Jouve

Castelli

Gentile

. How is the minimal clinically important difference established in health-related quality of life instruments? Review of anchors and methods. Health and quality of life outcomes. 2020;18(1):136. doi:10.1186/s12955-020-01344-w.

24.

Cohen

. Statistical power analysis for the behavioral sciences. New York, USA: routledge; 2013.

25.

Copay

Glassman

Subach

Berven

Schuler

Carreon

. Minimum clinically important difference in lumbar spine surgery patients: a choice of methods using the Oswestry Disability Index, Medical Outcomes Study questionnaire Short Form 36, and pain scales. The Spine Journal. 2008;8(6):968-974.

26.

Parker

Adogwa

Paul

, et al. Utility of minimum clinically important difference in assessing pain, disability, and health state after transforaminal lumbar interbody fusion for degenerative lumbar spondylolisthesis. Journal of Neurosurgery: Spine. 2011;14(5):598-604.

27.

Yoshida

Hasegawa

Yamato

, et al. Minimum clinically important differences in Oswestry Disability Index domains and their impact on adult spinal deformity surgery. Asian spine journal. 2018;13(1):35-44. doi:10.31616/asj.2018.0077.

28.

Coretti

Ruggeri

McNamee

. The minimum clinically important difference for EQ-5D index: a critical review. Expert review of pharmacoeconomics & outcomes research. 2014;14(2):221-233.

29.

Shahi

Shinn

Singh

, et al. ODI< 25 denotes patient acceptable symptom state after minimally invasive lumbar spine surgery. Spine. 2023;48(3):196-202.

30.

Smith

Norbury

Hunt

Mauger

. Intra- and interindividual reliability of muscle pain induced by an intramuscular injection of hypertonic saline injection into the quadriceps. Eur J Pain. 2023;27(10):1216-1225. doi:10.1002/ejp.2151.

31.

Ferreira-Valente

Pais-Ribeiro

Jensen

. Validity of four pain intensity rating scales. Pain. 2011;152(10):2399-2404. doi:10.1016/j.pain.2011.07.005.

32.

Amir

Rose-McCandlish

Weger

, et al. Test-Retest Reliability of an Adaptive Thermal Pain Calibration Procedure in Healthy Volunteers. J Pain. 2022;23(9):1543-1555. doi:10.1016/j.jpain.2022.01.011.

33.

Wang

Shi

Sun

Y-D

Dong

. Correlation between anxiety, depression, and social stress in young patients with thoracolumbar spine fractures. World Journal of Psychiatry. 2025;15(1):101373. doi:10.5498/wjp.v15.i1.101373.

34.

Hajilo

Imani

Zandi

Mehrafshan

Khazaei

. Risk factors analysis and risk prediction model for failed back surgery syndrome: A prospective cohort study. Heliyon. 2025;11(1):e40607. doi:10.1016/j.heliyon.2024.e40607.

35.

Sun

Tang

You

, et al. The Role of the Lumbar Paravertebral Muscles in the Development of Short-term Residual Pain After Lumbar Fusion Surgery. Spine. 2025;50(8):537-547.

36.

Menendez

Omar

Chagoya

, et al. Patient satisfaction in spine surgery: a systematic review of the literature. Asian Spine Journal. 2019;13(6):1047-1057.

37.

Lehrich

Goshtasbi

Brown

, et al. Predictors of patient satisfaction in spine surgery: a systematic review. World Neurosurgery. 2021;146:e1160-e1170.