The Application of Computerized Adaptive Testing to the International Knee Documentation Committee Subjective Knee Evaluation Form

Abstract

Background:

Patient-reported outcome measures (PROMs) are commonly used to monitor functional outcomes for clinical and research purposes; unfortunately, many PROMs include redundant, burdensome questions for patients. The use of predictive models to implement computerized adaptive testing (CAT) offer a potential solution to reduce question burden in outcomes research.

Purpose:

To validate the usage of an appropriate CAT system to improve the efficiency of the International Knee Documentation Committee (IKDC) Subjective Knee Form.

Study Design:

Cohort study (Diagnosis); Level of evidence, 2.

Methods:

Validation was based on electronically collected patient responses from 2 separate orthopaedic sports medicine clinics. Diagnoses included, but were not limited to, meniscal lesions, ligamentous injuries, and chondral defects. The CAT system was previously developed through analysis of an electronic knee PROM database that did not contain any of these cases.

Results:

A total of 2173 patient responses (1229 patients) were collected. The CAT model was able to reduce the question burden by a mean of 9.33 questions (45.1%). Higher CAT-predicted scores correlated strongly with higher actual scores (r = 0.99; intraclass correlation coefficient = 0.99). The mean difference between the CAT-predicted score and the actual PROM score was 0.48 of a point on a scale of 0 to 100.

Conclusion:

The use of CAT systems, in conjunction with electronic PROMs, can accurately predict outcome scores for IKDC PROMs, while dramatically decreasing the number of questionnaire items needed for any given patient. By decreasing questionnaire burden, clinicians and researchers can potentially increase patient participation and follow-up in both clinical assessments and research trials.

Keywords

IKDC subjective knee form CAT patient-reported outcome metrics knee functional outcome score

The modern orthopaedic patient is often tasked with several patient-reported outcome measures (PROMs) to provide a quantitative assessment of orthopaedic interventions. PROMs are an important tool for a variety of purposes. They provide an objective data point on subjectively based patient self-reported outcomes that can be used to assess patient outcomes for research and quality improvement, track patient satisfaction, and evaluate the economics of health care; they may even be used in the evaluation of reimbursement rates. Yet despite these benefits, they place significant burdens on both the patient and the clinician. Redundant questions create a time-consuming and repetitive process that can be mentally exhausting for patients. It can be equally time-consuming for clinicians to collect, evaluate, and securely store patient outcomes data.^5,7 Streamlining PROMs and optimizing the efficiency of data collection stand to benefit patients and clinicians alike.

A reduction of the number of questions asked of a patient has been shown to increase the response rate and completion of PROM surveys in patients.¹⁵ Thus, any effort to reduce question burden, while maintaining accuracy, is likely to be of great benefit in data collection. It is possible to predict a patient’s score on a given PROM based on a subset of questions with that PROM; however, the questions needed for each prediction vary from patient to patient. Predictive models of computer learning, such as computerized adaptive testing (CAT), offer a potential solution. CAT systems can identify which questions are needed to predict a patient’s score on a given PROM. By adapting the questionnaire to an individual patient’s response in real time, we can significantly reduce question burden, while still obtaining accurate outcome data.⁴

Before usage, CAT models must be validated for each specific PROM. Data featuring the efficacy and reliability of CAT models for hand, shoulder, elbow, and even other knee PROMs have already been published for orthopaedic patients.^1,2,16,20 However, to our knowledge, this would be the first publication for the International Knee Documentation Committee (IKDC) form, a well-validated PROM that has been shown to be a superior outcomes measure in specific orthopaedic patient populations.^9,10 The IKDC form was developed with the intention of being applicable across a range of orthopaedic conditions.¹⁵ Relative to other PROMs, it has been shown to contain the most important question items to patients across a multitude of diagnoses, thus making it an indispensable tool in evaluation of the orthopaedic knee patient.^10,11,22

The purpose of this study is to assess the CAT system’s ability to predict outcome scores for the IKDC form while decreasing the question burden.

Methods

A total of 2173 complete patient responses (1229 patients) were electronically collected from 2 separate orthopaedic sports medicine clinics within a single institution. The mean age of patients was 46.47 ± 17.05 years (range, 13-87 years). Data were retrospectively collected through the Outcomes Based Electronic Research Database (OBERD) (Universal Research Solutions LLC, www.oberd.com). Patient diagnoses included, but were not limited to, meniscal lesions, ligamentous injuries, and chondral defects of the knee. PROMs were collected from preoperative and postoperative assessments. Only complete PROM responses were included.

OBERD is a clinical outcomes measurement tool that can administer and collect data on PROMs. The OBERD CAT system was developed through the analysis of a separate electronic knee PROM database (see the Appendix, available in the online version of this article). This CAT system uses machine learning programs that analyze a completed database on how the individual patient response patterns affect their overall final IKDC score. This allows the CAT system to “learn” and refine its algorithm, developing a population model based on an aggregation of individual outcomes. This is known as the “training set.” Once the CAT system develops its model based on the training set, it must be validated for accuracy. Validity for a CAT model is determined by applying it to a completely new set of cases that are entirely separate from the training set. The outcomes measured by the CAT must be compared and contrasted against the scores measured by the full form model—in our case, the full form IKDC score.

Our PROM database was used to assess the predictive accuracy and questionnaire burden reduction power of the CAT IKDC to validate its use. The predictive accuracy was assessed by comparing the actual PROM scores with the CAT-predicted scores via multiple methods of statistical analysis, following the Bland and Altman³ recommendations for comparison between 2 independent groups of measurement. First, the mean and standard deviation between long-form IKDC and CAT model IKDC scores were compared. Next, the Pearson correlation coefficient (R) was calculated to measure the linear correlation between the scores, followed by the intraclass correlation coefficient (ICC) to help distinguish the inherent variability in scores intrinsic to the IKDC from the variation due to use of the CAT version. Then, the frequency of distribution of outcomes was plotted to ensure similar results between scores, with the differences (long form minus CAT score) plotted for analysis. Finally, a Bland-Altman plot was generated to assess the patterns of score differences. Analyses were performed with the R software suite V 3.4.2 (R Foundation for Statistical Computing), with the Python V 3.4.5 programming language (Python Software Foundation), or using Microsoft Excel spreadsheets.

These values were reviewed in relation to the minimal clinically important difference (MCID) for the IKDC form, the change score that discriminates between patients who perceive a difference in functional outcome from those who do not. Reports in the literature of MCID for the IKDC range from 11.5 to 20.5 and can vary based on the procedure.⁸ We selected a mean MCID value of 15, a priori, believing that it would be most representative of all knee procedures as a whole.

Results

The full IKDC form is composed of 18 items: 17 questions and 1 set of subjective functional scores. On average, the mean CAT IKDC form resulted in 9.33 questions, for a reduction of 7.67 questions per patient or a 45.1% percentage reduction. The median reduction was 8.0 questions for a 47.1% reduction compared with the full IKDC (Table 1). The single most informative IKDC question was found to be “How does your knee affect your ability to go upstairs?” Therefore, this became the first question that our CAT IKDC asked of all individuals. Other questions that were frequently asked concerned pain, swelling, stiffness, kneeling, rising from a chair, and general activity level.

Table 1

CAT Model Results: The IKDC CAT Resulted in a Significant Mean and Median Reduction in Question Burden^a

CAT Model Results
No. of individual patients	1229
No. of completed questionnaires	2173
No. of regular questions	17
Mean No. of CAT questions	9.33
Mean question reduction	7.67
Mean percentage reduction, %	45.1
Median No. of CAT questions	9.00
Median question reduction	8.00
Median percentage reduction, %	47.1

CAT, computerized adaptive testing; IKDC, International Knee Documentation Committee.

The questions on the IKDC are used to determine a score of 0 to 100 points, where 100 represents no functional deficits.¹⁵ The mean CAT score was 0.48 points higher than the mean full score for the IKDC with similar standard deviations (52.55 ± 22.69 vs 53.03 ± 23.30, respectively). The Pearson R was 0.99, representing a strong linear relationship between the scores, and linear regression shows the linear relationship to be near equality (Figure 1). The ICC was 0.99, indicating a strong agreement between scores as well (Table 2). The distribution of CAT and full scores showed a significant overlap (Figure 2). The difference between these scores was less than the MCID in >97% of cases (Figure 3). The Bland-Altman plot showed that the differences between the CAT and full scores were largely independent of the overall score. There was also no bias in either direction (differences at higher or lower scores) (Figure 4).

Figure 1.

Linear regression of the computerized adaptive testing (CAT) vs long International Knee Documentation Committee (IKDC) forms.

Table 2

Summary of Statistical Data: Compares the IKDC CAT and IKDC Long Form Models^a

Summary of Statistical Data
CAT	Long IKDC	Difference (CAT vs long)	R	ICC
52.551 ± 22.691	53.033 ± 23.299	0.482 ± 3.256	0.99	0.99

Data are reported as mean ± SD. Pearson correlation coefficient (R) and intraclass correlation coefficient (ICC) demonstrate good fit. CAT, computerized adaptive testing; IKDC, International Knee Documentation Committee.

Figure 2.

Distribution of final patient International Knee Documentation Committee (IKDC) scores for the computerized adaptive testing (CAT) and long forms.

Figure 3.

Distribution of score differences of the computerized adaptive testing (CAT) vs long International Knee Documentation Committee (IKDC) forms.

Figure 4.

Bland-Altman plot. Differences between the computerized adaptive testing (CAT) and long International Knee Documentation Committee (IKDC) score is independent of the final patient score.

Discussion

The IKDC is a well-validated and widely used PROM tool endorsed by a multitude of orthopaedic societies. Many institutions have adopted it to evaluate care and conduct research in a variety of knee conditions. Studies consistently report no floor or ceiling effects for the IKDC form, making it a valuable tool.^15,19 Accurate and reliable PROMs have become ever more important, as they are utilized by every step of the health care process. Tasks ranging from the evaluation of Food and Drug Administration trials for new treatments to system-wide health care economic assessments utilize PROMs to evaluate patient outcomes.¹⁵ The development of a CAT model is an attempt to streamline the patient experience and reduce the burden that the data collection process places upon patients.

The mean reported time to completion for the full IKDC form is 10 minutes.¹⁹ Assuming patients spend an equal amount of time per question, we can estimate that the average patient will be tasked with 7.67 fewer questions and save 4.5 minutes. While this may seem like a trivial amount of time, it may be critical in avoiding respondent fatigue, a well-established occurrence with administered surveys.¹⁴ Indeed, this was a consideration during the initial creation of the IKDC subjective knee form. The first version of the IKDC form contained 41 questions, but the authors discovered that an unacceptable level of questions went unanswered. The final version of the IKDC form settled on 18 items and an acceptable noncompletion rate, with 9.7% of 590 patients failing to answer ≥3 questions on the full IKDC form.¹⁵ This improved rate still leaves a significant amount of uncollected or incomplete data. Methods that reduce the question or time burden on a respondent are likely to result in an increased survey completion rate. If this can be accomplished while maintaining accuracy, as we have shown here, it is of significant research and clinical value.

We followed the Bland and Altman³ method to validate and compare the CAT IKDC model to the full form score. One of the values calculated according to this method is the ICC. The repeatability of a test, or the internal consistency after accounting for measurement errors, is demonstrated by the ICC value. Repeatability is typically determined via a test-retest method, in which the test in question is administered twice to the same individual. The consistency in the test versus retest scores is expressed as the ICC, which has been reported to be in the range of 0.90 to 0.95 for the full IKDC form.⁶ This is viewed as an excellent ICC value, showing strong agreement between the initial and retest values for the IKDC, and thus showing that the IKDC is a reliable tool. The ICC for the CAT IKDC versus the full form IKDC was found to be 0.99. This value demonstrates that CAT IKDC is incredibly consistent with the full form IKDC.

The accuracy and reliability of the CAT model was examined in context with the MCID. The MCID, by definition, reflects the subjective patient experience. This value determines the point at which an individual patient believes one’s clinical condition has improved; it quantifies the minimum change in value that occurs in that patient’s PROM score before this experience occurs. The MCID for the IKDC has a considerable range in its reported value throughout the literature, from 6.3 to 20.5, with variations between procedures/diagnoses measured.^6,8 An MCID of 11.5 has been shown to be a highly sensitive value for a mixed group of knee pathologies, and 20.5 to be a highly specific value.⁸ An MCID score of 15 was then selected to be a compromise between these 2 values, although it is possible to review the results with a smaller or larger MCID score by viewing the presented figures (Figures 1 and 3). With >97% of CAT IKDC scores being within the MCID of the full form IKDC, this CAT model accurately reflects the individual patient’s clinical experience.

There are several limitations with this study. As a retrospective study design, this study is limited in the outcomes it is able to measure. While this study demonstrated a mean question burden reduction of 45.1%, it cannot definitively conclude that the final result is a reduced survey completion time or a higher compliance in question completion. A prospective study design with the CAT IKDC and full form IKDC administered to patients would have been able to reveal if reduced question burden actually resulted in shorter completion times and increased question completion rates or patient satisfaction associated with reduced questionnaire burden. Due to its retrospective design, this study was also unable to concurrently administer another validated knee survey to serve as an anchor or control group for the study. With the results that have been shown here, these prospective study designs may be of value to further validate our CAT model for the IKDC form.

There are also still remaining questions as to the validity of a CAT model for PROM surveys. We demonstrated the accuracy of our CAT model compared with the full IKDC through a variety of methods. Summary statistics, including the mean, standard deviation, and Pearson correlation coefficient, all displayed a strong resemblance between scores. But there remain inherent questions about the CAT model itself. Individual patients are unaware that questions administered via the survey are generated through a set of algorithms; since questions are administered one at a time, they are never presented with the questions deemed unnecessary for completion. This consistency in the overall questionnaire experience between individual patients is important for valid outcomes scores after the administration of PROM surveys.¹⁷ However, this entire model assumes that individual questions are independent of one another by both design and when read through a patient’s perspective. The question order may vary from patient to patient, as the CAT model determines which questions are necessary to administer. There is evidence that the order in which a question is answered can influence responses on subsequent questions on the Short Form-36 survey,²¹ a survey that has shown high correlation with the IKDC form.^12,18 However, the IKDC instrument has been scrutinized in light of item response theory, a key requirement of which is independence of the questions from each other,¹³ and found to be valid.

There are also limitations in our CAT IKDC itself. As previously noted, a multitude of patient diagnoses were included in our patient population, in line with the original intent of the IKDC. Had our training included International Classification of Diseases or Current Procedural Terminology codes, it may have improved the precision of the CAT IKDC for patients with a known diagnosis. This is an area of potential future enhancement. We also noted that several IKDC questions were administered by the CAT IKDC at a higher frequency. Given this information, it may be possible to construct a short form IKDC questionnaire and validate it against the original IKDC form. Without further investigation, it is difficult to comment on such a PROM and how it would compare with our CAT IKDC. Finally, the CAT IKDC relies upon the original IKDC and thus suffers from the same limitations. Given the nature of the IKDC’s design for use across numerous multiple knee pathologies,¹⁵ it may be incapable of detecting subtle, but meaningful, clinical differences for some patients. Our CAT IKDC’s ability to identify such differences is constrained by the limitations of the original IKDC itself.

In conclusion, this study demonstrates that the CAT IKDC model is accurate, reliable, and comparable with the full form, while managing to reduce question burden. This model may serve as a valid alternative for the administration of the IKDC for clinical, research, and administrative purposes. Further studies may be necessary to quantify its ability to reduce time burden on patients or improve response rates. As with all software, there is room for future enhancement.

Supplemental Material

sj-pdf-1-ajs-10.1177_03635465211021000 – Supplemental material for The Application of Computerized Adaptive Testing to the International Knee Documentation Committee Subjective Knee Evaluation Form

Supplemental material, sj-pdf-1-ajs-10.1177_03635465211021000 for The Application of Computerized Adaptive Testing to the International Knee Documentation Committee Subjective Knee Evaluation Form by Donghoon Lee, Somnath Rao, Richard E. Campbell, Otho R. Plummer, Fotios P. Tjoumakaris, Steven B. Cohen and Kevin B. Freedman in The American Journal of Sports Medicine

Footnotes

Submitted October 17, 2020; accepted March 1, 2021.

One or more of the authors has declared the following potential conflict of interest or source of funding: O.R.P. is Chief Scientific Officer of Universal Research Solutions for OBERD. F.P.T. has received consulting fees from Medical Device Business Services and research support from Smith & Nephew and Medtronic USA; and holds stock or stock options in Franklin/Keystone Biosciences and Trice Medical. S.B.C. has received research support from Arthrex and Major League Baseball; consulting fees from CONMED Linvatec and Zimmer; and royalties from Zimmer and Slack Inc. K.B.F. has received consulting fees from DePuy, Vericel, and Medical Device Business Services and education support from Liberty Surgical. AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.

References

Banerjee

Deirmengian

Levicoff

Abboud

Plummer

Courtney

. Accuracy and validity of computer adaptive testing for outcome assessment in patients undergoing total knee arthroplasty. J Arthroplasty. 2020;35(7):1819-1825.

Banerjee

Plummer

Abboud

Deirmengian

Levicoff

Courtney

. Accuracy and validity of computer adaptive testing for outcome assessment in patients undergoing total hip arthroplasty. J Arthroplasty. 2020;35(3):756-761.

Bland

Altman

DG.

Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307-310.

Brodke

Hung

Bozic

KJ.

Item response theory and computerized adaptive testing for orthopaedic outcomes measures. J Am Acad Orthop Surg. 2016;24:750.e4.

Brogan

DeMuro

Barrett

D’Alessio

Bal

Hogue

SL.

Payer perspectives on patient-reported outcomes in health care decision making: oncology examples. J Manag Care Spec Pharm. 2017;23:125-134.

Collins

Misra

Felson

Crossley

Roos

EM.

Measures of knee function: International Knee Documentation Committee (IKDC) Subjective Knee Evaluation Form, Knee Injury and Osteoarthritis Outcome Score (KOOS), Knee Injury and Osteoarthritis Outcome Score Physical Function Short Form (KOOS-PS), Knee Outcome Survey Activities of Daily Living Scale (KOS-ADL), Lysholm Knee Scoring Scale, Oxford Knee Score (OKS), Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC), Activity Rating Scale (ARS), and Tegner Activity Score (TAS). Arthritis Care Res (Hoboken). 2011;63(11)(suppl):S208-S228.

Dawson

Doll

Fitzpatrick

, et al. The routine use of patient reported outcome measures in healthcare settings. BMJ. 2010;340:C186.

Greco

Anderson

Mann

, et al. Responsiveness of the International Knee Documentation Committee Subjective Knee Form in comparison to the Western Ontario and McMaster Universities Osteoarthritis Index, modified Cincinnati Knee Rating System, and Short Form 36 in patients with focal articular cartilage defects. Am J Sports Med. 2010;38(5):891-902.

Grevnerts

Terwee

Kvist

The measurement properties of the IKDC-subjective knee form. Knee Surg Sports Traumatol Arthrosc. 2015;23(12):3698-3706.

10.

Hambly

Griva

IKDC or KOOS: which one captures symptoms and disabilities most important to patients who have undergone initial anterior cruciate ligament reconstruction?

Am J Sports Med. 2010;38(7):1395-1404.

11.

Hambly

Griva

IKDC or KOOS? Which measures symptoms and disabilities most important to postoperative articular cartilage repair patients?

Am J Sports Med. 2008;36(9):1695-1704.

12.

Haverkamp

Sierevelt

Breugem

Lohuis

Blankevoort

van Dijk

CN.

Translation and validation of the Dutch version of the International Knee Documentation Committee subjective knee form. Am J Sports Med. 2006;34:1680-1684.

13.

Higgins

Taylor

Park

, et al. Reliability and validity of the International Knee Documentation Committee (IKDC) Subjective Knee Form. Joint Bone Spine. 2007;74(6):594-599.

14.

Hochheimer

Sabo

Krist

Day

Cyrus

Woolf

SH.

Methods for evaluating respondent attrition in web-based surveys. J Med Internet Res. 2016;18(11):e301.

15.

Irrgang

Anderson

Boland

, et al. Development and validation of the International Knee Documentation Committee subjective knee form. Am J Sports Med. 2001;29:600-613.

16.

Kane

Namdari

Plummer

Beredjiklian

Vaccaro

Abboud

JA.

Use of computerized adaptive testing to develop more concise patient-reported outcome measures. JB JS Open Access. 2020;5(1):e0052.

17.

McHorney

Ware

Jr Lu

Sherbourne

CD.

The MOS 36-item Short-Form Health Survey (SF-36): III. Tests of data quality, scaling assumptions, and reliability across diverse patient groups. Med Care. 1994;32(1):40-66.

18.

Metsavaht

Leporace

Riberto

de Mello Sposito

Batista

LA.

Translation and cross- cultural adaptation of the Brazilian version of the International Knee Documentation Committee subjective knee form: validity and reproducibility. Am J Sports Med. 2010;38:1894-1899.

19.

Padua

Bondi

Ceccarelli

, et al. Italian version of the International Knee Documentation Committee subjective knee form: cross-cultural adaptation and validation. Arthroscopy. 2004;20:819-823.

20.

Plummer

Abboud

Bell

, et al. A concise shoulder outcome measure: application of computerized adaptive testing to the American Shoulder and Elbow Surgeons Shoulder Assessment. J Shoulder Elbow Surg. 2019;28(7):1273-1280.

21.

Selim

Rogers

Qian

Rothendler

Kent

Kazis

LE.

A new algorithm to build bridges between two patient-reported health outcome instruments: the MOS SF-36 and the VR-12 Health Survey. Qual Life Res. 2018;27(8):2195-2206.

22.

Tanner

Dainty

Marx

Kirkley

Knee-specific quality-of-life instruments: which ones measure symptoms and disabilities most important to patients?

Am J Sports Med. 2007;35(9):1450-1458.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.10 MB