Abstract
Background:
Looking up information regarding a medical condition is the third most popular activity online, and there are a variety of web-based symptom-checking programs available to the patient. However, the authors are not aware of any that have been scientifically evaluated as an accurate measure for the cause of one’s knee pain.
Purpose/Hypothesis:
The purpose of this study was to design and evaluate an Internet-based program that generates a differential diagnosis based on a history of knee pain entered by the patient. The hypothesis was that the program would accurately generate a differential diagnosis for patients presenting with knee pain.
Study Design:
Cohort study (diagnosis); Level of evidence, 2.
Methods:
A web-based program was created to collect knee pain history and generate a differential diagnosis for ambulatory patients with knee pain. The program selected from 26 common knee diagnoses. A total of 527 consecutive patients aged ≥18 years, who presented with a knee complaint to 7 different board-certified orthopaedic surgeons during a 3-month period, were asked to complete the questionnaire in the program. Upon completion, patients were examined by a board-certified orthopaedic surgeon. Both the patient and physician were blinded to the differential diagnosis generated by the program. A third party was responsible for comparing the diagnosis(es) generated by the program with that determined by the physician. The level of matching between diagnoses determined the accuracy of the program.
Results:
A total of 272 male and 255 female patients, with an average age of 47 years (range, 18-84 years), participated in the study. The median number of diagnoses generated by the program was 4.8 (range, 1-10), with this list containing the physician’s diagnosis(es) 89% of the time. The specificity was 27%.
Conclusion:
Despite a low specificity, the results of this study show the program to be an accurate method for generating a differential diagnosis for knee pain.
Knee pain affects nearly 1 in 5 adults in the United States every year. 2 For patients interested in learning more about knee problems, there is seemingly an unlimited amount of information available on the Internet. However, for those seeking a diagnosis for their knee pain, accurate diagnostic information is difficult to obtain. Looking up information regarding a medical condition is the third most popular activity online, and there are a variety of web-based symptom-checking programs available to the patient. 5 However, we are not aware of any knee diagnostic programs that have been scientifically evaluated.
The purpose of our study was to design and evaluate an Internet-based program that generates a differential diagnosis based on a history of knee pain entered by the patient. The overall goal of the program was to provide an accurate diagnostic tool to assist patients with determining the cause of their knee pain. Our hypothesis was that the program would be able to accurately generate a differential diagnosis for patients presenting with knee pain.
Materials and Methods
Approval for this study was granted by the Health Sciences Institutional Review Board at The State University of New York at Buffalo. We created a web-based program to collect knee pain history and generate a differential diagnosis for ambulatory patients with knee pain. The program selects from 26 common knee diagnoses (Table 1). These diagnoses were chosen because they are the most common diagnoses seen among new patient encounters in the corresponding orthopaedic practice. Accuracy was tested on 615 consecutive patients presenting to 7 different board-certified orthopaedic surgeons during a 3-month period (December 2012–March 2013) with knee pain complaints. Of these, 46 patients were excluded because they were under 18 years of age, 29 because there was no progress note, and 13 because there was no clinical diagnosis recorded. Therefore, data for 527 patients were analyzed for this study. No patients declined an invitation to participate in the study.
26 Diagnoses Possibly Generated by the Program
Patients over the age of 18 years who presented to the office during a 3-month period with a knee complaint were asked to complete the questionnaire in the program. About one half of the patients completed the questionnaire in the office before being examined by the physician, while the other half completed the program at their home within days of being seen in the office. Questions included age, sex, history of injuries in cases of traumatic onset of pain, location of pain, and prior treatments. All questions can be seen in the Appendix (available online at http://ajsm.sagepub.com/supplemental). During their office appointment, patients were examined by a board-certified orthopaedic surgeon, who performed history taking, a physical examination, and imaging studies as part of a standard procedure. The physician provided a clinical diagnosis as well as treatment options based on his findings, which were also part of the standard procedure. Both the patient and physician were blinded to the differential diagnoses generated by the program. A third party was responsible for comparing the diagnosis(es) generated by the program with that determined by the physician. The level of matching between diagnoses determined the accuracy of the program.
The algorithm used in the program is one that generates a set of secondary questions based on responses to the primary set of questions. For example, if a patient indicated a traumatic onset of pain, the program would generate questions to further categorize the type of trauma to help narrow the differential. Each diagnosis within the scope of the program has a set of criteria that must be met in order for it to be generated as part of a differential. Some criteria include mechanism of injury, location of pain, and aggravating factors. If the patient’s answers match a certain number of the criteria needed to generate a given diagnosis, then that diagnosis will show up in the program’s differential for that patient. The more criteria met for a given diagnosis, the more likely that diagnosis becomes.
For the purposes of analysis, the 26 diagnoses were consolidated to 21. First, osteoarthritis and osteoarthritis exacerbation were considered a single diagnosis. For example, if the clinical diagnosis was osteoarthritis, and the program generated osteoarthritis exacerbation, this was considered an accurate match. The rationale for combining osteoarthritis and osteoarthritis exacerbation was because they are symptomatically the same diagnosis, with the only difference being a traumatic onset of pain in the case of osteoarthritis exacerbation. Also, they are treated the same, that is, with nonsteroidal anti-inflammatory drugs (NSAIDs), physical therapy, injections, or some combination of these treatments. The same logic was used for patellar arthritis and patellar arthritis exacerbation. Additionally, 4 of the diagnoses generated by the program were grouped together as “patellofemoral pain” and considered 1 diagnosis. These included patellar chondromalacia/patellofemoral syndrome, patellar contusion/saphenous nerve contusion, plica syndrome, and trochlear chondromalacia. The rationale for combining these diagnoses was because the treatment approach to these 4 diagnoses is the same (rest, ice, NSAIDs, and physical therapy). The influence of combining them on the next steps in the care of that patient is therefore likely unchanged compared with if they were kept as separate diagnoses. Therefore, the accuracy of the program was tested using 21 possible diagnoses.
For patients with a clinical diagnosis that did not match any of the diagnoses generated by the program, the history as entered into the electronic medical record was reviewed and divided into 4 categories based on the following criteria:
Category 1: The clinical diagnosis was not a diagnosis that could be generated by the program.
Category 2: User error: The patient inputted their history incorrectly; however, the diagnosis would have been correct if the history had been inputted correctly. A designation of incorrect history was assigned when the history inputted by the patient did not match the history recorded by the attending orthopaedic surgeon during the patient’s visit.
Category 3: Program error: The patient inputted their history incorrectly; however, the diagnosis would have remained incorrect even if the history had been inputted correctly.
Category 4: Program error: The patient’s input was correct; however, the program missed the diagnosis.
The 4 categories represent a total of 112 cases in which the clinical diagnosis was not included within the program’s differential for that patient. Category 1 represents the cases in which the final clinical diagnosis was not among the 26 available computer options, while categories 2, 3, and 4 comprise the total number of missed diagnoses in which the final clinical diagnosis was among the 26 diagnoses capable of being generated by the program. An incorrect history was allocated as user error and was assigned when the history inputted in the program by the patient did not match the history recorded by the attending orthopaedic surgeon during the patient’s visit. The history recorded by the attending orthopaedic surgeon was considered the correct history. These patients were further analyzed to determine if the program would have generated the correct diagnosis if their history had been inputted correctly (category 2) or if the program would have missed the diagnosis even if the history had been inputted correctly (category 3). Category 1 was not considered when determining the sensitivity and specificity of the program. This was done so that the program would not be penalized for missing a diagnosis that it was not capable of generating by design. Therefore, categories 2, 3, and 4 were used when calculating the sensitivity and specificity of the program.
By design, when the program generates an anterior cruciate ligament (ACL) tear as one of the diagnoses, it will not also generate a meniscal tear, even if the criteria for diagnosing a meniscal tear are otherwise present. There is a known difficulty in clinically diagnosing a meniscal tear in the setting of an acute ACL tear. 8 Therefore, when the clinical diagnosis included an ACL tear and a meniscal tear, only the ACL tear was considered for scoring purposes.
For any patient in whom the clinical diagnosis did not match the program’s differential diagnosis and that patient underwent a follow-up with magnetic resonance imaging (MRI), the MRI results were used to determine the true diagnosis for that patient. This occurred approximately 21 times during the study and was done because if the clinical suspicion was a meniscal tear, for example, which was later ruled out by MRI, the program would not be penalized for accurately omitting a meniscal tear diagnosis. Conversely, if there was nothing in the history suggestive of a meniscal tear, yet one was found on MRI, it was considered an incidental/asymptomatic finding and was not included as a clinical diagnosis during analysis. Magnetic resonance imaging was not used in cases in which the clinical diagnosis was included in the program’s differential because if these diagnoses matched, then something in the history indicated the diagnosis to be present in the patient. Therefore, if that diagnosis was later ruled out by MRI, the program would not be penalized for suspecting the diagnosis based on the history if the physician also suspected the diagnosis based on the history and physical examination.
The sensitivity and specificity of the program were calculated using Microsoft Excel 2010 (Microsoft Corp). Sensitivity was defined as the ability of the program to generate the physician’s diagnosis as part of its differential. The sensitivity of the program was calculated as the total number of matches divided by the number of clinical diagnoses. The specificity was defined as how often a given diagnosis produced by the program was the correct diagnosis. Specificity was calculated as the total number of matches divided by the number of times that a given diagnosis was produced by the program.
Results
There were 272 male and 255 female patients, with an average age of 47 years (range, 18-84 years) analyzed for the study. The most common diagnoses were osteoarthritis (n = 204), meniscal tear (n = 188), patellofemoral-generated pain (n = 145), patellar arthritis (n = 73), and ACL tear (n = 55). The median number of diagnoses generated by the program was 4.8 (range, 1-10), with this list containing the physician’s diagnosis(es) 89% of the time. In the case of the total program, there were 674 matches of a total of 758 clinical diagnoses, for an overall sensitivity of 89%. Additionally, there were 674 matches of a total of 2512 diagnoses in the program’s differential, for an overall specificity of 27%. The sensitivity and specificity for each individual diagnosis were calculated in the same manner as the total program and can be seen in Table 2.
Sensitivity and Specificity of Each Individual Diagnosis
A summary of the reasons that a diagnosis was missed can be seen in Table 3. For category 1 errors (ie, diagnoses not generated by the program because they are not contained in the program’s list of potential diagnoses), the clinical diagnoses included fractures, lacerations, loose bodies, ganglion cysts, and residual pain from prior surgery as seen in Table 4. For category 2 and 3 errors, discrepancies between the inputted history and that recorded by the attending orthopaedic surgeon included differences in the location of pain, history of trauma, or lack of trauma as well as mechanism of injury. Table 5 displays the number of times that a given diagnosis was missed because of a program error.
Summary of Reasons That a Diagnosis Was Missed
List of Category 1 Diagnoses
Number of Times That a Diagnosis Was Missed Because of a Program Error
If category 1 errors were considered when calculating the sensitivity and specificity of the program, the total number of clinical diagnoses would have been 786, the total number of diagnoses generated by the program would have been 2616, and the number of matched diagnoses would have remained 674. Therefore, using the same formulas as mentioned in the materials and methods section above, if category 1 errors were considered when calculating the sensitivity and specificity of the program, the sensitivity would have been 86%, while the specificity would have been 26%.
Discussion
Our hypothesis was supported by the results of the study. We found that an Internet-based program was able to generate a differential diagnosis for knee pain in 527 consecutive ambulatory patients that was 89% sensitive and 27% specific.
The goal of the program was to provide a differential for the patient, not a single diagnosis. A differential diagnosis in any field of medicine is meant to have a high sensitivity; one will never diagnose a disease if it is not thought of as a possibility. This differential can change throughout the course of a diagnosis, such as when more testing is performed. A differential, however, will not necessarily have a high specificity, and in fact, specificity will often be low. This is simply based on the idea that a differential is meant to be a list of diagnoses that seem appropriate for the patient being treated, even though usually 1 diagnosis is accurate. Based on this fact, we expected the program to have a lower specificity than if it were meant to provide the single most likely diagnosis for that patient.
According to the latest study, 81% of adults in the United States use the Internet; of those users, 72% have used it to directly search for medical information, making it one of the most popular online activities (behind e-mail and online shopping).4,5 Furthermore, 1 of every 3 adults in the United States have used the Internet to look up information regarding a medical condition that they or someone they know may have. 4 Users will navigate through many sites researching medical information, and once their search is exhausted, patients will often consult a physician as their second opinion to an already suspected diagnosis. 6
This approach to health care can be ineffective and even dangerous to the patient. Patients are often not qualified to interpret the information displayed because of many factors including a limited perspective, inexperience, and a general lack of medical knowledge. 6 The answers that they find may influence the choices that they make, oftentimes without consulting a health care professional. With the enormous amount of information available, the patient may often determine which is relevant to their problem. Our goal was to simplify this task by creating a diagnostic program to provide the user with a focused differential as well as accurate information regarding their knee pain.
Previous studies have assessed the accuracy of web-based programs in diagnosing disease. One evaluated the accuracy of WebMD for the diagnosis of otolaryngological (ENT) disease. 3 Information for 61 patients previously diagnosed with an ENT disease was entered into WebMD to determine its capability to correctly diagnose these patients. According to the study, WebMD generated the correct diagnosis 70% of the time, with a median number of 13 diagnoses (range, 1-20) being generated per patient. Importantly, WebMD generated an average of 3 inappropriate diagnoses per patient, and only 11% of the patients had no inappropriate diagnoses generated at all. An inappropriate diagnosis was defined as a diagnosis that was unlikely based on the combination of symptoms. For example, symptoms of earache and hearing loss being diagnosed as thyroid cancer would be considered an inappropriate diagnosis. 3 Our study was not designed to categorize diagnoses as inappropriate, so we are unable to comment on the number of inappropriate diagnoses in our cohort.
Another recent study looked at the ability of 4 smartphone applications (apps) designed to evaluate photographs of skin lesions and provide the user with feedback regarding the likelihood of malignancy. 9 Three of 4 apps missed at least 30% of melanoma diagnoses, and it was concluded that they were dangerous to the consumer because 30% of users with malignant melanoma were falsely assured that their lesions were benign and would therefore not seek medical advice. 9 Similar results were seen in a study using the Internet for the detection of undiagnosed patients with systemic lupus erythematosus and rheumatoid arthritis, with the authors concluding that the Internet could possibly delay affected patients from seeking medical attention. Reasons cited for the possible delay included an advanced reading level used by the websites and a general lack in the ability of these sites to provide a focused differential diagnosis for the users. 7 It is clear that patients are using the Internet to research medical information; however, there seems to be a lack of Internet sites that supply the patient with accurate diagnostic information that is pertinent to their symptoms. With so many patients researching medical information online, an accurate and trusted site seems warranted. Our program allows the user to accurately narrow the cause of their knee pain to 4 or 5 most likely causes, which could then be linked to informative content that they can use to learn about their current problem.
This program is not meant to direct provider care in any way; however, it could have a positive influence on the future care of knee injuries. It may lead to more informed patients asking direct, pertinent questions rather than questioning extraneous information that they found on the Internet, which could lead to more efficient office visits. In addition, it is certainly plausible for a patient to be encouraged to see a physician based on the results of the program’s differential, whereas the pain may have been ignored and a physician never consulted if the research stopped after endless searches through various web pages. This may allow more patients who are suffering from knee pain to come to clinical attention earlier, possibly reducing the likelihood of chronic knee pain issues in certain circumstances. However, patient behavior in response to the information displayed was not studied, and therefore, we cannot comment on the ability of the program to direct patients to seek or not to seek physician care because all patients were seeing a physician regardless of the results of the program.
One advantage to having a computerized diagnostic program is that it is not subject to human error such as fatigue or inattention. 1 At least 15 times during our analysis, we would find a clinically diagnosed meniscal tear; yet, nothing in the recorded history suggested a meniscal tear. Consequently, the program would not generate a meniscal tear as part of its differential. In these cases, the clinical diagnosis of a meniscal tear was frequently disproven at a later date based on the results of MRI, emphasizing the importance of careful history taking. One possible disadvantage of a computerized program could be the lack of human interpretation of the patient’s complaint. The finite list of answers from which the patient has to choose when answering a certain question may inadvertently lead to a wrong diagnosis being generated by the program, whereas the physician may be able to more accurately interpret what the patient is trying to describe, yielding a more accurate diagnosis. Another disadvantage is the possibility of deterring a patient from seeing a physician based on the differential provided by the program. Based on the design of the study, it is unknown how many patients would have been discouraged from seeing a physician when a visit was in fact needed.
After analyzing the study findings, it was seen that discrepancies between the inputted history and the history told to the physician contributed to missed diagnoses. As seen in Table 3, 42 of 84 of the total missed diagnoses, excluding the situations in which the final diagnosis was not among the available computer options (category 1), had some discrepancy between the inputted history and that told to the physician (categories 2 and 3); 30 of 42 of those directly resulted in the wrong diagnosis being generated (category 2). This may have been caused by some questions not being properly understood by the user, specifically with regards to the location of pain. Inconsistent location of pain was seen as a cause of discrepancy at least 21 times, with 16 of them resulting in an incorrect diagnosis being generated by the program. Future improvements to the program include methods to increase the user’s ability to precisely localize his or her pain. The other missed diagnoses (categories 3 and 4) did not have as clear a reason for being missed. One explanation could be because of an atypical presentation of a specific diagnosis. Meniscal tears and patellofemoral pain were the most commonly missed diagnoses likely because of being among the most commonly diagnosed problems, 188 and 145 times, respectively, for new patients, second only to osteoarthritis/osteoarthritis exacerbation, which was diagnosed clinically 204 times.
Limitations of the study include the use of the program by a group of patients presenting to an orthopaedic practice rather than a primary care physician. The results therefore should not be generalized beyond this group, as including patients in a primary care clinic may influence the results in an unknown way. Furthermore, our patients were all seeing a physician for their knee problem, and the results of the diagnostic program may not be generalized to a different population, such as those with knee pain who have not decided to see a physician. Another is the inclusion of MRI findings only in those patients who underwent MRI and did not have agreement between the physician and the program, which may have biased the results in favor of the program to some degree. We found this to most commonly occur when the physician ordered the study to rule out a meniscal tear, even though the history may not have seemed suggestive of this diagnosis. It is probable that some patients who were considered to have a meniscal tear on physical examination could have undergone MRI at a later date that would not show a meniscal tear. It is also possible that patients who were not thought to have a meniscal tear on initial evaluation, and therefore did not return to the office, would have had a meniscal tear if MRI had been performed. Both of these instances could have skewed our results in an unknown way. Another limitation of the study is not including category 1 errors when determining the sensitivity and specificity of the program. The diagnoses included in category 1 are generally much less common or often do not need a physician or a computer program to make the diagnosis. For example, if a patient has a laceration, it is fairly obvious to determine that there is a laceration present without the need for a physician. Likewise, if the patient recently had undergone surgery, residual pain from that surgery should be fairly simple to deduce. These were excluded from analysis so that the program would not be penalized for missing a diagnosis that it was not capable of generating by design. This does, however, bias the results of the study in favor of the program to a small degree as well as exemplify the limits of a program with a finite number of diagnoses. Finally, the program was created by one of the physicians involved in the study, and he evaluated approximately one seventh of the patients. His knowledge of the logic behind the program may have skewed his evaluation of the patients that he saw, artificially elevating the accuracy of the program to a mild degree.
In conclusion, we tested the ability of a web-based program to provide an accurate differential diagnosis of knee pain in orthopaedic outpatients. The program generated a median of 4.8 diagnoses per patient, with 89% sensitivity and 27% specificity in 527 consecutive patients.
Footnotes
One or more of the authors has declared the following potential conflict of interest or source of funding: This study was funded by the Ralph and Mary Wilson Fund.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
