Abstract
As physicians strive to provide evidence-based care, challenges arise if different entities disseminate divergent Appropriate Use Criteria (AUC) or clinical guidelines on the same topic. To characterize these challenges in one field, this study reviews the literature on comparisons of clinical recommendations regarding medical imaging. The PubMed database was searched for the years 2013-2018 for studies describing discordance among clinical recommendations regarding the performance of imaging. Of the 406 articles identified, 15 met the selection criteria: 8 qualitative and 7 quantitative. Reasons for discordance varied, with lack of evidence often cited. Quantitative studies often found that different decisions would be reached depending on the clinical recommendation followed. Nonetheless, quantitative studies also tended not to consider one set of recommendations superior to another. The findings of this review might help clinicians seek guidance more thoughtfully and could inform use of guidelines and AUC for quality improvement and clinical decision support.
Introduction
Physicians strive to provide evidence-based care in accordance with the clinical recommendations that they have received from professional societies, the government, and the clinical literature. The research team believes that one challenge that physicians may face in doing so is determining which recommendations to follow as different entities have developed divergent Appropriate Use Criteria (AUC) and clinical guidelines. For instance, the team has observed that the American College of Cardiology Foundation (ACCF) and the American College of Radiology (ACR) have both developed AUC for nuclear myocardial perfusion imaging as this form of imaging is performed by both cardiologists and radiologists.1,2 To explore the frequency with which physicians face conflicting clinical recommendations, the team decided to explore the literature. Because physicians are increasingly being compensated on the basis of providing quality care, and clinical recommendations are being codified within clinical decision-support systems, lack of uniform definitions of what constitutes quality care may take on new importance. 3
Although variation in clinical recommendations has the potential to impact all specialties, radiology is particularly affected, as a wide variety of specialties utilize radiology, driving multiple professional societies to create discordant clinical recommendations. Furthermore, because the Protecting Access to Medicare Act (PAMA) calls for the use of AUC when ordering advanced diagnostic imaging services and creates a process through which organizations can develop approved AUC for others to follow, there are a growing number of government-sanctioned AUC applicable to radiology. 3 Multiple organizations are being sanctioned to create AUC for the same clinical scenarios. 4
This study reviews the literature on comparisons of clinical recommendations regarding medical imaging. The research team conducted a systematic review of the biomedical literature for investigations that compared the degree of discordance between AUC and clinical guidelines regarding the indications for imaging studies. The primary objectives of this study were to highlight the nature of the differences found in the literature, the reasons for those differences, and how the differences may in some cases impact clinical decision making.
Methods
Search Strategy
On May 23, 2018, PubMed was searched for all potentially relevant articles published in the past 5 years. Articles likely to be about clinical recommendations were identified through the MeSH term practice guidelines as topic (Major Topic) or by the presence in the Title/Abstract of the terms appropriateness or criteria. The presence in the Title/Abstract of any of the following terms was required to identify articles likely to involve a comparison: comparison, compare, contrast, agree, agreement, concordance, disagreement, or discordance. Finally, articles comparing issues related to imaging were identified through the MeSH term cardiac imaging techniques (Major Topic) or by the presence of any of the following terms in the Title/Abstract: imaging, radiology, radiography, X-ray, computed tomography, CT, magnetic resonance imaging, MRI, or ultrasound.
Inclusion Criteria
Articles were considered for inclusion if they contained a comparison of clinical recommendations related to indications for imaging. Articles were only considered to relate to comparisons of clinical recommendations if they addressed recommendations endorsed or sponsored by different organizations or government agencies. All comparisons of recommendations addressing the clinical circumstances under which imaging should be used for purposes of treatment planning, diagnostic testing, screening, or prognosis were considered.
Exclusion Criterion
Articles were excluded from the sample if they did not at least partially reflect a US perspective, evidenced by inclusion of at least 1 US-based clinical recommendation in the comparison, at least 1 US-based author, or a US test population. Articles were excluded if they compared a new and an old version of a single guideline. Comparisons of recommendations regarding the specific imaging findings that should guide a clinical decision also were excluded. For example, a study that compared 2 clinical recommendations containing different anatomical definitions of a large pneumothorax was excluded on the grounds that the recommendations being compared did not differ in their stance on whether imaging should be performed.
Classification as Quantitative Versus Qualitative
Studies were categorized as quantitative or qualitative according to whether or not they made a numeric comparison regarding how different clinical recommendations impacted care.
Results
Of the 406 articles returned by the search, 15 met the selection criteria.1,2,5-17 In the articles reviewed, recommendations differed by countries and across different organizations with similar missions within a country. In 8 comparisons, the clinical recommendations being compared originated in differing countries. In 3 comparisons, there were clinical recommendations from different organizations within 1 country. In 4 comparisons, there were a combination of organizational and national differences contributing to the existence of multiple clinical recommendations addressing an issue.
The comparisons reported in the selected articles covered a diverse set of clinical situations: screening (4 articles), diagnosis (9 articles, 2 of which also looked at recommendations for post-diagnosis monitoring), and treatment planning (2 articles). The imaging modalities featured in the articles were diverse as well, with several featuring multiple modalities. Computed tomography (CT), radiography, and ultrasonography were the most commonly discussed modalities. Seven articles included quantitative analyses of the potential impact on clinical decision making. A summary of the findings regarding the characteristics of the differences is presented in Table 1. Table 2 provides information on the clinical details of the differences.
Summary of Findings From the Review.
Abbreviation: n/a, not applicable.
Summary of the Differences Described in the Studies.
Nature of Differences
Several of the articles highlighted the fact that guidelines and AUC can differ in their approach to using imaging to determine whether intervention is appropriate. For instance, differences in approach were seen in a comparison of 2 AUC for total knee replacement, wherein one based its criteria on radiographic changes, whereas the other based its criteria on patient-reported data. 5 Likewise, in a review of different sets of criteria for total hip or knee arthroplasty, there was variation in whether radiographic changes, in addition to symptoms and function, should be part of indication criteria. As all of the studies reviewed were based on level IV evidence (case series evidence, with no control or comparison group), the scientific basis of the recommendations was uniformly poor. Clinical recommendations were based on expert opinion rather than empirical data. 6
AUC also may differ in their granularity. The AUC for myocardial perfusion imaging released by the ACR and the ACCF differ in the number of clinical scenarios they present. A number of scenarios that are described in the ACCF guidelines lack an appropriateness rating under the ACR Appropriateness Criteria.1,2
Guidelines can differ in terms of the parameters that define recommendations for delivery of a health care service. Screening mammography has been particularly controversial, with some organizations recommending annual screening, some recommending biennial screening, and some recommending a combined approach that varies with age. Mammography screening recommendations further vary in their prescribed starting age and stopping age. 7 Starting age and/or frequency of monitoring also varied in other comparisons of screening guidelines.10,12
Disease-centered guidelines may vary in the extent to which they recommend follow-up imaging. The hepatocellular carcinoma guidelines used by the Japanese Society of Hepatology (JSH) differ from the guidelines of the American Association for the Study of Liver Diseases (AASLD) in that the JSH guidelines recommend that super-high-risk patients undergo dynamic CT or magnetic resonance imaging (MRI) every 6 to 12 months, even in the absence of evidence of small hepatocellular carcinomas visible on ultrasound. AASLD guidelines do not mention supplementing ultrasonography with CT or MRI in patients without visible carcinomas. Some of the differences between the two guidelines may have been the result of the incorporation of a more expansive evidence base by the JSH guidelines. The JSH guidelines were written after the AASLD guidelines and included a review of non-English articles as well as a key randomized controlled trial that was published after the AASLD guidelines. 8
Results of Quantitative Analyses
Of the 15 studies included, 8 were qualitative and 7 were quantitative.1,2,5-17 Quantitative studies used a variety of methods and tended to find that the clinical conclusions and/or treatment decisions reached would depend on which clinical recommendation was followed. Nonetheless, quantitative studies also tended not to consider one set of recommendations superior to another.
Retrospective comparisons of the performance of clinical guidelines in classifying care as appropriate or potentially inappropriate were made by some of the studies examined. As previously mentioned, 2 studies compared AUC for myocardial perfusion imaging from the ACCF and ACR. The first study used a prior version of the ACCF’s criteria, and the second used the current version.1,2 In both studies, the authors classified each case using both AUC, and then calculated a κ statistic to assess the level of agreement. The κ statistic calculated for each study showed agreement to be poor. The first study found that clinically confirmed ischemia was rare among patients rated as inappropriate for myocardial perfusion imaging according to the ACCF AUC, but more common among those rated as “inappropriate” according to the ACR Appropriateness Criteria. 1 The later study likewise found differences in the rates that studies considered appropriate by the 2 AUCs led to clinically abnormal findings. 2 Retrospective classification, followed by the calculation of a κ statistic, likewise was the methodology followed by the comparison of the AUC for total knee replacement, which also found poor agreement between the AUCs. 5
The use of guidelines with differing emphases can cause variation in whether or not patients are diagnosed with disease. One article compared 2 sets of guidelines for the diagnosis of chronic pulmonary aspergillosis (CPA) by retrospectively applying them to a cohort of patients who had previously been diagnosed with aspergillosis. One set of guidelines emphasized the need for imaging in addition to mycological evidence, whereas the other provided more detailed mycological criteria. The authors found that although 83% of patients would have been diagnosed with CPA by at least 1 of the guidelines, only 70% of patients would have been diagnosed with CPA by both the guidelines. 13
Two articles illustrated the trade-offs that professional societies make between competing goals. When confirmed diagnoses are available, the sensitivity and specificity of guidelines can be used to compare their accuracy. One study compared the prognostic performance of 5 different guidelines for predicting vesicoureteral reflux in children after first febrile urinary tract infection. When predicting vesicoureteral reflux grades I to V, vesicoureteral reflux grades II to V, and renal scarring, the set of guidelines with the greatest sensitivity had the lowest specificity. 14 Another study used computer modeling to project the implications of different mammography guidelines in terms of metrics such as the total number of examinations, the number needed to screen per death averted, the number of resulting benign biopsies, and the life years gained per benign biopsy. The guideline associated with the greatest number of deaths averted also happened to be the one resulting in the most examinations and benign biopsies. 7
Although most of the quantitative analyses did not state that one clinical recommendation was definitively more valid than others, one study suggested that one guideline’s methodology was superior to other approaches. The study compared preeclampsia screening methodologies endorsed by the Fetal Medicine Foundation (FMF), the National Institute for Health and Care Excellence (NICE), and the American College of Obstetricians and Gynecologists (ACOG). ACOG and NICE evaluate risk for preeclampsia by using a series of risk factors from maternal medical histories, whereas FMF utilizes an algorithm that combines medical histories with biochemical and biophysical measurements. The study concluded that FMF’s methodology was superior in predicting preeclampsia. This likely was the case because the FMF methodology incorporated a larger set of predictors than did the 2 methodologies to which it was compared. 9
Discussion
The frequent discordance that exists between clinical recommendations is in part a product of how various clinical recommendations are developed. In some cases, as was the situation with the AUC for myocardial perfusion imaging, guidelines were developed by different societies representing 2 different specialties.1,2 In other cases, as was the situation with the guidelines for hepatocellular carcinoma, professional societies representing similar clinical areas, but based in different countries, independently reached differing conclusions. 8 Some of the research examined guidelines from a mix of stakeholders, such as patient advocacy organizations, governmental organizations, and professional societies. One article compared a method for screening for preeclampsia proposed by the FMF, a British charity whose mission is to improve the health of pregnant women and their babies, with the guidelines recommend by NICE, a component of the Department of Health in the United Kingdom, and ACOG, a US-based professional society. 9
When clinical recommendations vary, it is not always possible to say that one set of recommendations is superior to another. Variation may exist for a number of reasons, including a lack of scientific evidence, trade-offs between sensitivity versus specificity, and conceptual differences in how the issue at hand is being examined. A lack of definitive evidence was repeatedly cited as a reason for clinical guideline discordance and was mentioned in 7 of the 15 studies.6,10-12,15-17 In such situations, clinical recommendations are typically based on expert consensus, which can vary across different sets of experts.
Public Policy and Clinical Guideline Discordance
To help inform decisions regarding the selection of AUC for use in clinical decision-support systems, a framework with high interrater reliability has been developed for grading the strength of evidence behind AUC for diagnostic imaging examinations. 18 Given that multiple organizations with overlapping areas of interest have been empowered to develop AUC by PAMA, physicians will increasingly need to thoughtfully select which AUC or guidelines they will follow in particular situations, rather than defaulting to a single option.
More than 4 decades of research has shown that there is regional practice variation. 19 Prior researchers have found that the use of diagnostic imaging services varies according to region and that some regions are more intense utilizers of services.20,21 As many of the Provider-Led Entities approved by PAMA to create diagnostic imaging AUC are academic medical centers situated in different parts of the country, spanning from the University of California, to the University of Texas, to the University of Pennsylvania, it is likely that some degree of regional variation will be embedded in the AUC that are developed. The creation of new AUC by regional Provider-Led Entities is likely to increase the number of overlapping evidence-based AUC available for various domains of diagnostic imaging. 4
Because health plans are required under the Affordable Care Act to provide coverage without co-pay for services receiving a grade “A” or “B” recommendation from the United States Preventive Services Task Force, its clinical recommendations have a disproportionate impact on American health care. 22 This has brought particular attention to the discordance in clinical recommendations regarding mammography, for which professional societies have proposed a number of different recommendations. 7 It appears that one of the issues with all guidelines related to mammography is that there is a continuous increase in cancer detection rates as the age of women screened increases, whereas the guidelines treat the detection rate as if it is discontinuous so that women may be grouped into age bands. 23
Strong differences of opinion on what constitutes appropriate care occur even within clinical guidance organizations, such as Cochrane. One member of the board of Cochrane publicly criticized the findings of a Cochrane review and was subsequently voted to be expelled from the organization. 27 In response to the expulsion, 4 of the 12 members of the Cochrane Governing Board resigned. 24 Just as it can be difficult to achieve consensus across organizations, there can be a lack of consensus within them as well.
The need to codify best practices within clinical decision-support systems along with the increasing role of artificial intelligence in the care process has brought increased attention to the issue of discordance between clinical recommendations. It has been suggested that Watson for Oncology, an artificial intelligence system designed to provide cancer care recommendations using data and guidance from the Memorial Sloan Kettering Cancer Center in New York, contains bias in its recommendations, in that by design, it reflects care that would be delivered at one particular, highly-resourced setting. 25 As additional regional organizations develop AUC, concerns about generalizability may persist.
Limitations
Although this study examined the literature returned by a specific search, it is likely that additional articles describing variation in AUC or guidelines exist. No attempt was made to search the gray literature. Nonetheless, the 15 articles analyzed highlight the extent to which AUC and guideline discordance impacts a broad variety of imaging applications.
Conclusions
Different organizations seeking to address similar clinical issues often develop discordant clinical recommendations. Lack of scientific evidence was frequently cited as the reason in the comparison studies reviewed herein. Quantitative analyses have shown that different approaches to characterizing clinical scenarios, differences in reliance on imaging versus pathological evidence, and different choices between goals, such as sensitivity versus specificity, lead to different clinical decisions. An awareness of these differences and their consequences could help clinicians seek guidance more thoughtfully and could inform the integration of guidelines and AUC into quality improvement programs and clinical decision-support systems.
Policy Implications
Understanding discordance in clinical recommendations is now of particular importance, as multiple organizations are creating potentially conflicting recommendations under the auspices of the Centers for Medicare & Medicaid Services (CMS) as a consequence of PAMA. As PAMA moves forward, it is important that policy makers be conscious of the lack of consensus that exists in some areas of practice and craft policies that take clinical recommendation discordance into consideration. Given that the funding for the National Guideline Clearinghouse has ended, there may be a need for both the public and private sectors to develop new approaches for disseminating clinical recommendations and examining the discordances between recommendations that exist. 26 Furthermore, CMS and privately-funded health plans may wish to invest in research examining how the selection of clinical recommendations used to guide decision making impacts clinical outcomes in contexts wherein multiple recommendations exist. By creating a better understanding of how discordance in clinical recommendations is impacting outcomes, it will be possible to improve the quality of care.
Footnotes
Declaration of Conflicting Interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Drs Powell, Shanser, Deshmukh, and Rao report an employment or consulting relationship with HealthHelp/WNS at the time the study was written. Ms Rogstad and Mr Long report employment by Humana Inc. Drs Powell, Shanser, and Rao and Dr Powells’s wife, Jennifer Powell, report unpaid roles serving the Synergetic Professional Guidelines Institute, a nonprofit Provider-Led Entity (PLE). Dr Powell additionally reports employment by Payer+Provider Syndicate, stock ownership of Berkshire Hathaway, Community Health Systems, CVS Health Corp, HCA Healthcare, Payer+Provider Syndicate, Quorum Health Corp, and Tenet Healthcare Corp. Dr Winchester additionally reports serving on the American College of Cardiology Accreditation Management Board and the American College of Cardiology Solution Set Oversight Committee. Dr Deshmukh additionally reports stock ownership of WNS Holdings, Johnson & Johnson, Merck, Pfizer, Express Scripts, Halyard Health Inc., Cigna, and Proctor & Gamble. Dr Rao additionally reports having served as the President of the Radiological Society of North America at the time the study was written.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
