Improving Wisely Using Physician Metrics

Abstract

Unwarranted clinical variation at the physician level is a major barrier to quality improvement, and reducing this variation remains the holy grail of health care. Yet quality improvement efforts have focused largely on measuring quality at the hospital level, not at the physician level. The problem with hospital-level metrics is that they are less relevant to physicians and, as a result, are less likely to translate into changes in physician behavior. In this Commentary, we propose a path to developing and using clinician-endorsed metrics at the physician level so that data sharing can be highly actionable.

One common problem hospital leaders encounter is diffusion of responsibility, which is related to the bystander effect. Outlier clinicians can perceive that the problem is not theirs or simply disagree with the metric or its definition. For example, if a clinician learns that his or her organization has a higher-than-average rate of complications or readmissions, he or she may perceive that the result is related to another clinician’s outlier performance, or that the metric is inadequate or inappropriate for their clinical setting. This effect grows as the quality improvement metric aggregates practice patterns or outcomes from more and more physicians. Consider a rude, outlier flight attendant who works for a large airline. Would we expect that flight attendant to change his or her behavior when shown data on how the airline as a whole performs below the industry average on customer satisfaction surveys? Surely feedback specific to the flight attendant, showing his or her individual outlier data benchmarked to peers, would have a far more dramatic and immediate impact on his or her performance. Diffusion of responsibility may explain why pay-for-performance schemes that measure hospital performance have been shown to have a limited impact on quality.¹

Several industries outside of health care have effectively used individual feedback to improve outcomes. In 2009, the utility company Positive Energy/Opower successfully reduced overall household power consumption by decreasing variation among households.² They mailed each household a monthly or quarterly personal feedback report that compared their electricity and natural gas usage to that of similarly sized households in their area—a data feedback intervention that resulted in an overall mean reduction in household energy use. In particular, households at the extremities of the energy use bell curve modified their behavior to more closely match the mean usage of houses within their respective communities. This simple intervention also reduced the total carbon emissions of the participating houses by an equivalent of 14.3 million gallons of gas and saved more than $20 million over the yearlong study.²

Early results from analogous large-scale peer comparison studies in health care also have shown promising results. In one study of 47 clinical practices, 248 physicians were randomized to receive physician-specific data on inappropriate antibiotic prescribing.³ Among physicians who were notified, antibiotic prescribing rates instantly declined by 5% relative to a control group of those who did not receive a notification. Similarly, at the University of Toronto, implementing a surgeon cost report card yielded an immediate 7% reduction in the cost of operations.⁴ Finally, the Society for Vascular Surgery also has launched a campaign to notify physicians of their vessel patency rates, leading to auto-correction of physician practice patterns around best practices. These successes speak to the power of physician-level outcomes to improve patient care.

Unfortunately, there remains a paucity of physician metrics today. We reviewed the 947 metrics endorsed by the National Quality Forum, considered the “gold standard” for health care measurement in the United States, and found that only 57 (6%) could be applied at the physician level using administrative claims data. An additional 44 (5%) metrics could be used to evaluate physicians if clinical registry data were accessible at the physician level. The American College of Surgeons’ National Surgical Quality Improvement Program and the Society of Thoracic Surgeons National Database collect data on individual surgeon performance; however, an analysis of the data at the surgeon level is only performed for a few specific named operations.

Beginning in 2017, the Medicare Access and Children’s Health Insurance Program Reauthorization Act (MACRA) will allow physicians to select the quality measures that will be used to evaluate the quality of their care.⁵ Ahead of these changes, which will impact payments to most physicians, specialty associations should take the lead in creating physician-endorsed metrics, as developed for Improving Wisely, to address outlier practice patterns. The homegrown aspect of developing metrics of physician performance is critical. In 2015, the Centers for Medicare & Medicaid Services made available in their data the National Provider Identifier, enabling quality efforts to benchmark physicians to other like specialists nationwide.

Through a Robert Wood Johnson Foundation–funded project using national Medicare claims data, Johns Hopkins has partnered with specialty societies to identify physician “waste metrics” that are both meaningful and actionable. This new program is called “Improving Wisely,” modeled after the dissemination success of Choosing Wisely, also funded by the Robert Wood Johnson Foundation. Examples of these new metrics that our team has developed include biopsies per screening colonoscopy by physician, mean blocks (number of resections) per Mohs surgery by physician, and rates of elective laparoscopic versus open colon surgery by physician for standardized presentations. Confidential data sharing reports (Table 1) using national Medicare data will enable each physician to observe his or her performance relative to that of colleagues nationwide.

Table 1.

Examples of Robust, Actionable Physician Metrics.

Source	Physician-Specific Metric	Partnering Organization
National Quality Forum	Percentage of patients who died from cancer receiving chemotherapy in the last 14 days of life	American Society of Clinical Oncology
National Quality Forum	Percent of patients 2 years of age and older with acute otitis externa who were or were not prescribed systemic antimicrobial therapy	Optum
National Quality Forum	Percentage of stress urinary incontinence surgeries for which cystoscopy was used during the surgical procedure to reduce complications	American Urologic Association
Improving Wisely	Number of stages per case for Mohs surgery	American College of Mohs Surgery
Improving Wisely	Number of biopsies per screening colonoscopy	American Gastroenterological Association
Improving Wisely	Percent of elective hysterectomy performed with a laparoscopic approach	American Association of Gynecologic Laparoscopists

We believe that several important principles apply for physician metrics. First, an effective, high-impact metric should be endorsed by practicing clinicians in the specialty, soliciting input from a range of doctors serving diverse patient populations. Second, the metric should be patient-centered, focusing on what it means for a patient’s quality of life and potential disability. The metric should target significant harm or waste among extreme outlier practice patterns, rather than be a way to measure small variations in practices. Third, a physician metric should be feasible to collect while minimizing reporting bias and potential gaming of the metric. Finally, a sound metric should be highly actionable for the physician. Metrics such as mortality, while easy to collect, are far less actionable than measures that provide direct insight into what the individual physician can specifically modify in his or her practice. For example, highly actionable physician metrics that can be benchmarked to peers to identify outliers include the percent of patients with bronchitis treated with antibiotics or the proportion of patients undergoing elective back surgery who did not attempt physical therapy prior to the operation.

The potential impact of meaningful physician metrics to reduce unwarranted clinical variation is great. Physicians and hospitals in the United States spend an estimated $15.4 billion annually and countless hours reporting quality measures.⁶ Yet this endeavor has yielded limited improvements in health care from the patient standpoint. Today, waste continues to be an endemic problem, comprising up to 30% of all medical spending and accounting for an estimated $210 billion in excess spending each year.⁷ Moreover, medical care gone awry has been described as a leading cause of death in the United States.⁸

A 2009 survey of primary care physicians reported that more than 75% of physicians would be interested in learning how their own practice patterns compared with those of their colleagues.⁹ Although underutilized today, individual performance reports on a large scale remain a major opportunity to achieve value-based health care. We propose that quality improvement using physician-specific, physician-endorsed metrics presented in peer comparison reports should aim to provide actionable feedback to physicians. While recent advances in big data and technology now make measurement at the physician-level possible, there remains a lack of robust metrics. Physicians and specialty associations can help by creating consensus metrics at the physician level using the principles outlined herein and submit metrics they believe to be meaningful to the MACRA program for broader implementation.

Footnotes

Declaration of Conflicting Interests

The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Dr. Makary has served as a consultant to Oliver Wyman’s Health Innovation Center.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Robert Wood Johnson Foundation, Grant #73417.

References

Jha

AK.

Time to get serious about pay for performance. JAMA. 2013;309:347-348.

Ayres

Raseman

Shih

Evidence from Two Large Field Experiments That Peer Comparison Feedback Can Reduce Residential Energy Usage. Cambridge, MA: National Bureau of Economic Research; 2009.

Meeker

Linder

Fox

et al

Effect of behavioral interventions on inappropriate antibiotic prescribing among primary care practices: a randomized clinical trial. JAMA. 2016;315:562-570.

Gunaratne

Cleghorn

Jackson

TD.

The surgeon cost report card: a novel cost-performance feedback tool. JAMA Surg. 2016;151:79-80.

Centers for Medicare & Medicaid Services. CMS quality measure development plan: supporting the transition to the merit-based incentive payment system (MIPS) and alternative payment models (APMs). https://www.cms.gov/Medicare/Quality-Initiatives-Patient-Assessment-Instruments/Value-Based-Programs/MACRA-MIPS-and-APMs/Final-MDP.pdf. Published May 2, 2016. Accessed March 22, 2017.

Casalino

Gans

Weber et al

US physician practices spend more than $15.4 billion annually to report quality measures. Health Aff. 2016;35:401-406.

Smith

Saunders

Stuckhardt

McGinnis

, eds. Best Care at Lower Cost: The Path to Continuously Learning Healthcare in America. Washington, DC: National Academies Press; 2013.

Makary

Daniel

Medical error—the third leading cause of death in the US. BMJ. 2016;353:i2139.

Sirovich

Woloshin

Schwartz

LM.

Too little? Too much? Primary care physicians’ views on US health care. Arch Intern Med. 2011;171:1582-1585.