Variation of fee-for-service specialist direct care work effort with patient overall illness burden

Abstract

Objective

To explore whether a common industry measure of overall patient illness burden, used to assess the total costs of members in a health plan, would be suitable to describe variation in a summary metric of utilization that assesses specialist physician direct patient care services not grouped into clinical episodes, but with exclusion criteria applied to reduce any bias in the data.

Data sources/study setting

Calendar year 2006 administrative data on 153,557 commercial members enrolled in a non-profit single-state statewide Health Maintenance Organization (HMO) and treated by 4356 specialists in 11 specialties. The health plan's global referral process and specialist fee-for-service reimbursement likely makes these results applicable to the non-managed care setting, as once a global referral was authorized there was no required intervention by the HMO or referring primary care provider for the majority of any subsequent specialist direct clinical care.

Study design

Specialty-specific correlations and ordinary least-squares regression models to assess variations in specialist direct patient care work effort with patient overall illness burden, after the application of exclusion criteria to reduce potential bias in the data.

Principle findings

Statistically significant positive correlations exist between specialist direct patient care work effort and patient overall illness burden for all studied specialties. Regression models revealed a generally monotonic increasing relationship between illness burden categories and aggregate specialist direct patient care work effort. Almost all regression model differences from the reference category across specialties are statistically significant (P ≤ 0.012). Assessment of additional results demonstrates the relationship has more substantive significance in some specialties and less in others. The most substantive relationships in this study were found in the specialties of orthopaedic surgery, general surgery and interventional cardiology.

Conclusions

For many specialties, specialists do vary physician direct patient care utilization with patient overall illness burden. Accounting for patient overall health status is important to fairly compare specialists of certain specialties on utilization for health plan specialist network management. Additional study is required to evaluate health plan application of this methodology.

Introduction

When compared with peers, specialists often attribute any increased amount of direct patient care they rendered to differences in patient health status, the refrain of ‘my patients are sicker’. While measures of a patient's overall health status exist, such measures are more commonly applied to assessments of the total costs of patients in a population, such as a health plan's membership for actuarial purposes or setting capitation rates for primary care physician (PCP) assigned patient panels.¹ While belief is commonly expressed in the general concept that sicker patients use more health care overall, anecdotally non-specialist physicians and others believe the amount of direct specialty specific patient care physician services (i.e. excluding laboratory, radiology and other ancillary services) provided by specialists actually do not vary with patient overall health status (e.g. gastroenterologists perform endoscopy on every referred patient, cardiologists perform a cardiac catheterization on every referred patient). In other words, the impression is that specialists routinely provide referred patients in a relatively better overall state of health with a generally similar amount of procedural and other specialty specific physician direct care services, as they do referred patients in a relatively worse overall state of health. This perceived lack of appropriate modulation of the amount of direct face-to-face billable care delivered by specialists is considered to be a result of specialist fee-for-service reimbursement arrangements that offer a financial incentive for such behaviour.^2–4 The perception of specialists as medical practitioners whose clinical decision-making in regard to the utilization of their own physician services is driven only by the desire for revenue maximization, would likely not be a view shared by actual specialists. Additionally, specialists (or any physician) tend to view themselves individually as better performers than their peers when asked anecdotally to compare themselves on clinical utilization or any other metric or issue being evaluated. This is consistent with what appears to be general human nature, as studied by David Dunning at Cornell, in that all individuals tend to view themselves as above average.⁵ This psychological trait is also part of American popular culture as evidenced by Garrison Keillor's⁶ fictional Lake Wobegon: a place where ‘all the women are strong, all the men are good looking, and all the children are above average’. Financial gain is, of course, influential within a medical system that exists in a capitalist society (e.g. the need to enact Medicare physician self-referral legislation, or ‘Stark laws’), but does it explain all specialist clinical behaviour?⁷ On the other hand, not all specialists are above average in their clinical skills and ability to optimize utilization decisions for different clinical scenarios. In regard to the measurement of physician efficiency for direct patient care services, the concerns of specialists on the issue of utilization variation due to patient health status do need to be addressed in order for health plans to advance the implementation of specialist comparative performance measurement processes.^8–10 While comparisons on utilization alone do not consider quality, such economic profiling is still a necessary component of any more comprehensive health care provider assessment process.¹¹ Economic profiling would bring quality comparisons into financial focus in order to determine the overall value a provider offers to a health plan and its members, as cost is not necessarily correlated with quality.¹²

Current approaches to the economic profiling of specialists generally involve an episode of treatment groupers.^13–15 Vendors include Ingenix' (Symmetry) ETG™, Medstat's MEGS^® and Cave Consulting Group's Marketbasket System™. All of these products use their proprietary algorithms to group claims data into clinical episodes of care for specific medical conditions (e.g. acute myocardial infarction, congestive heart failure).There exists debate whether to account for patient health status variation (i.e. risk adjustment) when making comparisons across specialists when using episode grouper methodologies.¹⁶ ETG and MEGS offer risk adjustment calculations, while the Cave product relies on the creation of homogeneous specialty-specific episodes in an effort to reduce any effect of patient health status variation.^13,17,18

Episodes constructed by an episode grouper include all costs associated with the care of that specific medical condition only while excluding concurrent, unrelated clinical events.¹³ All costs generally include professional, facility, pharmacy, laboratory, radiology and other ancillary costs related to the defined clinical episode. The goal is to create discrete units to measure all relevant costs during an episode treatment period and then compare providers. Physician attribution rules can stipulate that at least 20% (some suggest 30% as a minimum) of total episode relevant costs cause an episode, in its totality, to be attributed to a given specialist.¹⁹ Attribution rules can result in more than one specialist assigned to the same episode. This may be professed as a barrier to acceptance by providers, as a specialist may dispute why they are assigned an episode when he or she may have only been accountable for 20–30% of the costs under review.²⁰ Additionally, a report's time frame can influence the number of completed episodes available for analysis as completed episodes may be defined over a period of time that extends outside the limits of a typical health plan 12-month reporting period.

Episode grouper methodologies do provide valuable granular information, but are more complex and less transparent, creating possible difficulties in regard to initial provider acceptance.²⁰ While simpler efficiency measures expressed as ratios have been common in health care (e.g. case-mix adjusted totaled hospital surgical patient lengths of stay divided by the total count of surgical admissions, by surgeon), few efficiency measures of this kind have had adequate formal evaluation.²¹ Whereas literature exists regarding newer episode grouper methodologies, there is a dearth of literature specifically about older and less complex efficiency measurements of physicians in particular specialties.²² A less complex and more transparent summary measure has the potential to be more readily accepted by providers as an initial screening step for comparing specialist performance on utilization alone, if the concern of ‘my patients are sicker’ can also be sufficiently addressed. Screening tools, whether for cost (utilization) or quality, are important in order to optimize health plan resource allocation when managing a large specialist provider network. While such a metric cannot be the only measure used to assess a provider, it could serve as an initial evaluation step to determine entry into a series of metrics that would be part of a more complete profiling process. Such a process could then include episode grouper high level data to reinforce the initial screen, followed by granular episode information if a provider required a more in depth review.

The purpose of this study is to explore whether a common industry measure of overall patient illness burden, used to assess the total costs of members in a health plan, would be suitable to describe variations in a summary metric of utilization that only assesses specialist physician direct patient care services not grouped into clinical episodes but do have exclusion criteria applied to reduce any bias in the data. This would support the concept that specialists do in fact modulate their behaviour according to patient overall illness burden in regard to billable services for their direct patient care work effort, as well as the concept of a simple risk-adjusted utilization metric that could have utility as an initial utilization performance screening step that would have a place within a comprehensive specialist performance profiling process.

Methods

Setting

Blue Care Network of Michigan (BCN) is a non-profit statewide Health Maintenance Organization (HMO) and wholly owned subsidiary of Blue Cross Blue Shield of Michigan. BCN had approximately 450,000 total commercial members in 2006. BCN service areas and membership are divided among four geographic regions for administrative purposes. This study used BCN administrative data (i.e. paid claims, member and physician demographics) from calendar year 2006.

During the study period, all specialists were paid by BCN on a fee-for-service basis. While member self-referral to a specialist was not allowed for the vast majority of BCN membership during the study period, only the member's PCP's authorization was required for a referral to a BCN contracted specialist. BCN itself did not review such referrals when sanctioned by the PCP. Except for a limited list of selected procedures that were reviewed when requested by any provider, initial and ongoing care rendered by BCN contracted specialists did not require BCN medical review or prior authorization once the initial referral was made by the member's PCP. Ongoing PCP authorizations were also not required for any care provided during an already approved referral. These ‘global referrals’ could be up to one year in duration before requiring renewal. Only initial referrals to specialists not contracted with BCN were reviewed and required authorization by BCN for payment, except for the BCN Self Referral Option certificate which comprised <1% of total BCN commercial membership as of 1 January 2007 (0.4% as of 1 January 2006 and 0.79% as of 1 January 2007). Once an initial referral to a specialist not contracted with BCN was approved, the global referral process applied as with BCN contracted specialists.

Illness burden metric

Overall member illness burden (i.e. risk) was assessed for each of the studied members based on BCN claims data using DxCG^® software (Verisk Health, Waltham, MA, USA.) which uses Diagnostic Cost Group/Hierarchal Condition Category (DCG/HCC) models. DCG/HCC designations are derived from International Classification of Diseases, Ninth Revision (ICD-9) diagnosis codes from paid claims data.²³ The basic components of the DCG/HCC model are Diagnostic Groups, which are used to combine related conditions into Condition Categories, which are then organized into hierarchies. Risk is assessed using only the highest cost category, with cost based on a large national representative commercially insured population sample. Age and sex are also components of the disease groupings and score calculation. This same methodology has been used by the Centers for Medicare and Medicaid Services (CMS) for Medicare HMO capitation payment calculations since 2004 and is the risk adjustment component of Medstat's MEGS^® episode grouper.^18,24

Concurrent models (year 1 data to explain year 1 overall illness burden), as opposed to predictive models (year 1 data to predict year 2 illness burden), are primarily useful for retrospective analyses such as profiling because knowing all the medical conditions treated during a time period is especially germane when assessing the resources required during that same retrospective time period.²⁵ A comparison of different claims based methods of health-risk assessment found that the DxCG^® concurrent methodology had the highest R ² value compared with six other concurrent-model methodologies (0.564 with claims truncated at $50,000).²³

An output of DxCG^® software is aggregated diagnostic cost groups (ADCGs) at the member level. Output values are on an interval scale from 1 to 5, with 5 being the category representing the greatest illness burden, that represent five set ranges of adjusted yearly resource use (model expected costs as per the DxCG^® concurrent or predictive models). ADCG categories act to reduce the effect of extreme outliers in the data as well as create a manageable number of illness burden divisions for selected analyses. For BCN, ICD-9 codes from all member inpatient and outpatient claims (excluding laboratory, radiology, and pharmacy) during all of 2006 were used in the DxCG^® commercial population concurrent model to calculate concurrent BCN member ADCG values (1–5) for this study.

Another output of the DxCG^® software is relative risk scores (RRSs). Rather than the ADCG set ranges of model yearly expected costs, RRS represent total model yearly expected costs for a health plan member normalized to a benchmark average, with the benchmark represented by the large national representative commercially insured population sample. The time span of health plan enrollment is not accounted for in a single member's RRS or ADCG values. Therefore, RRS and ADCG values for yearly expected costs for a single member would be overstated for members with less than 12 months of eligibility in a given annual measurement period. When individual member RRS values are rolled up and reported at the level of a larger group (e.g. health plan, PCP patient panel, etc.), the number of months of member health plan eligibility is included as a weighting factor to account for variation in member eligibility during a given 12-month reporting period. RRS values are continuous and can range from 0 to infinity, with a value of 1 being equal to the benchmark average. RRS values would have more utility as a continuous variable in routine health plan report production over categorical ADCG values when constructing a ratio measure of utilization that accounts for overall member illness burden.

Exclusion criteria overview

Various exclusion criteria were used in an effort to create homogeneous groupings of paid claims data, members and specialists for the specialties included in this study (Table 1). This was done in order to mitigate any bias that may be present in these data so as to obtain as accurate an assessment as possible of the relationship between specialist direct patient care work effort and patient overall health status. These potential confounding elements are the same general issues specialists would be aware of and argue need to be addressed for fair comparisons between specialists (i.e. composition of the peer group and how the peer group and individual physicians are measured). Certain specific member exclusion criteria were planned ahead of the extraction of data from the BCN data warehouse, and were based on a combination of past internal BCN analyses and the face validity of these exclusions to create more homogeneous cohorts. These criteria were utilizing only the BCN commercial population (i.e. no Medicaid, Medicare or Medicare supplemental), only a specific age range (ages 18–65) and only members who were continuous health plan enrollees during the 12-month study period. Therefore, for the first two of these three criteria there are no readily available counts of excluded members. The number of members excluded due to not being enrolled with BCN for all 12 months of 2006 (after excluding for age and whether part of a government program) was 42,557. Other excluded member counts will be noted (Table 2) and additional explanation of the exclusion criteria is included below and in the appendix.

Table 1

Specialties and abbreviations

Abbreviation	Specialty
AI	Allergy/Immunology
CI	Cardiology Interventional
DE	Dermatology
GI	Gastroenterology
GS	General Surgery
GY	Gynaecology
OP	Ophthalmology
OR	Orthopaedic Surgery
OY	Otolaryngology
PD	Podiatry
UR	Urology

Table 2

Calendar year 2006 excluded and final analysis count of members, by specialty

Specialty	Members for study*	Unclassified CPT code^†	Non-BCN region^‡	Unclassified CPT code and non-BCN region^§	Total excluded members	Percent excluded members	Final analysis member count
AI	2980	2	2	0	4	0.1%	2976
CI	18,493	4	155	0	159	0.9%	18,334
DE	12,770	0	91	0	91	0.7%	12,679
GI	14,436	0	87	0	87	0.6%	14,349
GS	12,461	58	267	1	326	2.6%	12,135
GY	39,382	12	322	1	335	0.9%	39,047
OP	16,641	2	19	0	21	0.1%	16,620
OR	14,732	7	359	2	368	2.5%	14,364
OY	7372	6	219	0	225	3.1%	7147
PD	7958	0	19	0	19	0.2%	7939
UR	8020	9	44	0	53	0.7%	7967
TOTAL	155,245	100	1584	4	1688	1.1%	153,557

AI, Allergy/Immunology; CI, Cardiology, Interventional; DE, Dermatology; GI, Gastroenterology; GS, General Surgery; GY, Gynaecology; OP, Ophthalmology; OR, Orthopaedic Surgery; OY, Otolaryngology; PD, Podiatry; UR, Urology

*After initial exclusion criteria were applied: ages 18–65 and continuously enrolled in 2006 commercial members

^†Excluded members when unclassified CPT code exclusion criterion was met

^‡Excluded members when non-BCN region exclusion criterion was met

^§Excluded members when exclusion criteria noted in this table were both met

Paid claims data

Professional services claims data from individual specialists during calendar year 2006 with 90 days run-out (lag period to allow for submission of claims from providers) were aggregated at the member level and then had additional exclusion criteria applied. Such data reflect only what the specialist directly billed to BCN and thus is known to be 100% attributable to the billing provider. All facility, pharmacy and any other paid claims related to a specialist treated member (but billed by other than the treating specialist) were excluded. Facility claims for a member can be substantially influenced by providers other than a single specialist creating attribution issues. BCN cannot differentiate using claims data as to who ordered a test or procedure, or who sent the patient to the hospital, only who billed for what was done. Pharmacy claims data are influenced by attribution issues as well as formulary and benefit design factors.²⁶ Additionally, a member may not even have prescription drug coverage. Thus, many aspects of pharmaceutical utilization are outside the control of a treating specialist.

Claim lines are the itemization of billed American Medical Association (AMA) Common Procedural Terminology (CPT) codes as listed on a provider's insurance claim form. Professional claims data were organized at the claim line level by treated member and billing provider. Each claim line data record contained the elements of treated member identifier, billing provider identifier, date of service, billed CPT code, work relative value units (WRVUs) for that CPT code and the amount paid by BCN for that CPT code. WRVU values used were those current for the year 2006. WRVU values measure the amount of direct physician labour ascribed to a given CPT code as determined by the AMA.²⁷ These values are available on an annual basis for purchase from medical claim coding vendors by individual physicians or organizations.²⁸ The claim line data table had exclusion criteria applied line by line in regard to certain defined attributes. While paid amounts were available and used as part of the exclusion criteria to mitigate a potential source of bias, this study did not use paid amounts to assess specialist work effort. Instead WRVU values were utilized to create a uniform metric of labour across all specialists, unbiased by any specialists with individual contractual variations from the standard BCN fee schedule, or payment variations due to a physician's non-BCN contracted status.

Analytic approach

Ordinary least squares (OLS) linear regression analysis using dummy variables was used for the initial portion of the analysis. Eleven separate models, one for each specialty, were specified. For all models the dependent variable was continuous and equaled the sum of all WRVU values accumulated by an included study member during 2006 from all the treating specialists of that specialty. The same member, and their associated DxCG^® output, could be present in more than one model if treated by providers in more than one specialty (and met all inclusion criteria) during the 2006 calendar year.

Overall illness burden, as the independent variable, was specified by four categorical member ADCG value dummy variables, with ADCG = 1 as the reference category (referent). Statistical significance from the referent for each of the four categorical coefficients was part of the statistical output along with confidence intervals for each of the four categorical coefficient estimates. The confidence intervals were evaluated for overlap from each other. Overlap indicates whether, regardless if significantly different from the referent, the estimated categorical coefficients were significantly different from each other. As four coefficients were estimated (ADCG 2, 3, 4 and 5), three coefficient confidence interval comparisons were made for any overlap using the output of the models: ADCG 2 to 3, ADCG 3 to 4 and ADCG 4 to 5.

Since RRS, rather than ADCG values, would be utilized as part of a ratio measure in routine health plan report production, correlation coefficients by specialty were also calculated. Spearman's rho was used as a non-parametric method to establish whether a relationship exists between the rank-order assignments of member total WRVU and member RRS values. Spearman's rho results are reported in the same manner as the Pearson's correlation coefficient (r): −1 to 1, with 0 signifying no relationship between two rank-ordered variables. While changes in specific original variable values would alter a Pearson's correlation coefficient, as long as the rank-ordering of that variable was not affected Spearman's rho would not change.

The correlation approach provides an assessment in regard to using DxCG^® output in a simple ratio measure (e.g. sum of a member's specialty-specific WRVU values divided by that member's RRS). The regression approach provides distinct and important information in regard to any existing monotonic patterns between the two variables, along with the magnitude of WRVU differences along the spectrum of overall illness burden categories. Controlling for confounders was done by creating homogeneous data-sets through the application of exclusion criteria to support the use of both parsimonious regression models and simple correlation. Creating homogeneous data-sets in this manner also more closely replicates what would be practical for routine, ongoing report production by a health plan.

The exclusion criteria applied to paid claims data can be clearly explained to providers and creates the basis for a more equitable comparison of specialists on utilization. This study then evaluates whether specialist utilization data (after exclusion criteria applied) can be demonstrated to have a sufficient statistical relationship with a common industry measure of patient overall illness burden to conceptually support comparisons using a simple ratio measure that incorporates, and thus accounts for, patient overall illness burden.

Results

The distribution of the dependent variable (aggregate physician WRVU values by member by specialty) is positively skewed. The distribution of member ADCG and RRS values are also positively skewed. Spearman's rho was used for the correlations and thus makes no assumptions about linearity, with comparisons based upon the rank order of the correlated variables. ADCG values were modelled as dummy variables and not specified as a single variable with this distribution in the OLS regression equations. Various regression model diagnostics were evaluated, considering these data represent convenience samples for each specialty as opposed to random samples. These samples, while convenient, are largely mitigating the effect of any violations of basic assumptions in this social sciences analysis.^29–31

No issues with collinearity were identified in any of the 11 regression models and residual statistics revealed no outliers. The distribution of residual errors revealed some departure from normality on visual inspection of normal probability plots of the standardized residuals. These departures from normality varied in magnitude by specialty with some minimal and none major. Plots of Studentized deleted residuals by standardized predicted values revealed some moderate heteroscedasticity that varied by specialty.

Table 3 is the count of all studied members and their treating specialists (by specialty and ADCG) in 2006. These are the members included in the regression models after all exclusion criteria were applied as previously described. Total BCN commercial credentialed (available surrogate for contracted status) specialist counts are also included in Table 3. Available administrative data do not provide the contracted status of a given member's specialist on a given date of service. The network status of physicians (credentialing and contracting) is a dynamic process both during and across years. The credentialed specialist counts are reported to delineate the size of the BCN network and demonstrate that, other than for the previously described explicit exclusion criteria, no specialist was excluded from these data.

Table 3

Final count of study members* and treating specialists by ADCG^† by specialty

			Members
Specialty	Total BCN network specialists^‡	Total study treating specialists^§	ADCG 1	ADCG 2	ADCG 3	ADCG 4	ADCG 5	Total study members
AI	122	133	1078	1315	308	215	60	2976
CI	589**	426^††	1116	8272	3434	3407	2105	18,334
DE	203	216	4960	5466	1291	704	258	12,679
GI	272	282	1469	7946	2266	1716	952	14,349
GS	536	595	1794	4873	2316	1911	1241	12,135
GY	878^‡‡	947	15,889	15,380	4824	2456	498	39,047
OP	485	495	4804	7304	2267	1514	731	16,620
OR	497	521	2846	6886	2266	1749	617	14,364
OY	181	221	1631	3527	1009	646	334	7147
PD	262	265	2018	3819	1044	702	356	7939
UR	244	255	1552	3655	1340	907	513	7967
Specialty	Total BCN network specialists^‡	Total study treating specialists^§	ADCG 1 (%)	ADCG 2 (%)	ADCG 3 (%)	ADCG 4 (%)	ADCG 5 (%)	Total study members (%)
AI	122	133	36.2	44.2	10.3	7.2	2.0	100
CI	589**	426^††	6.1	45.1	18.7	18.6	11.5	100
DE	203	216	39.1	43.1	10.2	5.6	2.0	100
GI	272	282	10.2	55.4	15.8	12.0	6.6	100
GS	536	595	14.8	40.2	19.1	15.7	10.2	100
GY	878^‡‡	947	40.7	39.4	12.4	6.3	1.3	100
OP	485	495	28.9	43.9	13.6	9.1	4.4	100
OR	497	521	19.8	47.9	15.8	12.2	4.3	100
OY	181	221	22.8	49.3	14.1	9.0	4.7	100
PD	262	265	25.4	48.1	13.2	8.8	4.5	100
UR	244	255	19.5	45.9	16.8	11.4	6.4	100

*After all exclusion criteria have been applied

^†ADCG: aggregated diagnostic cost group

^‡Specialists who were credentialed with the BCN commercial product during at least some, but not necessarily all, of 2006

^§May be greater than network totals due to members treated by specialists not BCN credentialed on a member's date of service

**Includes all cardiologists as counted by BCN credentialing (interventional distinction not part of the BCN credentialing process)

^††Includes only interventional cardiologists, as defined in this study

^‡‡Includes both OB/GYN and GYN alone providers, as counted by BCN credentialing

Correlations were done, by specialty, and used data from members across all final ADCG counts (Table 3 ‘Total Study Members’ column). Complete output from the regression models is available in the appendix (Tables A1 and A2). Table 4 contains a summary of all results. Included, by specialty, are the average observed WRVU values for specialty-treated members across the five ADCG categories, mean of the average observed WRVU differences for each of the four ADCG intervals, the number (out of a maximum of 3) of confidence interval overlaps for regression model ADCG coefficient estimates and Spearman's rho correlation coefficients. A percentile ranking was determined for each mean of the observed average WRVU change per ADCG interval across the 11 specialties and the entire table then rank ordered by these percentile values. This created three groupings using quartile cut-offs in a box plot style approach. Significance test results for the Spearman's rho correlation coefficients are also included.

Table 4

Results summary for treated members, by specialty

Specialty	Observed average WRVU* for ADCG^† 1 members	Observed average WRVU for ADCG 2 members	Observed average WRVU for ADCG 3 members	Observed average WRVU for ADCG 4 members	Observed average WRVU for ADCG 5 members	Mean ADCG interval observed average WRVU change	Mean ADCG interval observed average WRVU change percentile rank	Count of overlapping confidence intervals for ADCG 2–5 regression model estimates	Spearman's rho correlation coefficient: RRS^‡ versus WRVU
OR	4.38	6.02	8.60	13.18	17.25	3.22	Above 75th Percentile	0	0.249
GS	3.97	5.58	8.39	10.46	15.94	2.99		0	0.312
CI	1.49	2.14	3.49	6.91	12.53	2.76		0	0.269
UR	3.29	5.14	7.84	8.33	10.59	1.83	25th to 75th Percentile	1	0.210
GI	3.55	4.81	5.31	6.14	8.98	1.36		0	0.213
GY	1.95	3.57	4.80	5.58	6.86	1.23		0	0.261
OY	3.67	4.41	5.81	6.20	8.48	1.20		1	0.156
OP	2.85	3.59	5.32	5.78	6.02	0.79		2	0.076
PD	3.13	3.79	4.49	4.94	5.72	0.65	Below 25th Percentile	2	0.111
AI	1.69	1.99	2.24	2.29	2.50	0.20		3	0.115
DE	2.65	3.20	3.15	3.22	2.88	0.06		3	0.075

*WRVU: physician work relative value inits

^†ADCG: aggregate diagnostic cost group

^‡RRS: relative risk score

These results depict a generally monotonic increasing relationship between ADCG and aggregate specialist physician WRVU values per specialty treated member over the 12-month study period in 2006. All differences from ADCG 1 across specialties are statistically significant (P ≤ 0.012), except for dermatology ADCG 5 which is also considerably under-powered at 0.163. There were no ADCG coefficient estimate confidence interval overlaps for orthopaedic surgery, general surgery, interventional cardiology, gastroenterology or gynaecology indicating that not only were ADCG 2–5 WRVU estimates all significantly greater than ADCG 1, but also from each other for these specialties. Urology and otolaryngology each had one overlap, ophthalmology and podiatry had two, while allergy and dermatology each had the maximum possible three overlaps. All the correlation coefficients were statistically significant with P < 0.01.

While higher confidence interval overlap counts could result from low statistical power between ADCG coefficients, overlap counts can be evaluated in the context of the other specialty specific values in Table 4. For example, not only does dermatology have three confidence interval overlaps but also exhibits no monotonic increasing relationship, is also ranked last with a near zero value of 0.06 for the mean ADCG category interval WRVU change and has the lowest Spearman's rho value.

Discussion

This study is evidence to support that the overall illness burden of patients has a largely monotonic increasing relationship with the aggregate amount of direct physician professional services provided to those same patients when treated by physicians of a given specialty, and that in many specialties providers do appear to modulate their direct patient care clinical practice behaviour as a result. The variation of how robust this relationship is (e.g. high for general surgery and essentially absent for dermatology) may be a reflection of the WRVU value span for services within the purview of a given specialty or less of a clinical impact of overall illness burden on treatment options within a specialty, as opposed to providers in those specific specialties systematically disregarding the illness burden of their patients. Thus, overall illness burden is an important factor that should be accounted for when comparing specialists of certain specialties on utilization, e.g. risk adjustment of specialist WRVU values by patient overall health status. Different initial screening methods may be appropriate for other specialties for which this relationship cannot be established (e.g. dermatology). Using aggregate direct physician services, WRVU values as described in this study, is a simple and transparent metric for the utilization component. WRVU values also have the benefit of assessing what the specialist can be held directly responsible for. The application of specific exclusion criteria to paid professional claims data can be used to create more homogeneous groupings of physicians and treated members mitigating potential sources of bias. These criteria can be clearly articulated to physicians so that providers can understand the efforts made to ensure comparisons are as fair as possible. These include office capacity, age extremes, CPT coding issues, fee schedule differences and others. CPT codes and WRVU values are both items readily and independently available to the provider. In addition, these exclusion criteria could generally be implemented by any health plan. Assessment of overall patient health status in this analysis used DxCG^® software. While this is proprietary and not software a practicing physician would either purchase or have the ability to use due to lack of data at the health plan level, the available literature and use by CMS and others make this methodology less of an unknown to the provider community.

Sample sizes in this study were large, as reflected by the mostly high observed power values for the variables in the different specialty regression models. The ranking of mean WRVU interval changes and the other reported results illustrate that the relationship of WRVU to overall illness burden, while statistically significant for almost all, has more substantive significance in some specialties and less in others. Dermatology has nearly no relationship. Allergy and podiatry do have a monotonic increasing relationship, but it is less robust than the other studied specialties. The relationships with the greatest magnitude are for orthopaedic surgery, general surgery and interventional cardiology. The remaining specialties fall in between these two ends of the spectrum in this study.

Although this analysis was conducted in a managed care environment, the BCN global referral process and specialist fee-for-service reimbursement likely makes these results generalizable to the non-managed care setting, as once the global referral was authorized there was no required intervention by BCN or the PCP for the majority of any subsequent specialist clinical care. However, variations in health plan processes, member benefit design, or unmeasured member and physician characteristics may cause these results to not be generalizable to other populations such as Medicare or Medicaid, high deductible (e.g. more than $1000 per year) insurance products or to other specialties. In addition, these relationships may not translate to data derived from health plans with less membership due to resulting smaller sample sizes or plans using different member illness burden software.

Conclusion

For many specialties, specialists do increase the amount of total physician direct care services rendered to a patient in response to an increased overall health burden of that patient, and do not just provide the generally same amount of physician direct care to every referred individual. Addressing overall patient health status is an important component to more fairly evaluate specialists on utilization in certain specialties when measured by WRVU values, a simple and transparent metric for which the specialist is known to be 100% accountable as described in this study. These results support the concept of adjusting physician WRVU values, which have less potential to be biased after the application of specific data exclusion criteria, by patient overall health status (e.g. sum of member accumulated WRVU values divided by member RRS). Such a risk-adjusted utilization ratio measure could be used as an initial screening step for health plan network management to compare specialists, of specialties for which this relationship applies, on utilization alone that may be perceived by specialists as both simple and fair. Specialist acceptance is a necessary step for any profiling process in order to move beyond objections to anecdotal views of specialist driven overuse expressed by non-specialist individuals, and to thus develop systems to better evaluate the true value (cost and quality) of specialists relative to their peers. Additional study is required to evaluate health plan application of this approach.

Footnotes

Acknowledgements

I thank Dean G Smith, PhD, Professor and Senior Associate Dean for Administration at the University of Michigan School of Public Health, for his statistical review and review of the manuscript; and James DiMaria, Manager BCN Medical Informatics, for his support in providing various necessary data extracts. This study was conducted using the resources of Blue Care Network of Michigan and while a full time employee of Blue Care Network of Michigan.

Appendix

References

Sevcik

, Aub-Jaber

, Marek

. Understanding approaches to case-mix assessment and case-mix adjustment. J Healthcare Qual. Online September/October 2004: pW5-24–W5-29. See http://www.nahq.org/uploads/files/Understanding_Approaches.pdf (last checked 14 December 2010)

Bodenheimer

, Berenson

, Rudolf

. The primary care–specialty income gap: why it matters. Ann Intern Med 2007;146:301–6

Carroll

. How doctors are paid now, and why it has to change. Managed Care. December 2007. See http://www.managedcaremag.com/archives/0712/0712.docpay.html (last checked 14 December 14 2010)

Miller

. Creating Payment Systems to Accelerate Value-Driven Health Care: Issues and Options for Policy Reform. The Commonwealth Fund, September 2007. See http://www.commonwealthfund.org/Content/Publications/Fund-Reports/2007/Sep/Creating-Payment-Systems-to-Accelerate-Value-Driven-Health-Care-Issues-and-Options-for-Policy-Refor.aspx (last checked 14 December 2010)

Carswell

. Everybody's above average. ScienCentral, 27 December 2005. See http://www.sciencentral.com/articles/view.php3?article_id=218392713 (last checked 14 December 2010)

Keillor

. A Prairie Home Companion: The News from Lake Wobegon. See http://prairiehome.publicradio.org/about/podcast/ (last checked 14 December 2010)

Centers for Medicare and Medicaid Services. Overview: Physician Self-Referral. See https://www.cms.gov/PhysicianSelfReferral/ (last checked 14 December 2010)

Milstein

, Lee

. Comparing physicians on efficiency. N Engl J Med 2007;357:2649–2652

Ferris

, Vogeli

, Marder

, . Trends: physician specialty societies and the development of physician performance measures. Health Aff 2007;26:1712–9

10.

Eisenstein

, Bethea

, Muhlbaier

Surgeons' economic profiles: can we get the ‘Right’ answers?

J Med Systems 2005;29:111–24

11.

Binder

, Rudolph

. Commentary: a systematic review of health care efficiency measures. Health Serv Res 2009;44:806–11

12.

Coakley

. Investigation of Health Care Cost Trends and Cost Drivers Preliminary Report. Boston, MA: Office of the Attorney General, 29 January 2010. See http://www.mass.gov/Cago/docs/healthcare/Investigation_HCCT&CD.pdf (last checked 14 December 2010)

13.

Pacific Business Group on Health. Advancing Physician Performance Measurement Using Administrative Data to Assess Physician Quality and Efficiency. September 2005. See http://www.pbgh.org/programs/documents/PBGHP3Report_09-01-05final.pdf (last checked 14 December 2010)

14.

Ingenix. Symmetry Episode Treatment Groups: Measuring Health Care with Meaningful Episodes of Care. Ingenix, Inc, 2007. See http://www.ingenix.com/content/attachments/SymmetryETG_WhitePaper.pdf (last checked 14 December 2010)

15.

MaCurdy

, Kerwin

, Gibbs

, . Evaluating the Functionality of the Symmetry ETG and Medstat MEG Software in Forming Episodes of Care Using Medicare Data. Accumen LLC, August 2008. See http://www.cms.hhs.gov/Reports/downloads/MaCurdy.pdf (last checked 14 December 2010)

16.

Thomas

. Should episode-based economic profiles be risk adjusted to account for differences in patients' health risks? Health Serv Res 2006;41:581–98

17.

Ingenix. Symmetry Episode Risk Groups: A Successful Approach to Health Risk Assessment. Ingenix, Inc. 2008. See http://www.ingenix.com/content/attachments/Symmetry_ERG_7-0_WhitePaper.pdf (last checked 14 December 2010)

18.

Thomson Reuters. Medical Episode Grouper- Health Plan. 2009. See http://home.thomsonhealthcare.com/uploadedFiles/docs/PAY-5166_MEG_HealthPlan_03%2009-Electronic.pdf (last checked 14 December 2010)

19.

Thomas

, Ward

. Economic profiling of physician specialists: use of outlier treatment and episode attribution rules. Inquiry 2006;43:271–82

20.

American Medical Association. Tiered and Narrow Physician Networks. 2007. See http://www.wsma.org/files/Downloads/PracticeResourceCenter/P4P_AMA_Tiered%20_Narrow_Networks_0606.pdf (last checked 14 December 2010)

21.

Hussey

, de Vries

, Romley

, A systematic review of health care efficiency measures. Health Serv Res 2009;44:784–805

22.

Consumer-Purchaser Disclosure Project. More Efficient Physicians: A Path to Significant Savings in Health Care, July 2003. See http://healthcaredisclosure.org/links/files/MedicareSavings.pdf (last checked 14 December 2010)

23.

Cumming

, Knutson

, Cameron

, . A Comparative Analysis of Claims Based Risk Assessment for Commercial Populations. Minneapolis, MN: Society of Actuaries, 2002. See http://www.soa.org/files/pdf/_asset_id=2583046.pdf (last checked 14 December 2010)

24.

Pope

, Kautter

, Ellis

, Risk adjustment of medicare capitation payments using the CMS-HCC model. Health Care Financing Rev 2004;25:119–41

25.

Ash

, Ellis

, Pope

, Using diagnoses to describe populations and predict costs. Health Care Financing Rev 2000;21:7–28

26.

Goldman

, Joyce

, Zheng

. Prescription drug cost sharing: associations with medication and medical utilization and spending and health. J Am Med Assoc 2007;298:61–9

27.

American Medical Association AMA/Specialty Society RVS Update Process. 2010. See http://www.ama-assn.org/ama1/pub/upload/mm/380/rvs-update-booklet.pdf (last checked 14 December 2010)

28.

American Medical Association. AMABookstore.com. See https://catalog.ama-assn.org/Catalog/cpt/cpt_home.jsp (last checked 14 December 2010)

29.

Achen

. Interpreting and Using Regression. Newbury Park, CA: Sage Publications, 1982

30.

Fox

. Regression Diagnostics. Newbury Park, CA: Sage Publications, 1991

31.

Garson

. Multiple Regression. See http://faculty.chass.ncsu.edu/garson/PA765/regress.htm (last checked 14 December 2010)