Abstract
A substantial biomedical literature has accumulated around the question of racial differences in the response to treatment with angiotensin converting enzyme (ACE) inhibitors for hypertension and congestive heart failure. African-origin populations are often asserted to be “low renin”, and therefore to have blunted response to agents that interfere with the renin-angiotensin system. Although the US Food and Drug Administration (FDA) rejected a combination of hydralazine and isosorbide dinitrate as a general heart failure treatment in 1997, this literature on Black resistance to ACE-inhibition therapy for heart failure was used to argue for a race-specific approval for this drug. Specifically, a paper by Exner and colleagues published in the New England Journal of Medicine in 2001 reported that ACE-inhibition reduced hospitalization for White patients, but not for Black patients. The Exner et al paper was used to argue for conducting a randomized trial in Black patients only, the successful completion of which led to FDA approval of the hydralazine and isosorbide dinitrate combination (known as BiDil) as a race-specific therapy in 2005. We re-analyze the data in the 2001 Exner et al study, and show that it is not well suited for answering the question of differential response by race. Even so, the published analysis ignored important facets of the data in order to arrive at the stated conclusion of a race-specific response. Black subjects were recruited mostly in a few regions, and were medically distinct from white patients in terms of clinical measures such as hypertension, diabetes and prescription drug history. Overall, Black subjects had a high risk of the outcomes, and the effect of treatment varied widely by clinical center or by region. The stated conclusion by Exner et al of a race-specific response to ACE inhibition is therefore suspect, as is the use of this conclusion to support the notion of race-specific therapies, both in the specific case of BiDil and in general.
Keywords
Introduction
The human body has roughly 60,000 miles of blood vessels, most of them so narrow that blood cells must pass through single-file (Marieb 1995). The heart must effectively push fluid through all those thousands of miles of piping, and the inability to keep up with the demands of this relentless task is the disease known as congestive heart failure. When fluid pressure in the blood vessels in insufficient to perfuse all the tissues of the body, one physiological mechanism that kicks in to try to restore balance is the renin-angiotensin system. The kidneys release renin, which acts on angiotensinogen from the liver, breaking this into angiotensin I, which is then converted into angiotensin II by the action of an enzyme found in the pulmonary circulation and the endothelium of blood vessels. This enzyme is called, appropriately enough, angiotensin converting enzyme, or ACE.
When a person is hypertensive, the problem is an imbalance in the other direction: fluid pressure in the blood vessels is so high that it causes damage to tissues in the vital organs, especially the kidneys, but also to the heart itself which strains under the excessive workload. Primitive treatment for hypertension began around the time of Word War II, with veratrum alkaloids, thiocyanates and catecholamine depletors as the main agents until diuretics were introduced in the 1950's (Moser 1997). Throughout the 1960's scientists sought ways to combat hypertension by trying to stimulate the body's natural response via the reninangiotensin system, either by inhibiting production of renin or ACE. An effective inhibitor of ACE was finally developed in the 1970s from the study of pit viper venom, and approved by the FDA for use as an antihypertensive agent in 1981 (Bakhle 1980). Since then, ACE-inhibitors have become a mainstay of antihypertensive therapy. By reducing the work-load of the heart and restoring neurohormonal balance, they also turn out to be life-saving medications for those with congestive heart failure.
Within a decade after the introduction and widespread adoption of ACE-inhibition as a primary antihypertensive therapy, however, emerged the overwhelming consensus that it was differentially effective by patient's race. Specifically, it was “black race” (understood to mean ancestral origin in sub-Saharan Africa) that was taken to be the marker predicting poor ACE-inhibition response. No pharmacologic agent has a uniform effect in a population, except perhaps for an agent that is 100% fatal. It is well known that patients respond differently because of innumerable physiological, genetic, dietary, psychological and cultural factors—everything from obesity to recent consumption of grapefruit juice (Bressler 2006). So why, in all the myriad axes of variability that physicians might record, was race the one factor that gained so much clinical attention in this instance?
To be fair, age was also frequently cited in the literature as a clinical predictor, and it may be argued that it is the central demographic trinity—age, race and sex—that is traditionally used in American medicine to try to succinctly summarize all of the intractably messy sources of social and biologic variability. This habit is reflected in the longstanding practice of identifying patients by these three labels in case reports and clinical presentations (Caldwell and Popenoe 1995), a tradition that continues largely unabated to this day. The finding of “race” as a primary way of slicing the population for observations of differential efficacy could arguably be attributed, therefore, to nothing more elaborate than medical habit. Had the US been a society obsessed with social class or height as primary demographic groupings, medical textbooks might read very differently. In historical fact, however, it was the dichotomy of black race versus non-black race that seems to have been most salient to American physicians, and which therefore became the edifice on which observations of difference were most often built. Because of the overwhelming influence of American medicine on the rest of the world, this quaintly New World preoccupation with skin color quickly became universalized.
Differential efficacy of ACE-inhibition by race
Based on literature that accumulated throughout the 1980's and 1990's, it became widely understood that blacks were resistant to the blood pressure lowering effects of ACE-inhibition. For example, the textbook Pathophysiology of Hypertension in Blacks (1993) notes that ACE-inhibitors are less effective in blacks than whites, unless combined with a diuretic, and less effective than diuretics in blacks (6, p. 275). The Manual of Hypertension (2002) similarly states that African-Americans exhibit less fall of blood pressure with ACE-inhibitors (7, pp. 326–327). Swales' Textbook of Hypertension (1994) goes further, distinguishing African-Americans from African blacks on the assertion that African-Americans may have an inherited genetic defect due to selective mortality during the slave trade (8, pp. 813–814). Regardless, this text also notes that ACE-inhibitors are to be considered less effective in blacks than whites (8, p. 823). The most recent edition of Clinical Hypertension (2005) concurs that blacks respond less well to monotherapy with “renin suppressing drugs” such as ACE-inhibitors (9, p. 281), which is the same message in Hypertension Primer (2003): blacks respond less well to monotherapy with “renin suppressing drugs”, although ACE-inhibition may still be an important treatment for this group, albeit at higher doses (10, p. 265).
This consensus is also reflected in most of the substantial and comprehensive review articles on this topic (Gibbs et al. 1999; Douglas et al. 2003) as well as in official treatment guidelines, including The Seventh Report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure (JNC 7). This regularly updated consensus document represents the state of clinical knowledge on hypertension treatment in the United States (Chobanian et al. 2003). Although the report recommends first line treatment with thiazide-type diuretics for all demographic groups, it also notes, under the heading “other special situations,” that “[M]onotherapy with … ACEIs … lowers BP to a somewhat lesser degree in African Americans than Whites.” (14, p. 39). British Hypertension Society Guidelines go beyond race as an interesting aside, and recommend an overtly racialized treatment algorithm. The 2004 recommendations state that “Blacks” and those older than 55 years of age should have first line treatment with calciumchannel blockers or thiazide-type diuretics, whereas “Caucasians” and those under 55 years of age should be treated first with ACE-inhibitors, angiotensin receptor blockers, or β-blockers (Williams et al. 2004).
The consistently proffered explanation common to all of these sources is that hypertensives can be classified into etiologic subtypes that are “high renin” or “low renin”, and that antihypertensive drugs which inhibit the renin-angiotensin system, such as ACE-inhibitors, are logically prone to be less effective among “low-renin” hypertensives. In justifying their race-guided approach, the British Hypertension Society guidelines note that studies measuring plasma renin levels have reported that younger people and “Caucasians” have higher renin levels than do older people or “Blacks” (which they define as people “of African descent”). This argument has at least two potential problems. First of all, the cited studies on plasma renin concentrations also indicate that such measures are not effective at predicting subsequent blood pressure response to treatment (Sagnella 2001). Secondly, even if a statistically significant difference existed between the two racial groups in the mean blood pressure response, it would not necessarily be a rational basis for individual therapeutic decisions, since an individual patient may fall far above or below the mean of their particular population (Kaufman and Cooper 2010).
ACE-inhibition for Congestive Heart Failure
Along with the popular use of ACE-inhibition as an anti-hypertensive, there also appeared in the 1990s's increasing evidence of its efficacy for treatment of congestive heart failure (The SOLVD Investigators 1992). By the end of the decade it had become a recommended standard of care for heart failure patients with low ejection fraction to be treated with an ACE-inhibitor alongside a variety of other therapies (Consensus recommendations for the management of chronic heart failure, 1999). Nonetheless, the focus, perhaps predictably, soon returned to race as a likely modifier of this effect. A secondary analysis of the second Vasodilator–Heart Failure Trial (V-HeFT II) published in 1999 reported that although ACE-inhibition therapy reduced risk of death for whites, there was no comparable benefit for blacks (Carson et al. 1999). In fact, there was a formal statistical test of effect measure heterogeneity published in this article, which did not reach standard levels of “statistical significance”, but the idea quickly became entrenched nonetheless.
In 1999 Dries and colleagues published a secondary analysis of the Studies of Left Ventricular Dysfunction (SOLVD) data, which combined two trials of the ACEinhibitor enalapril in contrast with placebo among heart failure patients (Dries et al. 1999). This report highlighted the worse prognosis for black patients in the trials, including a 28% higher overall death rate, but showed ambiguous results for the hypothesis of a race-specific treatment benefit, due primarily to the relatively small number of black participants. This led the authors to try another analytic strategy with the same data in order to make a more convincing case that black patients, as in the case of anti-hypertensive therapy, were resistant to the benefits of ACE-inhibition for treatment of heart failure.
Exner and colleagues published another re-analysis of the SOLVD trials in the New England Journal of Medicine in 2001 (Exner et al. 2001). This new study once again compared enalapril with placebo in black and white heart failure patients and concluded much more definitively that the authors had identified a racial pattern in drug efficacy. They argued that the therapeutic recommendations for ACE-inhibitors among heart failure patients should apply to white patients but not necessarily to black patients, and that, in general, many future treatment recommendations may need to be made according to patients' racial backgrounds. Indeed, the reported finding has been cited widely as a basis for the broader credibility of race-specific medicine, (Kaufman 2008) used for example to justify new requirements for racespecific reporting of all clinical trial data (Haga and Venter 2003).
Exner and colleagues devised an analytic strategy to deal with the enormous under-representation of black subjects in the SOLVD data set. They matched each of the black subjects with one to four white subjects on several measured factors (trial, ejection fraction, assigned therapy, sex, and age group), and then discarded the remainder of the white subjects. The idea was to create a subset of the original observations in which blacks and whites were more approximately balanced on these five factors, although the logic of this strategy has been questioned elsewhere (Kaufman 2008). A total of 6,797 subjects met the initial inclusion criteria, 800 of which were categorized as black. Of the 5,719 whites in the study, 1,196 (21%) were matched with black patients, leaving 79% to be discarded. Most black subjects (72%) found only a single white match, which explains the roughly comparable final sample sizes in the two race groups, and 14 black participants (2%) found no match at all, and were thus discarded as well.
Assigned drugs were administered by individual physicians in 23 different clinical centers, and although study subjects were randomly assigned to receive an initial dose of 5 mg of enalapril or placebo daily, each physician had the freedom (and obligation) to adjust the dosage up to 20 mg daily, as necessary to control the patient's disease. Because of this ethically mandated dynamic treatment regime, physicians obviously could not be blind to treatment group, nor to race as a factor in making such decisions (The SOLVD Investigators 1990). Furthermore, although the analysis of enalapril versus placebo is unconfounded by virtue of the randomized design (at least at baseline), no such logic applies to any comparison between blacks and whites, which was the focus of the new re-analysis (Kaufman 2008). The authors therefore adjusted statistically for additional measured covariates, including medical care factors such as other prescriptions and social factors such as self-report of recent financial distress and level of attained education. The two primary outcome measures were deaths from any cause and hospitalization for heart failure, for both of which blacks had substantially higher risk. In the case of heart failure hospitalization, for example, the event occurred for 238 matched black participants (30%) and 226 matched white participants (19%), which is a 60% excess risk for blacks.
Examination of the matched data revealed substantial covariate differences between the black and white patients at baseline: black patients were younger, had higher blood pressure, more recent financial distress, lower levels of education and lower levels of concurrent medication. There is little doubt, therefore, that they differed in numerous unmeasured ways as well, making the reported analysis confounded to an unknown extent (Kaufman et al. 1997). Nonetheless, despite the profound differences between black and white patients, even after matching, in risk of the outcome among the placebo group, comorbidities, social factors and prognostic indicators, the authors focused on the observation that there was an adjusted 49% reduction in the risk of heart failure hospitalization for the treated group of matched white patients, whereas among the black patients the adjusted risk reduction was only 14%, and not significantly different from the null at the usual p<0.05 criterion. It was this finding which was used to support the general conclusion: “the overall population of black patients with heart failure may be underserved by current therapeutic recommendations….[I]t seems appropriate to consider current therapeutic recommendations as applying to white patients but not necessarily to black patients.”
BiDil: the back story
In order to put the Exner et al findings in context, one needs to know that Jay Cohn, the senior author on the paper, appears to have had an undeclared conflict of interest. A combination of two generic vasodilators (hydralazine and isosorbide dinitrate) was tested as an alternate heart failure treatment in the Vasodilator Heart Failure Trial (VHeFT I) in 1986 (Cohn et al. 1986). Cohn obtained a patent for this combined treatment, now called “BiDil”, and licensed the patent to a pharmaceutical firm called Medco, which then conducted additional studies in preparation for submitting the therapy for FDA approval (Sankar and Kahn 2005). But the FDA rejected the application on the rather technical objection that the V-HeFT trials were not designed adequately to meet FDA requirements for demonstration of clinical safety and efficacy. The patent thus reverted back to Cohn, who then sifted through the old VHeFT data for some way to revive some commercial angle for BiDil. In 1999 he published a paper asserting a stronger effect of BiDil in black patients, but this finding was tempered by the very limited statistical power associated with a small number of black participants in the old data (Carson et al. 1999). Nonetheless, he then issued a new license for the drug to the pharmaceutical firm NitroMed and applied for a new patent to cover a use of BiDil specifically in black patients (Kahn 2004). To make the case that BiDil was needed as a new treatment especially for blacks, it was first necessary to show that the primary existing therapy (i.e. ACE-inhibition) was not effective for this group. The authors were not subtle about their use of the publication to support development of BiDil as an alternate therapy. In fact, the two-line conclusion sentence of the published abstract in the New England Journal of Medicine reads as follows: “Enalapril therapy is associated with a significant reduction in the risk of hospitalization for heart failure among white patients with left ventricular dysfunction, but not among similar black patients. This finding underscores the need for additional research on the efficacy of therapies for heart failure in black patients.” Thus, ACE-inhibition, already conveniently racialized in so much of the literature on its antihypertensive effects, needed to “take the fall” for Cohn's new race-specific BiDil patent, and the Exner et al article delivered the goods: within 6 weeks of the publication of the New England Journal of Medicine article, NitroMed issued a press release announcing $31.4 million in private financing to push BiDil through FDA approval, a goal that was eventually reached in June of 2005 (Kahn 2004).
Problem SOLVD
Like most clinical trials, the SOLVD study made severe restrictions on who was permitted to be enrolled, which aimed to increase internal validity at the expense of generalizability. For example, in addition to primary criteria based on age and ejection fraction, SOLVD participants also had to be free of an additional 26 listed conditions, including pregnancy, liver disease, myocardial infarction, pericarditis, uncontrolled hypertension, unstable angina, and cancer. Patients were also screened to assess compliance, and those who did not take at least 80% of an initial dose over the first 2 weeks of observation were excluded from the rest of the study (The SOLVD Investigators 1990). Because many of the 23 clinical sites did not include substantial numbers of black patients (for example, one center was located in Belgium), confounding by center is an immediate concern for any racial comparison based on these data that would presume to make a broad statement about “blacks” and “whites” in general. That is, recruited heart failure patients in Brussels, Belgium and Halifax, Nova Scotia may differ from those in Chicago, Illinois and Birmingham, Alabama in factors other than (or in addition to) race.
While trials must necessarily emphasize internal over external validity, for practical and ethical reasons, the comparison of blacks and whites in the Exner et al article tosses away all the original benefits of the randomization that would offset the poor generalizability. In a randomized trial, even though the population is highly selected, one can at least support a valid statistical inference for the causal effect of randomized treatment in that study population (Greenland 1990). But in the secondary analysis of the SOLVD data by Exner and colleagues, the analysis conditions on a variable that was not randomly assigned: race. A statistical comparison between blacks and whites therefore retains none of the inferential power of the original comparison between treatment and placebo groups, and yet has none of the generalizability of a representative sample. The only hope for interval validity would be to have measured enough covariates so that confounding could be minimized, but these seems doubtful, especially since hospitalization for heart failure is an outcome that is so sensitive to issues of patient compliance, knowledge, ability, resources and context. The analysis of this outcome is further compromised by the fact that decision to hospitalize is not made by people blinded to the treatment of interest in this analysis (race), and that the outcome may vary by practice habits in different regions where proportions of black versus white patients may be very different.
An exploratory re-analysis
We obtained the public release version of the SOLVD data set in order to replicate the published analysis and investigate how sensitive it is to modeling decisions and assumptions. Analysis of the data received an expedited approval from the University of North Carolina at Chapel Hill Public Health Institutional Review Board (Study Number 07-0452). Exact replication of the published matching analysis was not possible because precise details of matching algorithm employed were not made available in the published documentation. Nonetheless, all study variables were identified, and the published results were approximately reproduced (to the second decimal place) using a matched analysis that followed all procedures described by the authors and made educated guesses for the specific details omitted. Analyses were conducted in Stata Statistical Software Version 10 (Stata Corp, College Station, Texas).
The final modeled analysis reported in the Exner et al article was a Cox proportional hazards model for the effect of race (black or white) on the two predefined outcomes (death from any cause or hospitalization for heart failure), the effect of treatment assignment to enalapril (ACE-inhibition) or placebo, and the product-term interaction between race and treatment. In addition to matching on the trial, ejection fraction, assigned therapy, sex, and age group, the authors also adjusted in the model for the following covariates: age (to control for residual variation in age within the matched age categories), New York Heart Association functional class, educational level, presence or absence of financial distress in preceding 12 months, cause of left ventricular dysfunction, presence or absence of a history of hypertension, presence or absence of a history of diabetes, and use or nonuse of beta-blockers at base line. For the outcome of all-cause mortality, the findings were completely null. Exner et al. reported estimated hazard ratios from this final adjusted regression model of 0.85 (95% confidence interval 0.64–1.14) for matched black patients and 0.92 (95% confidence interval 0.72–1.17) for matched white patients. There is therefore no evidence of heterogeneity of the effect of ACE-inhibition by race on this outcome, a finding that was successfully replicated in several subsequent studies (Dries et al. 2002; Shekelle et al. 2003). Indeed, the reanalysis by Dries et al, which included on the SOLVD prevention trial, also identified no apparent heterogeneity by race in the effect of enalapril on development of heart failure symptoms, or on several combined endpoints (Dries et al. 2002).
In the case of hospitalization for heart failure, however, the results published by Exner et al were more consistent with the authors' interest in justifying the need for a new therapy for black patients. Estimated hazard ratios from the adjusted regression models were 0.86 (95% confidence interval 0.64–1.16) for matched black patients, and 0.51 (95% confidence interval 0.37–0.70) for matched white patients. The product interaction term between assigned treatment and race for the adjusted regression model had a p-value of 0.005, which rejects (at the traditional significance level of 0.05) the null hypothesis of equality of the two adjusted hazard ratio estimates across race groups. This analysis relies on many statistical assumptions. For example, the hazard ratio is assumed to be constant over time, there are assumed to be no important unmeasured confounders, non-compliance with assigned therapy is assumed to be non-differential by race, measurement error of the covariates is assumed to be non-differential by race, the model is assumed to be correctly specified in form (e.g. covariates enter multiplicatively and are linear in the log hazard, etc). All of these assumptions are standard, and would therefore be typical of articles in The New England Journal of Medicine or other comparable biomedical journals.
Although the published results were easily replicated from the raw data, several interesting aspects of the analysis immediately came to light. The first was that the matching strategy was largely unhelpful, in the sense that the regression adjustment for all modeled variables using the entire data set recovers almost exactly the same effect estimates. In fact, if matching had any important impact at all, it was a deleterious impact. The adjusted hazard ratio for hospitalization for all white patients was 0.56 (95% confidence interval 0.48–0.64). This is (ln(0.51) -ln(0.56))/ln(0.51)=14% larger than in the matched analysis, but the standard error of the adjusted ln (hazard ratio) is 0.07, compared to 0.16 in the matched model. Therefore, the mean square error (MSE) in the matched analysis is (0.16)2=0.026, compared with MSE=(0.07)2 + (0.05)2=0.007 in the model with regression adjustment instead of matching. This means that the reported analysis is needlessly less precise, owing to throwing away approximately 80% of the data in which a match could not be found. To be fair, the matching strategy could potentially have been advantageous if the regression model were horribly misspecified (Ho et al. 2007), but the authors could have checked this and observed that their analytic strategy in this instance was pointless.
The next surprising result from the re-analysis is that the effective sample size in the fully adjusted regression models doesn't come close to the full sample size as reported by the authors in the published paper. Missing variables on some of the covariates appear to force the loss of about 30% of the observations, from a complete sample size 6,516 (5,717 whites and 799 blacks) to a sample size of 4,564 in the fully adjusted model (i.e. a loss of 1952 observations). Inspection of each of the variables revealed that it was the two social variables, education and financial hardship (Tables 1 and 2) that accounted for almost all of the missing observations. It is astonishing to refer back to the original article and observe how meticulously the authors obscured this information in the regression output tables, making it impossible for the reader to discern that the analysis sample sizes were not the same as those reported in the descriptive tables that appeared earlier.
Education by race in the SOLVD data set
Financial distress by race in the SOLVD data set
Not only is this magnitude of missing data disastrous for the precision of the effect estimates, but also quite troubling because of the possibility of selection bias if missingness is not random with respect to important predictors of the outcome. Remarkably, however, these two variables turn out to not be very important in the fit of the final model anyway. Fitting the same regression model without adjustment for education and financial distress leads to adjusted hazard ratios of 0.58 (95% confidence interval 0.51, 0.66) for whites and 0.94 (95% confidence interval 0.73–1.22) for blacks. The treatment by race interaction term has approximately the same magnitude as well, but is now even more “significant” due to the increased power accrued by not tossing out an additional third of the observations.
The authors might understandably have preferred to report adjustment for some measure of social class, especially because this is a sensibly important confounder for an outcome like hospitalization, affected as it is by individual knowledge and behavior. But if it was important, it really needed to be measured on all the individuals in the study. While it does seem odd, in substantive terms, that social variables would not account for much confounding in this setting, the explanation may be that these are measured so crudely that they have little relevance. For example, both a millionaire stockbroker and a homeless man may declare that they experienced financial hardship in the last 12 months, but this does not make them equivalent in social class.
The next source of grave concern in the published analysis is that the authors chose to completely ignore the one variable with the greatest potential to confound the results, even though it was measured on all participants: clinical center. Because of different case mixes, different practice patterns, and different social contexts, it is likely that assigned treatment would be heterogeneous by clinical center, and racial composition would also differ dramatically by center. This would therefore be the most important variable to control in any multi-center study, and yet it appears nowhere in the analysis. The public access data does not include the identities of the 23 clinical centers, only anonymous numeric identifiers (presumably because of the small numbers of individuals in some centers and the resulting risk of deductive disclose). Nonetheless, we can observe that there were from 108 to 549 patients studied in each center, with proportions black ranging from less than 1% for two of the largest centers to over 77% in another.
The centers are too small to generate stable stratum-specific hazard ratio estimates (the number of hospitalizations within centers ranges from 18 to 111), but can be grouped into 12 geographic regions, with total sample sizes ranging from 247 to 911. Each of these regions is comprised of anywhere from 1 to 4 of the original 23 clinical centers, although once again, the specific identities of the regions are suppressed in the public access data. Regions are also highly correlated with race: 3 regions have less than 1% black participants, and one region has over 77%. The numbers are sufficient, however, to adjust for region in the models, or to examine region-specific effects. When the analysis is stratified by region rather than race, with adjustment for all other covariates in the published paper except for education and financial hardship, the resulting hazard ratio estimates for the effect of enalapril versus placebo vary from 0.32 (95% confidence interval 0.17–0.59) in region 10 (442 whites and 16 blacks) to an adjusted hazard ratio of 1.07 (95% confidence interval 0.62–1.82) in region 4 (138 whites and 109 blacks). This is a much more extreme pattern than was observed for race in the original publication. This result may be confounded by race, but there is no way to separate the effects statistically within strata, since there are essentially no black participants in several regions. The authors could have just as easily written their paper focusing on the heterogeneity in treatment effect by region instead of by race. We can only assume that this was not an attractive analytic choice because the senior author held a patent for a racespecific application of an alternative medication, rather than a region-specific application. Fitting race-stratified models with a region-by-treatment interaction instead of a race-by-treatment interaction produces an adjusted hazard ratio for whites of 0.52 (95% confidence interval 0.34–0.80), and for blacks of 0.53 (95% confidence interval 0.26–1.08). The imprecision of the latter estimate is an inevitable consequence of attempting to adjust across regions that are so racially imbalanced, reinforcing the fact that the variables are hopelessly confounded in these data. Nonetheless, this analysis is no less reasonable than the published analysis, and yet generates estimated effects for blacks and whites that are almost identical. Our purpose is not to claim that our estimates are necessarily more correct, but merely to show that an analyst could easily and validly come to a conclusion that is opposite of the published result. In the situation where region and race are so completely confounded, an analysis that completely ignores region seems at vey least incomplete, if not irresponsible or even unethical.
An attempt at a clearer picture
A popular statistical technique for assessing balance and for reducing dimensionality is the propensity score, developed by Rosenbaum and Rubin (Rosenbaum and Rubin 1983). These authors proved that one can predict treatment in a first stage model, and then use the predicted probability of treatment as the sole covariate in further modeling of the treatment effect. Moreover, substantial imbalance of measured covariates within propensity score strata can be used to invalidate a causal interpretation for an estimated effect parameter. We used this technique in the analysis of the SOLVD data set because it allows for a highly parameterized first stage model of covariates on treatment, and then a relatively transparent contrast of treatment groups in the second stage of the model. In this case, our binary treatment is race (black = 1), and so the first stage of this model predicts the probability that a SOLVD participant is black, as a function of all available clinical variables. We specified this model using the following variables: diabetes status, ejection fraction, history of hypertension, history of stroke, sex, age, reported alcohol consumption, New York Heart Failure Class (I–IV), SOLVD trial (prevention of treatment), betablocker use, diuretic use, systolic blood pressure, diastolic blood pressure, history of myocardial infarction and clinical center (1–23).
Fitting this as a standard logistic regression model, we were able to almost completely differentiate between black and white subjects on the basis of these clinical factors alone (education and financial distress were not used because of the large number of missing observations). Figure 1 shows that the distribution of propensity scores (the predicted probability that a given SOLVD participant is black) is almost completely non-overlapping between the actual black and white sub-populations.

Propensity score distributions by race
The median predicted probability for whites is 0.03, whereas the median value for blacks is 0.42. The conclusion is that information on race is almost perfectly predicted by the medical covariates alone, making it essentially impossible to identify a race-specific effect, and making it clinically useless to attempt to do so (since a clinician could identity almost the exact same subsets of individuals by using clinical variables instead of race). As a quantitative summary of this overlap of information, the area under the ROC curve for this prediction is 0.88. This suggests that any attempt to “adjust” for these measured covariates in order to identify the unique effect of race would be poorly supported in this data set.
While we can't hope to isolate an independent effect of race, we can use this data reduction strategy to depict the effect pattern and specifically how it varies across background variables. Figure 2 shows a series of stratified Cox proportional hazards models based on calipers of propensity score with width=0.2. Each model is fit within the defined window, estimates for blacks and whites are obtained, and then the window shifts up by 0.01.

Effect of enalapril therapy on first hospitalization for heart failure. Race-specific Cox models with a single predictor variable within a caliper of 0.20 of the propensity score
The lowest propensity score widow is centered at a value of 0.10 (since it is 0.20 units wide and cannot extend lower than zero), and as shown in the box plot at the bottom of the graph, this is already at the 75th percentile of the distribution of white subjects. There are very few white subjects with propensity scores higher than 0.2, and these are the ones that would presumably have been matched to black patients in the published Exner et al analysis.
This graph reinforces the previous point that the data set does not easily support a direct comparison of black and white patients since they were drawn from largely non-overlapping regional and medical sub-populations. Black patients had more severe disease, more comorbidity, and had a higher baseline probability of heart failure hospitalization. There is a indeed a range of propensity scores in which blacks have a lower estimated effect of ACE-inhibition therapy (roughly propensity score 0.25–0.50), but there is also a range where the effects do not appear heterogeneous (e.g. below 0.25) or where blacks actually have a differentially superior effect of treatment (e.g. above 0.60). The matched data set constructed by Exner et al included only the ∼20% of the original SOLVD data that falls roughly in the middle of this plot. Thus, the published conclusions appear to follow more from the poorer medical status of blacks at recruitment than from any intrinsically poorer response of blacks to treatment. This is especially remarkable given that the authors described their results as pertaining to “the overall population of black patients with heart failure.”
Moreover, the propensity score analysis reveals the continued imbalance by clinical center (Fig. 3). In the lowest range of propensity scores (<0.3), for example, only a tiny fraction of the study participants are from region 7, and yet this region contributes more than half of all observations in the range of propensity scores above 0.5. In this upper range of propensity scores, however, regions 3 and 8 are exceedingly rare, and regions 10 and 11 are completely absent. Recall that only clinical variables like comorbidity and prescription drug history are used in these models, and so it is clear that the patients are quite heterogeneous by region in factors that could be considered by a physician without ever resorting to race as a crude surrogate.

Proportion of study participants from each region by propensity score range
Conclusions
The re-analysis of the SOLVD study described here suggests that it was a poor data resource for addressing the question of interest to the authors of the Exner et al. paper. Social covariates were crudely defined, and had large proportions missing, the latter fact being assiduously hidden from view in the published study. Black subjects were recruited mostly in a few regions, and were medically distinct from white patients in terms of clinical measures such as hypertension, diabetes and prescription drug history. Overall, black subjects had a high risk of the outcomes, especially in the case of first hospitalization for heart failure, and the effect of treatment varied widely by clinical center or by region. The fact that the authors of this study chose to privilege race as the focus of their analysis may derive more from their material and ideological motivations than from any objective feature of the data, since the data set lends itself just as readily to conclusions that do not support the notion of any important racial disparity in ACE-inhibition efficacy. It does seem from the data that there are subgroups who received greater or lesser benefit from the study drug, but it takes a leap of faith (or perhaps some other more worldly incentive) to organize these differences around race, rather than around any of a number of other possible explanations. While we cannot know the exact line of thinking that generated the original paper in its published form, the background story of BiDil's development as an alternate therapy for black heart failure patients and the apparent conflict of interest this implies seems an important consideration in properly interpreting the stated results.
ACE-inhibition remains a primary treatment for hypertension and for heart failure in all racial sub-populations. Nonetheless, the analysis by Exner et al is highly cited as an example to demonstrate that treatments may need to be tailored to racial groups. A closer look at these data, however, reveal a much more ambiguous picture. The clear facts are that black participants in the SOLVD trial were sicker, and that the sicker patients were less likely to be kept out of the hospital by being assigned the study drug. In a society in which blacks make up a disproportionate part of the sickest patient population, this adverse profile can bemade to appear to be a function of race in some intrinsic sense. We view it as unfortunate and embarrassing episode in the annals of medical research that this interpretation was used to help motivate the marketing of a niche drug for black patients. We expect that future generations of scientists will likely look back on such episodes with the same curious disdain with which we now muse over discredited medical practices such as blood-letting and Mesmerism.
Footnotes
Acknowledgment
This work was supported in part by a Robert Wood Johnson Foundation Investigator Award in Health Policy Research. The views expressed imply no endorsement by the Robert Wood Johnson Foundation.
