Abstract
Users of logistic regression models often need to describe the overall predictive strength, or effect size, of the model’s predictors. Analogs of R 2 have been developed, but none of these measures are interpretable on the same scale as effects of individual predictors. Furthermore, R 2 analogs are not invariant to the base rate (overall proportion of successes), making it difficult to compare effect sizes across data sets. The authors propose a measure of overall effect size that is interpretable on the same scale as effects of individual predictors and is invariant to the base rate. They explore the properties of the overall odds ratio and illustrate its use through an example. They also provide interpretive guidance and illustrate how statistical software can be used to compute the proposed measure.
Introduction
In this article, we propose an overall effect size measure for multiple logistic regression (MLOGR) models. We first discuss common measures of overall effect size: classical R 2 applied to multiple linear regression (MLR) and R 2 analogs applied to other generalized linear models (GLMs). We then discuss how the population variance of the model’s multiple linear predictor (MLP) represents overall effect size for GLMs and for MLOGR how the population overall odds ratio is a simple function of the variance of the MLP. Next, we show how to estimate the variance of the MLP (and overall odds ratio) using the sample variance of the MLP. Because this variance estimator (and overall odds ratio estimator) is generally biased, we propose a method to correct the overestimation problem. Then through a simulation study, we explore the properties of the overall odds ratio estimator. An example is presented where the overall odds ratio is used to measure overall effect size of models for college retention. We provide practical guidance, including effect size interpretation and a SAS macro, for researchers who wish to use the proposed measure. We then summarize our findings and discuss directions for further research.
Measures of Overall Effect Size: Classical R 2 and R 2 Analogs
Under the fixed effects MLR model, E(Y) = η where
MLOGR is used across all disciplines to relate a set of predictor variables, (x
1, x
2, . . . x
p
), to a dichotomous outcome, Y. Under the fixed effects MLOGR model with a 0,1 coding for Y, E(Y) = {1 + exp(–η)}–1 where
Users of MLOGR have the same need as users of MLR to describe the overall effect size of models with multiple predictors. To meet this need, R
2 analogs have been developed that share some properties with classical R
2. Cox and Snell (1989) described a generalization of R
2 for any linear model as a function of the intercept, sample size n, and likelihood of the fitted model. This R
2 analog, which Menard (2000) referred to as the “unadjusted geometric mean square improvement” and called
Menard (2000) examined five of the R
2 analogs applicable to MLOGR models, including
Menard (2002) suggested that
The Population Variance of the Multiple Linear Predictor
Classical R
2 is defined as the estimated proportion of variation in the outcome that is explained by the model’s predictor variables. Let ρ2 represent the proportion of variation in the population’s outcomes that is explained by the MLR model. Hence,
GLMs are applied to outcome variables whose distribution is in the exponential family, taking the form exp{(Y – b(θ))/a(φ) + c(y, φ)} (McCullagh & Nelder, 1989, p. 28). Given this functional form, the mean of the outcome variable Y is b′(θ) and the variance is b″(θ)a(φ) (McCullagh & Nelder, 1989, p. 29). Hence, a(φ) is the component of the variance that is independent of the mean and the quantity
In Equation 1 and for the remainder of this article, we assume that the predictor variables are standardized. Similar to ρ, var1/2(η) can be interpreted as an overall regression coefficient for GLMs. For example, for the case of MLOGR, each regression parameter represents the change in the log odds of success for each standard deviation increase in x. And, for any constant k, because var1/2(η) = ln {p 1(1 – p 1)–1} – ln {p 0(1 – p 0)–1} where p 1 = {1 + exp(–η1)}–1, p 0 = {1 + exp(–η0)}–1, η1 = k + var1/2(η), and η0 = k; var1/2(η) represents the change in the log odds of success for each standard deviation increase in the MLP, η. Generally, regression coefficients from MLOGR are difficult to interpret because they represent the change in the log odds of success. The preferred logistic regression effect size measure for individual predictors seems to vary across disciplines. Marginal effects and elasticities (Greene, 2002) are often used in economics and other disciplines to express effect sizes of individual predictors. More commonly, in education, psychology, and many other areas, it is recommended that the odds ratios associated with individual predictors, given by exp(β), are reported (Peng, Lee, & Ingersoll, 2002). Accordingly, when reporting overall effect size, it makes sense to report an overall odds ratio. For MLOGR, exp {var1/2(η)} represents the population overall odds ratio (OOR) because it is the odds ratio associated with each standard deviation increase in η. Because the odds ratio is commonly used and is well understood as an effect size measure, the OOR leads to a more intuitive understanding of effect size than that offered by R 2 analogs. In the section that follows, we introduce the estimator for the PVMLP, the sample variance of the MLP (SVMLP).
The Sample Variance of the Multiple Linear Predictor
An estimator for var(η) (the PVMLP) can be obtained by substituting sample estimates for the population parameters in Equation 1 as follows:
Classical R
2 is a special case of the SVMLP; it is given by Equation 2 when y is scaled to have variance of one. Analogous to R,
For the single predictor logistic regression model,
Bias Correction for the SVMLP
Generally, the expected value of
In this expression, p represents the number of regression coefficients, not including the intercept. The overestimation problem has also been documented for R
2 analogs, and Liao and McGee (2003) proposed adjustment methods for
To quantify the SVMLP’s bias, we first consider the expected value of the SVMLP for the special case of uncorrelated predictor variables (r
k
l = 0 for all k ≠ 1):
Using Cordeiro and McCullagh’s (1991) bias approximation, we have
Now, because we have assumed that the predictor variables are uncorrelated, we make the simplifying assumption that
This bias estimator gives rise to a method of adjusting the SVMLP to correct for bias:
In developing our bias estimator, we assumed that the predictor variables were uncorrelated. Therefore, when estimating the bias, the columns of the design matrix X (the matrix should have p columns—one corresponding to each predictor variable) should be orthogonalized so that the pairwise correlations of predictors are forced to zero. This can be done via Gram-Schmidt orthogonalization (most matrix algebra textbooks describe this procedure; for example, see Horn & Johnson, 1990, pp. 15–16). The result of the orthogonalization is that a linear transformation will be applied to the predictor variables in such a way that the pairwise correlations of the transformed variables will be 0. Standardize the transformed predictor variables. Fit the MLOGR model using the orthogonalized and standardized predictor variables and obtain estimated regression coefficients
Compute the unadjusted SVMLP as
When the number of predictors is large relative to n, it’s possible for the bias estimate to be greater than the unadjusted SVMLP so that the adjusted SVMLP is less than zero. In this case, the adjusted SVMLP should be set to zero, suggesting no overall effect of the predictors.
Our adjustment method is designed to nullify the bias of the SVMLP. This adjustment will not necessarily eliminate the bias of effect size measures that are functions of the SVMLP. For example,
Behavior of Overall Odds Ratio Estimator
Through a simulation study, we examined the behavior of the OOR estimators (adjusted and unadjusted) for MLOGR models. For notational simplicity, we label the OOR parameter and statistic as θ and θ̂ where θ = exp {var1/2(η)} and
For each simulation condition, we monitored statistics across 1,000 simulated data sets including the overall base rate (y̿) and the mean and median of the unadjusted OOR estimates (
In addition to studying the behavior of the OOR estimators, we also studied its relationship to the preferred R
2 analog—the “log likelihood ratio” R
2(
Table 2 reports the simulation results examining the behavior of the adjusted and unadjusted OOR estimators and examining the relationship of the unadjusted OOR estimator with the R
2 analogs. We see that the unadjusted OOR estimator is severely biased when the sample size is small. The bias is most severe for n =50 (compare
The OOR estimators are especially imprecise with small sample size, large number of predictors, and large effect sizes. In condition 13, the population OOR is 14.65, indicating that the odds of success increase by a factor of 14.65 for each standard deviation increase in the MLP. The sample size is just 200, and so we see considerable bias in the unadjusted OOR estimator. The mean of the adjusted OOR is 15.49 and the adjusted RMSE is 13.46, suggesting that the adjusted OOR has little bias but is imprecise when the sample size is small.
The median of the adjusted OOR estimator tends to be smaller than the population OOR when the sample size is small and the effect size is large (see
The adjusted and unadjusted OOR estimators tend to be less biased as the base rate approaches .5. Conditions 11, 14, and 15 are the same except for the intercept β0—which varies as −2, 0, and 2. We see that the RMSE is the smallest for β0 = 0 (.41 compared to .73 and .74), which corresponds to a base rate of approximately .5. Similarly, the adjusted RMSE is the smallest for β0 = 0 (.18 compared to .30 and .31).
From Table 2, we see that the unadjusted OOR estimator has a strong linear relationship with the R
2 analogs within each simulation condition. The exception to this is condition 13, where the sample size is small and the true overall effect size is extremely large. In this case, the estimated OOR values are likely to vary wildly (see the large RMSE and adjusted RMSE) while the R
2 analogs are constrained to the [0, 1] interval. Typically, the correlations of the OOR estimates and R
2 analogs are greater than .9. The correlations with the OOR estimator are consistently higher for
While the OOR is invariant to the base rate, our simulations confirm Menard’s (2000) finding that
Example: Measuring the Incremental Validity of Psychosocial Factors in Predicting College Retention
Robbins, Allen, Casillas, Peterson, and Le (2006) studied the degree to which measures of psychosocial factors predicted first-year college outcomes. They conducted a large-scale study where 48 institutions and 14,464 incoming college students were sampled. The goal of the study was to measure the incremental predictive validity of the Student Readiness Inventory (SRI), an assessment given to students that measures certain psychosocial factors that may be related to success in college. Robbins et al. measured the incremental predictive validity of the SRI by modeling college retention with SRI scores after first adjusting for postsecondary institution, high school grade point average (GPA), and standardized achievement test scores. The SRI consists of 10 individual scales, each measuring a psychosocial factor possibly related to success in college: Academic Discipline, Academic Self-Confidence, Commitment to College, Communication Skills, Emotional Control, Goal Striving, General Determination, Study Skills, Social Activity, and Social Connection.
A series of MLOGR models was fit that measured the incremental validity of the SRI for predicting college retention. The final model in the series contained certain SRI scale scores, standardized achievement test scores, high school GPA, and postsecondary institution as predictors. Each prior model was nested within the final model so that the increase in the model’s overall effect size could be assessed for each group of predictors. In Table 3, we present overall effect size estimates for the nested MLOGR models. The sample represented in Table 3 includes students in the SRI national validity study who enrolled at 4-year institutions (see Robbins et al., 2006, for details). The sample sizes for the models were 6,817 for retention after the first semester and 7,554 for retention after the first year. In Table 3, we present the estimated OOR (unadjusted and adjusted),
For retention after the first semester, the base rate (overall proportion of students who dropped out) was .10; for retention after the first year, the base rate was .27. Although the adjusted OOR is nearly the same for retention after the first semester and retention after the first year (2.13 and 2.09, respectively), we see that the
For retention after the first year, the estimated OOR of the final model was 2.14. Therefore, the odds of a student returning after the first year are estimated to be 2.14 times higher for each one standard deviation increase in the model’s MLP. The model used 29 predictor variables (23 for institution, 1 for each additional predictor), not including the intercept, so there was concern for an inflated OOR estimate. However, due to the large sample size (n =7,554), we see that the adjusted OOR is not drastically different than the unadjusted OOR. We find that the model’s overall effect size is improved with the addition of SRI scores, as the unadjusted and adjusted OOR increase by 0.13.
For MLOGR, the OOR estimate can be compared to the standardized odds ratios of the individual predictors. In Table 4, we see that the odds of retention after the first year increase with increases in ACT Composite Score, high school GPA, and the SRI Academic Discipline, Commitment to College, and Social Connection scales. The odds of retention decrease with increases in Social Activity. The predictor institution is a nominal variable with 24 categories (24 4-year institutions are represented in this sample). It’s cumbersome to inspect the individual odds ratios associated with each category (institution) relative to a reference category. However, the OOR estimate allows us to see the overall impact of using institution as a predictor variable: From Table 3, we see that the OOR estimate was 1.76 when institution was the only effect in the model. This implies that institutional variation in retention rates explains much of the overall variation in retention rates.
Practical Guidelines for Using the Overall Odds Ratio
Qualitative Effect Size Interpretations
What value of OOR represents a “small,” “medium,” or “large” overall effect size? Cohen (1988) suggested qualitative descriptors of effect sizes that are commonly used in the behavioral sciences. For example, for r (Pearson’s correlation), a value of .10 is labeled small, .30 is medium, and .50 is large (Cohen, 1988, pp. 79–81). Rosenthal (1996) and Valentine and Cooper (2003) discussed the dangers of using these qualitative descriptors of effect size. Both authors cited the fact that large effect sizes are common in the physical sciences but rare in the behavioral sciences, and it is therefore misleading to have universal criteria for effect sizes. Rosenthal wrote, “Qualitative descriptors, to be useful, should be grounded in the context of social science where associations, in a strict mathematical sense, tend to be weak” (p. 42). Furthermore, Rosenthal suggested that qualitative interpretation “should be tempered and adjusted by sub-field and context” (p. 43). Following Rosenthal’s advice, we should compare the adjusted OOR estimates obtained for our example of MLOGR for college retention (2.13 and 2.09) to OOR values obtained in previous empirical studies of college retention. Of course, this is not yet possible as the OOR has not been used previously.
Even with the aforementioned limitations of effect size benchmarks, we’d like to have some idea of what represents a “small,” “medium,” or “large” OOR. In an attempt to establish these benchmarks, we first related the standardized odds ratio to ρ (the correlation coefficient). Then, by extension, we related the OOR to R 2. Our approach, described in detail in the following, was to find the odds ratio that results when a continuous variable Y has correlation ρ with a predictor X and is transformed to a dichotomous variable Y * at some cutoff point c.
First, we assume that random variables X and Y have a bivariate normal distribution, with means of zero and standard deviations of one. Then, let Y
* =1 if Y ≥ c and Y
* = 0 otherwise. Considering Y
* as the dichotomous outcome variable and X the predictor variable, the standardized odds ratio (contrasting X =1 to X =0) is
It should be stressed that different methods of relating the OOR to established effect size measures such as r or d may lead to different benchmarks. Because of this complexity and the fact that our proposed benchmarks depend on the base rate, we urge users to avoid using qualitative effect size interpretations for the OOR. Instead, we recommend only comparing OOR values in the context of the field of study. Further work is needed to establish OOR effect size benchmarks that are invariant to the base rate and related in a meaningful way to existing benchmarks.
Calculating the Adjusted and Unadjusted Overall Odds Ratio Using Statistical Software
The SAS LOGISTIC procedure for SAS version 9.1 (SAS, 2004b) does not report
Calculation of the adjusted SVMLP and adjusted OOR is not as straightforward because we must implement the bias-correcting steps described earlier. In Part 2 of the appendix, we provide a SAS macro that calculates the adjusted versions of the SVMLP and OOR as well as the unadjusted versions and
Discussion
R 2 analogs for MLOGR are not interpretable on the same scale as the effects of individual predictors. For example, users of MLOGR cannot convert an R 2 analog into an odds ratio in the same way that users of MLR can convert classical R 2 into a standardized regression coefficient. Users of MLOGR can use the OOR to measure the overall predictive strength, or effect size, of the model’s predictors. Similar to R applied to MLR, the OOR is interpretable on the same scale as individual effects.
When applied to MLOGR, R
2 analogs are dependent on the base rate so that meaningful comparisons of model effect size cannot be made if the base rates are different. On the other hand, because Equation 1 does not involve β0, we know that the OOR is invariant to the base rate. Among the R
2 analogs,
The overall effect size of a model’s predictors cannot be assessed by simply inspecting the estimated regression coefficients. As seen in Equation 1, the PVMLP is a function of regression coefficients but also depends on the inter-correlations of the predictor variables. If the regression coefficients are all positive (or all negative), the overall effect size will be greater if the predictors are positively correlated. This is clearly seen when we inspect the values in Table 2. For example, if we have five uncorrelated predictor variables, each with an odds ratio of 2, the OOR is 4.71 (condition 12). However, if the five predictor variables all have pairwise correlations of .5, the OOR is 14.65 (condition 13). Because of this complexity, simple inspection of estimated regression coefficients is not a good approach for assessing a model’s overall effect size.
The SVMLP does not allow for direct comparisons of models when the outcomes have different probability distributions. For example, it would not be meaningful to compare the SVMLP from MLOGR with the SVMLP from MLR (classical R 2) because the two measures are on different scales. R 2 analogs are generally on the same zero to one scale as classical R 2, so users may be tempted to compare models’ effect sizes even when the outcomes have different probability distributions. But, this comparison is not valid because the variance of some nonnormal outcomes (e.g., Bernoulli) is tied directly to the mean and the meaning of “proportion of variance explained” depends on the probability distribution of the outcome.
Generally, the SVMLP is biased and may need correction. Although eliminating the bias of the SVMLP does not necessarily eliminate the bias of the OOR estimator (which is a function of the SVMLP), our simulation study showed that the bias is reduced considerably. Interestingly, the bias of the OOR estimator increases with overall effect size. Generally, the bias is more severe when the sample size is small relative to the number of predictors. Our proposed method for adjusting the SVMLP eliminates most of the bias of the OOR estimator. There are certainly other methods available for adjusting the OOR estimator. For example, Sapra (2002) demonstrated how the jackknife procedure (Efron, 1982) can be applied to estimate regression coefficients for the Probit model; the same general approach could be used to derive a bias-corrected OOR estimate. Future work might identify a bias estimator with better properties than our proposed estimator.
When the sample size is large relative to the number of predictors, there is little difference in the adjusted and unadjusted SVMLP, so that adjustment is not necessary.
In this work, we have restricted our study to special cases of MLOGR. Further work is needed to understand how the SVMLP could be used as the basis for measuring overall effect size for other GLMs. More work is also needed to develop a method for approximating standard errors and confidence intervals for the SVMLP and OOR so that users know the precision of the estimate of the target parameter. Further research could also explore whether the SVMLP could be extended to mixed-effect models. So far, we have only considered fixed-effect models, where the MLP has the form
In conclusion, we suggest that users of MLOGR report the estimated OOR to describe the model’s overall effect size in terms of the odds ratio, which is a meaningful and generally accepted effect size metric for dichotomous outcomes. The OOR should supplement
Footnotes
Appendix
Figures and Tables
Acknowledgements
The authors would like to thank Nancy Petersen, Justine Radunzel, and Richard Sawyer of ACT, Inc. for their reviews and suggested improvements of the article. Also, we thank Steve Robbins of ACT, Inc. for the college retention data used in the example.
