The Causal Structure of Suppressor Variables

Abstract

Suppression effects in multiple linear regression are one of the most elusive phenomena in the educational and psychological measurement literature. The question is, How can including a variable, which is completely unrelated to the criterion variable, in regression models significantly increase the predictive power of the regression models? In this article, we view suppression from a causal perspective and uncover the causal structure of suppressor variables. Using causal discovery algorithms, we show that classical suppressors defined by Horst (1941) are generated from causal structures which reveal the equivalence between suppressors and instrumental variables. Although the educational and psychological measurement literature has long recommended that researchers include suppressors in regression models, the causal inference literature has recently recommended that researchers exclude instrumental variables. The conflicting views between the two disciplines can be resolved by considering the different purposes of statistical models, prediction and causal explanation.

Keywords

suppression suppressor instrumental variable causal discovery statistical model

Multiple linear regression has been the most popular statistical model for investigating the relationship between a set of predictors (independent variables) and a criterion variable (dependent variable). Despite its long history and popularity, however, some phenomena in the use of regression analysis still remain unclear. For example, the topic of suppression is one that many applied researchers and students find difficult to understand. The question is how a variable, which is completely unrelated to the criterion variable, can be beneficial as an auxiliary regressor in regression models. Horst (1941) referred to this variable as a suppressor variable and showed that the inclusion of suppressors in multiple regression models strengthens the relationship between predictors¹ and the criterion variable and thus increases the predictive power of the regression model. Such variables are called suppressors because, according to Horst’s (1941) explanation, they suppress some criterion-irrelevant variation in predictors and thus strengthen the relationship between the predictors and the criterion.

Since the earlier literature investigated the algebraic definitions and features of suppression effects (Conger, 1974; Darlington, 1968; Lubin, 1957; Tzelgov & Stern, 1978; Velicer, 1978), recent literature has focused on whether suppression occurs in contexts other than multiple regression, for example, path analysis (Maassen & Bakker, 2001) and logistic regression (Lynn, 2003); how suppression effects are related to other statistical phenomena, for example, Simpson’s or Lord’s paradoxes (Tu, Gunnell, & Gilthorpe, 2008) and mediation and confounding (MacKinnon, Krull, & Lockwood, 2000); and how to detect suppressor variables (Pandey & Elliott, 2010; Shieh, 2006). However, no studies have yet investigated suppression effects from a causal perspective. Although mediation, confounding, and suppression may be viewed as statistically equivalent in that a relationship between two variables changes by conditioning on a third variable (MacKinnon et al., 2000), the underlying causal structures of each phenomenon are distinct. Indeed, the causal structures for mediation and confounding are already clarified in the causal inference literature and help researchers to intuitively understand the difference between the two phenomena (e.g., Greenland, Pearl, & Robins, 1999). No corresponding rationale has been provided regarding suppression. This is interesting because the very first published paper on suppression, even before Horst (1941), clearly mentioned causal reasoning in defining suppressor variables. Mendershausen (1939) wrote,

[a] clearing variate [i.e., suppressor] in the strict sense is a useful determining variate without causal connection with the dependent variate; its role in the set consists of clearing another determining (observational) variate of the effect of a disturbing basis variate. (p. 99, emphasis added)

The purpose of this article is to provide an intuitive causal framework for formalizing suppression by uncovering the distinct underlying causal structure for suppressor variables. We translate the abstract algebraic expressions for suppression from the earlier literature into natural causal languages. Interestingly, our findings reveal suppressors’ ambivalent role in scientific research. Although the educational and psychological literature has long emphasized the benefit of suppressor variables in predicting the criterion variable (Conger, 1974; Darlington, 1968; Horst, 1941; Lubin, 1957; Mendershausen, 1939; Pandey & Elliott, 2010; Shieh, 2006), we argue that suppressors can be viewed as instrumental variables—variables that the causal inference literature has recently warned against including in regression models (Pearl, 2010, 2011; Steiner & Kim, 2016). Particularly, we show that Horst’s (1941, 1966) classical suppressors are indeed equivalent to instrumental variables. In order to resolve the conflict, this article clarifies two different purposes of using statistical models: making predictions and developing causal explanations (Hernán & Robins, 2017; Kerlinger, 1964; Pearl & Mackenzie, 2018; Pedhazur, 1982; Shmueli, 2010).

This article is organized as follows. We first introduce the terminology and rules used throughout this article. After this, we present how we discover the causal structures of suppressor variables. Then, based on this found structure, we explain suppression in multiple linear regression, especially focusing on changes of regression coefficients and R². Following this, we discuss the structural equivalence between Horst’s (1941) suppressors and instrumental variables and show how causal inference researchers have used instrumental variables for making causal explanations. In the Illustration section, we use real data sets and discover underlying causal structures behind self-esteem and antisocial behavior. Finally, we discuss the theoretical and practical implications of our findings.

Terminology, Symbols, and Rules

Causal structure and statistical independence

A causal structure is a directed acyclic graph in which each variable’s causal relation to the other variable is represented by arrows such that cause → effect. As examples, in Figure 1, we present the three basic structures describing causal relations among three variables A, B, and C. In Figure 1a, A causally affects B through C such that C plays a mediating role. In Figure 1b, both A and B are caused by a common cause C. Finally, in Figure 1c, both A and B causally affect a common outcome C. More complex causal structures can be represented by combining these three basic structures.

Figure 1.

Basic causal structures consisting of three variables. (a) C plays a mediating role between A and B. (b) C is a common cause of A and B. (c) C is a common outcome of A and B.

Causal structures encode statistical (in)dependences among variables (Pearl, 1988). For example, in the two causal structures in Figures 1a and 1b, the variables A and B are not independent of each other (i.e., A and B are marginally associated), and we symbolize this as $A ∐ B$ . In contrast, A and B are marginally independent of each other in Figure 1c, and we symbolize this as $A ∐ B$ . However, if the variable C is conditioned on, the independence pattern is reversed: Conditional on C, A and B are statistically independent of each other in Figures 1a and 1b, $A ∐ B | C$ , but not independent of each other in Figure 1c, $A ∐ B | C$ .

d-separation

d-separation is a formal but intuitively understandable rule to extract (conditional) (in)dependence statements from causal structures (Pearl, 2009). A key concept for d-separation is a path in the causal structure. A path is a sequence of connected variables that appear only once regardless of the arrows’ directions. A path can be unconditionally or conditionally blocked or unblocked (i.e., open) by middle variables in the path. In Figure 1a, we find a sole path between A and B: A → C → B. This path is unconditionally open and transmits the association between A and B. This also holds in Figure 1b where the sole path is A ← C → B. However, when a common outcome exists in a path, as in Figure 1c, the path is unconditionally blocked at the common outcome and does not transmit any association. Such common outcomes are often referred to as colliders (Elwert & Winship, 2014). In Figure 1c, the variable C belongs to colliders, and thus, the path A → C ← B is unconditionally blocked at the node C. This path, however, becomes open if the collider variable is conditioned on. That is, conditional on the collider C, the path A → C ← B becomes open and the association is transmitted via the path.² If non-collider variables are conditioned on, the path is blocked at the conditioned non-collider variable. As the variable C is not a collider in Figures 1a and 1b, conditional on C, each path is blocked at the node C.

According to d-separation, for a pair of variables, if every path between the pair is blocked by a set of conditioning middle variables (including an empty set), the pair is d-separated by those variables, and then the pair is statistically independent of each other ( $∐$ ) conditional on the conditioning variables; otherwise, they are d-connected (by the set of conditioning variables) and are then associated ( $∐$ ) conditional on the variables. For more formal explanations of d-separation, see Pearl (2009) or Pearl, Glymour, and Jewell (2016).

Wright’s path-tracing rules

Throughout this article, we restrict our discussion to linear models. Also, for simplicity but without loss of generality, we assume that every variable is normalized to have zero means and unit variances. In standardized linear models, the quantity of the linear relation between a pair of variables can be easily computed by path-tracing rules (Wright, 1921). Given a causal structure, the covariance or correlation—which are identical with standardized variables—between any pair of variables is determined by the sum of products of structural path coefficients along all open paths between the pair (Pearl, 2013). In Figure 1, the Greek letters above the arrows indicate such structural path coefficients. In Figures 1a and 1b, the correlation coefficient between A and B is simply $r_{A B} = α \times β$ because the paths A → C → B and A ← C → B are unconditionally open. However, the correlation coefficient between A and B in the different causal structure in Figure 1c is zero, $r_{A B} = 0$ , because the sole path A → C ← B is unconditionally blocked at the collider node C. See Pearl (2013) for details.

Discovering Causal Structures of Suppressor Variables

Causal Structures of Classical Suppressors

Horst (1966) provided an example of what is now known as “classical” suppression in which the correlation between the suppressor variable and the criterion is zero. He found that even though there is almost zero correlation between pilots’ verbal ability and their navigational skill, if verbal ability (which is positively correlated with technical ability³) is included in the regression model, the predictive power of the model (i.e., R²) substantially increases. He claimed that pilots’ verbal ability plays a role as a suppressor in predicting navigational skill with technical ability. That is, verbal ability suppresses some irrelevant variation in the predictor (i.e., technical ability) and thus strengthens the relationship between the predictor and the criterion (i.e., navigational skill). Let X, S, and Y denote the predictor, suppressor, and criterion variables, respectively. Then, the key elements of the example can be expressed with the following three (in)dependence statements:

$X ∐ Y$ (X is predictive of Y),

$X ∐ S$ (X and S are correlated with each other), and

$S ∐ Y$ (the correlation coefficient between S and Y is zero).

One can infer the underlying causal structure behind the observed (in)dependence statements. Such inference procedures are known as causal discovery algorithms (see Pearl, 2009, or Spirtes, Glymour, & Scheines, 2000, for details). Our goal is to find a causal structure which is compatible with all three (in)dependence statements. Throughout this article, we assume that the predictor X and the suppressor S precede the criterion variable Y in time. This time ordering is typical in many prediction tasks. Using this background knowledge, we efficiently restrict the causal structure space we have to explore.

The first dependence statement $X ∐ Y$ (i.e., technical ability is predictive of navigational skill) can emerge from one of the three causal structures in Figure 2. The causal structure in Figure 2a represents that the predictor X and criterion Y are unconditionally associated via the causal relation between them: X causes Y. Note that we rule out the reversed causal structure Y → X since we assumed that X precedes Y in time (i.e., the future never causes the past). The causal structure in Figure 2b represents another scenario where X and Y are unconditionally associated via an unknown—indicated by a hollow node—third variable U. The variable U represents all types of unmeasured confounding variables that causally affect both X and Y. In the causal inference literature, such variables are frequently referred to as confounders (with respect to the treatment X and outcome Y). Then, despite the absence of the causal relationship between X and Y, they can be unconditionally associated. Finally, the first dependence $X ∐ Y$ can be due to both of the two previous structures, causal and confounding structures, as depicted in Figure 2c.⁴

Figure 2.

Causal structures creating unconditional dependence between X and Y. (a) X causally affects Y. (b) X and Y are causally affected by an unknown—indicated by a hollow node—common cause U. (c) X and Y are associated via both causal and noncausal relations.

Let us consider the second dependence statement $X ∐ S$ (i.e., technical ability is correlated with verbal ability). Based on the first causal structure in Figure 2a, the possible causal structures embracing this additional dependence are presented in Figure 3. ⁵ In Figures 3a and 3b, the association between S and X is due to the direct causal relations between them, but in Figure 3c, the association is due to a confounding structure via U. As in Figure 2c, it is possible to combine causal and confounding structures to explain $X ∐ S$ , but as this does not affect our main findings, we do not consider such complex cases here. Refer to the Appendix for such cases.

Figure 3.

Causal structures encoding two dependence statements $X ∐ Y$ and $X ∐ S$ , based on Figure 2a. (a) X causally affects S. (b) S causally affects X. (c) S and X are causally affected by an unknown variable.

Finally, we now consider the third independence statement between S and Y, $S ∐ Y$ (i.e., verbal ability is not correlated with navigational skill). Although all three causal structures in Figure 3 are compatible with the first ( $X ∐ Y$ ) and second ( $X ∐ S$ ) dependence statements, neither structure can explain the third independence statement $S ∐ Y$ . This is because, in all three structures, S is d-connected with Y. Therefore, we conclude that all the proposed causal structures in Figure 3, which are based on Figure 2a, are invalid. They cannot explain the third independence statement. This incompatibility also holds even if one tries to explain the second dependence statement ( $X ∐ S$ ) based on the causal structure in Figure 2c. The presence of the path X → Y necessarily creates the unconditional association between S and Y which is not compatible with the third independent statement $S ∐ Y$ .

Therefore, we conjecture that the predictor X and the outcome Y are associated only via confounding, as in Figure 2b. Taking this as the basic structure, the possible causal structures embracing the second dependence $X ∐ S$ are presented in Figure 4. Among the three candidates, only the two structures in Figures 4b and 4c are able to explain the third independence statement, $S ∐ Y$ . This is because, in Figure 4a, S is d-connected with Y via the path S ← X ← U → Y. As the causal structures in Figures 4b and 4c are compatible with all three (in)dependence statements among X, S, and Y, we conclude that they are valid causal structures of the classical suppressors defined by Horst (1941, 1966).⁶ The use of a formal causal discovery algorithm finds the equivalent structure. The inductive causation (IC) algorithm (Verma & Pearl, 1990) finds the structure S → X ← Y. Since X ← Y is not possible, we replace it with X ← U → Y. Then, the resulting causal structure is identical to Figure 4b. Note also that replacing S → X with S ← K → X, where K denotes an unknown confounder different from U, results in the causal structure identical to Figure 4c. See the Appendix for complex cases where the causal and confounding relationships between S and X are combined.

Figure 4.

Causal structures encoding two dependence statements $X ∐ Y$ and $X ∐ S$ , based on Figure 2b. (a) X causally affects S. (b) S causally affects X. (c) S and X are causally affected by an unknown variable K. The causal structures in (b) and (c) explain all three (in)dependence statements in Horst’s (1966) example.

Suppression in Linear Regression Models

The causal structures we discovered allow us to better understand suppression in linear regression models. For ease of exposition, hereafter, we consider that Figure 4b represents the true causal structure behind Horst’s (1941, 1966) classical suppression, but all the main arguments also hold with Figure 4c. We can write the linear structural causal model corresponding to the causal structure in Figure 4b as

\begin{array}{l} U = ∊_{U}, \\ S = ∊_{S}, \\ X = γ S + α U + ∊_{X}, \\ Y = β U + ∊_{Y}, \end{array}

where $∊_{U}$ , $∊_{S}$ , $∊_{X}$ , and $∊_{Y}$ are mutually independent random error terms reflecting exogenous idiosyncratic variation of the corresponding variables. The structural path coefficient γ represents the causal effect of S on X (S → X), and the structural path coefficients α and β represent the causal effects of U on X (U → X) and Y (U → Y), respectively.

Using the data generating model in Equation 1, we now replicate Horst’s (1966) correlation and regression findings in an algebraic manner. First, he found a positive correlation between technical ability (predictor X) and navigational skill (criterion Y). Given Equation 1, the population correlation coefficient between X and Y, $r_{X Y}$ , can be expressed with the structural path coefficients:

\begin{matrix} r_{X Y} = Cov (γ S + α U + ∊_{X}, β U + ∊_{Y}), \\ = γβCov (S, U) + αβVar (U), \\ = αβ, \end{matrix}

because $Cov (S, U) = 0$ and $Var (U) = 1$ . Note that the idiosyncratic error terms $∊_{X}$ and $∊_{Y}$ are independent of S and U: $Cov (S, ∊_{Y}) = Cov (U, ∊_{Y}) = Cov (U, ∊_{X}) = Cov (∊_{X}, ∊_{Y}) = 0$ . More easily, one can directly apply the path-tracing rules, resulting in the product of the two path coefficients α and β of the open path X ← U → Y in Figure 4b. Since this correlation was positive in Horst’s (1966) example, we have $αβ > 0$ . Second, he found that verbal ability (suppressor S) and technical ability (predictor X) are positively correlated. That is,

\begin{matrix} r_{S X} = Cov (∊_{S}, γ S + α U + ∊_{X}), \\ = γCov (∊_{S}, S), \\ = γ, \end{matrix}

because ${Cov (∊}_{S}, S) = {Cov (∊}_{S}, ∊_{S}) = Var (S) = 1$ . Therefore, we have $γ > 0$ . Again, this also can be easily obtained by the path-tracing rules (the only open path is S → X in Figure 4b).

A noticeable finding was the zero correlation between verbal ability (suppressor S) and navigational skill (criterion Y). Given the structural causal model in Equation 1, the population correlation coefficient between S and Y is

\begin{matrix} r_{S Y} = Cov (∊_{S}, β U + ∊_{Y}), \\ = βCov (∊_{S}, U) + Cov (∊_{S}, ∊_{Y}), \\ = 0, \end{matrix}

because ${Cov (∊}_{S}, ∊_{U}) = {Cov (∊}_{S}, ∊_{Y}) = 0$ . Again, this is clear from d-separation and the path-tracing rules because the sole path between S and Y, S → X ← U → Y, is blocked at X in Figure 4b (since X is a collider in the path).

Nonetheless, Horst (1966) found that the partial regression coefficient of verbal ability (suppressor) of the regression of navigational skill (criterion) on technical ability (predictor) and verbal ability was negative, which is the opposite to the positive correlation between verbal ability and navigational skill. The population standardized partial regression coefficient of S after controlling for X is given by⁷

\begin{matrix} b_{Y S | X} = \frac{r_{S Y} - r_{X Y} r_{S X}}{1 - r_{S X}^{2}}, \\ = \frac{- γαβ}{1 - γ^{2}} . \end{matrix}

It is obvious that the partial regression coefficient is negative because we already know $αβ > 0$ and $γ > 0$ , and since the standardized structural path coefficient γ is $- 1 < γ < 1$ , the denominator of Equation 5 is always positive, $1 - γ^{2} > 0$ . Thus, given the causal structure in Figure 4b, Horst’s (1966) findings are replicated.

A suppression phenomenon means two increasing or strengthening patterns after controlling for suppressor variables. First, the relationship between the predictor X and the criterion Y should be strengthened by adding the suppressor S into the regression model. The regression coefficient of X of regressing Y on X (without the suppressor S) is

\begin{matrix} b_{Y X} = \frac{Cov (X, Y)}{Var (X)}, \\ = r_{X Y}, \\ = αβ. \end{matrix}

The partial regression coefficient of X of regressing Y on X and S is

\begin{matrix} b_{Y X | S} = \frac{r_{Y X} - r_{Y S} r_{S X}}{1 - r_{S X}^{2}}, \\ = \frac{αβ}{1 - γ^{2}} . \end{matrix}

Thus, we see that the inclusion of the suppressor S always amplifies the original relationship between the predictor X and the criterion Y. The classical suppression can be expressed by the inequality $| \frac{αβ}{1 - γ^{2}} | > | αβ |$ .

Second, because of the amplification of regression coefficients, suppressor variables substantially increase the predictive power of the regression model. The R² of the regression model of Y on X (without S) is

\begin{matrix} R_{Y X}^{2} = r_{X Y}^{2}, \\ = {(αβ)}^{2} . \end{matrix}

If the suppressor S is added to the regression model, the R² of the regression model becomes

\begin{matrix} R_{Y X | S}^{2} = \frac{r_{X Y}^{2} + r_{S Y}^{2} - 2 r_{X Y}^{} r_{S Y}^{} r_{S X}^{}}{1 - r_{S X}^{2}}, \\ = \frac{{(αβ)}^{2}}{1 - γ^{2}} . \end{matrix}

Thus, we see that the original R² (without S) is again amplified as a result of adding the suppressor S. A comparison of Equations 8 and 9—as well as of Equations 6 and 7—shows that suppression becomes stronger as the structural path coefficient $| γ |$ increases.

Causal Structures of Negative Suppressors

After Horst (1941) provided the rationale of classical suppressors, Lubin (1957) asked, “Can a variable act as a suppressor even if it has positive [instead of zero] validity?” (p. 291). Here, the term “validity” indicates the correlation with criterion. Obviously, the causal structure of classical suppressors in Figure 4b (and Figure 4c) does not allow a positive (or more generally, any nonzero) correlation between S and Y. However, Lubin (1957) as well as Darlington (1968) argued that suppression still occurs even when S is positively associated with Y. They referred to such suppressors as negative suppressors.

However, besides the two dependence statements $X ∐ Y$ and $X ∐ S$ , if we allow the correlation between S and Y, that is, $S ∐ Y$ , it is generally impossible to find the underlying causal structures because we cannot make any useful restrictions that help us to narrow down the number of possible causal structures behind the data. The causal discovery reasoning we previously used is not applicable to the case of the three dependence statements. Instead, we shall explore the causal structures of negative suppression by directly extending the causal structures of Horst’s classical suppression that we already found. Taking the causal structure of classical suppression in Figure 4b as a basis, in Figure 5, we present three structures that easily allow negative suppression effects. Here, let’s assume that $αβ > 0$ and $γ > 0$ , as in Horst’s (1966) example, and additionally that $τ > 0$ . First, in Figure 5a, we add the causal link X → Y, represented by $τ$ . Using d-separation, we then obtain $r_{X Y} = τ + αβ > 0$ , $r_{S X} = γ > 0$ , and $r_{S Y} = γτ > 0$ . Note that the third correlation is positive, instead of zero, in accordance with Lubin’s (1957) conception of negative suppressors. This positive association between S and Y is, however, reversed if X is conditioned on. The partial regression coefficient of S of regressing Y on S and X is given by

\begin{matrix} b_{Y S | X} = \frac{r_{S Y} - r_{X Y} r_{S X}}{1 - r_{S X}^{2}}, \\ = \frac{- γαβ}{1 - γ^{2}} < 0. \end{matrix}

Figure 5.

Causal structures allowing nonzero correlation between S and Y, based on Figure 4b. (a) X causally affects Y. (b) S causally affects Y. (c) X and S causally affect Y. The causal structure in (a) always makes suppression regardless of parameter values.

So, we replicated Lubin’s (1957) setting for negative suppression.

We now investigate whether suppression still occurs with the variable S, which is now positively correlated with Y in Figure 5a. First, the regression coefficient of X on Y (without S) is

\begin{matrix} b_{Y X} = \frac{Cov (X, Y)}{Var (X)}, \\ = τ + αβ . \end{matrix}

If suppression occurs, the relationship between X and Y in Equation 11 should be amplified after controlling for S. The partial regression coefficient of X, controlling for S, is given by

\begin{matrix} b_{Y X | S} = \frac{r_{Y X} - r_{Y S} r_{S X}}{1 - r_{S X}^{2}}, \\ = τ + \frac{αβ}{1 - γ^{2}} . \end{matrix}

Thus, suppression (i.e., amplifying the association) still occurs because $| τ + \frac{αβ}{1 - γ^{2}} | > | τ + αβ |$ . Note that this inequality is indeed identical to the inequality for Horst’s (1941) classical suppression.

The same is true for the R². Without the suppressor S, the R² of the regression model (i.e., regressing Y on X) is

\begin{matrix} R_{Y X}^{2} = r_{X Y}^{2}, \\ = {(τ + αβ)}^{2}, \\ = τ^{2} + 2 αβτ + {(αβ)}^{2} . \end{matrix}

If the suppressor S is added to the regression model, the R² is

\begin{matrix} R_{Y X | S}^{2} = \frac{r_{X Y}^{2} + r_{S Y}^{2} - 2 r_{X Y}^{} r_{S Y}^{} r_{S X}^{}}{1 - r_{S X}^{2}}, \\ = τ^{2} + 2 αβτ + \frac{{(αβ)}^{2}}{1 - γ^{2}} . \end{matrix}

Since $τ^{2} + 2 αβτ + \frac{{(αβ)}^{2}}{1 - γ^{2}} > τ^{2} + 2 αβτ + {(αβ)}^{2}$ , the R² is still amplified due to the negative suppressor S. Note that, again, this inequality is identical to the R² inequality for the classical suppression. Thus, we see that although adding X → Y to Figure 4b allows a nonzero (positive in Lubin’s original setting) correlation between the suppressor S and the criterion Y, which violates the condition of being a classical suppressor, the mechanism of suppression effects does not change, and suppression always occurs regardless of parameter values in causal structures.

However, this does not hold in Figures 5b and 5c. In Figure 5b, we have $r_{X Y} = αβ + γδ$ , $r_{S X} = γ$ , and $r_{S Y} = δ$ . Using the bivariate correlations, we derive the regression coefficient of X for regressing Y on X as

b_{Y X} = αβ + γδ,

and the partial regression coefficient for X, controlling for S, as

\begin{matrix} b_{Y X | S} = \frac{r_{Y X} - r_{Y S} r_{S X}}{1 - r_{S X}^{2}}, \\ = \frac{αβ}{1 - γ^{2}} . \end{matrix}

Now, the inequality is no longer obvious. Depending on $γδ$ , both $| b_{Y X} | \geq | b_{Y X | S} |$ and $| b_{Y X} | < | b_{Y X | S} |$ are possible. They may even have the opposite signs. Thus, we see that the mechanism here is different from that which is behind the classical suppression defined by Horst (1941). The same is true for the R². We have

\begin{matrix} R_{Y X}^{2} = r_{X Y}^{2}, \\ = {(αβ + γδ)}^{2}, \\ = {(αβ)}^{2} + 2 αβγδ + {(γδ)}^{2}, \end{matrix}

and

\begin{matrix} R_{Y X | S}^{2} = \frac{r_{X Y}^{2} + r_{S Y}^{2} - 2 r_{X Y}^{} r_{S Y}^{} r_{S X}^{}}{1 - r_{S X}^{2}}, \\ = δ^{2} + \frac{{(αβ)}^{2}}{1 - γ^{2}} . \end{matrix}

Thus, the mechanism of the increase of R² after including S into the regression model is no longer only due to the amplification (i.e., dividing by $1 - γ^{2}$ ), as was in the classical suppression. The causal structure in Figure 5c has the same issue. In short, although one may observe a suppression effect from the structures in Figures 5b and 5c (i.e., $| b_{Y X} | < | b_{Y X | S} |$ and $R_{Y X}^{2} < R_{Y X | S}^{2}$ ), the effect’s mechanism is completely different from that of the classical suppression resulting from a pure amplification. More precisely, from the structures in Figures 5b and 5c, one may not see the negative suppression effect with certain parameter values. In contrast, from the structure in Figure 5a, the negative suppression effect always occurs regardless of parameter values and the suppression mechanism is identical to the mechanism of classical suppression.⁸

Instrumental Variables in the Causal Inference Literature

Equivalence Between Suppressors and Instrumental Variables

We are interested in now switching our concern from prediction (i.e., predicting pilots’ navigational skill using their technical and verbal abilities) to causal inference (i.e., whether technical ability causally affects navigational skill). Many causal inference researchers probably find an instrumental variable from the causal structures in Figures 4b, 4c, and 5a. An instrumental variable, or simply instrument, is a variable that is related to the treatment but is unrelated to the outcome, except via the treatment. Formally, given a causal structure, a variable S is an instrument with respect to the causal effect of the treatment X on the outcome Y if the following two conditions are met (Pearl, 2009; also see Brito & Pearl, 2002):⁹

S is d-connected with X in the causal structure;

S is d-separated from Y in the modified structure where the arrow X → Y is deleted.

In Figures 4b, 4c, and 5a, if one considers the predictor X as a treatment variable and the criterion Y as an outcome variable, the suppressor S satisfies the two conditions; therefore, it serves as an instrument. Hence, classical suppressors depicted in Figures 4b and 4c and negative suppressors depicted in Figure 5a are structurally equivalent to instrumental variables.

Despite this equivalence, however, the reasons why instrumental variables are special in the context of causal inference are completely different from the reasons given in support of the use of suppressors in the educational and psychological measurement literature. Suppressors help to amplify the R² of the regression model which results in a better prediction of the criterion variable. For example, in Horst’s (1966) example, if an evaluator knows a pilot’s verbal ability (suppressor) as well as the pilot’s technical ability (predictor), this evaluator can predict the pilot’s navigational skill (criterion) more accurately. Similarly, educational and psychological researchers have been interested in predicting social behaviors with personality or vice versa. However, this passive prediction is not a major concern in causal inference. Causal inference researchers investigate how a variable (called a treatment) causally affects the other variable (called an outcome). This is not a prediction but a causal explanation.

Identifying Causal Effects With Instrumental Variables

Note that Figure 5a represents a case where the relationship between the treatment X and outcome Y is confounded by the unknown variable U. Then, the causal effect of X on Y cannot be directly obtained from a simple regression analysis of Y on X because of the unmeasured confounding due to U (backdoor criterion; Pearl, 2009). The resulting effect estimate is always biased. In fact, we already witnessed this. The regression coefficient of X in the simple regression model of Y on X in Equation 11 was $b_{Y X} = τ + αβ$ , which differs from the true causal effect $τ$ . The term $αβ$ represents the confounding bias due to the unmeasured U.

In this case, researchers may use instrumental variables in a very special way for making a causal explanation of X on Y. The two-stage least squares (2SLS) consist of two stages. In the first stage, a researcher regresses the treatment X on the instrument S and obtains the predicted value of the regression model. The regression model is expressed as

\begin{matrix} \hat{X} = a + b_{X S} S, \\ = γ S, \end{matrix}

because $a = 0$ and $b_{X S} = r_{X S} = γ$ in Figure 5a (note that S and X are standardized). In the second stage, the researcher regresses the outcome Y on the predicted value $\hat{X}$ . The regression coefficient is

\begin{matrix} b_{Y \hat{X}} = \frac{Cov (\hat{X}, Y)}{Var (\hat{X})}, \\ = \frac{Cov (γ S, τ X + β U + ∊_{Y})}{Var (γ S)}, \\ = \frac{γ^{2} τ}{γ^{2}} = τ . \end{matrix}

Therefore, even in the presence of an unmeasured confounding due to U, one can correctly identify the causal effect of X on Y using the instrument S. Due to the linearity, the causal identification using instruments is simple here. For the nonparametric identification using instruments, see Angrist, Imbens, and Rubin (1996) or Steiner, Kim, Hall, and Su (2017).

Bias Amplification Due to Instrumental Variables

Note that the 2SLS above is a very special way to use instrumental variables. A group of causal inference researchers recently started looking at consequences of simply controlling for instrumental variables instead of 2SLS (Middleton, Scott, Diakow, & Hill, 2016; Myers et al., 2011; Pearl, 2010, 2011; Steiner & Kim, 2016; Wooldridge, 2009). They have argued that conditioning on instrumental variables is harmful because doing so amplifies any remaining hidden bias. Pearl (2010) referred to this phenomenon as bias amplification. Indeed, we have already witnessed this. Given the causal structure in Figure 5a, the partial regression coefficient for X when regressing Y on both X and S in Equation 12 represents the causal effect estimate of X on Y after controlling for S. It was given by $b_{Y X | S} = τ + \frac{αβ}{1 - γ^{2}}$ . Thus, the original confounding bias due to U, $αβ$ , is amplified by $1 / (1 - γ^{2})$ (again, note that $- 1 < γ < 1$ ). That is, conditioning on the instrumental variable S just results in a largely amplified bias in the effect estimate of X on Y. Therefore, one should not control for instrumental variables. It is interesting that the very same amplification phenomenon has been considered as beneficial in the educational and psychological literature on suppression. Depending on whether a statistical model’s purpose is prediction or causal explanation, one may prefer to include suppressors or avoid instruments.

Illustration

We illustrate our findings using a real data set. Paulhus, Robins, Trzesniewski, and Tracy (2004) provided real examples of suppression in personality research. They investigated the relationship between antisocial behavior and two types of self-esteem, genuine self-esteem and narcissistic self-esteem. Analyzing three independent samples with multiple regression, they argued that both types of self-esteem play a role as a (mutual) suppressor in predicting antisocial behavior. We apply causal discovery algorithms to discover the underlying causal structures behind their data sets. For this illustration, we use TETRAD, a freeware software program developed by Clark Glymour, Richard Scheines, Peter Spirtes, and Joseph Ramsey.¹⁰ Including the PC algorithm (Spirtes & Glymour, 1991), which is a refined version of the IC algorithm we mentioned earlier, TETRAD provides more than 30 different causal search algorithms. Out of those search algorithms, we particularly use the FCI algorithm which allows causal discovery with hidden variables. In Table 1, we present the correlation matrices from Paulhus et al.’s (2004) three samples.¹¹ Assuming linear models with a normal probability distribution, TETRAD tests the conditional independences between variables and explores underlying causal structures behind these empirical data.¹²

Table 1.

Bivariate Correlation Among Two Types of Self-Esteem and Antisocial Behavior in Paulhus et al. (2004)

	Sample 1 (n = 4,057)			Sample 2 (n = 301)			Sample 3 (n = 232)
	Nar	Self	Anti	Nar	Self	Anti	Nar	Self	Anti
Nar	1			1			1
Self	.32	1		.44	1		.50	1
Anti	.21	−.27	1	.33	.02	1	.45	.12	1

Note. Nar = narcissism self-esteem; self = genuine self-esteem; anti = antisocial behavior.

The causal search results are presented in Figure 6. We found different causal structures from their data. From their Sample 1, the structure depicted in Figure 6a is found. TETRAD uses the edge type, Ao−oB, to indicate that (1) A causes B, or (2) B causes A, or (3) a hidden variable causes both A and B. Thus, we do not have useful information about the underlying causal structure from Sample 1. This is because all variables are correlated with each other. In Table 1, Sample 1 shows a significantly stronger correlation between genuine self-esteem and antisocial behavior ( $ρ = - .27$ , p < .05), compared to the other two Samples (for Sample 2, $ρ = .02$ , p > .05; for Sample 3, $ρ = .12$ , p > .05). Thus, there is no independence structure. As we discussed in the section about negative suppression, too many structures are compatible with the three dependence statements.

Figure 6.

Discovered causal structures from Paulhus et al.’s (2004) self-esteem data sets. (a) Discovered structure from their Sample 1. (b) Discovered structure from their Samples 2 and 3. (c) Modified structure with hidden variables from the graph (b). Nar = narcissistic self-esteem; self = genuine self-esteem; anti = antisocial behavior; K & U = two hidden variables. The edge type, Ao−oB, indicates that (1) A causes B, or (2) B causes A, or (3) a hidden variable causes both A and B. The edge type Ao→B, indicates that (1) A causes B or (2) a hidden variable causes both A and B.

However, both Sample 2 and Sample 3 reveal an informative causal structure depicted in Figure 6b. The edge type, Ao→B, indicates that (1) A causes B or (2) a hidden variable causes both A and B, which would rule out the possibility of B causing A. We find that there are no causal relationships between either type of self-esteem and antisocial behavior from Sample 2 and Sample 3. Relying on the time ordering, one may further restrict the possible causal structures. First, as we have assumed, the regressors, two types of self-esteem, precede in time the criterion antisocial behavior. Second, the two types of self-esteem do not causally affect each other (probably because they are measured almost at the same time). Then, we can infer that two types of self-esteem are associated due to the unknown hidden variable K, and narcissistic self-esteem and antisocial behavior are associated due to another unknown hidden variable U as in Figure 6c.

Note that the structure in Figure 6c is equivalent to the structure in Figure 4c, which is the causal structure of Horst’s (1941, 1966) classical suppression. In Paulhus et al.’s (2004) Samples 2 and 3, genuine self-esteem (“self”) plays the role of a classical suppressor and instrumental variable. Although genuine self-esteem itself has no causal effect on antisocial behavior, the inclusion of it into the regression model of antisocial behavior on both types of self-esteem will suppress some irrelevant variation in narcissistic self-esteem and thus will strengthen the relationship between narcissistic self-esteem and antisocial behavior. This is desirable in terms of making predictions (the R² change by including genuine self-esteem were, for Sample 2, $Δ R^{2} = .13$ ; for Sample 3, $Δ R^{2} = .20$ ; see Paulhus et al.’s Table 2). At the same time, however, including genuine self-esteem will amplify the confounding bias between narcissistic self-esteem and antisocial behavior, therefore, doing so is undesirable in terms of making causal inferences. Although the regression coefficient for narcissistic self-esteem of regression of antisocial behavior on both genuine and narcissistic self-esteem is significantly positive (for Sample 2, $β = .40$ , p < .05; for Sample 3, $β = .52$ , p < .05; see Paulhus et al.’s Table 2), from the causal structures in Figures 6b and 6c, we see that the true causal effect of narcissistic self-esteem on antisocial behavior must be zero. Therefore, intervening on students’ narcissistic self-esteem (e.g., by developing an educational program) does not affect their antisocial behavior despite the positive partial regression coefficients for narcissistic self-esteem.

Discussion

Over the past 70 years, educational and psychological researchers have viewed suppression effects within a purely statistical framework. Although they succeed to clarify algebraic features of suppression, its interpretation is unclear in the literature. For example, Darlington (1968) wrote:

The relations possible among sets of variables are so complex that when a variable with a positive correlation with the criterion variable receives a negative weight in a regression equation, it is generally very difficult or impossible to determine, from the content of the variables, whether the negative weight is “unreasonable.” (p. 179, emphases added)

Such mentions about the difficulty of the interpretation can also be found in the recent literature on suppression (e.g., Valentine, DuBois, & Cooper, 2004). Using recent advances in causal inference (i.e., structural causal models, causal discovery algorithms), we view suppression from a causal perspective and uncover underlying causal structures of suppressor variables. It turns out that, indeed, it is possible to determine whether a partial regression coefficient is “reasonable” or not from researchers’ subject-matter theory. In the illustration, we discovered the causal structure of self-esteem and antisocial behavior, reflecting such theory, and show how genuine self-esteem can be negatively associated with antisocial behavior conditional on narcissistic self-esteem despite its zero correlation with antisocial behavior.

Importantly, our causal structures of suppressors in Figures 4b, 4c, and 5a reveal that they are indeed equivalent to instrumental variables. Therefore, suppression and bias amplification are also identical phenomena. Despite the long discussion on suppression and instrumental variables, no literature has yet discovered this structural equivalence. In the causal inference literature, the bias amplification has been considered a danger because it will amplify the original confounding bias, which is not desirable from a causal inference perspective (Middleton et al., 2016; Pearl, 2010, 2011; Steiner & Kim, 2016). However, one may interpret this amplifying phenomenon as an enhancement of the relationship between the predictor and the criterion, which can be desirable to better predict the criterion variable. This has been the standard interpretation of suppression in the educational and psychological measurement literature (e.g., Horst, 1941, 1966; Lubin, 1957; Maassen & Bakker, 2001; MacKinnon et al., 2000; Pandey & Elliott, 2010; Shieh, 2006). The two disciplines have focused on different purposes of statistical models and thus have interpreted the same phenomenon from a completely opposite point of view.

Our findings have implications for variable selection in regression, propensity score, or missing imputation models. In addition to utilizing conventional variable selection methods like stepwise selection, recent researchers have started looking at various techniques to select variables such as random forests (Genuer, Poggi, & Tuleau-Malot, 2010), neural networks (Keller, Kim, & Steiner, 2015), or tests of conditional independence (de Luna, Waernbaum, & Richardson, 2011; VanderWeele & Shpitser, 2011). Importantly, however, “[t]he criteria for choosing variables differ markedly in [causal] explanatory versus predictive contexts” (Shmueli, 2010, p. 297; see also Hernán & Robins, 2017). Thus, Genuer, Poggi, and Tuleau-Malot (2010) proposed separate variable selection procedures using random forests for each of causal inference and prediction (see also Shortreed & Ertefaie, 2017, using lasso). Suppressors and instruments serve as a nice example of the importance of clarifying such purposes for variable selection. If one’s goal is to make an accurate prediction, suppressors should be included because doing so amplifies the power of the models to predict the criterion variable. In contrast, if one’s goal is to derive a valid causal explanation, instruments should be excluded because doing so amplifies any remaining confounding bias in causal effect estimates.

Footnotes

Appendix

Acknowledgments

The author thanks Nick Brown, Felix Elwert, Peter Steiner, Jee-Seon Kim, and Eunjin Seo for their helpful comments.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article

ORCID iD

Yongnam Kim

Notes

References

Angrist

J. D.

Imbens

G. W.

Rubin

D. B.

(1996). Identification of causal effects using instrumental variables. Journal of the American Statistical Association, 91, 444–455.

Brito

Pearl

(2002). Generalized instrumental variables. In Darwiche

Friedman

(Eds.), Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (pp. 85–93). San Francisco, CA: Morgan Kaufmann.

Cohen

West

S. G.

Aiken

L. S.

(2003). Applied multiple regression/correlation analysis for the behavioral sciences (3rd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.

Conger

A. J.

(1974). A revised definition for suppressor variables: A guide to their identification and interpretation. Educational and Psychological Measurement, 34, 35–46.

Darlington

R. B.

(1968). Multiple regression in psychological research and practice. Psychological Bulletin, 69, 161.

De Luna

Waernbaum

Richardson

T. S.

(2011). Covariate selection for the nonparametric estimation of an average treatment effect. Biometrika, 98, 861–875.

Ding

Miratrix

L. W.

(2015). To adjust or not to adjust? Sensitivity analysis of M-bias and butterfly-bias. Journal of Causal Inference, 3, 41–57.

Elwert

Winship

(2014). Endogenous selection bias: The problem of conditioning on a collider variable. Annual Review of Sociology, 40, 31–53.

Genuer

Poggi

J. M.

Tuleau-Malot

(2010). Variable selection using random forests. Pattern Recognition Letters, 31, 2225–2236.

10.

Greenland

Pearl

Robins

J. M.

(1999). Causal diagrams for epidemiologic research. Epidemiology, 10, 37–48.

11.

Hernán

M. A.

Robins

J. M.

(2017). Causal inference (Part II). Boca Raton, FL: Chapman & Hall/CRC. Retrieved from https://cdn1.sph.harvard.edu/wp-content/uploads/sites/1268/2017/03/hernanrobins_v2.17.17.pdf

12.

Horst

(1941). The role of predictor variables which are independent of the criterion. Social Science Research Bulletin, 48, 431–436.

13.

Horst

(1966). Psychological measurement and prediction. Belmont, CA: Wadsworth.

14.

Keller

Kim

J. S.

Steiner

P. M.

(2015). Neural networks for propensity score estimation: Simulation results and recommendations. In van der Ark

L. A.

Bolt

D. M.

Chow

S.-M.

Douglas

J. A.

Wang

W.-C.

(Eds.), Quantitative psychology research (pp. 279–291). New York, NY: Springer.

15.

Kerlinger

F. N.

(1964). Foundations of behavioral research. New York, NY: Holt, Rinehart & Winston.

16.

Lubin

(1957). Some formulae for use with suppressor variables. Educational and Psychological Measurement, 17, 286–296.

17.

Lynn

H. S.

(2003). Suppression and confounding in action. The American Statistician, 57, 58–61.

18.

Maassen

G. H.

Bakker

A. B.

(2001). Suppressor variables in path models: Definitions and interpretations. Sociological Methods & Research, 30, 241–270.

19.

MacKinnon

D. P.

Krull

J. L.

Lockwood

C. M.

(2000). Equivalence of the mediation, confounding and suppression effect. Prevention Science, 1, 173–181.

20.

McFatter

R. M.

(1979). The use of structural equation models in interpreting regression equations including suppressor and enhancer variables. Applied Psychological Measurement, 3, 123–135.

21.

McNemar

(1949). Psychological statistics. New York, NY: Wiley.

22.

Mendershausen

(1939). Clearing variates in confluence analysis. Journal of the American Statistical Association, 34, 93–105.

23.

Middleton

J. A.

Scott

M. A.

Diakow

Hill

J. L.

(2016). Bias amplification and bias unmasking. Political Analysis, 24, 307–323.

24.

Myers

J. A.

Rassen

J. A.

Gagne

J. J.

Huybrechts

K. F.

Schneeweiss

Rothman

K. J.

… Glynn

R. J

. (2011). Effects of adjusting for instrumental variables on bias and precision of effect estimates. American Journal of Epidemiology, 174, 1213–1222.

25.

Pandey

Elliott

(2010). Suppressor variables in social work research: Ways to identify in multiple regression models. Journal of the Society for Social Work and Research, 1, 28–40.

26.

Paulhus

D. L.

Robins

R. W.

Trzesniewski

K. H.

Tracy

J. L.

(2004). Two replicable suppressor situations in personality research. Multivariate Behavioral Research, 39, 303–328.

27.

Pearl

(1988). Probabilistic reasoning in intelligent systems. Palo Alto, CA: Kaufman.

28.

Pearl

(2009). Causality: Models, reasoning, and inference (2nd ed.). New York, NY: Cambridge University Press.

29.

Pearl

(2010). On a class of bias-amplifying variables that endanger effect estimates. In Grunwald

Spirtes

(Eds.), Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (pp. 425–432). Corvallis, OR: AUAI Press.

30.

Pearl

(2011). Understanding bias amplification [Invited commentary]. American Journal of Epidemiology, 174, 1223–1227.

31.

Pearl

(2013). Linear models: A useful “microscope” for causal analysis. Journal of Causal Inference, 1, 155–170.

32.

Pearl

Glymour

Jewell

N. P.

(2016). Causal inference in statistics: A primer. New York, NY: Wiley.

33.

Pearl

Mackenzie

(2018). The book of why: The new science of cause and effect. New York, NY: Basic Books.

34.

Pedhazur

E. J.

(1982). Multiple regression in behavioral research: Explanation and prediction (2nd ed.). Orlando, FL: Harcourt.

35.

Shieh

(2006). Suppression situations in multiple linear regression. Educational and Psychological Measurement, 66, 435–447.

36.

Shmueli

(2010). To explain or to predict? Statistical Science, 25, 289–310.

37.

Shortreed

S. M.

Ertefaie

(2017). Outcome-adaptive lasso: Variable selection for causal inference. Biometrics, 73, 1111–1122.

38.

Spirtes

Glymour

(1991). An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 9, 62–72.

39.

Spirtes

Glymour

C. N.

Scheines

(2000). Causation, prediction, and search (2nd ed.). New York, NY: MIT Press.

40.

Steiner

P. M.

Kim

(2016). The mechanics of omitted variable bias: Bias amplification and cancellation of offsetting biases. Journal of Causal Inference, 4(2). doi:10.1515/jci-2016-0009

41.

Steiner

P. M.

Kim

Hall

C. E.

(2017). Graphical models for quasi-experimental designs. Sociological Methods & Research, 46, 155–188.

42.

Y. K.

Gunnell

Gilthorpe

M. S.

(2008). Simpson’s Paradox, Lord’s Paradox, and Suppression Effects are the same phenomenon–the reversal paradox. Emerging Themes in Epidemiology, 5, 2.

43.

Tzelgov

Stern

(1978). Relationships between variables in three variable linear regression and the concept of suppressor. Educational and Psychological Measurement, 38, 325–335.

44.

Valentine

J. C.

DuBois

D. L.

Cooper

(2004). The relation between self-beliefs and academic achievement: A meta-analytic review. Educational Psychologist, 39, 111–133.

45.

VanderWeele

T. J.

Shpitser

(2011). A new criterion for confounder selection. Biometrics, 67, 1406–1413.

46.

Velicer

W. F.

(1978). Suppressor variables and the semipartial correlation coefficient. Educational and Psychological Measurement, 38, 953–958.

47.

Verma

Pearl

(1990). Equivalence and synthesis of causal models. In Bonissone

P. P.

Henrion

Kanal

L. N.

Lemmer

J. F.

(Eds.), Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence (pp. 220–227). New York, NY: Elsevier Science.

48.

Wooldridge

J. M.

(2009). Should instrumental variables be used as matching variables? Lansing: Michigan State University. Retrieved from http://econ.msu.edu/faculty/wooldridge/docs/treat1r6.pdf

49.

Wright

(1921). Correlation and causation. Journal of Agricultural Research, 20, 557–585.