Abstract

Theorizing in strategic organization often leads researchers to state hypotheses involving mediating effects, in which an antecedent variable, X, affects an outcome, Y, through the intermediate channel of a mediator, M (Whetten, 1989). Sometimes the entire effect of X on Y is thought to operate through M (i.e. complete mediation), and sometimes X is thought to affect Y directly and also indirectly through M (i.e. partial mediation). When empirical researchers set out to test such mediating hypotheses the vast majority of tests use either of two standard methodologies introduced in the mid-1980s. The first is the ‘causal steps’ approach of Baron and Kenny (1986), extended in Kenny et al. (1998) and further discussed in MacKinnon et al. (2002) and Shrout and Bolger (2002). The second is the SEM approach, advocated in James and Brett (1984) and further discussed in James et al. (2006).
While the two standard methods and their variants feature an appealing simplicity that has sustained their popularity for decades, they both require a stringent assumption that is unlikely to hold in most contexts of interest in strategic organization research. The result has been a proliferation of studies reporting problematic tests of mediating effects. 1
The good news is that better, instrument-based mediation tests such as two-stage least squares (2SLS) and three-stage least squares (3SLS) that do not require the stringent assumptions underlying the standard methods are available and have been well known for decades. The bad news is that they are rarely used, despite some clear and compelling articulations of their merits (e.g. Shaver, 2005). Some have recently argued that research in strategic organization would benefit from adopting these alternative methods given that they do not require the stringent assumptions of the standard methods (DeVaro, 2011; Shaver, 2005; Wood et al., 2008). The key point is not that evidence of mediating effects is more or less likely to be uncovered using the alternative methods versus the standard methods but rather that the standard methods are more likely to support incorrect conclusions regarding mediation.
This essay offers a program for mediation remediation in the field of strategic organization and complements earlier papers that emphasized the importance of endogeneity and of correct interpretation of regressions in strategic organization research (e.g. Bascle, 2008; Hamilton and Nickerson, 2003; Yip and Tsang, 2007). It also provides a useful addition to the ‘hallmarks of high-quality empirical research’ identified in Oxley et al. (2010).
But the call to replace the standard methods with instrument-based ones is not new. Given that the most important arguments for instrument-based methods have been made cogently before – without appreciably shifting the literature away from the standard methods – it is fair to ask what value there is in yet another orator ascending the soapbox to critique the standard methods. My answer is to focus this essay on what I see as the main reasons why scholars resist moving beyond the standard methods, in hopes that an understanding of the roots of that resistance may facilitate fruitful change.
A closer look at why remediation is needed
The two standard approaches and their variants might be criticized on a number of grounds, but arguably their greatest shortcoming, and the one on which I focus in this essay, is that they require an assumption (usually implicit) that the error terms in the ‘Y’ regression and the ‘M’ regression are uncorrelated, i.e. ρ = 0. 2
This assumption is usually untenable in strategic organization. Typically, Y and M are closely related theoretically – in fact M is believed to determine Y – so in most cases some of the factors determining M will have an independent effect on Y. As noted in Hamilton and Nickerson (2003), in strategic organization research M is often chosen strategically by the firm, with the aim of increasing organizational performance, Y. Researchers frequently include many of the same control variables in the Y and M equations, reflecting the fact that both outcomes are expected to share common observed determinants. But the same argument can be made for unobserved determinants. And if there is overlap in the unobserved determinants, then it is likely that ρ ≠ 0. Furthermore, regression error terms reflect our ignorance and the fact that we cannot measure all of the relevant determinants of the outcomes. If we do not know what these unobserved determinants are, how can we declare with confidence that the two sets of unobserved measures are uncorrelated?
Consider two recent examples of mediation tests that use the standard approaches. 3 Hypothesis 9 of Crossland and Hambrick (2011) states that managerial discretion, M, mediates the relationship between national institutions, X, and the CEO’s effect on firm performance, Y. The authors considered six national institutions, using standard methods to conclude that the impact of employer flexibility was not mediated by discretion, but the impact of uncertainty tolerance was completely mediated, and the impacts of the remaining four institutions (individualism, cultural looseness, ownership dispersion and legal origin) were partially mediated. The implicit assumption ρ = 0 can be criticized here, given that it is likely that some of the same unobserved factors influencing the degree to which a CEO’s actions impact firm performance may also influence the discretion the CEO wields. One example of such a (potentially time varying) unobserved factor is CEO motivation and effort.
Similarly, in a study of domestic versus cross-border mergers and acquisitions, Boeh (2011) formulated a complete mediation hypothesis based on the following elements. The outcome variable, Y, is the difference in contracting costs between cross-border and domestic mergers and acquisitions. The X variable is an indicator measuring whether a deal is cross-border or domestic. The mediator, M, is the use of mechanisms to improve information (e.g. increasing the amount of a priori information, producing new information, or reducing the information asymmetry about an offer). The mediation hypothesis (H6) states that using mechanisms to improve information are sufficient to explain differences in contracting costs between cross-border and domestic mergers and acquisitions. That is, a deal being cross-border should have no direct effect on costs. Using the Baron–Kenny method, the author found support for H6; solely controlling for the use of cash and tender offers (the measure of M) is sufficient to explain differences in contracting costs between cross-border and domestic mergers and acquisitions. The cross-border indicator is statistically insignificant and adds no extra explanatory power. It is plausible that unobserved characteristics that influence the organization’s use of various mechanisms to improve information may be correlated with the factors that determine the cost difference between domestic and cross-border mergers and acquisitions, implying ρ ≠ 0. Thus, in this study and in Crossland and Hambrick (2011) the assumption ρ = 0 might be questioned.
Assuming ρ = 0 might be unreasonable in most contexts in strategic organization, but what harm does it cause? The problem is that the two standard methods and their variants are based on OLS regressions of the M and Y equations, and for OLS to yield consistent estimates of a regression’s parameters, there must be zero correlation between the regression error term and each of the right-hand side variables. That consistency condition is violated if ρ = 0 is incorrectly assumed (Greene, 2005). These biases can be severe, as illustrated in simulation analyses by Shaver (2005) and DeVaro (2011). While we cannot determine the extent to which the conclusions of Crossland and Hambrick (2011) or Boeh (2011) about mediation may have been distorted by the resulting bias, these examples make a case for improved methods that do not require assuming ρ = 0.
Like Crossland and Hambrick (2011) and Boeh (2011), in the majority of cases empirical researchers use the standard methods or their variants, as documented in a comprehensive study by Wood et al. (2008), which reviewed over 400 articles published over a quarter century in Administrative Science Quarterly, Academy of Management Journal, Personnel Psychology, Journal of Applied Psychology and Organizational Behavior and Human Decision Processes. My own review of publications in Strategic Management Journal, Organization Science and Journal of Management during the last three years reveals that the literature continues to be dominated by the standard methods (see Table 1 for a list of examples).
Recent mediation studies
Note: The table contains papers that report empirical tests of mediating effects published between 2009 and 2011 in Strategic Management Journal, Organization Science and Journal of Management. Studies that used a remediation technique (e.g. 2SLS or 3SLS) are underlined and in boldface. The other studies used standard methods (i.e. Baron–Kenny, SEM, or their variants).
The impact of Shaver (2005) on mediation research
I found only 21 published articles that cite Shaver (2005), and two-thirds of these do not actually implement the methods he recommends. Three are general essays on the subject of mediation, as opposed to empirical studies (Mathieu et al., 2008; Miller et al., 2007; Wood et al., 2008); Westphal and Graebner (2010) list the paper in their references but do not cite it in the text; Estrin et al. (2009) and Schwens et al. (2011) cite the paper in contexts unrelated to mediation tests; Hessels et al. (2010) cite the paper but do not require its methods given the nature of their data; Heavey et al. (2009) cite the paper and say they do not implement its methods because they could not find suitable instruments; Boeh (2011) and Vodosek (2007) use standard methods, though they take steps to try to show that Shaver’s recommendations are not necessary in their analyses; Ziggers and Tjemkes (2010) and Bing et al. (2007) cite the paper but do not implement its recommendations; Leiponen and Helfat (2011) mention that their results using standard methods are only suggestive due to the potential for correlated errors to induce bias, citing Shaver (2005); Holcomb et al. (2009) use standard methods, reporting in a footnote that in response to a referee request they used an instrumental variable method to address the concerns in Shaver (2005), finding that the alternative method did not change their results substantively; however, the authors do not mention what instrumental variable(s) they used in this robustness check. That leaves only seven papers that report statistical results from methods that account for error correlation (Dushnitsky and Shapira, 2010; Gomez and Maicas, 2011; Simsek et al., 2007; Soda et al., 2008; Stern and Westphal, 2010; Zaheer and Soda, 2009; Zhang et al., 2009).
What does mediation remediation look like?
Model specification
The first step of any mediation analysis involves choosing between the partial and complete mediation models. James et al. (2006) recommend that the researcher first determine on theoretical grounds, before consulting the data, which model is appropriate. If theory does not offer sufficient basis for deciding, James et al. (2006) recommend first testing for complete mediation as the natural default, arguing that this is consistent with the basic tenets of the philosophy of science, because the complete model is more parsimonious – i.e. involves fewer parameters – than the partial model. However, they acknowledge the argument by Baron and Kenny that the partial model is the one most often applicable in psychology, so that ‘if one is a betting person’ one should start with the partial mediation model.
Identification of the empirical model can be achieved either by assuming ρ = 0 (as in the case of the standard methods) or by introducing additional information in the form of instruments (as in the case of 2SLS and 3SLS). These identification strategies are needed only in the partial mediation model, since the complete mediation model is already identified even in the absence of instruments or an assumption of ρ = 0. This would seem to make the advice in James et al. (2006) – to assume complete mediation as the default model – even more appealing, since in the complete mediation model the challenging problem of model identification can be avoided. However, organizational theory will rarely offer a compelling justification a priori for complete mediation. It is one thing to argue on the basis of theory that X affects Y through M. It is another to argue that M is the only avenue through which X affects Y. There may be a direct channel of influence from X to Y that has not occurred to the researcher. Or there may be additional mediators that have not occurred to the researcher and that are therefore not measured and included in the model, and this could cause X to have a direct effect in a regression of Y on M and X. The data should be given a chance to reject partial mediation, even when complete mediation is anticipated. If complete mediation turns out to be correct and the researcher started with a more general model, the only cost is that a more general model was estimated than was ultimately deemed necessary by the data. In contrast, if the true model is partial mediation and the researcher – because theory offers insufficient guidance as to which model applies – begins with a complete mediation model, the result could be disastrous, since a stringent and incorrect constraint is forced on the data. The partial mediation model is therefore the natural default.
Model estimation
In a cogent critique of the standard methods, Shaver (2005) recommended the method of two-stage least squares (2SLS) as the preferred approach for handling mediation. In similar spirit, DeVaro (2011) recommended three-stage least squares (3SLS). Both approaches are instrument-based methods, and neither one requires the problematic assumption ρ = 0. In contrast to the standard methods that achieve identification via the assumption that ρ = 0, these two alternative methods achieve identification via instrumental variables. What is needed is at least one exogenous instrumental variable, Q, that influences M directly but that influences Y only indirectly through M, i.e. M completely mediates the effect of Q on Y. The instrumental variable(s), or instrument(s), must satisfy two conditions:
Q must be correlated with M given the model’s other exogenous variables.
Q cannot be correlated with the Y equation error term, i.e. it cannot suffer from the same problem as M, which is correlated with the Y equation’s error.
The main difference between 2SLS and 3SLS is that the latter is a systems-based method that uses information from both the M and Y equations simultaneously, whereas with 2SLS the two equations are estimated separately. Because 2SLS ignores some information by focusing only on the Y equation, it is less efficient than 3SLS and can be expected to yield larger standard errors. Both 2SLS and 3SLS achieve identification the same way, by using instruments.
Given valid instruments, implementation of either 2SLS or 3SLS is easy with standard software packages like STATA, involving only a single line of code. 4 Table 2 can then be used to test the mediation hypotheses.
Mediation inferences following estimation via instrument-based methods
Note: If d is estimated with low precision, this suggests that the instrumental variable, Q, is weak, calling into question the identification strategy.
A number of methods other than 2SLS and 3SLS could also be used to address the problem of correlated errors, and the various options available to the researcher are summarized in Table 3. LIML and FIML are limited information and full information maximum likelihood methods, with the latter analogous to 2SLS in that it estimates the equations separately and the former analogous to 3SLS in that it estimates both equations in the system simultaneously. In contrast to 2SLS and 3SLS, LIML and FIML require an assumption of normally distributed regression errors. GMM does not require a distributional assumption, can be implemented on the equations either separately or as a system, and is more efficient than 3SLS when the errors are heteroskedastic, though it is not as easily implemented as either 2SLS or 3SLS. None of the options in Table 3 imposes ρ = 0. Of the methods in Table 3, my sense is that the relative simplicity of 2SLS and 3SLS makes them the preferred candidates to eventually eclipse the standard methods in the strategic organization literature. The Appendix offers some advice for choosing between 2SLS and 3SLS.
Alternative approaches for mediation remediation
Why aren’t we using these instrument-based methods already?
Table 1 lists 34 published mediation tests in Strategic Management Journal, Organization Science and Journal of Management during the last three years. The authors of these studies had access to Shaver (2005), which made a clear and compelling case several years earlier for instrument-based methods over standard methods, yet most used the standard methods anyway. Given this fact, it is worth taking a step back and thinking about the potential reasons for the field’s reluctance to embrace superior tests of mediating effects. If we can better understand the obstacles to adopting superior methods, we might have a greater chance of overcoming them.
I see two main obstacles that prevent instrument-based methods from becoming the new industry standard and that allow the standard methods to sustain themselves.
First, what makes instrument-based methods more challenging than the standard methods is conceiving of, and credibly measuring, appropriate instruments. This is often difficult, and it is hard to offer general advice regarding instrument selection, given that the choice heavily depends on the specifics of the problem at hand (see Greene [2005] for a methodological discussion of this issue and Shaver [2005] for a more applied one). The fact that it is harder to devise and make the case for credible instruments than it is to (implicitly) assume ρ = 0 is, I believe, one of the reasons for researchers’ reluctance to embrace instrument-based approaches.
An informal email survey of the authors of all papers in Table 1 supports this conjecture. A number of respondents cited the difficulty of finding good (or any) instruments when asked an open-ended question about the reasons for scholars’ resistance to adopt instrument-based methods. Furthermore, some authors in published work forthrightly acknowledge their inability to find good instruments given limitations of their data and cite this as the reason for not using instrument-based methods (e.g. Heavey et al., 2009; Leiponen and Helfat, 2011).
Second, reviewers, referees and editors – the gatekeepers – of the leading journals currently permit identification via the ρ = 0 assumption, without requiring researchers to justify or even mention this assumption. Given that authors already have numerous items to address in each round of revision, they have no incentive to address issues not signaled as being important.
The email survey corroborated this point. One respondent wrote, ‘Established scholars might not be willing to worry about it, unless forced to by reviewers. A key role here is played by the editors of journals. It is their responsibility to demand the issue to be taken up on a regular basis by the reviewers that work on their manuscripts.’ Another wrote, ‘the paper was already in R&R, and I wasn’t going to add something the reviewers didn’t ask for (they asked for plenty)’. The same respondent also wrote, ‘If reviewers don’t know a method, they can be suspicious of it. I’ve been asked by reviewers on other papers to provide less “state of the art” methods as primary analysis to show that the results are robust to more “standard” specifications.’ Another wrote, ‘I think researchers suspect that most methods are flawed, but are guided more by the review process to determine what’s acceptable.’ To explain his/her reasons for using a standard method, another respondent wrote, ‘I don’t have to justify it to naive reviewers, and it won’t be grounds for rejecting the paper.’ Another wrote, ‘If an author doesn’t include a newer method, a reviewer might ask for it, but probably won’t reject based on not having it.’ Another wrote, ‘if the reviewers don’t ask for it, I wouldn’t do it’.
Recommendations for researchers and editors
It is not obvious how to break free from this longstanding pattern of inappropriate estimation of mediation effects. The main arguments in favor of improved methods have been known and understood for decades, and when they have been raised in clear and compelling fashion in leading journals they have failed to break this unfortunate equilibrium (e.g. Shaver, 2005). Given that Strategic Organization was founded to ‘provide an international, interdisciplinary forum designed to improve our understanding of the interrelated dynamics of strategic and organizational processes and outcomes’ and that it has published ‘big picture’ articles about changing the norms of research in the field (e.g. Hamilton and Nickerson, 2003; Mezias and Regnier, 2007; Oxley et al., 2010), this seems an appropriate forum for another call to the researchers and gatekeepers in the field. In the spirit of this call, I offer five recommendations:
Recommendation 1: Assume partial mediation, not complete mediation, as the default The partial mediation model nests the complete mediation model as the special case for which the coefficient of X in the Y equation is zero. This means one can specify a partial mediation model from the outset (suitably identified) and allow the data to determine, via a statistical test, whether complete mediation applies. A conclusion of complete mediation will rarely if ever be compelling unless the researcher can show empirically that excluding X from the Y equation is justified by the data. Thus, it is inadvisable to begin with a complete mediation model, even if it is anticipated theoretically that complete mediation applies.
Recommendation 2: Make instrument-based methods the new industry standard Eschew the standard methods in favor of instrument-based ones such as 2SLS and 3SLS to estimate partial mediation models (that nest complete mediation as a special case), or use one of the other options in Table 3. Include an explicit discussion carefully justifying the choice of instrument(s). Finally, infer mediating effects using Table 2.
Recommendation 3: Choose between 2SLS and 3SLS following the Appendix guidelines Either 2SLS or 3SLS (along with the other methods in Table 3) is appropriate for achieving mediation remediation. Which of those two instrument-based methods is preferable depends on the context, and the Appendix briefly describes the key tradeoffs.
Recommendation 4: If you must use standard methods . . . If researchers fail to find viable instruments and choose to avoid all remediation methods in Table 3 in favor of either of the classic methods or their variants, they should at least include an explicit discussion justifying why the identifying ρ = 0 assumption is defensible in the study.
Recommendation 5: Editors should create bold new policies to encourage remediation If all researchers were to accept and follow the previous four recommendations, the current equilibrium would be broken and mediation remediation would finally be achieved. I believe this is unlikely to happen. If it were going to happen, it probably would have happened already. The recommendations have been made cogently before in various contexts and have largely fallen on deaf ears. Whatever incentives researchers have had for adopting them have apparently been insufficient. Looking forward, what is needed is bold action by the editors of a leading journal, by introducing a policy of requiring the authors of all published mediation tests to carefully and explicitly defend the underlying identifying assumptions. Such an editorial policy would not require researchers to abandon the standard methods. Rather, users of the standard methods would be required to include explicit discussion of why ρ = 0 is a reasonable and defensible assumption in their study. This would put the debate squarely where it needs to be: between the researchers and gatekeepers. It would shift the relative costs to researchers of using standard methods vs alternative ones, and require that editors and reviewers learn more about these methods in order to correctly evaluate their application. This repositioning of the issue might, over time, erode the current equilibrium, leading to mediation remediation.
Footnotes
Appendix: 2SLS vs 3SLS?
In this appendix I briefly discuss the relative strengths and weaknesses of 2SLS and 3SLS; see Belsley (1988) for further discussion of this topic. As discussed in DeVaro (2011), 3SLS offers greater asymptotic efficiency (i.e. potentially lower standard errors) than 2SLS, and this can be important given that inferences about partial or complete mediation must be made on the basis of statistical tests (see Table 2). However, the efficiency advantage of 3SLS over 2SLS is based on asymptotic results, and in finite samples this advantage could dissipate or even reverse, given that the finite-sample variation of the estimated covariance matrix is transmitted throughout the system in 3SLS. Furthermore, efficiency gains from 3SLS require the researcher to identify more than one instrument per endogenous mediator; if there is only one Q in the M equation per M in the Y equation then the system is exactly identified, in which case 2SLS and 3SLS are equivalent. Finally, misspecification of the M equation in 3SLS (or FIML) can cross-contaminate the Y equation (which is usually of primary interest), whereas 2SLS (or LIML) is largely immune to this problem.
In some cases the theory implies a prediction on the sign of ρ. In such cases, unlike 2SLS or the standard methods, 3SLS can be used to test this theoretical hypothesis since ρ is actually estimated. For example, in the partial mediation models of DeVaro (2006a, 2006b) that consider strategic promotion tournaments, the logic of tournament theory implies ρ < 0.
Some cautions apply to both 2SLS and 3SLS. For example, one should pay attention to the coefficient of Q in the M equation, since if this parameter is estimated with low precision the quality of the instrument, and hence the entire exercise, is called into question. This is because if the parameter is statistically insignificant it implies that Q does not belong in the M equation; in that case the Y equation is not identified. As a further cautionary note, even with good instruments, instrumental variable estimators like 2SLS and 3SLS that have good asymptotic properties can be problematic in finite samples (Nelson and Startz, 1990).
Acknowledgements
I thank numerous colleagues and former students for helpful comments, and I acknowledge with particular thanks detailed feedback from Dev Jennings and the rest of the SO! editorial team, and from three anonymous referees on an earlier version. I also gratefully acknowledge the thoughtful and swift responses of many authors cited in this essay to an informal email survey.
