Abstract
Numerous recommendations for improving the dependability of research findings in psychological science have emerged in recent years. Unfortunately, this glut of recommendations proves problematic for advancing uniform solutions to improve research dependability by individuals within and across the various constituencies that comprise psychological science. Moreover, several dependability recommendations propose practices to address the problem of false positive findings that can inversely increase the problem of false negative findings, and vice versa. It is proposed that the solution to this conundrum is to develop a concise and concrete list of core research dependability recommendations that address problems of false positive and false negative findings in a noncontradicting manner. Examination of the current research dependability literature and earlier literatures dealing with the problem of false findings is used to arrive at a set of three core research dependability recommendations: (a) conduct and promote direct replication studies; (b) ensure data sharing when requested by fellow scientists; and (c) adopt and promote a truth seeking mindset during the research endeavor.
A recent confluence of factors, including high-profile cases of research fraud by psychologists, questions regarding the dependability and replicability of scientific results, and concerns about the use of questionable research practices, has contributed to the emergence of a new movement to improve the scientific integrity of psychological science (Asendorpf et al., 2013; Funder et al., 2014; Ioannidis, 2005; John, Loewenstein, & Prelec, 2012; Ledgerwood, 2014a, 2014b; Pashler & Wangenmakers, 2012; Simmons, Nelson, & Simonsohn, 2011; Spellman, 2012). The primary goals of this movement, which I will refer to as the research dependability movement, are to illuminate problems in contemporary psychological science that undermine the integrity of empirical findings and to present potential solutions for how to address these problems. The movement appears to be gaining momentum as indicated by changes in the editorial policies of various prominent journals that appear designed to promote greater dependability in published findings (for several examples, see editorial policies at Journal of Personality and Social Psychology, Journal of Research in Personality, and Psychological Science).
Increasing support for the movement has spawned numerous commentaries that seek to address the dependability problem (e.g., Ledgerwood, 2014a, 2014b, and accompanying special section articles; Makel, 2014, and accompanying special section articles; Pashler & Wangenmakers, 2012, and accompanying special section articles; Spellman, 2012, and accompanying special section articles). Although several attempts to integrate solutions into comprehensive lists have been attempted (e.g., Asendorpf et al., 2013; Cumming, 2014; Funder et al., 2014), the number of proposed solutions and recommendations is growing quite large. A sampling of such recommendations based on a casual reading of various research dependability commentaries is provided in Table 1.
A Sampling of Recommendations for Improving Research Dependability
Note. All recommendations taken from articles denoted by
Unfortunately, this glut of recommendations for improving research dependability has created several problems. First, the high number of recommendations is overwhelming for those hoping to implement meaningful changes in their own research activities or training programs. Second, the high number of recommendations makes it challenging to coordinate meaningful responses within and across constituencies that make up psychological science (i.e., researchers, journal editors and reviewers, administrators of research support institutions, and educators). Together, these problems are fairly obvious in their potential impact: too many recommendations make it difficult to address research dependability challenges in a manner that is uniform among individuals within and across different constituencies.
However, a third less obvious problem exists with the numerous research dependability recommendations. Specifically, many are based on concerns that arise from two earlier movements in psychological methodology and statistics that rest on somewhat contradictory assumptions. When these recommendations are uncritically blended under the research dependability umbrella, they may promote conflicting behaviors that undermine the very goals the dependability movement seeks to achieve.
Given these problems, the purpose of the present commentary is to arrive at a concise list of core recommendations to improve the integrity of psychological science. This process was based on three guiding principles. First, and most importantly, core recommendations should be internally consistent in that they should lead to nonconflicting outcomes in addressing research dependability issues. Second, the number of recommendations should be small and their applicability broad enough to be plausibly considered in a uniform manner by individuals both within and across different constituencies. Third, although the recommendations may require intense effort and additional development to effectively implement in a concrete manner, recommendations should be elaborated enough so one can readily see how their intersection can mutually address the major goals of psychological science. One should be able to grasp the basic underlying rationale for each of the recommendations and be able to use them as broad principles to guide ongoing research activity, research training, review of research, or allocation of research resources.
A Tale of Two Psychologies
Before attempting to isolate a concise set of core research dependability recommendations, it is important to consider the issue of false findings in psychology. One can view the research dependability movement as a reaction to recent events and attention to problematic research practices (Funder et al., 2014; Ledgerwood, 2014a; Pashler & Wangenmakers, 2012; Spellman, 2012). However, it also is possible to view the dependability movement as an integration (intentionally or unintentionally) of concerns about (a) research transparency with (b) two earlier, and in some ways contradicting literatures—one documenting the problem of false-positive findings in psychology and one documenting the problem of false-negative findings in psychology.
False-Positive Psychology
The major premise of writings on false-positive psychology is that many reported empirical findings that appear to support a hypothesis are in fact instances of false-positive results. The issue of false positive results is almost always discussed in statistical terms (typically statistically significant rejection of a null hypothesis)—false positives occur when researchers interpret instances of Type I error as confirmations of theoretical hypotheses (Murayama, Pekrun, & Fieldler, 2014; Simmons et al., 2011).
Simmons et al. (2011) provided an influential contemporary examination of some potential contributing factors that may lead to false-positive findings. In one study, they randomly assigned participants to listen to either a control song (“Kalimba”) or a children's song (“Hot Potato”). Participants then responded to items asking how old they subjectively felt (a 5-point Likert scale ranging from very young to very old) and the age of their father in years. An analysis of covariance controlling for father's age produced a significant difference in reports of subjective age by music listening condition such that those listening to the children's song indicated feeling younger than did those listening to the control song. In a second, conceptual replication study, they found the same effect on a measure of participant's age in years after controlling for father's age in years, this time after listening to the same control song but a different “youthful” song (“When I'm Sixty-Four”). In a clever twist, the authors’ revealed how they produced apparently consistent results across the two studies through the use of flexible data analytic and reporting strategies (i.e., use of “researcher degrees of freedom”).
Finally, in a third study based on sampling from a simulated population, Simmons et al. (2011) reported the rate of statistical significance after randomly drawing 15,000 samples from a population with a true difference between conditions equal to 0. They found that increasing the number of flexible data analytic strategies increased the rates of Type I error (for p < .05, rates ranged from 9.5% to 60.7% depending on the type and number of flexible data analytic strategies conducted).
Although Simmons et al.'s (2011) findings are provocative, the seriousness of their implications for psychological science rests on two assumptions. The first assumption is that the type of replication Simmons et al. performed in their first two studies is typical of most replication studies performed in the literature (conceptual replication involving different operationalizations of both the independent and dependent variables). The second assumption is that the null hypothesis is often true. With regard to the first assumption, it is not clear that such flexibility is typical in instances where a priori hypotheses dictate specific patterns of data across more than two conditions (Murayama et al., 2014). Also, as noted by Simmons et al., this problem is mitigated when direct replications are conducted rather than conceptual replications. As for the second assumption, it may be that the null hypothesis is more often false than true. Indeed, the assumption that the null hypothesis is usually (and perhaps always) false is a major premise of influential commentaries concerning false-negative findings.
False-Negative Psychology
The major premise of writings on false-negative psychology is that many unreported findings in psychology that appear not to support a theoretical hypothesis are in fact instances of false-negative results. In statistical terms, what researchers often interpret as inability to confirm a theoretical hypothesis (typically failure to statistically reject a null hypothesis) are actually instances of Type II error (Cohen, 1962; Hunter, 1997; Sedlmeier & Gigerenzer, 1989). The empirical evidence for this claim is most clearly seen in meta-analyses documenting the relatively low sample size and degree of statistical power routinely found in published psychology studies (Cohen, 1962; Marszalek, Barber, Kohlhart, & Holmes, 2011; Sedlmeier & Gigerenzer, 1989; Wetzels et al., 2011).
One point made in prominent false-psychology commentaries is that the likelihood that the null hypothesis is true is virtually guaranteed to be false (for a fairly elaborate discussion of this point, see Cohen, 1994, and Meehl, 1990). Given this state of affairs, the major concern with psychology research is inadequate statistical power to (a) reject null hypotheses that are in reality false, and (b) estimate the magnitude and direction of true effects (Gelman & Carlin, 2014). Thus, interval estimation is often touted as an ideal data analytic approach in the false-negative psychology literature because confidence intervals summarize a range of reasonable estimates of the population parameter given the sample data (Cohen, 1994). The range is considered reasonable because the long-run chance of capturing the parameter value in similarly constructed confidence intervals based on random sampling from the same population is equal to the level of confidence (Cumming, 2014 offers a nice example of this statistical phenomenon).
No doubt interval estimation yields more descriptive information about the potential magnitude of effect, but it does not allow for unambiguous determination of whether effects are theoretically meaningful (Morey, Rouder, Verhagen, & Wagenmakers, 2014; Prentice & Miller, 1992). Instead, one must rely on the introduction of thresholds established by experts who determine the range of effects that should be considered important or unimportant (Cohen, 1992; Cumming, 2014; Fidler & Cumming, 2014; Greenwald, 1975). But as others remind us, sometimes just the occurrence of an effect, regardless of its magnitude, may be theoretically or practically important (Abelson, 1985; Mook, 1983; Prentice & Miller, 1992; Rosenthal & Rubin, 1979; Yeaton & Sechrest, 1981). If any size effect has the potential to be meaningful, then one encounters a conundrum. It seems to suggest that all we can hope to achieve in psychology is a kind of descriptive summary of the various effects sizes that accompany the myriad conditions in which mental processes and behaviors are measured (for a similar point see Morey et al., 2014). The search for organizing principles governing mental processes and behavior becomes fruitless.
The debate over the value of interval estimation is not the only manifestation of a clashing of false-positive and false-negative assumptions. As mentioned earlier, several prominent research dependability recommendations forwarded to address one type of false finding may in turn promote the other type. Examples include recommendation to adjust for alpha inflation when making multiple comparisons (reduces Type I error, increases Type II error); to collect additional data if a marginal trend is observed, but p > .05 (reduces Type II error, but increases Type I error); and to rely on expert opinion to determine the meaningfulness of an effect or an estimated range of interval values (likely to reduce Type II error, but increase Type I error if the expert believes in an effect; likely to reduce Type I error, but increase Type II error if the expert does not believe in the effect).
Fortunately, it may be possible to resolve the apparent contradiction in assumptions that constitute the false-positive and false-negative psychology viewpoints by considering a broader conceptualization of what psychological science seeks to accomplish. Specifically, a major goal of psychological science is to determine the general organizing principles that give rise to and guide mental processes and behavior. These organizing principles may be conceptualized in terms of causal links among abstract theoretical constructs (Cronbach & Meehl, 1955). In psychological science inferences about the causal links among the theoretical constructs are made based on the observed associations among operational measures. In other words, one seeks to understand the links among constructs at a theoretical-conceptual level using observations at an operational-measurement level. As will be seen, one can understand a false finding as a failure to bridge the two levels of analysis. This conceptualization can then be used to identify core recommendations for research practice that will reinforce rather than weaken this bridging between levels.
Guiding Principles for Selecting Core Recommendations
Guiding Principle 1: Core Recommendations Must Address False-Positive and False-Negative Findings in an Internally Consistent Manner
As noted earlier, one challenge to integrating false-positive and false-negative psychology viewpoints is that they fundamentally espouse certain contradictory assumptions. The false-positive viewpoint argues that the likelihood of variable association is uncommon yet researchers are producing findings that falsely reject null hypotheses. Alternatively, the false-negative viewpoint argues that the likelihood of variable association is common yet researchers are producing findings that fail to reject the null hypothesis. At the operational-measurement level of analysis the false-negative viewpoint may well be correct given the problem of various forms of “crud” inherent in psychological measurement (e.g., method biases, lack of one-to-one correspondence between conceptual variables and operational variables; Cohen, 1994; Meehl, 1990). The presence of this crud factor can lead to “significant” association among different measurements, particularly with large study samples. There may be methodological techniques and designs available to some experimentalists working in well-controlled laboratory settings that can substantially mitigate some forms of measurement crud. Yet, across the broad spectrum of psychology, this problem seems generally unavoidable.
Although the issue of ubiquitous measurement association at the operational-measurement level of analysis seems more consistent with the false-negative viewpoint, it is not at all clear that excessive association exists among conceptual-theoretical constructs or that the concept of a statistical null hypothesis even applies at the conceptual-theoretical level. Indeed, at the conceptual-theoretical level constructs either possess causal associations among one another or they don't. Only when operational measures are used as indirect indices of constructs and are then subjected to statistical data analysis does the concept of a null hypothesis have meaning. In other words, the concept of a null hypothesis applies when considering parameters of populations made up of measurements, which are then used to infer directional associations (and potentially association magnitudes) among theoretical constructs. Moreover, if there are indeed general organizing principles of mind and behavior, then the existence of such organizing principles necessitates a relatively infrequent likelihood of association among psychological constructs at the conceptual-theoretical level even if the frequency of association among psychological measures is relatively high in comparison. In terms of Cronbach and Meehl's (1955) concept of a nomological network, what I am suggesting is that the existence of general organizing principles assumes that the relations among theoretical constructs are relatively sparse even though the relations of theoretical constructs to observables, and thus the relations among observables, may be fairly numerous.
The major implication of this reasoning is that it is incorrect to cast concerns about false-positive psychology and false-negative psychology in terms of whether a statistical null hypothesis is likely to be true or untrue (for similar warnings, see Fiedler, Kutzner, & Krueger, 2012, and Ioannidis, 2012). Instead, it is better to cast such concerns in terms of whether empirical associations (or lack thereof) among measures accurately reflect the reality of associations (or organizing principles) among constructs. A false-positive finding thus occurs when an empirical association produced by a study is interpreted as meaningful even though no true association exists among the theoretical constructs the study seeks to model. A false-negative finding occurs when an empirical association produced by a study is interpreted as not meaningful even though a true association does exist among the theoretical constructs the study seeks to model. In other words, false findings are best conceptualized as a mismatch between a data pattern and the true construct pattern the data seek to model. But if only data patterns are available for inspection, then how does one determine whether there is a mismatch between the data pattern and the true construct pattern? The solution is to determine whether the data pattern can be replicated. If the data pattern replicates, one may infer linkage among the modeled theoretical constructs (i.e., one may infer the presence of an organizing principle; Schmidt, 2009). 1
Habitual replication of research thus constitutes a major solution to address concerns about false findings. The value of replication is that it generates a comparable set of observations that speaks to the relative consistency in data regardless of how one chooses to express the pattern descriptively or statistically so long as a systematic approach to pattern expression is used across replication studies. Indeed, replication is widely advocated in both positive-psychology and negative-psychology literatures and the newer research dependability literature (Asendorpf et al., 2013; Bakker, Van Dijk, & Wicherts, 2012; Braver, Thoemmes, & Rosenthal, 2014; Cohen, 1994; Cumming, 2014; Frank & Saxe, 2012; Ioannidis, 2012; Kerr, 1998; Koole & Lakens, 2012; Makel, 2014; Murayama et al., 2014; Nosek, Spies, & Motyl, 2012; Open Science Collaboration, 2012; Pashler & Harris, 2012; Schmidt, 2009; Simmons et al., 2011; Stanley & Spence, 2014). The proposed advantage of replication is that consistency in data patterns can be used to rule out false findings depending on the nature of the consistency. Specifically, consistency in lack of association (effect magnitudes dispersed around zero) tends to rule out concerns about false-negative findings, whereas consistency in association direction (effect magnitudes dispersed around values located above or below zero) tends to rule out concerns about false positive findings.
Guiding Principle 2: Core Recommendations Should be Addressable by Multiple Constituencies
Examination of the recent dependability literature reveals a plethora of recommendations directed specifically at practices of individual researchers, some of which also may be directed at individuals making up other constituencies that comprise psychological science (see Table 1). Of these, at least three have been advocated, explicitly or implicitly, as addressable by multiple constituencies (Asendorpf et al., 2013; Funder et al., 2014). These recommendations include (a) conduct replications; (b) share data; and broadly speaking, (c) promote a culture or research mindset of “getting it right.”
In this set of recommendations, replication immediately stands out—not only is it widely considered useful as a means for addressing false findings, but it is a recommendation that could be targeted by individuals across constituency. Thus, replication makes a promising candidate for retention in a concise, core list of research dependability recommendations. Like replication, the recommendation to share data also reoccurs quite frequently in research dependability commentaries (Asendorpf et al., 2013; Cumming, 2014; Funder et al., 2014; Giner-Sorolla, 2012; Nosek et al., 2012; Simonsohn, 2013; Wicherts, Borsboom, Kats, & Molenaar, 2006) although there is debate about the degree to which unsolicited data-sharing should be enforced or required. What is clear, however, is that promoting data sharing when appropriate could be a guiding principle implemented across constituencies. “Getting it right” as a research mindset is less commonly mentioned, but is noted as something addressable across constituencies in two comprehensive lists of dependability recommendations (Asendorpf et al., 2013; Funder et al., 2014). All three of these recommendations, or some variant of them, are potential contenders that could be promoted consistently by individuals within and across constituencies and they need not generate contradicting implications for addressing false psychological findings if elaborated properly.
Guiding Principle 3: Core Recommendations Should be Amenable to Clear Elaboration
Replication, sharing data, and advancing a research mindset of “getting it right” appear promising core research dependability recommendations assuming they can be elaborated clearly enough to guide attempts at more concrete implementation within and across different constituencies. However, even in recent commentaries pushing dependability in psychological research, these recommendations often are discussed in an overly general manner. Replication can refer to relatively exact, direct forms of replication or to highly conceptual forms of replication. Data sharing can range from agreeing to share data if they are requested by others to public posting of all data whether the data are published or not. Getting it right seems to involve adopting and teaching all the other dependability-in-research recommendations. But which of the myriad recommendations should be advocated to get it right? One must be cautious of the aforementioned problems of conflict among recommendations and recommendation overload. Despite limitations in the clarity of these three recommendations as they have been discussed previously in the literature, they are amenable to clearer elaboration. Thus, what follows is discussion of these three recommendations in a manner designed to promote (a) clarity of the rationale for each recommendation and (b) clarity regarding what the recommendation does and does not entail. Ideally, this discussion would provide clearer direction in guiding attempts to implement the recommendations in ongoing research practice, training, and support.
Elaboration of the Core Dependability Recommendations
Core Recommendation 1: Conduct and Promote Direct Replication
As noted previously, replication is important because it provides information about whether general organizing principles exist among conceptual-theoretical constructs. Information is obtained from replication studies by demonstrating either consistency or inconsistency in a data pattern assuming the same approach to creating and analyzing the data pattern is used in both studies. However, when different approaches to creating and analyzing data patterns are used, inconsistency in findings across studies immediately results in questions about the possible presence of false-positive and false-negative findings. For example, suppose two replication studies are conducted using different methods, different data analytic approaches, or both. If the results of one study suggest an effect relatively close to zero, whereas the results of the other study suggest an effect relatively far from zero, then one must entertain the possibility that the first study has produced a false-negative finding, the second study has produced a false-positive finding, or both. Alternatively, if both studies generate effects relatively far from zero but in opposite directions, then one must entertain the possibility that both studies have produced false-positive findings.
But here's the kicker: in the event both studies generate consistent findings, one must entertain the possibility that the consistency results from false findings perpetuated by flexible data collection or analysis that propagates an appearance of consistency. Simmons et al. (2011) demonstrated this problem with false findings by using (a) data collected from actual populations with effects of unknown value and (b) simulated data obtained when the effect in the population was truly zero. Interestingly, it may be that the actual population effects at the operational level in Simmons et al.'s real studies were nonzero, but the authors clearly showed how apparent consistency in data patterns could be increased via use of flexible methodology and reporting of findings. Moreover, there is no reason to believe that overly flexible research practices only proves problematic in inflating apparent consistency in nonzero effects across replications. One could easily use such practices to promote the apparent consistency in effect sizes dispersed around zero in situations where one is motivated to suppress undesired but reoccurring data trends.
A replication continuum
The key in using replication studies to address questions of research dependability is to ensure that consistency in findings cannot be easily explained away by differences in methodological or data analytic practices that are confounded with the different studies. There are three ways to address this: (a) ensure the same methodological and data analytic practices are used in each replication study, (b) conduct many replication studies so that aggregation overcomes the noise produced by random error and use of different methods and analyses, or (c) a combination of both. Recent simulation work on the effect of sampling error and measurement error on replication and how one empirically demonstrates consistency (or lack thereof) across replication studies (Braver, Thoemmes, & Rosenthal, 2014; Stanley & Spence, 2014) suggests that the third approach would most efficiently and effectively undermine the impact of false findings. In contrast, the second approach absent the fist approach would likely be highly inefficient because empirically ascertaining consistency across replication studies requires overcoming differences across studies produced by measurement error and sampling error, let alone differences in methodological and analytic practices. To empirically ascertain consistency across replication studies when this third form of systematic error is present requires acquiring an even higher number of replication studies.
The implication so far is that if one wishes to combat false findings, then one should rely on direct replication rather than conceptual replication (for a discussion of both replication types see Makel & Plucker, 2014; Schmidt, 2009). A direct replication is an attempt to repeat a study in as exact a manner as possible. One seeks to conduct the replication study by sampling from the same participant population, and by using the same operations, the same setting, and the same data analytic approach as the first study. A variation of direct replication is extension replication (called paradigmatic research by Nosek et al., 2012, a follow-up study by Schmidt, 2009, or a replication and extension by Reiter-Palmon & Tinio, 2014). With extension replication, one nests a direct replication in a larger design that includes more levels of the same operations or includes one or more new operations that may moderate or further inform understanding of the direct replication findings. In extension replication studies the sampled participant population, setting, and a subset of operations are usually unchanged. Conceptual replication is when one seeks to conduct a second study in which one systematically varies the population from which one is sampling, the operations, the setting, the data analytic approach, or a combination thereof, while still seeking to model the same conceptual-theoretical constructs and their relations as those modeled in the first study.
Generally speaking, one can imagine a replication continuum ranging from an anchor of “exact” on one side to an anchor of “maximally divergent” in setting, population, operations, and analysis on the other side. Direct replications would exist near the exact side of the continuum, followed by extension replications, followed by conceptual replications, which would span the rest of the continuum up to the maximally divergent anchor. Assuming no outright fraudulence, consistent but false findings are more likely to occur in the process of replication when one moves farther away from the “exact” side of the replication continuum toward the maximally divergent side because one can intentionally or unintentionally stack the deck in favor of showing consistency via researcher degrees of freedom (Simmons et al., 2011). In other words, the more potent the conceptual replication the less dependable findings are likely to become across replicated studies. Given the need to examine multiple replication studies to empirically ascertain the degree of consistency in results (Stanley & Spence, 2014), this principle implies that a set of direct replication studies will tend to have higher dependability than will a set of conceptual replication studies. Thus, the set of direct replications will better combat the problem of false findings than will the set of conceptual replications.
In arguing that direct replications will tend to result in higher dependability of findings than will conceptual replications, it is important to acknowledge that this is a relative claim. Even in cases where researchers seek to produce direct replications, there are a number of factors that can give rise to instances where findings are not repeated even if one of the sets of findings is true. Besides fraudulence, the two most obvious factors include (a) shoddy or inadequate implementation of research procedures, which undermine the validity of the procedures for one or more of the replication studies, or (b) inability to perfectly duplicate all procedural factors across studies due to subtle differences in participants, researchers, contextual factors, and the like that are difficult to identify (i.e., inability to duplicate sampling decisions; see Brown et al., 2014). The challenges of direct replication, even when one seeks to duplicate research as exactly as possible, have long been known (Smith, 1970), and a sizable degree of variability in direct replication effects can be linked to differences in the laboratory conducting them (McShane, & Böckenholt, 2014). In other words, direct replication is an imperfect tool for assuring dependability of findings, but it is less imperfect than conceptual replication.
The claim that direct replication produces more dependable findings across replicated studies than does conceptual replication seems contrary to conventional wisdom that conceptual replication is preferable to direct replication (Dijksterhuis, 2014; Neulip & Crandall, 1990, 1993a, 1993b; Stroebe & Strack, 2014). However, most arguments advocating conceptual replication over direct replication are attempting to promote the advancement or refinement of theoretical understanding (see Dijksterhuis, 2014; Murayama et al., 2014; Stroebe & Strack, 2014). The argument is that successful conceptual replication demonstrates a hypothesis (and by extension the theory from which it derives) is able to make successful predictions even when one alters the sampled population, setting, operations, or data analytic approach. Such an outcome not only suggests the presence of an organizing principle, but also the quality of the constructs linked by the organizing principle (their theoretical meanings). Of course this argument assumes that the consistency across the replicated findings is not an artifact of data acquisition or data analytic approaches that differ among studies. The advantage of direct replication is that regardless of how flexible or creative one is in data acquisition or analysis, the approach is highly similar across replication studies. This duplication ensures that any false finding based on using a flexible approach is unlikely to be repeated multiple times.
Consider nested replication
Does this mean conceptual replication should be abandoned in favor of direct replication? No, absolutely not. Conceptual replication is essential for the theoretical advancement of psychological science (Dijksterhuis, 2014; Murayama et al., 2014; Stroebe & Strack, 2014), but only if dependability in findings via direct replication is first established (Cesario, 2014; Simons, 2014). Interestingly, in instances where one is able to conduct multiple studies for inclusion in a research report, one approach that can produce confidence in both dependability of findings and theoretical generalizability is to employ nested replication. In nested replication, one combines sets of direct replications that differ conceptually from one another. For example, in contrast to conducting four conceptual replication studies, one could conduct a direct replication of Study 1 as Study 2. In the event that consistent results across the two studies occur, one could then conduct a conceptual replication as Study 3 and then a direct replication of Study 3 as Study 4. The conceptual replication of nested, direct replications provides information both about dependability of findings (any consistency between Study 1 and Study 2 and between Study 3 and Study 4 offer evidence of this) and theoretical generalizability of findings (any consistency across the two nests of findings offer evidence of this). Moreover, in attempts to ascertain consistency in replications empirically (Stanley & Spence, 2014), the nests could in turn be coded and used to distinguish variability across studies attributable to differences in methodological and analytic practices from variability attributable to sampling and measurement error.
Unlike a nested replication approach, use of only conceptual replication raises more serious questions about whether any consistency in findings is an artifact of differential data acquisition or data analysis. Moreover, inconsistency in findings between conceptual replication nests, but consistency in studies within conceptual replication nests, suggests the presence of a moderator or need to elaborate theory, whereas inconsistency between conceptual replications of individual studies is more ambiguous. This type of inconsistency may be the result of a moderator or need for more elaborate theory, or it may be the result of false findings in one or more of the studies. Thus, Asendorpf et al. (2013) are correct in their claim that replication can enhance theoretical generalizability; but this will only be true if the process of replication involves systematic use of direct replication to ensure dependability in research findings across subsets of individual studies.
Additional advantages of direct replication
Several other potential virtues of direct replication deserve mention as well. First, direct replications may be amenable to aggregated analyses in which data from replicated studies are combined to perform higher power statistical tests or to construct more precise interval estimates. Examples of approaches to aggregating data analysis include meta-analysis, a proposed remedy for dealing with false findings often cited in the dependability literature (Braver et al., 2014; Cumming, 2014; Funder et al., 2014; Stanley & Spence, 2014), as well as hierarchical linear modeling and structural equation modeling in which raw data from multiple replication studies could be combined but coded by individual study. Unlike meta-analyses, a raw data aggregation approach may be more useful to those conducting research in which patterns of findings are not readily reducible to simple correlations or differences in a dependent variable based on comparison of two conditions (e.g., contrasting of means across more than two conditions, examining interactions, examining statistical mediation). Consequently, approaches to aggregate analyses that combine raw data deserve more discussion as means for evaluating replication of more complex data patterns.
A second advantage of direct replications is that they can protect against fraudulent findings (Schmidt, 2009), particularly when different research groups conduct direct replication studies of each other's research. Stroebe and Strack (2014) make a compelling argument that direct replication is unlikely to prove useful in detection of fraudulent research. However, even if a fraudulent study remains unknown or undetected, its impact on the literature would be lessened when aggregated with nonfraudulent direct replication studies conducted by honest researchers.
A third virtue of direct replications is that unlike conceptual replications they narrow the field of possible explanations when there is a failure to repeat findings. Generally speaking, the more exact the direct replication is to an earlier study, the fewer methodological or theoretical explanations exist for a failure to repeat findings. In the event no plausible methodological difference for the inconsistent results can be ascertained, one may more quickly abandon a line of research for more promising avenues. In the event plausible theoretical or methodological differences for the inconsistent results can be ascertained, they will tend to be relatively low in number and so can be evaluated in follow-up studies more quickly. Inconsistent results across direct replications can thus promote more efficient decision making in the planning of future research than can inconsistent results across conceptual replications.
How to promote direct replication
Overall then, Core Recommendation 1 is an appeal to conduct and promote direct replication. If possible, researchers should (a) seek to conduct direct replications of their own work; (b) seek to conduct direct replications of others’ work; and (c) write thorough Methods sections, create well labeled data sets, and maintain sharable records of research materials and procedural scripts to assist the direct replication attempts of one's own work by others. For example, Methods sections should consist of the following, much of which seem absent in many descriptions of methods found in even the more comprehensive contemporary research reports: detailed procedures, basic participant characteristics, basic researcher characteristics (e.g., researcher demographics, level of training, description of researcher blocking procedures), span in time of data collection, and description of research location (for a list of additional forms of participant, researcher, and procedural information that deserve mention see Brown et al., 2014; and Klein et al., 2012).
With regard to other constituencies, journal editors and reviewers should facilitate publication of direct replication attempts, either in the form of direct replication study sets within single papers or direct replications of previously published work. Editors and reviewers should request detailed methods sections and procedural scripts of all published studies, which should be no problem given the ability to post electronic supplementary material. Ideally, a brief report, direct replication section should be included as a regular content portion in each issue of most journals. Granting agencies should fund direct replications of studies as part of an initial grant if financially feasible, and should also develop funding mechanisms to promote direct replications of previously funded studies by research groups other than those originally funded. The attainment of publications or grant funding should be a basis for promotion regardless of whether it involves novel research or direct replication research.
Undoubtedly, the ease of producing direct replication studies will be determined in large part by the cost in time, money, and other resources necessary to conduct any given type of research (Russ, 2014). In turn, these factors will vary by the type of research one hopes to replicate (e.g., longitudinal vs. single session experiments, running participants in groups vs. running participants individually, awarding participants with research credit vs. paying participants, etc.), and so standards may need to be determined as a function of psychological subdiscipline. For example, constituencies specializing in promoting developmental and clinical research may choose to focus more on direct replication across research groups, whereas constituencies specializing in promoting social, cognitive, and neuroscience laboratory research may choose to focus more on direct replications conducted by the same research group. Nonetheless, mechanisms for facilitating and rewarding direct replication, either within research groups or across research groups, should be implemented within institutions supporting or funding all branches of psychology.
Core Recommendation 2: Share Data if Requested
As noted in Table 1, recommendations to share data occur across the different constituency categories. The degree of the recommendation varies from being open to sharing data, to publically promising to share data, to requiring all data be made publically available. Although one might opt for the strongest form of the recommendation because it would maximize data transparency, doing so may come at costs that are not well defined presently (Ioannidis, 2012). For example, Cumming (2014) argues that all data should be made publically available, yet fails to consider the psychological and practical implications for researchers and the ethical concerns that might arise by doing so. What happens if the findings permit triangulation of individual participant identities based on uncommon or unique response patterns? When participants are told their responses will be confidential or anonymous, I suspect most don't assume their responses will be readily available for anyone to view even if their responses are deidentified. How does this impact willingness to participate or researchers’ ethical obligation to keep participants’ individual data as private as possible?
Potential problems with strong data sharing recommendations
Ironically, strong data transparency recommendations (public posting of all data) may restrict researchers from overly flexible data analytic practices that promote false findings at the publication or data dissemination phase, but there is no guarantee that such practices would be kept in check by others who subsequently access the data. Should researchers who went through the effort and cost to collect their data have to operate in a more restricted manner in how they analyze and report their original findings than others who can acquire an effort-free data set? What about nonexperts? Should they be granted full access to the data even though they may have little understanding of the research context that gave rise to the data or lack the training to competently interpret them? Do researchers have a responsibility to vet their own work before dissemination of the data? If a scientist questions the dependability of his or her work after acquiring and analyzing the data should the scientist still release the data? Proponents of strong transparency recommendations may overestimate the degree to which mistakes, blunders, and incompetence undermine data quality, especially when it involves work with unseasoned researchers in training. In some instances, maybe even in many instances, research is just bad. One encounters poor quality research and hopes it never sees the light of day. Indeed, there may be an ethical obligation to make sure it doesn't.
Simonsohn (2013) in particular extols the advantages of public posting of all data, of which one advantage is that it permits scrutiny of data patterns that may lead to detection of fraudulent research. Although most scientists would agree that fraudulent findings are damaging to the field, it is a bit unsettling to think that someone is snooping through your data with the assumption they may be fraudulent. This seems to promote an implicit sense that scientists are untrustworthy. Moreover, required public posting also does not guarantee the originator of the work the opportunity to discuss it with interested parties who may be unprepared to fully appreciate the scope or technical aspects of the research.
How to promote reasonable data sharing
These are some of the issues that need to be considered carefully before attempting to select from among the different data sharing recommendations offered in the dependability literature. There is no question that refusal to share data with fellow scientists undermines the spirit of science and undermines identification of unintentional or intentional misreporting of data (Sakaluk, Williams, & Biernat, 2014). Nonetheless, it appears that obtaining access to the data of fellow scientists for purpose of confirming findings can be fairly challenging (Wicherts et al., 2006). Yet, one wonders whether unfettered access to data may undermine the spirit of science in other ways. Psychological science is difficult work and scientists deserve some ownership and personal representation of the fruits of their labor. For these reasons it may be prudent to address the issue of data transparency in a more moderate manner instead of adopting a full public release approach. Instead, researchers should commit to share data and materials when requested by others if doing so does not overly violate participants’ confidentiality. Ideally, any request to share should be directed in a manner that permits the originator of the data the opportunity to provide contextual background about the research process, speak to any concerns about the dependability of the data, and ascertain the reasons for the request. Journal editors, as well as administrators of institutions that support and fund research, should require a public commitment by authors to share data and materials if requested (a checkbox of some sort during the submission or grant application phase). When requested to share data, failure to honor the commitment in a timely manner should be reported to journals. If investigation reveals unreasonable unwillingness to share, then it should be grounds for retraction of published findings by journal editors.
A commitment to share data and materials when requested to do so in a reasonable fashion would promote conducting direct replication studies (Core Recommendation 1), and would improve research transparency without leading to false accusations of scientific impropriety (for a possible example of such a situation see Francis, 2012 and responses by Galak & Meyvis, 2012 and Simonsohn, 2012). The data sharing policy adopted by the Society for Personality and Social Psychology (Funder et al., 2014) is an example of policy that if widely adopted would go a long way toward meeting this recommendation.
Core Recommendation 3: Promote a Research Mindset of “Seek the Truth”
One recommendation found within most of the constituency categories identified in Table 1 is to promote a mindset of “getting it right.” Its elaboration in the dependability literature tends to be vague, but essentially boils down to “follow, promote, and teach all the other recommendations designed to address research dependability” (for discussions of getting it right, see Asendorpf et al., 2013, and Funder et al., 2014). The major problem with this approach is that many of the recommendations have been inadequately vetted to determine whether they would address the issue of false findings in an internally consistent manner.
For example, a recommendation high on most research dependability lists is to avoid flexibility in data analysis. In the context of null hypothesis statistical testing (NHST), flexibility in data analysis is often bemoaned because it can promote false positive findings by generating high rates of Type I errors. These errors are then exploited strategically to support a hypothesis (Simonsohn, 2012). However, inflexibility in data analysis can promote false negative findings as well (Fiedler et al., 2012). A good example of this is when one staunchly applies a post hoc correction to alpha, which, although it reduces the likelihood of Type I error, does so at the expense of increasing the likelihood of Type II error. Yet, if a researcher more flexibly conducts both corrected and uncorrected analyses on his or her data, the researcher may form a truer sense of what the data demonstrate. Some patterns may be less stable than others depending on the data analysis strategy adopted, whereas others may be fairly stable regardless of analysis approach. On the one hand, this is useful information when one seeks a true understanding of the data (as will be discussed in more detail below). On the other hand, this is also useful information when one seeks to confirm favored hypotheses and suppress nonfavored hypotheses. The real problem is not flexibility in data analysis. The real problem is in the researcher's motivation when data analysis occurs.
Seek the truth
Instead of an approach to “getting it right” that focuses on teaching and supporting “correct” data generation and analysis techniques, the approach advocated here focuses on teaching and supporting a “correct” orientation to the process of generating and analyzing data. This approach has a simple mantra: seek the truth. To seek the truth one must begin the research enterprise with a clear sense of the ultimate purpose for collecting and analyzing data in science. This ultimate purpose is to develop a true understanding of natural phenomena. The ultimate purpose of science is not to test, derive, confirm, or disconfirm theory, nor is it to estimate population parameters. These are means to an end. They are instrumental steps on the way to arriving at a true understanding of natural phenomena, which in psychological science is a true understanding of mental processes and behavior.
The problem of effort justification in research
A major challenge in navigating the research endeavor is that search for truth can become lost in search for justification of the effort expended to conduct psychological research. Successful execution of the research endeavor involves the following, albeit simplified sequence of tasks: develop the ideas that stimulate the research, design research methodology and materials, submit to and obtain permission from ethics review boards to conduct the research, train research personnel, collect data, analyze the data, and then determine whether the fruits of one's labor actually result in increased understanding. The expenditure of effort is tremendous and its magnitude grows larger as the research endeavor continues to unfold. In the event that all this effort results in findings that are in some way equivocal (frequently guaranteed given the problem of random error), inconsistency between the perceived degree of effort and the perceived benefit of one's effort occurs. Essentially, psychological research is routinely an exercise in the production of cognitive dissonance (Festinger, 1957).
Based on this viewpoint regarding the psychological nature of the research endeavor, it is not surprising so many “questionable research practices” tend to occur during the later stages of research (John, Loewenstein, & Prelec, 2012; Simmons et al., 2011). Such practices may stem from a strong need to justify the intense time and resources the individual researcher has devoted to successfully move from beginning to end of the research process. A number of factors are expected to increase need to justify research effort. For example, the greater the equivocalness in the findings and the greater the expenditure of resources (material or behavioral) necessary to conduct the research, the greater will be the need to justify effort. Presumably those higher in a research hierarchy, who have expended personal time and resources, as well as that of others lower in the hierarchy, will be the most prone to experience a need to justify research effort.
Justification of effort is not the only impediment to seeking the truth. For most scientists research outcomes are necessary to (a) achieve performance standards necessary for career advancement, (b) establish reputation and prestige, or (c) acquire financial rewards. When one or more of these goals becomes a primary driving force for conducting research, truth seeking also may become imperiled. Moreover, when these goals are sought in what some argue is a current climate of extreme incentive and pressure to accumulate publications (McBee & Matthews, 2014) in journals that maintain unreasonable publication standards (Maner, 2014; Nosek, Spies, & Motyl, 2012), the barriers to truth seeking seem imposing indeed.
Admittedly, justification of effort may be less potent in its influence on truth seeking compared with some of these other pressures. However, justification of research effort as a source of problematic research practices may be more pervasive, harder to identify in others, and even harder to identify in oneself. It can occur within the most successful and prestigious scholars as well as the novice student of psychology who is conducting a study for the first time. Even if the field eliminated all the incentive structures that promote the modern publish-produce-prestige culture in academic psychology, justification of research effort would persist in undermining truth seeking because research will continue to require high levels of researcher effort. Moreover, eliminating such incentive structures is likely only possible by actions of individuals making up journal/reviewer and research support constituencies. Thus, addressing such incentive structures would not make for a compelling core recommendation as it is likely not addressable by individual researchers and educators. But finding ways to combat the justification of research effort is more addressable by multiple constituencies and would likely serve to reduce the influence of such incentives structures by strengthening the potency of truth seeking motivation.
How to promote truth seeking over justification of research effort
One solution to the problem of research effort justification is to reflect on whether one has markedly deviated from the ultimate goal of truth seeking by evaluating the quality of the research process upon its completion. Unfortunately, this reflection approach would be easy prey to motivated reasoning in service of effort justification processes in a manner similar to what likely occurs in cases where desire to achieve performance standards, prestige, or financial rewards is high (Nosek et al., 2012). Instead, the approach to dealing with effort justification advocated here is to explicitly consider (a) how best to begin the research endeavor in a way that will promote the generation of truthful findings at its end, and (b) how best to minimize and evaluate the potential departure from truth seeking. A consideration of the first point quickly leads to ensuring that conditions underlying Core Recommendations 1 and 2 are met—if requested, the findings should be available for inspection by other scientists and should be acquired in a manner that would be methodologically repeatable by one's self and other scientists. Thus, the issues of replicability and data sharing begin as the groundwork for what's to come rather than afterthoughts on completion of the research endeavor.
The second point is tackled in an ongoing manner by asking research questions designed to bolster truth seeking during the research process. These questions are entertained at every stage of the research endeavor. One must be cognizant that at any point truth seeking may be undermined by need to justify research effort, possibly without one's awareness. To illustrate this approach, consider the following potential conceptualization of the research endeavor: (a) examine existing work and literature; (b) develop or identify theoretical hypotheses; (c) develop research design and predictions; (d) train research personnel; (e) collect data; (f) analyze data; (g) disseminate findings. A nonexhaustive list of specific questions that bolster truth seeking based on these seven stages can be found in Table 2.
Examples of Questions That Bolster Truth Seeking at Different Stages of Research
Implications of a truth seeking approach to the research endeavor
Essentially, a truth seeking-in-stages orientation to “getting it right,” as opposed to a best practice approach, moves the individual's primary focus of concern away from behavior to motivation. The primary concern shifts from identifying what are correct and incorrect research behaviors to identifying which actions will best facilitate achieving the ultimate goal of generating a true understanding of the phenomenon. From a truth seeking-in-stages orientation, what constitutes a correct or incorrect research behavior is context dependent. At any given stage of the research endeavor, what is correct or incorrect research behavior is dictated by whether the behavior will enhance or undermine the dependability and trustworthiness of one's results and their interpretation. Implicit in this approach to truth seeking is the idea that one must work to undermine a need to justify research. A truth seeking mindset also is consistent with the first two core recommendations and the assumptions underlying them (e.g., one should seek to protect against both false positives and false negatives, one should seek to replicate whenever possible, one should be willing to share data with other experts). Interestingly, however, a truth seeking-in-stages orientation leads to some counterintuitive implications that ironically question the sanctity of some best practice behaviors identified in previous research dependability commentaries. Two of these implications are discussed below.
Hypotheses can undermine truth seeking
Many argue that the ideal way to conduct research is to employ a hypothetico-deductive approach (Kerr, 1998; Lewin, 1997; Murayama et al., 2014). Indeed, it is probably safe to say that most psychological researchers would argue that alternative approaches are to be avoided. However, the hypothetico-deductive model of scientific research is double-edged: nowhere is the problem of justification of research effort more problematic than when one seeks to test theory. In addition to the myriad steps necessary to conduct research, testing theory (or theoretically derived hypotheses) requires and extra level of expenditure of effort at the idea development stage of research. One must read about theory, understand it conceptually, consider ways to evaluate it, and finalize an empirical plan to measure the variables captured within the theory in a fairly strict, but valid manner. Routinely, deriving hypotheses involves establishing how one's own approach to testing or applying theory is novel and unique and how the theory is different from “rival” or alternative theories. Not infrequently, establishing hypotheses is indicated by public disclosure of one's particular theoretical viewpoint and hypotheses in the form of lab or class presentations, research colloquia, and master's and doctoral proposal defenses. By the time one is ready to begin data collection, he or she likely has acquired a personal stake in the veracity of the hypotheses of interest.
Not only is more effort expended on the idea generation side of the research endeavor, but an intense hypothetico-deductive perspective creates greater peril during data analysis. In the event that clear understanding is not readily achieved after initial data analysis, one must grapple with the realization that one's cherished hypothesis or theory may be wrong or imperfect. As intense dissonance ensues, one begins to seek interpretations of the data that will create a perception of unequivocal understanding and meaning so as to justify the time and resources expended, even if that necessitates turning a blind eye to some patterns in the data and overly attending to others. The final result is intentional or unintentional analysis and interpretation of data that ultimately serves the researcher's psychological need to justify effort rather than seek truth.
As can be seen with some of the questions under Stage b (Determine theoretical hypotheses) in Table 2, a truth seeking-in-stages orientation to research can lead to a fairly intense evaluation of hypotheses and alternative hypotheses. This evaluation involves more than just selecting hypotheses to test, but also involves more fundamental questions regarding the theoretical strength of hypotheses and whether one should even entertain hypotheses. For example, it may be important to distinguish between conjectural hypotheses (hypotheses based on personal opinions or hunches) versus theoretical hypotheses (hypotheses derived from theoretical models or principles). Identification of which type of hypothesis one is working with may assist in protecting against effort justification. Unlike theoretical hypotheses, conjectural hypotheses are not grounded in a perspective external to the researcher, which in turn may lead to greater ownership of or self-involvement with the hypothesis. Consequently, more intense justification of effort may occur in the event results are ambiguous or disconfirm the conjecture. In contrast, theoretical hypotheses, which often arise from a perspective external to the researcher, may lead to less intense justification of effort.
Of course, one's interests, background, or previous research successes working with a hypothesis may produce overcommitment to a hypothesis, so ensuring one works solely with theoretical hypotheses does not necessarily protect one from justification of research effort. Moreover, sometimes theoretical hypotheses, although directionally bounded, may not be exact in their prediction. For example, one's theoretical hypothesis may lead to a prediction of an interaction pattern in which a directional effect on a dependent variable produced by one predictor variable is increased as a function of a second predictor variable. However, as shown in Figure 1, the precise form of this contrasting interaction may be unclear: it may result from a symmetric increase of the effect about the baseline effect, or asymmetric increases of the effect above or below the baseline effect. Note that although the directional aspect of the interaction may be derived from a fairly strong theoretical hypothesis (it is directionally bounded), the specific form of the interaction is inexact. This inexact aspect of the prediction invites conjecture. Thus, when working with a directionally bounded, but inexact hypothesis, more intense justification of effort may result when the data are consistent with the directional aspect of the hypothesis, but are inconsistent with the conjectural aspect of the hypothesis formed by the researcher.

Potential patterns of a directionally bound but inexact theoretical hypothesis.
The issues regarding the potential undermining of truth seeking by adopting a strong hypothetico-deductive perspective lead to the question of whether exploratory research should be more strongly encouraged in psychological science. The short answer is “yes.” But as one reviewer of this article (David Funder) noted, it may be more important to consider ways of asking questions that are interesting regardless of the answer. This is not necessarily easy to do, but here is an attempt to provide three examples of such questions: Are psychopathic tendencies related to degree of religiosity? Are people capable of altruism? If they were absolutely certain they could get away with it, what types of actions would people wish to engage in? Answers to these questions would be interesting (and I think) valuable for advancing theoretical understanding in the domains of personality, motivation, and morality, regardless of the answers, or direction of associations uncovered. The questions are specific enough to offer direction for developing research procedures and materials, yet do not promote ownership of a specific prediction or hypothesis that may waylay truth seeking. Rozin (2009) provides additional insight into how one might go about formulating research questions that do not fall easily within the traditional hypothetico-deductive mold.
Note that in encouraging exploratory research, I am not advocating a devaluing of hypothesis-driven research. A truth seeking approach to research would argue that either may be acceptable depending on the context. Moreover, it may be better to consider exploratory versus hypothetico-deductive not as types of overarching research approaches, but as perspectives one may adopt at various stages of any research endeavor. A hypothetico-deductive perspective may assist a researcher in clarifying the exact nature of an association during later stages of the research endeavor even if he or she made no initial hypotheses about the direction of effect during earlier stages. Likewise, an exploratory perspective may assist the researcher uncover alternative explanations for an association he or she initially hypothesized. In both instances moving between perspectives facilitates truth seeking.
Overall then, a truth seeking-in-stages orientation leads to the counterintuitive conclusion that hypotheses do not uniformly promote research dependability and may even undermine it. Generally speaking, one should be more wary of directionally bounded but imprecise theoretical hypotheses than of precise, directionally bounded theoretical hypotheses. Moreover, there seems little benefit to adopting purely conjectural hypotheses. When one identifies a hypothesis as conjectural, the best approach is to abandon a hypothetico-deductive approach to the research endeavor altogether and instead embrace one that is exploratory. Regardless of whether the research problem is initially cast as exploratory or hypothesis driven, researchers should be open to moving between hypothetico-deductive and exploratory perspectives during the research endeavor so as to better promote truth seeking. Finally, it may be valuable to develop interesting questions that would generate insight about psychological phenomena regardless of the outcome of the findings.
Flexible data analyses can promote truth seeking
The evils of flexible data analysis have received much attention in the dependability literature, mostly because it is perceived as promoting false positive findings (Simmons et al., 2011). A truth seeking-in-stages orientation, which seeks to protect against justification of research effort as one means to assure research dependability, leads to acknowledgment that flexible data analyses also may promote false negative findings when one seeks to suppress potentially meaningful but alternative effects in order to produce less equivocal results. One simply examines the statistical outcomes of different approaches and then selects the one that will create the best story during the dissemination stage of the research endeavor.
However, generation of different statistical outcomes to examine their consistency also can be quite useful in determining contradictions and complexity in one's data. This is worthwhile information to consider when one's ultimate goal is to acquire a truthful understanding of the phenomena in question. For example, in the event two NHST data analytic approaches are technically acceptable, learning that the effect is “significant” with one approach and “not significant” with another approach illuminates potential limits to the strength of one's conclusions, which the truth seeker is interested identifying. In contrast, if different analytic approaches lead to the same conclusion, then one gains more confidence in the strength of one's conclusions. Not only does flexibility in data analysis lead one to consider different approaches within an overarching statistical framework (e.g., use of alpha-unadjusted comparisons vs. alpha-corrected comparisons), but it also may lead one to consider different approaches across statistical frameworks (NHST vs. interval estimation). Again, the truth seeker grows more cautious when conclusions differ as a function of statistical frameworks than when conclusions are similar across statistical frameworks. Indeed, when conclusions of an individual study are consistent both within and across statistical frameworks it represents a kind of “statistical replicability” that may signal dependability in one's data. That being said, it would be incorrect to assume that statistical replication within individual studies produces equivalent information about research dependability compared to direct replication across studies. The latter is always more valuable than the former in addressing research dependability, because it permits identification of organizing principles (or lack thereof).
How to promote a truth seeking mindset
Promoting a truth seeking mindset is the most challenging of the three core recommendations to implement, because it ultimately begins with an individual's commitment to guard against oneself. Moreover, successful truth seeking in stages requires development and practice of a challenging skill set, which may not be easily communicated to others. Although question lists like the one found in Table 2 may prove useful, learning a truth seeking-in-stages orientation to research necessitates that one grapple with these types of questions in the context of active research engagement. Teaching such an orientation to future researchers would need to involve collaboration between students and teacher-scholars who provide instruction on the reflection process to inform and shape the ongoing research endeavor. Indeed, prioritizing the role of teaching and training as a central facet of one's research program may be one of the most effective means to promote truth seeking—it ensures justification of research effort no matter how equivocal the findings because, at the very least, the effort results in educational and training benefit to students.
Editors may stimulate truth seeking in a number of ways. First, they could designate more journal space for integrative debate, in which opposing author viewpoints on a theoretical, methodological, or empirical issue are presented, and attempt is made to integrate the viewpoints by a third neutral party who approaches the debate from a truth seeking perspective. 2 Examples of such debates in journals could also serve as models for stimulating debate in research training contexts, which would provide those in training opportunities to practice truth seeking and theoretical reasoning. Second, editors could encourage reviewers to ask truth seeking questions during the review process. Similar to the research endeavor, virtually all the questions in Table 2 can be restated to apply to the reviewing endeavor. Truth seeking during manuscript review is important because reviewers often are selected for their expertise in the topic addressed by the manuscript. Thus, selection will tend to result in reviewers who have some stake in whether the manuscript is accepted for publication. However, asking truth seeking questions may help reviewers gain psychological distance from the research findings, which in turn would promote objectivity and constructiveness in the review process. Truth seeking during the review process may orient reviewers toward identification of additional analyses that may clarify the strength of the findings. Reviewers could encourage editors to request direct replication in instances where the resources of the author would permit him or her to do so in a reasonable time frame. Some of the evaluative guidelines for reviewers listed by Maner (2014) also are of a form that would promote truth seeking. Given the aforementioned challenges to developing a strong truth seeking-in-stages skill set and the unintentional quality of research effort justification, reviewers would need to be cautious of interpreting apparent failures at truth seeking by authors as intentional.
Ensuring adequate consideration of truth seeking questions should be used by research support institutions as a means for evaluating allocation of resources and funds to research projects. It is more important and fiscally responsible to fund research that would ensure truthful understanding of natural phenomena or application of such understanding than to fund research with high potential to produce false findings even though it is novel, exciting, or “clinically relevant” (see also Makel & Plucker, 2014). Ultimately, in addition to helping decide whether a given research endeavor is worthy of resource support or publication, reviewers, editors, and administrators of research support institutions should see themselves as active facilitators of scientists whose research practices would most likely generate truthful findings.
Summary of Core Research Dependability Recommendations
In a number of respects the recommendations offered here are consistent with viewpoints that have already been expressed in the literature. However, the present viewpoint seeks to advance research dependability by identifying a concise list of core recommendations that addresses the problems of false positive findings and false negative findings in an internally consistent manner and could be uniformly followed by individuals within and across the constituencies that constitute psychological science. In advancing this viewpoint, an attempt has been made to more clearly elaborate what the recommendations espouse and the underlying rationales for making them. In some instances, descriptions of strategies for implementation of the recommendations have been made in a fairly concrete fashion. In other instances strategies for implementation have remained fairly vague, mostly because it still remains unclear what constitutes the best concrete approach for implementation. Ultimately, the goal is to narrow attention to a few clear objectives that can focus the research dependability movement in a more efficacious manner. How these objectives are implemented concretely within constituencies requires additional input and answers to questions (e.g., What policies does a journal adopt to foster direct replication? How does a graduate training program foster truth seeking in future scholars?).
In the attempt to better focus attention to key dependability recommendations many readers will note that I have avoided recommendations based on statistical solutions to the research dependability problem. Indeed, this is one characteristic of the present viewpoint that is at odds with much of the dependability movement, which to date has generated a host of such solutions. Thus, false findings are not conceptualized in statistical terms. Instead, false findings are conceptualized as disconnect between patterns among constructs at the theoretical-conceptual level and patterns of variables at the operational-measurement level of analysis. This conceptualization leads to a reconciliation of apparently contradictory statistical assumptions that distinguish the false positive and false negative psychology literatures (e.g., the “null” is more likely to be true at the conceptual level of analysis, but relatively less likely to be true at the operational level of analysis) and assumes an agnostic orientation with regard to debate about which statistical frameworks are best for advancing psychological science. Regardless of which statistical framework one adopts, the three core recommendations would be expected to enhance the dependability of findings.
Of the three core recommendations advocated here, commitment to data sharing and direct replication are most similar to recommendations offered elsewhere in the literature. The data-sharing recommendation (public commitment to share data if requested by experts) is fairly moderate compared with stronger viewpoints voiced in the literature. Some may wonder whether stronger data-sharing requirements would promote advancement of quantitative modeling, meta-analytic research, or understanding of large data sets. But suspicion of others seems to underlie the original impetus for most strong sharing recommendations, and I wonder whether this doesn't undermine the spirit and excitement of science while creating even more expenditure of effort in the research endeavor. In contrast, the present viewpoint on replication is on the strong side: for assuring research dependability, multiple direct replication studies, followed by extension replication studies, followed by conceptual replication studies are most valuable. Moreover, although conceptual replication studies (if successful) may promote theoretical generalization, their sole use does not promote research dependability in an efficient manner. If possible, use of nested replication (conceptual replication based on direct replication nests) would promote both research dependability and generalization simultaneously. Clearly, the commitment to direct replication advocated here would require fairly intense change on the part of all constituencies involved in psychological science.
The third core recommendation (promote a mindset of truth seeking) deviates most from previous recommendations found in the research dependability literature. Although similar in tone to the viewpoint of Nosek et al. (2012), the present recommendation is a call to switch from a focus on what constitutes correct scientific research behavior to a focus on what constitutes the correct scientific research orientation and motivation (seek a truthful understanding of natural phenomena). Not only does this recommendation call for affirmation of the ultimate goal of truth seeking at the beginning of the research endeavor, but it also calls for an ongoing affirmation of this ultimate goal at each stage of the research endeavor. The emphasis on addressing researcher motivation, as opposed to researcher behavior, in turns leads to implications that challenge the sanctity of several prominent research best practices noted in the existing dependability literature. Specifically, depending on the research context, reliance on a priori hypotheses can undermine truth seeking and flexible data analysis can promote truth seeking. Thus, a research orientation guided by the appropriate ultimate goal of truth seeking may allow for practices traditionally viewed as unfavorable within the research dependability movement.
Of the three core recommendations, a call to foster truth seeking admittedly poses the most challenges for concrete implementation. One reason for this is that truth seeking, as it is cast here (to ensure external incentives and most importantly a need to justify research effort do not undermine truth seeking motivation), has received the least amount of attention in previous research dependability discussions. As such, I encourage others to examine the issue of truth seeking in more depth and to consider how we may effectively encourage it as individuals and as a science.
Conclusion
Ultimately, there seems little doubt that serious challenges exist for ensuring psychological science is empirically dependable. Fortunately, many have persevered to identify potential solutions to address these challenges. What is lacking is a concise set of core solutions that various constituencies could adopt in tandem to achieve greater research dependability. The approach documented here is one attempt to do so in a manner that (a) clearly identifies the underlying nature of false findings independent of statistical framework, and (b) proposes a concise set of ways to address the problem of false findings in an internally consistent and a mutually reinforcing manner. In all likelihood additional work on this front will be necessary, but perhaps consideration of the perspective espoused here will prove useful in the movement toward a more dependable psychological science.
Footnotes
1
It is worth noting that although consistency in a replicated data pattern may be suggestive of an organizing principle linking two theoretical constructs, the replicated data pattern does not necessarily ensure correct identification of (a) the principle's causal nature, or (b) the character of the constructs upon which the principle operates. However, before one can hope to address questions regarding the nature of the principle and the constructs it links, one needs to have confidence in their existence.
2
I thank one anonymous reviewer for pointing out the importance of stimulating debate in the context of fostering better replication research. The reviewer's comments in turn led me to consider the value of debate for fostering truth seeking more generally.
