Abstract
Critical thinking is praised as the hallmark of cognitive development and a raison d’être for higher education. In this review, we analyzed critical thinking interventions in higher education with regard to participant characteristics; focus, length, and duration of interventions; and specific measurement and data-analytic treatment. Forty-five studies published between 2010 and 2023 were reviewed. One key finding from this analysis was the determination that most researchers’ definitions of critical thinking centered on how it manifests (procedural) rather than on its inherent nature (ontological). Further, the measures used were frequently misaligned with researchers’ definitions. Moreover, a lack of methodological details precluded thorough analyses of intervention characteristics vis-à-vis reported outcomes and raised questions about the causal conclusions reached. Among the implications discussed are the need for conceptual refinement, greater internal consistency between the conception and measurement, more prudent data-analytic approaches, and richer descriptions of the implemented intervention, study participants, and context.
Keywords
“The real problem of intellectual education is the transformation of more or less casual curiosity and sporadic suggestion into attitudes of alert, cautious, and thorough inquiry.” (Dewey, 1933, p. 181)
Critical thinking is praised as a hallmark of cognitive development and raison d’être for higher education (Scheffler, 1973). Indeed, one would be hard-pressed to find higher education institutions that do not aim to promote students’ critical thinking (Davies, 2015). Moreover, individuals’ ability to reason critically is regarded as essential for democratic societies (Dewey, 1916; Weinstein, 1991), a safeguard against misleading or false information that populates public discourse (McGrew & Chinoy, 2022), and key to academic and professional success (Khosravani et al., 2005; Williams & Worth, 2001). Cross-sectional and longitudinal studies have reported increases in students’ thinking abilities and shifts in motivation to engage in critical thought during college (Huber & Kuncel, 2016; Pascarella et al., 1996). Further, meta-analytic syntheses suggest evidence for the instructional viability of promoting critical thinking through educational interventions (Abrami et al., 2015). Such empirical data align well with the aims of higher education to promote students’ reflective, reasoned, and rational thinking (Abrami et al., 2008; Dewey, 1916).
Nonetheless, there is scant systematic documentation of the conditions or causal mechanisms to which improvements in students’ critical thinking can be attributed. Despite efforts to investigate the differential impact of diverse critical thinking interventions (e.g., Abrami et al., 2015), these earlier syntheses did not entirely succeed in unearthing under what conditions the gains of critical thinking interventions were most meaningful. Further, the extant reviews did not consider conceptual and methodological decisions reported in the reviewed studies, and the characteristics of participants in those interventions were given limited consideration. Consequently, much remains to be learned about the nature of critical thinking interventions and the effects of intervention characteristics on the outcomes.
Thus, in this systematic review, we set out to analyze interventions in higher education to identify how the characteristics of interventions and the participants relate to documented improvements—or lack thereof—in critical thinking. To evaluate the effectiveness of interventions, we consider the clarity of definitions and appropriateness of measures, as well as data-analytic approaches. Although our original intention for this foray into the literature was to conduct a hybrid review that would combine a systematic review of study characteristics with a meta-analytic synthesis of quantitative results, we soon identified a lack of data-analytic details needed to aggregate results across studies. For that reason, we decided to systematically review and analyze the characteristics of interventions that used quantitative data-analytic approaches to summarize those quantitative findings and attempt to position them in relation to the intervention characteristics (Alexander, 2020). Given this combined qualitative and quantitative synthesis of intervention studies, it may be considered a systematic-narrative hybrid review (see Turnbull et al., 2023).
To frame the current review, we first consider the important aspects of critical thinking that are discussed in the extant literature. Specifically, we survey how critical thinking has been conceptualized and measured in empirical research, discussing its salient characteristics. We summarize recommendations for promoting critical thinking, especially in the context of undergraduate education, and provide an overview of specific interventions (Abrami et al., 2015; Tiruneh et al., 2014). Following this introduction, we identify lingering issues within the extant theoretical and empirical critical thinking literature that were the catalyst for specific research objectives for the current review.
The Disputed Nature of Critical Thinking
Despite persistent interest, there is little concurrence about critical thinking’s inherent nature (Halonen, 1995). Within the literature, critical thinking has typically been described in terms of cognitive processes presumed to be indications of mental engagement referred to as thinking critically. As is true for other constructs that suffer conceptual vagueness (Schoute et al, 2022), researchers seem more invested in outcomes it may yield rather than its core nature (Alexander, 2018b). Yet, without understanding the nature of critical thinking, efforts to craft valid assessments or devise effective interventions remain on precarious footing (Alexander, 2023b).
Clues to the essence of critical thinking can be found within philosophical and psychological writings. For example, the Theaetetus, one of the Socratic Dialogues (Cooper, 2002), depicts Socrates’s use of probing questions to a precocious and self-assured young man to help him recognize fallacies in his thinking about knowledge. Centuries later, American pragmatists James (1890) and Dewey (1910, 1925) also drew on philosophy as a wellspring for improved thinking and reasoning. Dewey (1910), for example, argued that meaningful education must move beyond rote memorization to reflective thought, which he defined as: active, persistent, and careful consideration of any belief or supposed form of knowledge in the light of the grounds that support it, and the further conclusions to which it tends . . . it is a conscious and voluntary effort to establish belief upon a firm basis of reasons. (p. 6)
Only reflective thought, Dewey reasoned, is truly educative in nature, teaching individuals to critically evaluate claims. While many researchers acknowledge reflection as foundational to critical thinking, they nonetheless rely on capturing or improving other processes in their work.
Disproportionate Focus on Manifestations of Critical Thinking
Key figures in the critical thinking movement, like Ennis (1987) and Facione (1990a), refer to the writings of James and Dewey when framing their conception of this valued thinking. Like James and Dewey, Ennis (1987) and Facione (1990a) attempted to capture critical thinking’s essence but ultimately focused on its manifestations. For example, in “A Definition of Critical Thinking” (1964, p. 599), Ennis proposed that the “root notion” of critical thinking is the “correct assessing of statements” based on criteria. While there is merit in judging if claims are valid and evidence-based, such determinations do not capture the essence of critical thinking.
Conceptualizing critical thinking in terms of demonstrable outcomes from internal mental processes was echoed in the highly cited Delphi Report (Facione, 1990a). The consensus view on critical thinking reached by philosophers and educators was as follows: We understand critical thinking to be purposeful, self-regulatory judgment which results in interpretation, analysis, evaluation, and inference, as well as explanation of the evidential, conceptual, methodological, criteriological, or contextual considerations upon which that judgment is based. (p. 3)
Beyond this characterization of critical thinking as “cognitive skills” (p. 3), the Delphi Report did not explicate how many or which “skills” are required to qualify thoughts as achieving critical status. Yet, the report is not unique in that regard. Much of the theoretical and empirical writings offer little clarity as to when individuals’ mental processing crosses the threshold into the realm of critical thinking (Bailin & Siegel, 2003; Murphy, Ogata, & Schoute, 2023). Does simply reflecting achieve that end, or the demonstration of a single mental process associated with critical thinking, such as analysis or synthesis? In effect, it becomes a struggle to ascertain the sine qua non for critical thinking or, conversely, what qualifies a thought as representing uncritical thinking (Alexander, 2023b; Bailin & Siegel, 2003; Kleemola et al., 202211; Murphy, Ogata & Schoute, 2023). For that reason, calls for a conceptual respecification of critical thinking as critical-analytic thinking have been forwarded (Alexander, 2014, 2023a; Murphy, Ogata & Schoute, 2023), in which justifications for responses are a necessary condition for thinking to qualify as critical (Murphy et al., 2014): critical analytic thinking [is the] effortful, cognitive processing through which an individual or group of individuals comes to an examined understanding of something known or believed. This examination is characterized by a systematic evaluation of the object of thought and claims, reasons, and evidence forwarded about that object. (p. 563)
Given the range of typifications of critical thinking in the literature, we analyze how critical thinking was defined in the interventions to assess the degree to which the focus lies on its manifestation or rather on its conceptual nature. After that analysis, we discuss the potential of reframing research in terms of critical-analytic thinking in the Implications for Research section.
General Versus Specific Nature of Critical Thinking
In searching for a consensus on critical thinking, authors of the Delphi Report were “articulating an ideal” of the processes “a generally educated college lower division level critical thinker should be able to do” (Facione, 1990a, p. 6). This statement ultimately casts critical thinking in generic terms. However, there is no consensus within the broader philosophical, psychological, or educational literature as to whether critical thinking is general or specific in nature (Bailin, 2002). Even the authors of the Delphi Report opened the door to ambiguity, stating that “while CT skills themselves transcend specific subjects or disciplines, exercising them successfully in certain contexts demands domain-specific knowledge” (p. 10). Still, this influential document espoused a mostly generalist perspective (Abrami et al., 2015) shared by members of the critical thinking movement (Ennis, 1989; Paul, 1989). Their contention was that even though critical thinking may look different across situations or domains, it is a general capacity that can be learned in one context, abstracted, and then applied across various tasks and domains (Huber & Kuncel, 2016; McMillan, 1987).
Equally strong counterarguments have been voiced by those holding specifist views (Bailin et al., 1999b; E. Glaser, 1941). Notably, McPeck (1981) argued against the generalist view, given that thinking is always about some idea considered in some context and for some purpose. Similarly, Bailin (2002, p. 367) contended that “the debate over generalizability has proven particularly intractable.” Bailin’s rationale was that the “skills account” unfolds as a variety of mental processes or procedural moves. However, casting critical thinking as generic processes or procedural moves is problematic because such procedures or operations can be executed uncritically (Bailin, 2002; Bailin & Siegel, 2003).
Another perspective on the general or specific nature of critical thinking sets aside this seemingly false and certainly unproductive dichotomy. From this vantage point, critical thinking relies on certain foundational mental abilities, which are instantiated within a specific context or for a specific purpose. Further, how and how well these mental processes are enacted relies on the motivations and differences that learners bring into the problem space (Alexander, 2023b; P. A. Facione et al., 1996; Murphy, Ogata & Schoute, 2023). Also, whether individuals activate certain processes and procedures associated with critical thinking will depend on their perceptions of a given context or problem as analogous to those encountered previously—that is, if they see the potential for transfer and are able to act on it (Alexander & DRLRL, 2012; Gentner et al., 2003). In this review, we expressly examine whether researchers forwarded domain-specific or domain-general conceptions of critical thinking and assess whether those conceptions were aligned with their measures of critical thinking. This analysis first requires a consideration of the ways in which critical thinking is operationalized in extant studies.
Varied Operationalizations of Critical Thinking
When researchers want to capture changes in students’ critical thinking abilities as a function of explicit training or life experiences, they must operationalize their conceptualizations (Williams, 1999). In effect, they must decide on behaviors hypothesized to reflect that latent ability and systematically assign meaning (e.g., numbers, codes) to those behaviors. As was true for the conceptualization of critical thinking, there is no consensus within the literature on how best to assess manifestations of this valued form of thought. There is no shortage of approaches to measuring critical thinking, from popular standardized measures to researcher-developed questionnaires, surveys, and tasks (Abrami et al., 2015; Tiruneh et al., 2014).
The popularity of standardized measures can be understood for several reasons. First, measures like the California Critical Thinking Skills Test (CCTST; P. A. Facione & Facione, 1994b), the California Critical Thinking Disposition Inventory (CCTDI; P. A Facione & Facione, 1992), and the Cornell Critical Thinking Test (Ennis et al., 1985) were forged by researchers from the critical thinking movement. Second, the psychometric properties of these measures suggest ready-made tools that provide valid interpretations, while their norm-referenced properties allow for comparisons across experimental conditions and educational contexts (Geisinger, 2012; R. Glaser, 1963). Apart from using standardized measures, researchers may develop tools such as analytic or holistic rubrics to glean critical thinking from written responses or essays (e.g., Braun et al., 2020; Saxton et al., 2012) or may adapt an existing measure to align with their aims more closely.
Potential Disconnects in Operationalizing Critical Thinking
Whether researchers select standardized measures or develop or adapt their own tools, questions remain about the correspondence between the conceptualization and operationalization of critical thinking. Those questions, which speak to issues of validity and reliability, relate to whether definitions and measures reflect the nature of critical thinking or are only manifestations treated as evidence of that nature. A potential disconnect can also arise from the mode of assessment employed and whether the resulting data can be treated as trustworthy or credible. Even the choice of highly respected standardized measures with desirable psychometric properties does not automatically ensure that the conceptualization grounding the study is well-represented (Messick, 1986).
For example, one disconnect that raises ontological concerns is when critical thinking is solely expressed as distinct processes (e.g., analysis, interpretation, and evaluation) assessed by individual subtests or subscales. This is particularly evident for often-used CCTST (P. A. Facione & Facione, 1994b), where critical thinking ability is determined by individuals’ performance on individual subscales rather than overall. In such a partitioned view, it is unclear which processes collectively constitute critical thought and whether improvement on one scale indicates meaningful gains in critical thinking. Moreover, for researchers who take a more metacognitive perspective (e.g., Kuhn, 1999), such a process-oriented operationalization is too limited.
Further misalignments arise from researchers’ views about the domain-generality and transferability of critical thinking and chosen measures. Researchers may be focused on enhancing critical thinking in a specific context but employ a general measure to capture that ability. However, it is important to note that standardized measures are not, per se, context-free or abstract, such as with fluid reasoning measures (Alexander, 2012; Raven, 1941). Instead, the content and context are provided by the standardized measure, and researchers assume—often implicitly—that transfer of critical thinking takes place from within the source domain to the target context of the measure. However, this notion of transferability remains a contested assumption (Ennis, 1989; McPeck, 1981). Further, this assumption goes untested if researchers do not employ any domain-specific measures in conjunction with generic measures to corroborate their findings. It may be that a “good” domain-specific thinker, such as an accomplished nurse (Papathanasiou et al., 2014), may fail to demonstrate that ability on a standardized measure.
Beyond measurement content, consideration of how data are collected is warranted. Importantly, with self-report measures, extra degrees of disconnect are introduced. Although largely constrained to assessments of critical thinking dispositions (CCTDI; P. A. Facione & Facione, 1992), measures expressly require students to report their perceived critical thinking abilities. While some researchers see such self-reports as acceptable proxies for actual performance (Bowen, 1977; Pascarella & Terenzini, 1991), the calibration data suggest that students are poor at estimating their performance (Dunning et al., 2003). Additionally, researchers who seek to improve students’ critical thinking rather than change their tendency toward critical thought should focus on demonstrated abilities rather than dispositions. Researchers should also provide rationales for their chosen measures vis-à-vis the constructs they are targeting. In sum, findings from these interventions must be judged in relation to the modes of assessment employed. In the present study, we evaluate the alignment between studies’ guiding conceptualization and ensuing operationalization of critical thinking and assess researchers’ rationales for the measures used to better understand the mechanism by which changes in critical thinking were detected.
Lack of Consensus on How to Improve Critical Thinking
Although there is marked variability in conceptualizations and operationalizations of critical thinking, there appears to be a general consensus on the idea that this ability is malleable and can be improved under the right conditions (Abrami et al., 2015; Dewey, 1933; Huber & Kuncel, 2016). For example, a large portion of the Delphi Report (P. A. Facione, 1990a is dedicated to fruitful teacher practices. Likewise, Ennis (1989) described ways by which critical thinking may be instructed, bypassing questions of whether it can be changed and focusing instead on how this valued ability can be promoted. Yet, beyond the shared assumption that it can be fostered, views on how critical thought is best promoted vary markedly along a domain-general (e.g., Ennis, 1989; P. A. Facione, 1990a) to domain-specific continuum (e.g., McPeck, 1981).
Given the generalist view that critical thinking transcends any specific domain, the debate on how critical thinking can be fostered has largely centered on promoting transferability. Ennis (1989), for one, contended that critical thinking can be promoted in the absence of a specific content domain (general approach) or embedded within a specific domain with (infusion) or without making the targeted skills explicit to the learner (immersion). In fact, several prior reviews of critical thinking (Abrami et al., 2008, 2015; Tiruneh et al., 2014) were framed by Ennis’s (1989) three intervention types. As those prior reviews suggest, Ennis’s (1989) framework can be a helpful starting point in considering the degree to which critical thinking is explicitly targeted in an intervention or more implicitly embedded in the educational experience. In contrast, the goal of interventions for specifists is to promote critical thinking within a domain or context because of its suggested highly contextual nature (McPeck, 1981).
In light of the aforementioned perspectives on fostering critical thinking, we wanted to document the characteristics of interventions, including the degree of instructional explicitness and overall scope. We were aware that those who conducted prior reviews lamented that such details were often not provided (Abrami et al., 2008, 2015), leaving unclear why researchers expect a certain educational condition to be effective in promoting critical thinking. Most notably, McMillan’s (1987) review concluded that “What is lacking in the research is . . . a clear theoretical description of the nature of an experience that should enhance critical thinking” (p. 3). If this remains the case, then the studies we review will not bring sufficient clarity on how critical thinking can be improved. Yet, we were hopeful that this updated examination of critical thinking interventions would supply the details lacking in those prior reviews.
Undergraduate Education
Whatever conceptual differences exist in the literature, there is a general perception that critical thinking manifests differently at different stages of life (Byrnes & Dunbar, 2014). This perception is fueled, in part, by documented changes in students’ ability to think critically because of their neurocognitive development, increased knowledge and experiences, and repeated opportunities to exercise thinking critically. Specifically, efforts to improve critical thinking at the primary (O’Reilly et al., 2022) and secondary level (Abrami et al., 2015) have been undertaken with some success. However, younger students may not possess neurocognitive abilities—metacognition or working memory capacity—to a level sufficient for them to routinely engage in this mode of thinking (Byrnes & Dunbar, 2014).
In contrast, students in higher education are more generally cognitively mature than during earlier stages of formal education (Pascarella, 2005), although not universally so (Kuhn, 1999). Moreover, undergraduates are likely to have larger repositories of prior knowledge upon which to draw when thinking critically (Alexander, 2004; McCarthy & McNamara, 2021; McPeck, 1981). Additionally, the very structure and curricula of higher education institutions place greater demands on college students to carry out complex cognitive tasks with fewer instructional supports, thereby increasing metacognitive and self-regulatory demands (Alexander, 2004; Byrnes & Dunbar, 2014). From a more pragmatic standpoint, our interest in documenting the effect of interventions in undergraduate education is because it is the last stage of formal education for many students in the United States and internationally (National Center for Education Statistics [NCES], 2023; Organization for Economic Cooperation and Development [OECD], 2023), where critical thinking interventions could be carried out. Thus, although some take issue with the burden that falls on educators to rectify or negate the impediments to critical thinking that students face due to a plethora of societal issues (Robinson, 2014), there is reason to assume that higher education is fertile ground for fostering students’ willingness and ability to think critically. In the present review, we will only review intervention literature that reports on the undergraduate student population to avoid conflating developmental differences and the effects of intervention studies.
Yet, to learn more about what interventions work for whom—which reviews did not manage to fully elucidate (e.g. Abrami et al., 2015; Tiruneh et al., 2014)—express attention must be directed toward learning characteristics to explain intra- or inter-individual differences in critical thinking performance. Factors like cognitive maturity and prior knowledge (Alexander, 2018a), especially knowledge directly relevant to the domain under study (cf. McPeck, 1981), should not merely be assumed but empirically tested (Sanz de Acedo Lizarraga et al., 2012). Further, executive function (Shanmugan & Satterthwaite, 2016), a class of cognitive processes implicated in the monitoring and regulating of behavior and thought (Diamond, 2013), should be considered. Relatedly, because critical thinking is sometimes depicted as an inherently metacognitive act (Flavell, 1985; Kuhn, 1999), metacognitive ability or awareness needs to be considered as a factor (Ku & Ho, 2010; Magno, 2010; Winne, 1996). What is more, alongside students’ predisposition or internal motivation to think critically (P. A. Facione et al., 1996), it seems important to consider students’ interest in a domain or topic within which critical thinking should take place (Alexander, 1997) or the perceived relevance (Schoute et al., 2022), task value (Eccles & Wigfield, 2002), or other motivational appeal of that subject (Wentzel & Miele, 2016). Equally important would be to consider aspects of students’ gender, race, and culture, and intersections of these identities (Danvers, 2018; DeCuir-Gunby & Schutz, 2014; F. López, 2022). For that reason, we set out to document the details reviewed studies provide regarding these factors to attempt to explain differential intervention results.
Documented Critical Thinking Outcomes
Based on the general findings of prior systematic and meta-analytic reviews (Puig et al., 2019; Tiruneh et al., 2014), one may conclude that critical thinking interventions within higher education have frequently resulted in significant improvements, particularly in medical and nursing education (Chan, 2018; Kong et al., 2014) or in research methods courses (Stark, 2012). Thus, there seems to be reason for optimism that critical thinking can be improved through educational interventions, and systematic reviews report increases that support the “instructional viability hypothesis” (Abrami et al., 2008, 2015, p. 8). However, those extant reviews have afforded few insights into why some interventions are successful while others fail to improve critical thinking. In their meta-analytic review, Abrami et al. (2015) considered theoretically and practically relevant moderators that could shed light on the variation in effectiveness between interventions. They explored the predictive power of differences in the intervention approach employed (Ennis, 1989), the type of outcome measure administered, the academic field in which the instruction took place, and the treatment duration. While Abrami et al. could ascertain the overall effect sizes for certain factors, they could not meaningfully typify interventions that yielded significant outcomes from those that did not.
Much the same was true for Tiruneh et al.’s (2014) systematic review of interventions. They considered the instructional approach employed (Ennis, 1989), the degree of explicitness of critical thinking instruction, the characteristics of students and instructors, and the kind of operationalization employed. One key finding was that the instructional approach could not meaningfully explain differences in intervention results. However, these researchers noted that many studies provided insufficient theoretical and methodological details about the nature and implementation of the intervention and the characteristics of students and instructors.
Crucially, Tiruneh et al. (2014) excluded studies that employed self-reports or measures that only tested domain-specific critical thinking without a strong theoretical rationale for that decision, limiting the comprehensiveness of their portrayal of interventions in higher education. Further, they did not consider academic domains in which the interventions were situated as a factor, leaving unanswered how the academic domain may have influenced the interventions per se or the intervention effectiveness. Lastly, these researchers did not discuss the domain-generality versus domain-specificity of measures vis-à-vis the outcomes of interventions.
Further, the existing reviews (Abrami et al., 2008, 2015; Tiruneh et al., 2014) largely took the reported findings from included studies at face value. In their meta-analytic review of critical thinking interventions, Abrami et al. (2008, p. 1111) attempted to differentiate the impact of interventions based on their “pedagogical grounding,” but they did not consider the quality of the theoretical justification for why individual researchers expected their interventions to improve critical thinking as a factor. Therefore, in line with McMillan’s (1987) criticisms of the critical thinking literature, we saw the need to examine the causal hypotheses incorporated in studies and the subsequent design of these interventions.
Present Study
Given the substantial number of theoretical considerations identified as relevant to efforts to promote critical thinking, we argue that it is warranted to systematically review recent studies that aim to promote critical thinking in higher education. In effect, we will want to interrogate the literature to answer the following critical question:
How can critical thinking be successfully promoted within the undergraduate education context, and how are instructional features of the intervention study and characteristics of the participating students related to improvements in critical thinking or lack thereof?
Given the study- and participant-related factors that we identified as potentially consequential for the outcomes of the interventions or the interpretation of these outcomes, we set out to answer our critical question by means of the following subquestions:
RQ1: How do researchers engaged in critical thinking interventions in higher education define critical thinking?
RQ2: How do researchers measure critical thinking? How do measurements align with their definitions?
RQ3: What were the characteristics of the interventions that were carried out and of the students participating in those interventions?
RQ4: What were the outcomes of these interventions? How are those outcomes related to the characteristics of the measurement and interventions?
Method
To answer the questions guiding this systematic review, we retrieved studies that expressly set out to promote critical thinking in undergraduates by means of an educational intervention that employed an experimental or quasi-experimental design. Although there is value in a qualitative evaluation of interventions, we were primarily interested in evaluating interventions that captured critical thinking by means of quantitative measures. We compiled a pool of useful documents in four steps by systematically applying our criteria: (a) identifying potentially relevant literature through keyword searches; (b) screening of titles and abstracts of these identified sources; (c) reviewing the full text based on inclusion criteria; (d) assembling the final pool of studies to be systematically analyzed.
Search Parameters
To assemble a representative sample of critical thinking intervention studies, we conducted a search using relevant databases. Specifically, we searched through the APA PsycArticles, APA PsycInfo, Education Source, and the Psychology and Behavioral Sciences Collection databases on EBSCO. To find relevant research articles that report on critical thinking interventions, we set the following search parameters using Boolean terms: (critical thinking or critical thinking skills or critical reasoning) AND (undergraduate students or college students or university students) AND (experimental study or experimental research or experimental design or quasi experimental).
We restricted the search to only abstracts and titles using the specified terms. Further, we selected 2010 as the starting point for the current review, extending the review by Tiruneh et al. (2014), which included studies from 1995–2012. Thus, we included indexed articles published between January 2010 and August 2023. The search of the databases returned 158 peer-reviewed articles and, after removing duplicates, the initial pool consisted of 145 articles (Figure 1).
PRISMA flow chart.
Selection and Exclusion Criteria
Given the critical questions that motivated this systematic review of critical thinking interventions, we specified a number of inclusion and exclusion criteria to compile a pool of pertinent literature. In addition to the specific parameters set during the search—experimental or quasi-experimental critical thinking studies on the undergraduate population published in English—we articulated criteria that could be applied during the abstract screening and full-text read of the search results. Specifically, we considered studies pertinent that (a) implemented an educational intervention designed to directly or indirectly improve critical thinking. While direct interventions provided overt critical thinking instruction, indirect interventions sought to improve critical thinking via some other strategy or activity believed to bolster critical thought. Further, included studies were required to (b) target the undergraduate student population in the intervention and (c) have a dedicated critical thinking outcome measure or subscale. We purposefully included intervention studies conducted across geographical locations to allow for diverse populations to be represented (Henrich et al., 2010).
At the same time, we specified exclusion criteria to ascertain that only pertinent studies would be retained. We decided to exclude studies that (a) were not published in English; (b) targeted atypical or nonundergraduate populations; (c) were nonempirical, such as theoretical contributions, teacher practitioner pieces, or reviews; (d) reported qualitative results in the absence of quantitative results; or (e) lacked an intervention component or critical thinking outcome measure.
Title and Abstract Screening
Before engaging with a full-text review of all articles in the initial pool, we set out to apply our inclusion and exclusion criteria to the articles’ titles and abstracts. We imported the search results from EBSCO into Rayyan (Ouzzani et al., 2016), which provided a systematic means to screen abstracts and log determinations of whether articles merited a full-text review. Guided by the aforementioned criteria, we excluded 59 articles based on the title and abstract alone (see Figure 1). Most eliminations occurred because studies lacked an intervention component. Some of these studies would mention critical thinking in the abstract or title, invoking or describing critical thinking as a desirable skill in the classroom as a means to an end rather than proposing or testing a mechanism to augment critical thinking abilities. Such studies did not comprise an intervention but rather called on the importance of critical thought for argumentation writing (e.g., Fahim & Hashtroodi, 2012) or nursing competency (e.g., Cariñanos-Ayala et al., 2021). A small number of studies were unrelated to critical thinking research (e.g., Nussbaum et al., 2019), conducted research on nonundergraduate populations (e.g., Walker & Kettler, 2020), presented measure validation or a literature review (e.g., Zhang & Gao, 2022), or were unavailable in English (e.g., Morales, 2014). Not all records provided sufficient information to make the determination of inclusion or exclusion (e.g., research design) based on the abstract and title alone. In those cases, the documents were retained for a full read.
Full-Text Read
To determine whether the articles met our inclusion criteria, we conducted a full-text review of the initial pool of 86 articles. Following this full read-through, 41 studies (48%) were excluded. A significant number of those excluded studies were rejected because they did not expressly focus on critical thinking (n = 20). In effect, studies rejected for this reason lacked a substantive research question pertaining to critical thinking and were, for example, only tangentially related (e.g., Menekse et al., 2022).
Other studies were rejected because there was an absence of a dedicated measure of critical thinking (n = 11), for example, because critical thinking was only evaluated qualitatively (e.g., Goldsworthy et al., 2022) or assessed by only a single question (e.g., Ginosyan et al., 2020). Finally, we excluded a smaller number of studies because they either did not include an undergraduate population (n = 3) or were unavailable in English (n = 4). Of the remaining articles (n = 3), one focused on the validation of a critical thinking measure rather than its use, while the others were nonexperimental in nature. The final pool consisted of 45 articles.
Coding Procedures
Given the research questions guiding this review, we aimed to organize relevant aspects of the included studies in one table (Table 1). To populate this table, some information readily available within the articles was included, while other information could only be extracted through inference. For the latter, scoring was guided by a rubric we crafted for this study (Table A, Online Supplementary Material) with good overall inter-rater reliability (κ = .75) on 20% of included studies, and disagreements were resolved through discussion.
Selected Characteristics of Reviewed Intervention Studies
Note. 1Symbols listed indicate presence or absence of statistical differences (+ = positive, − = negative, 0 = no differences) as reported by authors. F = female; N/A = not available; EX = external; IM = internal; NS = not significant; S = significant. 2Alignment between concept and operationalization was found to be aligned (A) or confounded critical thinking performance and disposition (PD), demonstrated and perceived performance (PP), or domain generality and specificity of definition and measure (GS). PA = particular; PR = programmatic; EM = embedded; EV = evident; E = experimental; QE = quasi-experimental; L = longitudinal. For studies with multiple measures, the measures and the coding of the measures are listed below each other. Occasionally, measures received multiple codes because multiple misalignments were found (e.g., Rababa and Masha’al (2020)).
First, we recorded publication information, such as the names of the authors, the year of publication, and the country where the study was conducted. Second, we documented information related to the participants, such as sample size, percentage of female students, and academic major, when reported. When these data were not provided, the code N/A was recorded. We coded academic domains within which the intervention took place into the self-evident categories of “biology,” “business,” “computer science,” “engineering,” “nursing,” “physics,” and “psychology/education.” A few studies concerned “English” communication, which we coded as such. We also used the codes “humanities,” which grouped domains that are concerned with researching human experiences (Wierzbicka, 2011), and “other” for remaining contexts. We further sought to code for participant characteristics collected, such as prior knowledge or motivation. However, due to a lack of relevant information in the articles, we could not code for this.
Third, using the procedure employed by Murphy and Alexander (2000) and Dinsmore et al. (2008), we recorded whether authors forwarded explicit (EX) or implicit (I) definitions of critical thinking (see Table A). Definitions were considered explicit if a guiding concept was clearly stated by the authors or if the authors provided a direct quote of a definition from earlier works. For example, Jones et al. (2023) stated that “critical thinking is defined as ‘a reasonable, reflective thinking that is focused on deciding what to believe and do (Ennis, 1987, p. 81).” When no explicit definition was stated, we distinguished several types. For a conceptual definition (IC), we gleaned the authors’ intended definition from language used within the theoretical framing, for example, when authors referred to critical thinking as “reflective.” Finally, we used the code (IM) when the only information that hinted at the authors’ overarching conceptualization of critical thinking could be deduced from the measures they used (e.g., CCTST; P. A. Facione, 1990a). Salient examples of each category are discussed in the results section.
Fourth, we logged details pertaining to the measurement of critical thinking and whether those measures were standardized or normed (S) or nonstandardized (NS). For the standardized measures, we noted the name of that measure. For the nonstandardized measures, we indicated whether researchers analyzed an essay or other written assignment (W), possibly using a rubric, or whether they employed a test (T), survey, or questionnaire (Q). For studies that used multiple measures, multiple codes were logged. Next, we coded for the alignment (A) or misalignment between the provided definition of critical thinking and the subsequent measure used. Here, we distinguished misalignments due to researchers’ use of questionnaires to capture perceived rather than demonstrated performance (PP), such as asking students to rate their performance rather than demonstrate it, because of a confounding of critical thinking performance and disposition (PD)—for example, defining critical thinking in terms of mental processes but electing to measure students’ motivation to think critically; or misalignment of domain-generality and specificity between concept and measure (GS), a when a researcher conceived of critical thinking as domain-specific but measured it in a domain-general fashion.
Fifth, we logged and coded the characteristics of the Intervention, noting the dosage, approach, and focus of the intervention. For the dosage of the intervention, we noted information pertaining to the duration (span of time) and intensity (e.g., number of classes) of the intervention as reported in the studies. For the approach, we simplified Ennis’s (1989) taxonomic framework and distinguished interventions that were more evident in nature (EV) or more embedded (EM). That is, evident interventions entailed explanation, demonstration, or practice on how to think critically, while embedded interventions infused critical thinking into the curriculum without overt signaling or demonstration of that ability to the students. For the focus, we coded whether interventions were particular (PA) or programmatic (PR); that is, whether the researchers implemented specific techniques or processes (e.g., concept maps), or rather an entire program (e.g., PBL) or a larger set of elements (e.g., group discussions and student-led presentations and knowledge maps).
Sixth, we charted information related to the research design employed in studies. We categorized designs based on the authors’ reported designs, and we corroborated this by consulting the method sections. To determine whether a study was experimental (E), quasi-experimental (QE), longitudinal (L), or mixed-methods (M), we combined information reported on the number of groups (e.g., single group), the assignment to groups (i.e., random or nonrandom), if any, and the measurement moments of the critical thinking dependent variable measure (e.g., pretest-posttest).
Last, we coded the key outcomes that the studies presented—that is, the outcomes of the intervention as assessed by a critical thinking measure. Although our primary interest was the effect of the intervention manifest as a difference between treatment and comparison groups, not all studies had a comparison group. Moreover, the vast majority of included studies did not report effect sizes or statistical evidence that could be used to compute them. Instead, we took the authors’ reporting of outcomes in the results or discussion sections at face value and coded them as positive (+), neutral (0), or negative (-). For example, O’Flaherty and Costabile (2020) did not conduct any significance tests but deduced the positive effects of their intervention by descriptively comparing percentages of critical thinking behaviors. Further, we indicated the reported outcomes for every measure identified in a study. For example, Muehlenkamp et al. (2015) employed three distinguishable measures of critical thinking, and three reported outcomes were listed in Table 1.
Results and Discussion
Descriptive Trends
Before critically analyzing the retained intervention studies to answer our research questions, we first provide an overview of several trends that emerged in the selected characteristics of included studies charted in Table 1.
Research Publications by Year
One pattern of interest that emerged within this review related to the frequency of published intervention studies unearthed. Specifically, in Figure 2, we tabulated the number of critical thinking intervention studies published between 2010 and 2022 in three-year intervals. The resulting graphic reveals a general, albeit nonlinear, upward trend in critical thinking intervention studies. Thus, while critical thinking in higher education has been a focus within the educational and psychological literature for more than 80 years, attempts to improve this ability appear to be on the rise.
Intervention studies per three-year period since 2010.
Interventions in Diverse Countries
We also found that critical thinking interventions took place in diverse countries. In fact, the majority of reviewed studies were conducted outside of Europe, the United States, and Australia, and Asia-Pacific countries (e.g., Taiwan, China, Indonesia) were well-represented (n = 18, 40%). A smaller number of studies took place in the Middle East (e.g., Jordan, Turkey, Iran; n = 10, 22%) and sub-Saharan Africa (i.e., Ethiopia, n = 2, 4%), representing populations that are often underrepresented in psychological research (Henrich et al., 2010).
Research Across Educational Domains
Further, these interventions were carried out in 12 different domains, ranging from engineering to gender studies. While domains such as biology, computer science, and engineering were underrepresented, other STEM domains, such as chemistry or mathematics, were absent in this review. Perhaps these STEM domains that often deal with well-structured problems were underrepresented because knowledge claims in these domains tend to be objectively verifiable, as opposed to the STEM-adjacent domain of nursing, in which decision-making is probabilistic and based on best evidence. Perhaps it is for that reason that nursing dominated the literature we reviewed (n = 18, 40%; Table 2), although it may also be attributable to seminal research by N. C. Facione et al. (1994) that was situated in that domain.
Studies Conducted by Educational Domain
Student Characteristics
In the studies we reviewed, varying levels of information on the participants were reported. Although the majority of studies described participants’ gender, a substantial minority (n = 14, 31%) failed to include even that basic information. For those studies that did provide a gender breakdown, the majority of participants were female (~73%; Table 1). This pattern may be related to the frequency of nursing and psychology classes in which the critical thinking interventions were carried out. A very similar pattern occurred for the reporting of participants’ ages. That is, age was documented in 58% of the studies, while the remaining 42% made no mention of participants’ age. Although we can presume that most participants were of college age as a result of our inclusion criteria (i.e., approximately 18 to 22), one study included older adults enrolled in education on a part-time basis (Mage = 36.7; Marquès Puig et al., 2022). Regrettably, participants’ race or ethnicity was rarely recorded. Generally, scant details were provided regarding the students participating in the interventions.
Research Designs Employed
While we set out to examine both experimental and quasi-experimental intervention studies, we found that the vast majority of included studies were quasi-experimental (QE; n = 39, 87). In comparison, only a few studies were experimental (E; n = 4, 8%). This may reflect the fact that studies were carried out in educational settings, complicating random assignment to treatment conditions. Eight QE interventions lacked a comparison group (18%; e.g., T. Liu et al., 2021), one QE lacked a pretest but had a comparison group (Muehlenkamp et al., 2015), and another used a single-group posttest-only design (Marquès Puig et al., 2022). We were unable to determine the research design for two studies: Chen (2021) did not explicitly mention or report pertinent details, while Bilik et al. (2020) timed the measurement between randomly assigned groups such that they effectively conducted an unspecified QE.
RQ1: Defining Critical Thinking
From prior reviews, we recognized that there was often little consensus on the meaning of core constructs, such as engagement, motivation, or self-regulated learning (Dinsmore et al., 2008; Murphy & Alexander, 2000). For that reason, we coded whether authors forwarded an explicit or implicit conceptualization of critical thinking with the expectation that explicit and consistent definitions of critical thinking would be rare. We were surprised to find that most studies we reviewed offered an explicit definition of critical thinking (n = 27, 60%). For one, O’Flaherty and Costabile (2020, p. 1) explicitly defined critical thinking as “the development and evaluation of arguments.” Also, Yu et al. (2013, p. 574) stated that critical thinking “consists of the clarification, simplification, organization, and rationalization of ideas.” Notably, studies that expressly defined critical thinking often framed this construct as a set of mental processes, like López et al. (2020, p. 1), who stated that critical thinking entails making “useful and self-regulatory judgments through analysis, interpretation, evaluation, and inferential reasoning.”
What was central to most of the explicit definitions was characterizations of critical thinking as involving analysis, evaluation, and, to a lesser extent, reflection. Further, many explicit definitions underlined the purposeful or intentional nature of critical thinking, highlighting the need for interventions to solicit such nonspontaneous thinking. Thus, in many ways, definitions of critical thinking in the reviewed studies echoed the American Philosophical Association consensus definition established by P. A. Facione (1990a). Definitional consistency between studies is desired but often absent in social science research (Gonzalez et al., 2021). Nonetheless, the use of similar terms does not per se indicate conceptual agreement, and when used in applied research may even create an illusion of clarity and agreement (i.e., “Jingle–Jangle” fallacies, see Kelley, 1927). Moreover, the definitions in reviewed studies often overlapped only in part, touching on one or a few of the facets of critical thinking specified in the Delphi report.
For those studies that provided no explicit definition of critical thinking (n = 18, 40%), we attempted to extract the authors’ covert definitions from the theoretical framing. In two articles, words or phrases in the theoretical overview alluded to the authors’ implicit definition. For example, Marquès Puig et al. (2022, p. 214) hinted at critical thinking as “self-regulated . . . strategies” of “applying prior knowledge to new situations” and the “analysis and evaluation of information” in a thoughtful manner. The remaining 10 studies did not offer sufficient textual cues for us to infer their guiding conception. In those cases, only the measure hinted at an implicit definition.
For example, McLean and Miller (2010) course assignments suggested that critical thinking was the ability to find logical flaws in psychology research reports and to evaluate passages with problems, statements, inferences, and assumptions through the interpretation and evaluation of arguments. Similarly, Elçiçek’s (2022) focus on computational thinking skills and problem-solving ability using IT sheds some light on the concept of critical thinking guiding that investigation. We also determined that a substantial portion of the definitions were variations of the conceptions forwarded in the writings of leading voices in the critical thinking movement, such as P. A. Facione (1990a, 1990b) and Ennis (1964). What was not apparent was whether these definitions translated into measures that reflected those conceptions. Therefore, we next investigated the congruency between conceptions and operationalizations of critical thinking.
RQ2: Measuring Critical Thinking
Across the studies we reviewed, 55 measures were used to assess critical thinking (see Table 1). In our analysis, we distinguished between standardized and nonstandardized measures. We coded a measure as standardized (n = 15, 27%) when its construct validity was psychometrically established and its scores were norm-referenced (R. Glaser, 1963). We labeled measures without normed properties as nonstandardized (n = 40, 72%), of which most were researcher-developed instruments that specifically matched the aims of the intervention. Overall, we found little variation among the standardized measures used. Specifically, the CCTST was represented three times (P. A. Facione & Facione, 1994b), while the related measure for critical thinking dispositions, the CCTDI (P. A. Facione & Facione, 1992), was given seven times, including three uses of its Chinese translation (CCTDI-CV; Peng et al., 2004). Carter and Welch (2016) administered the domain-specific Health Sciences Reasoning Test (HSRT; P. A. Facione et al., 2010).
The aforementioned reliance on standardized measures in this literature seems promising. However, there is reason to question whether the results yielded from those standardized measures were interpreted in valid ways (Messick, 1986). For example, favorable content and criterion validity were reported for the CCTT Level X (Ennis et al., 1985), although the five-factor structure of induction, deduction, observation, credibility, and assumptions could not be reproduced (Leach et al., 2020). For the CCTST, scarce psychometric data were available. In fact, most studies refer back to a single psychometric validation study (P. A. Facione,1990b) published by the authors of the measure (i.e., P. A. Facione, 1990b). One validation study found that the CCTST “had neither sufficient psychometric properties to assess individual abilities nor sufficient stability reliability” (Bondy et al., 2001, p. 309). Thus, simply using a standardized measure of critical thinking does not ensure construct validity.
The majority of the measures administered in the studies we reviewed were nonstandardized, and surveys and questionnaires were the most often used type (n = 23). Some of these surveys were crafted by the researchers, while others were adapted or adopted from earlier works. For example, Chiu et al. (2023) developed a 10-item Likert-scale questionnaire requiring students to self-report their critical thinking during art design. Akins et al. (2019) adopted the University of Florida Critical Thinking Inventory developed by Lamm and Irani (2011) to assess students’ critical thinking “style.”
A smaller number of studies (n = 7, 16%) based their conclusions about critical thinking on written reflections or essays that students wrote as part of the intervention. For instance, Orique and McCarthy (2015) had nursing students submit four written analyses of cases and scored those analyses using P. A. Facione and Facione’s (1994a) Holistic Critical Thinking Scoring Rubric. Two other studies (Muhlisin et al., 2016; Wale & Bishaw, 2020) used a variation on that rubric to measure students’ critical thinking. O’Flaherty and Costabile (2020) inferred growth in the ability to “apply standards of accuracy” and “transform knowledge” from students’ grade improvement after resubmitting a written patient case analysis. However, the authors provided no scoring rubric, leaving it unclear how grades reflected critical thought. Particularly in light of the lack of agreement on what constitutes critical thinking among researchers and educators, the absence of a system to measure that mode of thinking is troublesome.
Five studies (11%) used an existing, nonstandardized test to capture critical thought. For example, McLean and Miller (2010) used the Assessment of Thinking Skills (Wesp & Montgomery, 1998) to assess students’ ability to identify and describe logical flaws in psychology research reports. Lee et al. (2013) used the multiple-choice Critical Thinking Scale by Cheng et al. (1996) to gauge students’ ability to make inferences, recognize assumptions, deduce, interpret, and evaluate arguments. Further, we found that most studies in this review relied exclusively on self-report to assess critical thinking (n = 24, 53%). This outcome seems attributable to a focus on critical thinking dispositions in this literature P. A. Facione, 1990a; P. A. Facione & Facione, 1992) or participants’ tendency to engage in critical thinking. As with many motivational or attributional measures (Fulmer & Frijters, 2009), dispositions are routinely measured by means of self-reports. Many self-report measures of affective traits or states have dubious construct validity, and we found that few studies in our review reported on the psychometric properties of their self-report measures. Fortunately, we found that some studies (n = 9, 10%) used more than one measure of critical thinking, which would have enabled researchers to corroborate findings. For example, Naber and Wyatt (2014) used both the CCTST and the CCTDI to assess the effect of their reflective writing intervention in nursing.
Conceptualization—Operationalization Misalignment
As noted, operationalizing complex constructs like critical thinking is a challenging undertaking. For instance, disconnections can occur between the targeted constructs and measures that researchers choose or develop, resulting in conceptualization-operationalization misalignment. We could not evaluate the conceptual-operational alignment for 11 studies (24%) because sufficient details on the measures or an explicit definition of critical thinking were lacking. The forms of misalignment we discuss pertain to domain generality versus domain specificity, performance versus dispositions, and perceived versus demonstrated performance.
Domain-Generality Versus Domain-Specificity
To evaluate the concept-measure alignment, we determined whether the definition forwarded in a study was more domain-general or specific and compared this to the nature of the measure used. Given the disagreement on the generality or specificity of critical thinking, we expected that most studies would conceptualize critical thinking as a generic ability. Importantly, we found that five studies (11%) employed domain-general measures of critical thinking, especially standardized, while their guiding definitions and interventions were domain-specific. For example, Rababa and Masha’al (2020) defined critical thinking as including “assessment and nursing diagnosis, planning, nursing interventions, and evaluation” (p. 1), but they administered the generic Critical Thinking Self-Assessment Scale (Nair et al., 2017).
The work of two groups of researchers (Bellaera et al., 2016; Tiruneh et al., 2016) deserves mention for their express focus on the domain-generality and specificity of critical thinking. These groups used both a general and a specific measure to assess the impact of their interventions, hypothesizing that fostering critical thinking within a domain was more likely to be successful than expecting transfer to a domain-general measure. Initially, these two studies were considered misaligned because their theoretical introductions provided only an explicit domain-general definition of critical thinking, but no domain-specific definition. Yet, upon closer scrutiny, we found that they provided a description of critical thinking in their respective domains that could be considered explicit. Specifically, Bellaera et al. (2016) used a researcher-developed domain-specific measure to tap into students’ “recognition of inferences, assumptions, interpretations, and evaluation of arguments” while evaluating claims rooted in sociohistorical theory (p. 269). They paired that measure with the domain-general WGCTA (Watson & Glaser, 1980). In turn, Tiruneh et al. (2016) used the domain-general HCTA (Halpern, 2010) together with the specific Critical Thinking in Electricity and Magnetism test (CTEM; De Cock et al., 2015) that was designed to test students’ “ability to draw valid inferences, analyze arguments, solve problems, make predictions, and analyze probabilities and assumptions with respect to thinking tasks that are specific to a freshman physics course” (p. 486). In keeping with their hypotheses, both interventions resulted in greater improvement for the domain-specific measures of critical thinking than for the domain-general measures. Both the commendable multimeasure approach and their express testing of hypotheses regarding the transfer of critical thinking meaningfully added to the extant body of intervention research.
Critical Thinking Performance Versus Dispositions
Given the frequent use of disposition-like measures in the reviewed studies, we anticipated that a further disconnect would occur between critical thinking performance and dispositions. Worryingly, we found that 14 studies (31%) used critical thinking disposition or motivation measures to assess interventions that targeted students’ ability or performance to think critically. For example, Akins et al. (2019) invited agriculture students to engage in a series of case studies on agricultural issues to promote their critical thinking capacity. However, the researchers administered the University of Florida Critical Thinking Inventory (Lamm & Irani, 2011) to assess students’ “critical thinking style” (2019, p. 97). These researchers also frequently confounded style and ability in their discussion section. Similarly, Ma and Zhou (2022) set out to improve students’ critical thinking in nursing using a case-based curriculum but used the CCTDI (P. A. Facione & Facione, 1992) to assess students’ dispositions. However, within their discussion section, these authors then claimed that their intervention improved students’ ability to think critically.
Although there is evidence to suggest that critical thinking dispositions and ability are correlated (Colucciello, 1997), these two constructs are theoretically distinct (Alexander, 2023b; P. A. Facione, 1990a; Murphy, Ogata & Schoute, 2023), empirically distinguishable (Taube, 1997), and not simply interchangeable. Substituting a performance or ability measure for a dispositional measure prohibits researchers from making causal claims regarding the success of interventions targeting the ability to think critically.
Reliance on Perceptions Versus Performance
Equally disconcerting was the finding that several studies (n = 6; 13%) relied solely on students’ perceptions of the effectiveness of interventions. Despite arguments to the contrary (Bowen 1977; Pascarella & Terenzini, 1991), there is ample reason to believe that students are poorly calibrated, unable to accurately report their performance on cognitively demanding tasks (Dunning et al., 2003), such as those involving critical thinking. Thus, such self-reports are distal proxies of students’ actual ability to think critically. Moreover, self-reported abilities are susceptible to social desirability effects (Bråten, 2016), rendering them problematic indications of improvement of ability. For example, Rababa and Masha’al (2020) discussed the questionable psychometric properties of most available tests, opting instead to use the self-report Critical Thinking Self-Assessment Scale (CTSAS; Nair et al., 2017). Muehlenkamp et al. (2015) employed an even more distal proxy of performance by asking students to report on the extent to which they perceived that course context required them to think critically. These practices of relying on measures of perceived ability over performance indicators produce serious disconnects between the type of thinking targeted and the actual construct that is measured.
RQ3: Characteristics of Critical Thinking Interventions
Central to this review was the mapping of features of interventions associated with improvements in students’ critical thinking. We expected that focus, approach, duration, and intensity would bear directly on the effectiveness of those interventions.
Intervention Focus
Our approach to specifying focus was to distinguish interventions that were more programmatic or holistic from those that were more particular or specific. We found that the majority of studies (n = 26, 58%) employed a programmatic intervention, while the remaining studies (n = 18, 40%) targeted a few particular elements linked to critical thought. The remaining study could not be classified. Programmatic interventions typically took the form of an entire course curriculum or program consisting of multiple elements hypothesized to promote students’ critical thinking. For instance, seven studies leveraged a problem-based learning (PBL) approach, often in comparison to lecture-based instruction. For one, Gholami et al. (2016) conducted an intervention in a critical care course for nursing students that used lecture-based instruction for the first half of the semester and PBL for the second half. In the PBL portion, students were expected to self-regulate their learning, participate in group discussions, and use peer evaluation and reflection under the supervision of the course instructor. T. Liu et al. (2021) used PBL in conjunction with case-based and team-based learning, as well as mind mapping. In this intervention, many elements could potentially help students think more critically, although no comparison group was included to isolate these effects. A greater issue within these programmatic intervention studies was the absence of compelling justifications for the specific elements included or explanations for how those elements were expected to improve critical thinking (Loyens et al., 2023; McMillan, 1987).
Other programmatic studies likewise exposed students to various educational elements. For example, Tiruneh et al. (2016) compared a traditional teacher-centered introduction to an electricity and magnetism course to a more student-centered version that focused on activation, demonstration, application, integration, and problem-centeredness. In contrast, Muhlisin et al. (2016) applied a combination of reading assignments and collaborative mind-mapping in a biology course and compared this approach against a “conventional” biology class. The authors argued that mind-mapping allowed students to reason, test assumptions, weigh alternatives, and draw conclusions between concepts, although they did not specify how these processes—alone or in combination—would bolster critical thinking. One salient example is Koç et al.’s (2021) study, in which students in a gender studies course collaborated on analytic questions regarding gender-related issues and engaged in ongoing discussion. They hypothesized that those elements would improve participants’ critical thinking compared to nonparticipants.
There was marked diversity among the studies we identified that focused on particular processes or elements (n = 18). For example, Bellaera et al. (2016) compared two political science classes for which students in the treatment condition were prompted with higher-order questions during reading, while students in the comparison condition read without question prompts. Naber and Wyatt (2014) introduced a reflective writing intervention in one of two nursing course sections. Students were to document observations and experiences, interpret, draw inferences, reflect on preexisting assumptions, and weigh consequences for their patients and themselves. Another study by Huang et al. (2022) compared two flipped-classroom business management classes. In the treatment group, researchers introduced business simulation games that required students to collaborate on setting up a business plan and evaluate the ensuing results of the simulated business. The authors reasoned that those games would simulate actual business managers’ decision-making, although no strong justification was provided for such an assumption or for why that would improve critical thought. Lee et al. (2013) implemented a concept map treatment to supplement the traditional lectures and case study assignments used in the control class. They hypothesized that crafting concept maps would require students to examine their existing knowledge and think in more critical and complex ways. D. Liu and Zhang (2022) and Wang and Liao (2014) similarly relied on concept maps in their studies. The differential foci between studies in our pool reflect that there is great variety in the characteristics of interventions researchers design or elect to use, although the suggested causal mechanisms were, again, seldom sufficiently justified. Importantly, in the Results section, we consider whether the heterogeneity of reviewed studies can explain the variation in the yielded effects.
Intervention Approach
A second relevant dimension of the intervention characteristics is that of the intervention approach, where we distinguished interventions that appeared to make explicit or evident to students what critical thinking looks like from interventions in which critical thinking was embedded. For this dimension, two studies (4%) could not be categorized. In several studies (n = 11, 24%), researchers used modeling or demonstrations to explain to students how to think critically. For example, Orhan (2023) incorporated online, flipped, and in-person modules that explicitly taught critical thinking skills into an existing course. Using reflections, discussions, group work, and quizzes, students expressly practiced thinking critically. Similarly, McLean and Miller (2010) directly taught psychology students to think critically about potentially false scientific claims by discussing and practicing principles of psychological science.
For the majority of the 45 studies in this review (n = 32, 71%), no explicit demonstration or communication of critical thinking occurred. Rather, students participated in an intervention but were left to infer what that experience had to do with critical thinking. For example, Ma and Zhou (2022) compared two nursing classes that employed case studies. In the control condition, traditional intact case studies were used, whereas in the treatment condition, the cases were unfolded in parts to deepen reflection, analysis, and evaluation of case elements. These researchers did not explain if or how this unfolding-case approach was explicitly linked to critical thinking for the participants. Similarly, D. Liu and Zhang (2022) compared a traditional computer science database course to a treatment class in which students communicated via an app and were aided by the instructor in answering and posing questions. In no instance, however, were the elements of the intervention expressly tied to improved critical thinking.
Duration and Intensity
We documented significant variability in the duration of interventions (Table 1). The briefest interventions were only one or two sessions long (O’Flaherty & Costabile, 2020; Orique & McCarthy, 2015), whereas the longest intervention lasted three academic years (Elçiçek, 2022). Yet, beyond general statements about the number of weeks their interventions lasted, researchers were far less forthcoming about how many hours per week those interventions entailed or what precisely occurred during those time spans (Huang et al., 2022). Besides this considerable variability, we found that 19 studies (42%) instituted interventions that spanned almost an entire semester (i.e., 10–16 weeks). For example, the intervention by Koç et al. (2021) consisted of biweekly class sessions over 14 weeks, while Snyder and Wiles’s (2015) intervention consisted of 115-minute weekly sessions for 13 weeks.
Although the effects of intervention duration and intensity were hard to quantify, the span and number of sessions in these studies suggest that, typically, ample time was taken to intervene. Further similarity can be found among the eight studies (18%) that spanned about half a typical 16-week semester, ranging from 7–8 weeks in duration. The relative brevity of some interventions seemed to reflect practical concerns, such as course demands and time allotted by instructors, since these interventions were part of the specific course.
RQ4: Outcomes of Critical Thinking Interventions
The final question guiding this review was our attempt to integrate data from articles that would address our overarching goal of ascertaining the characteristics of interventions in higher education associated with significant improvements in participants’ critical thinking. Our first step in the synthesis was to record researchers’ conclusions as to the effectiveness of their interventions in an uncritical way. That is, we first accepted researchers’ claims of intervention effectiveness at face value. Next, we engaged in critical analyses of those reported findings based on the nature and quality of the measures implemented, statistical factors relevant to the reported results, and overall study designs that would permit or inhibit conclusions of effectiveness.
Researchers’ Reported Outcomes
At first glance, the findings from this systematic review seem to corroborate the positive effects and gains in critical thinking as a function of implemented interventions reported in earlier reviews (Abrami et al., 2008, 2015; Tiruneh et al., 2014). Indeed, we found that most studies in our review (n = 37, 82%) reported at least one positive effect. Further, positive outcomes were found for the majority of measures used (n = 41, 78%; Table 1). For example, T. Liu et al. (2021) reported significant increases in nursing students’ critical thinking disposition from pretest to posttest within their single-group study. Similarly, Orhan (2023) reported that explicit critical thinking instruction was fruitful compared to regular university education. In sharp contrast, few studies (n = 8, 18%) reported no differences in critical thinking. For instance, Snyders and Wiles (2015) reported no significant improvements in demonstrated critical thinking performance among students in a peer-leader treatment group compared to a comparison group. Just one study (3%; Carter & Welch, 2016) reported negative outcomes from an unfolding case-based intervention in nursing. However, it is imperative for us to look beyond these general findings to understand what these outcomes actually demonstrate.
Reported Outcomes vis-à-vis Intervention Characteristics
Upon closer scrutiny, it becomes evident that some of the effects reported by authors cannot be taken at face value. For one, we liberally applied the code “positive” when we summarized the findings of the reviewed studies. That is to say, we coded findings as “positive” if any increase in critical thinking was documented as positive by the authors, even if this improvement was demonstrated in both treatment and comparison groups, or if it was only observed in one of the many elements of a critical thinking measure. Thus, we describe several reasons why caution is warranted in judging interventions as effective.
Nature of Measures Used
One factor that must be considered in contextualizing the exceedingly positive intervention effects is the type of measures used to gauge improvement. We anticipated that the earlier categorization of measures as standardized and nonstandardized would be helpful. Yet, no clear differences between these general categories were identified. When we examined other features of the measures (see Table B, Online Supplementary Material), relevant patterns emerged. For instance, when researchers administered a test requiring the demonstration of critical thinking by participants, the intervention was a positive factor 65% of the time (11 of 17). In contrast, when data were based on participants’ perceptions of their critical thinking ability or dispositions following the intervention, the positive outcomes rose to 80% (24 of 30). Given the shortcomings of self-report measures (Dunning et al., 2003), claims regarding the effectiveness of critical thinking interventions must be interpreted with caution.
A further consideration regarding the operationalization and measurement of critical thinking is whether the measure used is domain-general or domain-specific. Given that domain-general measures are divorced from the educational context in which critical thinking is promoted, some transfer is expected, although we discussed that the debate surrounding transfer is inconclusive. Yet, we found that the success ratios between domain-general (74%; Table C, Online Supplementary Material) and domain-specific (84%) measures are comparable, suggesting that measures whose contents are close to the educational context do not differ significantly from more distal measures that are relatively unrelated to the intervention context. Of importance here is that many self-reports of critical thinking ability or disposition are counted among both kinds of measures. However, the difference in success ratios between domain-general (64%) and domain-specific (67%) measures remains similar when focusing only on tests of critical thinking (Table D, Online Supplementary Material).
Of course, this comparison across studies does not support strong conclusions regarding the effect of the generality of a measure on the assessed results. Fortunately, two studies that incorporated both kinds of measures to test the generality of critical thinking may be insightful. The earlier-discussed salient study by Tiruneh et al. (2016) merits mention, as it reported differential gains between a general and a specific measure. These researchers found that physics students’ gains in the treatment condition outperformed that of the comparison group on the domain-specific but not the domain-general measure. This finding is in line with theory (McPeck, 1981) and empirical findings (Murphy, Ogata & Schoute, 2023) that suggest that students may perform significantly better on domain-specific measures of critical thinking because relevant skills or abilities do not have to transfer. However, our review cannot confirm or disconfirm the assumption that critical thinking transfers across contexts.
Statistical Considerations
In scrutinizing the study results, we also identified issues related to the appropriateness of the conclusions that researchers reached based on their data and analytical approach. For example, participants in M. López et al.’s (2020) study reported their perceived critical thinking performance following an intervention, but the researchers used no statistical test to interpret those perceptions. Rather, the resulting scores on the pretest and posttest were presented descriptively, and inferences regarding increases in critical thought were gleaned from those descriptive data. Gholami et al. (2016) administered the eight subscales of the CCTST to assess the effectiveness of the intervention. However, the researchers analyzed the resulting data by conducting eight separate t-tests without the mandatory adjustments for conducting multiple dependent tests, artificially inflating the chance of producing significant effects (Benjamini & Hochberg, 1995). In addition, even setting aside the problem of familywise error, these researchers concluded that their intervention was effective when only two of the eight subscales showed a significant change from pretest to posttest. This raises the question of how many subscales on a measure need to demonstrate a positive outcome for an intervention to be deemed effective and for students to have demonstrated greater critical thinking.
Research Designs Employed
To our surprise, few of the studies that we analyzed discussed how their research design may have limited the internal validity of inferences. Specifically, the nature of intervention research is such that researchers conducting such studies cannot assume that the educational experience they introduce will significantly improve students’ critical thinking above and beyond more conventional experiences unless certain conditions are met. Thus, to draw valid causal conclusions, researchers must ensure that alternative explanations for findings are accounted for (Steiner et al., 2023).
As a case in point, quasi-experimental studies dominated in this review (n = 39, 76%; Table E, Online Supplementary Materials), of which 29 studies included a nonequivalent comparison group design. For that reason, we wanted to determine if researchers expressly reported on the equivalence of the groups or statistically adjusted for any nonequivalence. We found that 18 of the 29 (62%) in this review discussed the equivalence of groups prior to intervention, most often in terms of student characteristics, especially critical thinking pretest scores. In 6 of those 18 studies (33%), researchers used statistical procedures to adjust posttest performance based on the group differences they identified. Interestingly, there were 6 additional studies of the 29 nonequivalent group designs (21%) in which researchers performed statistical analyses, such as analysis of covariance, without any evidence that groups differed prior to intervention. Without such equivalence testing on variables such as motivation, prior knowledge, and metacognitive competence, assuming equivalence between nonrandomly assigned groups poses a significant threat to the validity of the causal inference that the intervention brought about improvements in critical thinking exclusively. On the other hand, statistically adjusting for pretest scores may, in some cases, lead to misguided interpretations (Pedhazur, 1997) and must be informed by theory and compared to nonadjusted analyses (Steiner et al., 2023).
Worryingly, 10 of the 39 quasi-experimental studies (22%) lacked a comparison group (n = 8, 18%), another relied solely on a posttest (n = 1, 3%), while one study (3%) had neither a comparison group nor a pretest. Within these studies, the validity of causal claims is even further under threat. In effect, in these studies, any differences or growth in critical thinking reported cannot unambiguously be ascribed to the intervention that students participated in because it is unclear how students’ critical thinking would have developed in the absence of an intervention. Although quasi-experimental studies have unquestionable potential to evaluate educational interventions (Hodis & Hancock, 2016), ascribing the yielded effects to the intervention in the absence of advanced statistical modeling and identification of alternative explanations to the findings may be inappropriate (Steiner et al., 2023). In two studies (5%), no determination could be made regarding the precise research design employed, and similar caution is warranted.
In addition to the quasi-experimental studies, there were four experiments (9%) in which participants were randomly assigned to treatment or comparison groups, and equivalence between treatment conditions could be assumed, meaning the yielded effects could be taken at face value. However, there were two studies (5%) for which the research design was indeterminable, leaving the findings of those studies in doubt.
Substantive Intervention Characteristics
Besides data-analytic considerations, we examined the reported outcomes vis-à-vis the substantive characteristics of the reviewed interventions, such as focus, approach, duration, and intensity. As it pertains to the foci of the interventions, no substantial differences in success rates were identified for particular versus programmatic interventions. Specifically, for the 18 interventions that only leveraged a select number of instructional features (e.g., concept maps or reflection questions; Bellaera et al., 2016) to promote critical thought, 14 (78%) yielded positive effects (Table F, Online Supplementary Material). Similarly, for the 26 studies that immersed students in a programmatic intervention (i.e., using a host of instructional elements; e.g., Gholami et al., 2016), 22 (85%) reported positive effects. Of note is that interventions as dissimilar as Bellaera et al. (2016) and Gholami et al. (2016) yielded similar results. Thus, studies in which students participated in programmatic interventions reported similar success rates to studies that maintained a narrower focus.
Based on the literature on intervention approaches (e.g., Marin & Halpern, 2011), we expected that studies with a more evident or explicit structure would record higher instances of success than those electing for a more embedded or implicit structure. Contrary to our expectations, however, studies employing more evident, explicit signaling of critical thinking (e.g., Orhan, 2023) versus embedded, implicit conveyance of critical thinking (Ma & Zhou, 2022) manifested rather equivalent levels of intervention success (Table G, Online Supplementary Material). More precisely, 9 of the 11 studies (82%) that were more transparent or evident in the treatment of critical thinking skills produced positive effects. In contrast, 26 of the 32 studies (81%) took a more implicit or embedded approach to the promotion of critical thinking.
At the outset of this study, we also planned to analyze the reported effects of interventions in relation to their duration and intensity. While one would perhaps hypothesize that longer or more intense interventions would yield more positive effects, the limited information that researchers provided on how long they intervened and how many hours each session encompassed scuttled this planned analysis. Furthermore, the overwhelming number of reported positive results provided a limited opportunity to identify the characteristics that were most effective among the heterogeneous pool of interventions. In what follows, the implications of what we learned from scrutinizing this cross-section of the intervention literature on critical thinking are presented, where insights from these factors and considerations are combined.
Conclusions
Critical thinking is a form of mental engagement that is highly esteemed for its promotion of human growth and development (Davies, 2015; Dewey, 1913, 1933; Scheffler, 1973), effective decision-making, and contributions to an informed citizenry (McGrew & Chinoy, 2022; Murphy, Ogata & Schoute, 2023). We undertook this review to examine interventions developed to promote critical thinking among students in higher education. The overarching goal was to depict the lay of the land in terms of what critical thinking interventions entailed and to what extent they proved successful. More specifically, we set out to explore (a) how researchers conceptualized the construct of critical thinking; (b) how they subsequently operationalized that construct; and (c) how researchers’ conceptualization and operationalization aligned. Further, we sought to document (d) the characteristics of the identified interventions and (e) the characteristics of the participants in those interventions. Lastly, we sought to (f) position the reported outcomes in relation to the guiding definitions, intervention features, measures employed, and characteristics of participants. As this review progressed, we found ourselves surprised and somewhat dismayed by the lack of clarity, explicit articulation, or justification for the very elements we set out to investigate. Those concerns extended to the core concepts, measures used or created, details of the interventions, demographics of the participants, and presumed benefits of the interventions. In what follows, we present several salient conclusions that extend from these areas of concern.
Theoretical Transparency
To interpret the goals and results of a critical thinking intervention appropriately, transparency surrounding the constructs targeted in the intervention is essential. Although the explication of core constructs is regarded as foundational to empirical investigations (Williams, 1999), researchers have repeatedly found that vague or tacit definitions dominate psychological inquiry (Dinsmore et al., 2008; McMillan, 1987; Schoute et al., 2022). In contrast to that general trend, a positive attribute of the research we analyzed was that a majority of studies incorporated explicit definitions of critical thinking. Further, definitions proffered by key scholars during the critical thinking movement dominated (P. A. Facione, 1990a; Ennis, 1962).
Despite this positive characterization, there are two conclusions we want to highlight about the orientation toward critical thinking captured in this review. The first, to which we alluded previously, is that the explication of critical thinking that dominated in this review is process-rich but ontologically weak. In other words, researchers conducting these interventions were focused almost exclusively on various manifestations of critical thinking, such as interpretation, analysis, and evaluation, and rarely on the core nature of such thinking that would give rise to those manifestations. Analogically, this is similar to confusing symptoms with the disease or ailment that those symptoms may indicate. This issue is of particular relevance to both the conceptualization and operationalization of critical thinking. Because critical thinking is not directly observable, it is vital to capitalize on defining and validating the relation of observable cognitive processes and the construct of critical thinking that underlies it. We amplify this point later when we consider the implications of this finding.
The second issue we want to raise pertains to the conceptual or definitional coherence documented in researchers’ definitions of cognitive processes in this review. On the one hand, this strong reliance on the definition of critical thinking articulated in the Delphi Report may be seen as a strength of the ongoing work in this domain. In effect, this level of coherence for what many researchers described as a complex phenomenon projected an air of conceptual authority, erasing any doubt about what it means to think critically (P. A. Facione, 1990a. On the other hand, such a consolidated perspective on a construct as expansive and elusive as critical thinking may have unintentionally closed the door to deep and essential discussions of its core nature (Alexander, 2016; Greene et al., 2016; Murphy, Ogata, & Schoute, 2023). In essence, that core nature would reflect what Dewey (1910, p. 6) described as “active, persistent, and careful consideration of any belief or supposed form of knowledge in the light of the grounds that support it, and the further conclusions to which it tends.”
Despite the rich literature that emerged during and after the critical thinking movement (e.g., Ennis, 1962, 1964; P. A. Facione, 1990a, b; McPeck, 1981), we found that theoretical contributions were elusive when it came to addressing the inherent nature of the construct (Alexander, 2023a; Lombardi, 2023; Murphy, Ogata, & Schoute, 2023). Given this dearth of theoretical advances toward the ontology of critical thought, it is perhaps unsurprising that the empirical studies reviewed did not push the envelope concerning the nature of critical thinking.
Of course, not every contribution to the literature seeks to or needs to elucidate the nature of a construct but may instead set out to test hypotheses regarding the relation between various skills, such as critical thinking and general educational attainment (Fong et al., 2017). Yet, even for the pragmatic purpose of measuring or improving students’ thinking, uncritically adopting a popular definition of critical thinking that expresses that thinking solely in terms of cognitive processes is problematic. As discussed, none of those individual processes or any known synergy or combination of processes per se constitutes critical thought (Alexander, 2023a; Bailin et al., 1999b). Encompassing a wide array of cognitive processes in one definition may, in fact, result in the illusion of conceptual coherence when, in fact, individual studies operate under varying interpretations of that definition. Consequently, the fact that the majority of reviewed interventions did not explicate the nature of the construct of central interest detracted from the certainty that these interventions improved critical thinking.
Methodological and Measurement Considerations
In the results, we provided overviews of several key points regarding the characteristics and usage of critical thinking measures. Specifically, we found that there was (a) a substantial number of studies that strongly relied on self-report measures, often by using well-known standardized measures of critical thinkinglike the CCTST (P. A. Facione, 1990a, 1990b); (b) inclusion of disposition measures that were more about critical thinkers than critical thinking; (c) limited psychometric data on any instruments used; and (d) evident misalignments in the domain generality or domain specificity of study components, such as a domain-general definition and measure of critical thinking combined within a domain-specific intervention.
Not only was the domain-generality or domain-specificity of the measures used in the reviewed studies a serious concern, but so were the interpretations or conclusions that researchers reached based on those measures. For example, there were multiple instances when researchers reported positive effects for their interventions when there were significant increases on only one of the scales comprising a measure. There were also multiple cases where an intervention was deemed significant solely on the basis of participants’ self-reported improvement in critical thinking. As with conceptualizations of critical thinking, such unresolved methodological issues bring into question the quality of the data collected. In effect, any conclusions the researchers reached based on those data must be viewed cautiously.
Furthermore, the characterizations of the definitions and measurements of critical thinking seemed to indicate that most researchers whose works we reviewed were operating under the assumption that critical thinking is inherently a domain-general ability. They also assumed that this general ability to think critically would interface effectively with a domain-specific intervention, such as nurses conducting case analyses, even when there was no explicit cueing of critical thinking processes. Reciprocally, those researchers assumed that the domain-specific intervention would translate into higher posttest performance on a general critical thinking measure. This is particularly striking as the debate about the transferability of critical thinking remains unresolved (Bailin et al., 1999a; Kuhn, 1999; McPeck, 1981).
Further, as indicated by our research questions, we set out to investigate not only what kind of interventions promote critical thinking but also to explore for whom those interventions were helpful. Unfortunately, we were forced to abandon this exploration, as reviewed articles provided scant details on the students participating in interventions. Thus, we were largely unsuccessful regarding our ambition to position and question the reported intervention effects in relation to the context and conditions under which critical thinking was targeted. This question remains unanswered within this literature.
Issues about the research designs and data-analytic methods used in the reviewed studies must also be voiced. As noted, the majority of studies we analyzed employed quasi-experimental designs (QED) with pretest and posttest. Although a well-executed QED can afford significant insights into the causal mechanisms of the interventions, such designs require careful consideration of threats to internal validity (Grosz et al., 2020; Steiner et al., 2023). In effect, in the case of critical thinking interventions, researchers would have to theoretically anticipate and practically assess factors that may be alternative explanations for their findings, such as initial group differences in terms of relevant prior knowledge, metacognitive abilities, or relevant experiences. Among the reviewed studies, variables of this nature were rarely considered, and consequently, alternative explanations were not addressed. Instead, most studies used simple mean comparisons to support causal claims. Moreover, the vast majority of the manuscripts offered no indication of the magnitude or meaningfulness (i.e., effect size) of the reported effects, nor reported sufficiently detailed statistics to allow for secondary data analysis to derive or compute those effect sizes.
Given these shortcomings, we must, therefore, conclude that, in many instances, gains or changes in critical thinking documented within the reviewed studies cannot irrefutably be ascribed to the interventions conducted. Further, the practical significance of reported outcomes remains indeterminate. It is important to recognize that these measurement and methodological shortcomings result from researchers’ decisions and are amenable to change.
Intervention Characteristics
One key goal of this review was to characterize critical thinking interventions in higher education. Upon closer scrutiny, certain approaches were more commonplace, including student-centered curricula (e.g., Muehlenkamp et al., 2015), case studies (e.g., Akins et al., 2019), and flipped classrooms (e.g., D. Liu & Zhang, 2022). Yet, beyond these general distinctions, we found that the majority of the studies did not report on the precise nature, content, and procedures of the interventions. However, even those studies that included richer information about the interventions often failed to describe the specific components or processes within the intervention that were expected to improve critical thinking. This lack of detailing proved problematic in two significant ways. First, without specificity for what precisely was done in these interventions and why, we were unable to analyze these studies to the degree we intended. Second, and even more important for the outcomes of this review, the causal claims forwarded in the majority of these empirical investigations were insufficiently supported.
Limitations and Implications for Research and Practice
Limitations
While the present investigation was delimited by the specific research questions and inclusion and exclusion criteria we applied, our ability to address the goals we set out for this review was hampered by features of this body of literature. As we have characterized in our conclusions, the overall lack of theoretical and methodological information in the articles we reviewed left us unable to identify the features of interventions—participant, measure, or intervention characteristics—that causally resulted in improved critical thinking in higher education. Such a limitation has far-reaching implications for researchers’ ability to understand how critical thinking manifests differently when the characteristics of students or the conditions and context of the intervention vary. These critical shortcomings signal avenues that need to be pursued in future research.
Implications for Research
Based on the conclusions just delineated, we set forth several recommendations for future research into critical thinking interventions, highlighting areas for conceptual refinement and methodological restructuring.
Conceptual Refinement: Ontology of Critical Thinking
Foundational to intervention research should be the careful consideration of the very nature of critical thinking as well as the crucial elements that mark the presence of this valued form of thought (Bailin et al., 1999a; Williams, 1999). Beyond the explicitness and consistency of definitions forwarded in the studies we reviewed, researchers must consider the very nature of those definitions. Currently, the definitions identified in this review largely reflect a cognitive-processing-oriented perspective on critical thinking (e.g., P. A. Facione, 1990a), leaving contentious points regarding the nature of critical thinking unresolved (Alexander, 2023a; Thayer-Bacon, 2001a, 2001b).
Critical-Analytic Thinking
Defining critical thinking is contentious regarding what factors or processes ultimately determine whether thinking can be rightfully judged as critical or uncritical (Alexander, 2023b; Murphy, Ogata, & Schoute, 2023). Because cognitive processes may be executed critically or uncritically (Bailin et al., 1999a, b), casting critical thinking in terms of an unspecified configuration of mental manipulations does not contribute to construct validity or measurement accuracy. Rather, we argue that valued thinking should more appropriately be reconceptualized in terms of critical-analytic thinking, rendering justification as integral to and a necessary condition of thinking critically (Alexander, 2014; Murphy et al., 2014). What has become clear in the present review is that justification was not central to definitions of critical thinking that guided individual studies, and that most measures—particularly multiple-choice measures—did not capture justification as a necessity for a response to count as a manifestation of critical thinking.
Following such a reconceptualization, critical thinking should be researched beyond the acts of analyzing, synthesizing, and evaluating; instead, a student’s resulting answer after engagement in those mental processes should be supported by appropriate evidence—that is, justified (Dewey, 1933). Both guiding definitions and measures should reflect that requirement. Conceptually, a pertinent definition of critical thinking should guide research within a domain. Operationally, externalization of that examination and evidence for a final position would be crucial (Alexander, 2014, 2023b). By extension, as we have discussed in detail elsewhere (Murphy, Ogata, & Schoute, 2023), the grounds upon which something merits a judgment of “valued” thought demand serious reconsideration beyond normative, post-positivist standards that have dominated critical thinking research (Biesta & Stams, 2001).
Critical Thinking and Critical Thinkers
Moreover, there is an urgent need to distinguish the act of critical thinking from the critical thinker. Many typifications of critical thinking hinge on ideal behaviors or mental processing exhibited routinely by a specific person, regardless of the context or problem. This typification is the basis for critical thinking dispositions that are woven into predominant conceptions of critical thinking (P. A. Facione, 1990a). Specifically, Facione and colleagues (P. A. Facione et al., 1996) positioned dispositions as consistent motivation or as traitlike characteristics of thinkers. Conceptualized as such, it seems unlikely that any relatively short-term interventions would produce any significant or enduring changes in participants. Thus, positioning students along a continuum of being likely to engage in critical thinking or not is unlikely to benefit intervention research intended to bring about durable changes in either the process or the person (Murphy, Ogata, & Schoute, 2023). However, if there are reasons to assume that critical thinking dispositions are malleable and important to critical thinking skills (e.g., P. A. Facione, 2000), then the body of critical thinking literature would benefit from empirical studies that adopt or develop measures of dispositions that go beyond self-report to provide more objective evidence of the development of or change in critical thinking dispositions. What is more, given that dispositions are cast as a means to an end—to engage more often in more rigorous critical thought—disposition measures could be paired with measures of demonstrated critical thinking to elucidate the causal mechanism between critical thinking skills and dispositions for which the extant empirical research provides mostly correlational evidence (e.g., Y. C. Yang & Chou, 2008).
In our view, critical thinking interventions would benefit from a focus on malleable attributes such as relevant prior knowledge (Alexander & Schoute, 2022), motivation for engagement (e.g., personal interest; Dewey, 1913), or the cognitive or metacognitive capacities that underlie such valued thought (Kuhn, 1999). Further, critical thinking would seem to require epistemic competence, which is the ability to understand what kind of evidence and justification are sufficient and necessary for a claim to be duly substantiated. An awareness of how such evidence and justification may vary by domain and context would be another key to epistemic competence (Alexander & DRLRL, 2012). Fostering such competence would simultaneously be an entry point to enact the call to liberate critical thinking (see Murphy, Ogata, & Schoute, 2023) by reevaluating the normative constraints and historically biased criteria by which critical thinking is commonly judged (Biesta & Stams, 2001; Marshall, 2001).
Critical Thinking Across Contexts
A third ontological consideration relates to the domain-specificity and transferability of critical thinking. As with influential critical thinking theorists (e.g., Ennis, 1989), most researchers conducting the interventions we reviewed seemingly operated under the assumption that critical thinking would transfer spontaneously from their domain-specific interventions to performance on domain-general measures or different contexts (e.g., Rababa & Masha’al, 2020). Yet, conflicting evidence from this review (e.g., Bellaera et al., 2016; Tiruneh et al., 2016) as well as external sources (e.g., van Peppen et al., 2022) suggests that the “transferability of critical thinking” debate is far from settled (Alexander, 2023b; Murphy, Ogata, & Schoute, 2023). Thus, researchers should expressly focus on the conditions under which and for whom critical thinking can be leveraged across domains and contexts by students. Particular attention should be directed toward the role of prior knowledge and experiences (Alexander & Schoute, 2022) and relational reasoning ability that is foundational for recognizing meaningful associations across tasks or contexts (Alexander & DRLRL, 2012).
Methodological Restructuring: Toward Seamless Assessment
Across the reviewed articles, shortcomings were identified pertaining to the definitions and measures of critical thinking and the interventions that targeted such thinking. What these shortcomings indicated was a need for researchers to strive toward seamless assessment (Young et al., 1997), where concepts, interventions, and measurements are appropriately aligned. Such seamless assessment begins with clear, defensible definitions that can subsequently be operationalized such that the measurement logically extends from and indicates the type of thinking the researcher values. Then, the interventions that are implemented in educational settings should expressly target students’ critical thinking as it is defined and measured.
Valid Measurement
For researchers to devise measures that validly assess critical thinking, deliberation on its ontology cannot be bypassed. Yet, regardless of the validity of the definitions researchers use, they need to delineate observable indicators of critical thinking that accurately reflect their definitions (Williams, 1999). Importantly, both test-makers and researchers who use existing measures would need to remember that a measure is not inherently valid but that validity is a process of interpretation (Messick, 1986). Four guidelines we proffer can aid in crafting more valid measurements of critical thinking. First, researchers should focus on the characteristics of the thinking and not the thinkers or their motivations for or dispositions toward critical thinking (Alexander, 2023b). Second, researchers should not rely solely on self-report measures but identify ways to assess demonstrable critical thinking performance (Kleemola et al., 2022; Shavelson et al., 2019).
Third, researchers should consider alternatives to multiple-choice formats where only “correct” answers serve as markers of critical thought. Instead, formulate measures that require students to display their thinking and justify their responses (Lombardi, 2023). Such justifications may require a qualitative or mixed-methods data-analytic approach to validly capture the process of critical thinking as reasoned decision-making, in which students’ reasons for their claims or answers are externalized. Fourth, when reporting the findings of intervention studies, it is essential that researchers consider not only statistical significance but also practical significance (Bakker et al., 2019; Cohen, 1994).
Justifiable Causal Mechanisms
Although few specific recommendations for promoting critical thinking resulted from this review due to shortcomings in the literature, such interventions remain worthwhile given the importance of critical thinking to human functioning (Weinstein, 1991). Thus, one implication we would forward is that a strong theoretical rationale must be offered for how the components of any intervention should hypothetically result in improved critical thinking. For example, if researchers view critical thinking as deliberative and reflective thinking rather than nonintuitive and nonreactive, then their interventions should explicitly target deliberation and reflection. Nonetheless, researchers must still be prepared to weigh alternative explanations for their findings by acknowledging the presence of any extraneous factors that may affect the outcomes (Steiner et al., 2023).
Setting and Student Characteristics
Guided by the concept of seamless assessment, we would like to draw attention to the consideration of the settings in which interventions are conducted and, consequently, the populations that receive the intervention. For instance, it was evident that the vast majority of interventions took place within nursing education. Thus, there is a need to diversify where critical thinking interventions take place, not only in higher education but also at other levels of education. Similarly, we found that they were of relatively short duration, spanning mostly less than one academic semester. Consequently, researchers should invest more in long-term interventions that may advance our understanding of the developmental nature of critical thinking (Kuhn, 1999). Lastly, populations of representatively diverse students should be central to interventions (F. López, 2022). Embracing such diverse populations will enrich our understanding of how critical thinking manifests across identities and intersections of identities (Danvers, 2018) and would allow researchers to craft interventions that are appropriate for such diverse populations (e.g., Larson et al., 2020). Considerations of identity and intersections of identities in the context of critical thinking interventions may be best served by qualitative or mixed-methods approaches, requiring researchers to “embrace additional research methods beyond traditional quantitative approaches” (DeCuir-Gunby & Schutz, 2014).
Closing Thoughts
It is perhaps surprising that we concluded that the state of critical thinking intervention research is worrisome on multiple fronts, particularly given the relative optimism reported in earlier reviews (Abrami et al., 2008, 2015; Tiruneh et al., 2014). However, it is crucial to repeat that these worries are not entirely novel. In fact, a number of issues we identified in this analysis are similar to those issues that McMillan (1987) identified over 35 years ago. In many ways, this body of literature seems to have matured little since its rise during the critical thinking movement (Paul, 1997), with few advances in theory (Alexander, 2023b), reliance on many of the same measures (Murphy, Ogata, & Schoute, 2023), limited uptake on calls for advanced statistical modeling across education research (Steiner et al., 2023), and unsettled debates about critical thinking’s nature and transferability (Ennis, 1989; van Peppen et al., 2022).
Yet, we remain hopeful that this review and the recommendations we forward for future research may illuminate pathways to needed theoretical, methodological, and data-analytic advancements that undergird effective interventions. As we have argued, certain actionable points to improve critical thinking interventions rely on a deep understanding of the meaning—not just meaningfulness—of critical thinking. Such deep understanding inevitably requires researchers to delve into the roots of what became known as critical thinking lest this valuable form of thinking become divorced from its philosophical and psychological ancestry (Murphy, 2003; Murphy, Alexander, & Ogata 2023; Murphy, Ogata, & Schoute, 2023). It is only by understanding critical thinking’s storied past through “alert, cautious, and thorough inquiry” (Dewey, 1933, p. 181) that the education research community can hope to devise measures that capture its true essence and craft interventions that appropriately and meaningfully promote individuals’ ability to think critically.
Supplemental Material
sj-docx-1-rer-10.3102_00346543251352539 – Supplemental material for A Critical Analysis of Critical Thinking Interventions in Higher Education
Supplemental material, sj-docx-1-rer-10.3102_00346543251352539 for A Critical Analysis of Critical Thinking Interventions in Higher Education by Eric C. Schoute and Patricia A. Alexander in Review of Educational Research
Footnotes
Authors
ERIC C. SCHOUTE is an alumnus from the Department of Human Development and Quantitative Methodology at the University of Maryland, College Park, MD, USA; e-mail:
PATRICIA A. ALEXANDER is a Distinguished University Professor and the Jean Mullan Professor of Literacy in the Department of Human Development and Quantitative Methodology at the University of Maryland, College Park, MD, USA; e-mail:
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
