Abstract
Memory research has evolved along two distinct traditions: well-controlled laboratory experiments emphasizing precision and tractability and naturalistic-memory experiments emphasizing generalization to real-world contexts. Although both have yielded important insights, we do not yet have a generalized theory of memory consistently interpreted across laboratory and naturalistic paradigms. By analyzing the strengths and limitations of the two traditions, I propose that formal modeling is the key to creating this theoretical link. A formal theory, instantiated in precise computational models that are developed over decades of laboratory-based experiments, needs naturalistic experiments to test its generalizability and reveal its limitations. Naturalistic experiments, in turn, better connect with existing laboratory paradigms when their results are explained by the same theoretical model. To achieve this, I propose a step-by-step procedure in which naturalistic settings are considered as all possible scenarios that could be realized in the real world, with laboratory settings forming a smaller subset that we have understood well. Our goal as memory researchers is to incrementally expand the scope of existing laboratory studies, theories, and models to account for increasingly naturalistic scenarios, ultimately achieving a generalized theory of memory. Together, the proposed framework no longer views laboratory versus naturalistic approaches as a trade-off to navigate, given their different priorities and methodologies, but considers them both essential in working toward the same goal.
Introduction
Two distinct traditions have shaped the study of memory: One focuses on memory in well-controlled laboratory experiments (Ebbinghaus, 1885) while the other focuses on memory in naturalistic contexts (Neisser, 1982). The former tradition dates back to Ebbinghaus (1885), who pioneered the method to study memory under highly controlled conditions using lists of nonsense syllables. The 20 years following Ebbinghaus’s monograph saw a proliferation of laboratories that began to characterize how people remember under various laboratory conditions, developing fundamental memory paradigms that remain standard in the laboratory today (Kahana, 2012). The rigor of the laboratory approach has generated rich empirical findings on memory and laid a solid foundation for later theoretical developments of memory (Anderson, 1996; Atkinson & Shiffrin, 1968; Tulving & Thomson, 1973).
Despite the success of early laboratory approaches in producing rich groups of theories, one does not know whether they provide meaningful insight into natural-memory behavior (Neisser, 1982). A new wave of memory research emerged that started to emphasize practical aspects of memory and examine memory not just for lists of simple items but also for facts, stories, and personally experienced events (Gruneberg et al., 1988; Neisser & Winograd, 1988). However, as Neisser (1988) cautioned, these early efforts in naturalistic-memory research are “primarily interested in the phenomena themselves” and “came to it without strong theoretical commitments” (Neisser, 1988, p. 2). Neisser argued that it is not enough to simply denounce the old laboratory methods and call for more ecologically oriented alternatives, but we need to answer the question of whether the findings discovered in naturalistic settings are already adequately explained by theories developed from laboratory experiments or whether they represent entirely new concepts. A conversation was started to integrate naturalistic and laboratory approaches during the second Emory Cognition Project Conference in 1985, leaving participants with the optimism that “a new psychology of memory is beginning to take shape – one that will eventually yield theoretically consistent interpretations of both laboratory paradigms and naturalistic phenomena. We do not have that psychology yet, but we are moving” (Neisser, 1988, p. 3).
Forty years after this optimism, have we achieved the new psychology with consistent interpretations across both laboratory paradigms and naturalistic phenomena? There is no doubt that both traditions have continued to flourish and produce important results. Well-controlled laboratory experiments remain the primary approach for refining theoretical models of memory (Kahana, 2012). Yet, there is a lack of effort in testing these theories against more naturalistic and realistic scenarios, missing opportunities to identify gaps in these theories. Meanwhile, we can now capture components of naturalistic memory under unprecedented richness: Participants can go around their daily lives wearing a camera (Cabeza et al., 2004; Jeunehomme & D’Argembeau, 2020; St. Jacques et al., 2008), perceive and recall events from movies or stories (Baldassano et al., 2018; Zacks et al., 2001), and navigate virtual-reality spaces before recalling elements of their experiences (Herweg et al., 2020; J. F. Miller et al., 2013). Although these more recent studies of naturalistic memory provide important theoretical insights, they do not offer the same precision to explain and predict behavior compared with theories developed in well-controlled laboratory environments. Furthermore, it remains underexplored how existing theories based on controlled experiments should be revised to accommodate these naturalistic empirical findings. The current perspective aims to initiate a dialogue and propose steps toward a theoretical unification of the two traditions, particularly emphasizing the role of formal modeling in this integrative process.
A Perceived Tension Between the Two Traditions
Why is there a divide between laboratory and naturalistic approaches? There is a perceived tension between laboratory and naturalistic approaches regarding their priorities. The laboratory approach offers experimental control where only a small set of variables are involved, and experimenters can manipulate specific independent variables while holding others fixed. The laboratory approach strongly emphasizes the internal validity of obtained knowledge (Fig. 1a), which refers to “the approximate validity with which we infer that a relationship between two variables is causal” (Cook et al., 1979, p. 37). Yet the very condition that enables internal validity through strict experimental control is often thought to be at odds with external validity (Dhami et al., 2004), which refers to how much the discovered relationship generalizes across “different types of persons, settings, and times” (Cook et al., 1979, p. 37). The artificiality and the simplicity of traditional experimental design required to secure internal validity make the inferences from the experimental to the real-world, our ultimate interest, difficult (Dhami et al., 2004).

Laboratory and naturalistic approaches as complementary routes to generalizability. Traditionally viewed as a trade-off (a), laboratory approaches emphasize internal validity (the degree to which causal relationships can be accurately identified) and naturalistic approaches emphasize external validity (the ability to generalize findings across different contexts and situations). Here (b), I provide arguments considering both traditions essential for generalizability. Naturalistic approaches study memory closer to the environments to which we want to generalize our theories (green arrow). Experimental control in laboratory approaches facilitates the discovery of fundamental rules that apply across contexts (yellow arrow) and supports the development of cumulative, theoretically grounded knowledge via formal modeling (red arrow).
Naturalistic studies, although lacking the same level of control and precise measurement that laboratory studies provide, offer greater external validity by studying psychological processes in the environments to which we want to generalize our theory (Fig. 1a; Brunswik, 1947, 1952, 1955b). For example, in studies of autobiographical memory, people’s natural-memory behavior is probed as they remember the events of their lives (McDermott et al., 2009). While the complexity of autobiographical-memory studies hinders our ability to isolate and characterize the full range of variables involved, such studies better capture how memory functions in natural contexts. Relatedly, Yarkoni (2022) discussed “the generalization crisis” in traditional laboratory experiments and explored the consequences of failing to consider stimuli and situation variability during statistical inference. Yarkoni (2022, p. 12) emphasized that “If authors intend for their conclusions to hold independently of variation in uninteresting factors, and to generalize to broad classes of situations, there is no good substitute for studies whose designs make a serious effort to respect and capture the complexity of real-world phenomena.” Therefore, depending on the research question, it seems that one must often navigate a trade-off or find a middle ground between experimental control and generalizability (Hasson & Honey, 2012; Yarkoni, 2022), where strengthening one would mean compromising the other. This tension may explain the emergence of two distinct research traditions in human-memory studies, each prioritizing one approach over the other, given their emphasis on internal versus external validity. The next section of this article focuses on addressing this conceptual tension between experimental control and generalizability, arguing that researchers need not trade off one for the other, but that the two can work synergistically toward the same goal.
A Shared Goal of Generalizability
To move toward integrating laboratory and naturalistic approaches, we could view experimental control and naturalistic settings not as mutually exclusive choices but as complementary for achieving a shared scientific goal. Despite their distinct methodologies, both laboratory and naturalistic approaches share the same ultimate goal: developing a generalizable theoretical understanding of memory. Although the artificiality of traditional laboratory experiments is often associated with a lack of external validity or generalizability, their fundamental goal is also to uncover general rules that apply beyond specific experimental settings. Consider Ebbinghaus’s pioneering memory experiments: His use of random syllables in controlled settings was not meant to understand memory for nonsense syllables per se, but to uncover universal principles of memory and forgetting that would apply across contexts, including real-world situations. If the goal is to understand human memory in its most natural and generalized environments, why did cognitive psychologists start with these lab-based settings to uncover generalized theories? The answer lies in recognizing that while lab-based settings may seem distant from real-world environments, their precision and tractability enable generalizability in other important ways. These additional factors of generalizability, which I will discuss in the following section, suggest that the two approaches are complementary rather than opposing strategies to advance our understanding of memory.
Naturalistic approaches study memory closer to the environments to which we want to generalize our theory
Brunswik (1955b) has famously argued that controlling selected variables in traditional laboratory experiments (referred to as systematic design) destroys the naturally existing texture of the environment to which an organism has adapted, limiting the generalizability of findings. Those highly constrained situations are convenient for the experimenter but atypical for the individual. As an alternative, Brunswik (1947, 1952, 1955b) proposed the representative design, in which stimuli in an experiment are sampled in a way that is representative of the organism’s natural environment in terms of their number, values, distributions, and intercorrelations. For example, one way of achieving representative design is through random sampling. In Brunswik (1944), a participant went about their daily routine over a 4-week period in various outdoor and indoor situations. At random intervals, the participant was asked to give perceptual estimates of whichever object they happened to be looking at in that moment. This process was repeated multiple times to collect samples of objects representative of real-world situations. Compared with well-controlled laboratory experiments, the representative design better reflects the actual environment to which researchers aim to extend their findings.
Limitations
Although naturalistic settings are commonly associated with external validity, relying solely on these settings does not guarantee generalizability. Although the representative design aims to sample adequately to capture the natural complexity of the real-world environment, it is a “formidable task in practice” (Brunswik, 1955a, p. 239). As a result, most naturalistic studies instead focus on identifying formal properties of a naturalistic environment and reconstructing them in laboratory settings, an approach known as formal situational sampling (Hammond, 1966). For example, recent naturalistic studies of memory employ movie stimuli, narratives, and social interaction to recreate elements of naturalistic settings that are absent in traditional laboratory experiments (Baldassano et al., 2018; Zacks et al., 2001; Zadbood et al., 2017). While these studies each reveal important aspects of naturalistic environments, they cannot, by themselves, adequately sample and represent the full variability inherent in real-world contexts. Therefore, just as findings from laboratory settings with limited variability in populations, stimuli, or contexts are questioned regarding their broader generalizability, we must also consider the same limitation in any single naturalistic environment because of the large variability from one naturalistic environment to another. Ultimately, a strong theoretical framework is needed to integrate results across diverse naturalistic experiments. A science of naturalistic memory can succeed only if it has “precise techniques for translating observations into a formal language such that the operations of invariant mechanisms can be shown obviously” (Banaji & Crowder, 1989, p. 1188).
Experimental control facilitates the discovery of fundamental rules that apply across contexts
It has been argued that issues of internal validity are chronologically and epistemically antecedent to issues of external validity: asking whether a result applies beyond the experimental circumstances is only meaningful once we have established its validity within those circumstances (Guala, 2003). Experimental control can contribute to a better chance of generalization by discovering a real (causal) relationship between different variables from the original environment. Causality speaks more directly to the fundamental mechanisms researchers seek to uncover and is therefore more likely to remain valid in new contexts; in other words, establishing internal validity is crucial for establishing external validity (Fig. 1b, yellow arrow). In contrast, correlation often reflects spurious patterns that occur only under one specific environment. The latter can be an issue for naturalistic environments, given the large number of variables involved, many of which may not be precisely measured or controlled. This is not to say that internal validity is impossible to establish in naturalistic studies, as there are machine-learning approaches dedicated to learning low-dimensional causally related variables from high-dimensional data (Schölkopf et al., 2021). In fact, the link between causality and generalization is so strong—causal mechanisms tend to persist across different contexts—that methods have been developed using invariance as a property to identify causality (Arjovsky et al., 2019; Heinze-Deml et al., 2018; Peters et al., 2016). For example, if various data sets and environments are characterized by the same relationships, they are likely to be causal relationships (Arjovsky et al., 2019). However, these data-driven approaches require access to large amounts of data collected from multiple environments, which may be beyond the feasibility of current naturalistic studies of memory in psychology.
Limitations
Just discovering empirical relationships with internal validity is not enough; a mature science requires an understanding of what explains the discovered relationships. Newell (1973) cautioned that experimental psychology as a science had primarily dealt with phenomena: Upon discovering a new phenomenon, we explored all possible variations of variables affecting this phenomenon, but what was missing was theoretical unification of these results; that is, explanations of how different variables affect given phenomena and how various phenomena relate to each other (Cummins, 2000; Fried, 2020; Newell, 1973; van Rooij, 2019). Without these explanations, simply examining the generalization of individual empirical findings across different situations and contexts does not add up to a generalizable theoretical understanding of memory.
Formal modeling contributes to a generalized theory of memory by unifying findings across experiments
Formal modeling addresses limitations in both traditional laboratory-based and naturalistic empirical studies. As Newell (1973) argued, to unify disparate empirical findings, we need to construct complete and precise models capable of simulating a range of experiments. Without such models, research can become discovery-oriented research, with hypotheses that are loosely linked with existing theories and that may or may not be supported by empirical data (Oberauer & Lewandowsky, 2019). We cannot have strong confidence in these hypotheses until they have been replicated with large sample sizes and across similar and different contexts. In contrast, theory-testing research generates experimental hypotheses that are strongly motivated by theoretical models; these models must hold if the theory is correct, and if they are not supported by empirical evidence, researchers must call for modification of the theory (Oberauer & Lewandowsky, 2019). Regarding generalization, discovery-oriented research alone provides little confidence that conclusions will extend to new contexts unless explicitly tested in those situations. However, when combined with theory-testing research that precisely formulates core assumptions into formal cognitive models, empirical effects are allowed to accumulate (Jamieson & Pexman, 2020). Each new piece of evidence either supports the current model or contributes to its revision. When a theory has accumulated sufficient empirical evidence, its deductive power becomes so robust that we can not only generate new hypotheses for further testing but also confidently predict its generalization to new contexts without explicitly testing every situation. Such a theory-testing approach should be our method of choice if the goal is to develop a generalizable theory of memory that applies across laboratory and naturalistic settings. Focusing on validating formal theoretical models rather than isolated phenomena allows us to build a theoretical framework with strong generalizability beyond the specific conditions under which it was initially tested.
Past work in formal modeling has contributed to building a generalizable theory of memory. While early verbal theories introduced important theoretical concepts in memory (Carr, 1931; Ebbinghaus, 1885; Galton, 1883), formal models implemented as computational simulations and mathematical equations have been proved useful for formulating precise, testable predictions (Estes, 1955; Howard & Kahana, 2002b; Oberauer & Kliegl, 2006; Raaijmakers & Shiffrin, 1980). To build models of memory, researchers examine a set of existing empirical effects for a memory task and develop a minimal set of model assumptions that can explain these effects, usually involving specifying memory representations and processes underlying stages of memory encoding, storage, and retrieval. A theoretical model then needs to go through stages of refinement by testing its predictions in new situations. Although memory models are typically developed to account for a range of empirical patterns for a single memory task, recent modeling efforts have increasingly focused on uncovering generalizable rules that apply across different tasks or contexts. For example, while the context maintenance and retrieval (CMR) model was initially developed to understand free recall of lists (Howard & Kahana, 2002a; Polyn et al., 2009b), it has been generalized to account for behavioral patterns in serial recall tasks (Logan & Cox, 2021; Lohnas, 2025), free-association tasks (Richie et al., 2023), collaborative free-recall tasks (Angne et al., 2024), as well as a broader range of memory behavior in memory consolidations (Z. Zhou et al., 2024), rewards (Rouhani et al., 2020), and decision-making (C. Y. Zhou et al., 2025). Exemplar-based models (Medin & Schaffer, 1978; Nosofsky, 1986), originally developed for categorization tasks, can account for old/new item-recognition tasks and explain relations between classification and recognition (Nosofsky et al., 2011). A resource-limited theory of memory encoding, as implemented in a mathematical model, can account for word-frequency effects across item recognition, associative recognition, cued recall, and free recall (Popov & Reder, 2020). While separate Bayesian models of memory have explained category effects during memory reconstruction (e.g., single category by Huttenlocher et al., 2000, or hierarchical categories by Hemmer & Steyvers, 2009), Xu et al. (2025) unified them into a single framework.
Limitations
Historically, formal models of memory have primarily relied on well-controlled laboratory experiments, such as list-learning paradigms, to study the encoding and recall of information (Kahana, 2012). The close control over both the selection and timing of stimuli and procedures, along with the precision of their measurements, is critical for making computational modeling tractable. Additionally, the task variables are relatively few and well-specified in well-controlled experiments, allowing for direct comparison of experimental results obtained under different laboratories, a prerequisite for integrating empirical results in a single theoretical model. For instance, the serial position effects in free-recall paradigms—how the recall probability of an item differs as a function of its study position in a list—has been observed across numerous laboratories, enabling researchers to develop and refine explanations in a theoretical model about what affects the shape of the serial position curve (Howard & Kahana, 1999; Ma et al., 2024; Tan & Ward, 2000; Watkins et al., 1989; Zhang et al., 2023). The progress of developing computational models for naturalistic memory is still at an early stage (see examples at Franklin et al., 2020; Lu et al., 2024; Michelmann et al., 2023) as it is challenging to precisely measure the full range of variables in complex naturalistic tasks and to navigate the large number of possible model alternatives. Although simplified laboratory experiments have provided convenience and tractability in building theoretical models of memory, without extending the models to a wider range of scenarios identified in naturalistic studies of memory, we miss opportunities to identify further gaps in the theory.
To summarize, both naturalistic and laboratory approaches aim to build a theory of memory that generalizes across different contexts and environments. Naturalistic approaches study memory directly in the environments to which we want to generalize our theories (Fig. 1b, green arrow), though reliably extracting rules, modeling, and integrating results from these complex settings remains challenging. Although the laboratory approach is traditionally associated with internal validity, it contributes to external validity by uncovering causal relationships that are likely to hold across contexts (Fig. 1b, yellow arrow). It also supports a strong tradition of formal modeling to integrate empirical findings (Fig. 1b, red arrow). However, without directly testing these theoretical models across more naturalistic scenarios, we cannot determine with certainty whether they can successfully account for empirical results in both controlled and naturalistic contexts. The rest of the article outlines the framework and concrete steps for building a unified theory by combining the strengths from both traditions, using formal modeling as the bridge.
An Integrative View Necessary for Theoretical Unification
To facilitate a path toward theoretical integration through formal modeling, it is useful to reconceptualize the relationship between naturalistic and laboratory-based approaches. Existing research may emphasize theoretical developments of either laboratory or naturalistic studies of memory. At one extreme is the optimistic view (Fig. 2a), according to which laboratory settings are considered an ideal abstraction of naturalistic settings. As Tulving (1983) put it, “Words to the memory researcher are what fruit flies are to the geneticist: a convenient medium through which the phenomena and processes of interest can be explored and elucidated . . . Words are of no more intrinsic interest to the student of memory than Drosophila are to a scientist probing the mechanisms of heredity . . .” (p. 146). Although recalling random word lists may seem removed from naturalistic memory scenarios, words have served as ideal abstractions of meaningful information units in memory. According to this perspective, theories or models developed under laboratory settings should theoretically apply well to other contexts or situations, directly assuming external validity from internal validity alone. However, the limitation of the optimistic view is that we disregard the unique contributions of naturalistic settings, potentially missing opportunities to identify further gaps in the theory. At the opposite extreme is the pessimistic view (Fig. 2b), according to which laboratory settings are considered unrepresentative of what takes place in real life: “Conclusions drawn from controlled experimental designs with a limited number of variables may not be valid in real-life behavior” (Shamay-Tsoory & Mendelsohn, 2019, p. 844). Thus, one should focus efforts on investigating memory within naturalistic settings. The consequence of the pessimistic view is that we disregard the role of laboratory settings in contributing to external validity and miss opportunities to connect new findings with existing theories. In addition to the optimistic and pessimistic views, an intermediate view (Fig. 2c) offers a middle ground, acknowledging that laboratory settings have aspects that share features with naturalistic settings, as well as aspects that are artificial and stand orthogonal to naturalistic settings.

Toward a theoretical unification of laboratory and naturalistic approaches. The optimistic view (a) focuses efforts on investigating memory within laboratory settings, seeing them as an ideal abstraction of naturalistic environments. The pessimistic view (b) focuses efforts on investigating memory within naturalistic settings and sees laboratory settings as unrepresentative of what takes place in real life. The intermediate view (c) acknowledges that laboratory settings have aspects that are in common with naturalistic settings as well as aspects that are artificial and stand orthogonal to naturalistic settings. I propose the integrative view (d), which considers naturalistic settings as containing all possible scenarios that could be realized in the real world, with laboratory settings forming a smaller subset that we have understood well. Our goal as memory researchers is to gradually expand the scope of laboratory studies, theories, and models to account for the full range of naturalistic-memory behavior, ultimately achieving a generalized theory of memory. Under the integrative view, we can adopt established formal modeling refinement approaches (in iteratively testing predictions in new situations), using components identified in naturalistic settings to incrementally guide the direction of this refinement process (e).
While the intermediate view helps reconcile the extremes of the optimistic and pessimistic views, I propose an integrative view that can more directly facilitate a research program that unifies laboratory or naturalistic studies of memory (Fig. 2d). Under an integrative view, naturalistic settings refer to all possible scenarios that could be realized in the real world, whereas laboratory settings represent a smaller subset that we have studied and understood well, presumably because of their relative simplicity. Despite their artificial appearance, laboratory conditions remain valid slices of our reality. Our goal as memory researchers, regardless of whether we are from the laboratory or naturalistic traditions, is to gradually expand the scope of laboratory studies and theories to eventually account for the full range of naturalistic memory behavior, thus achieving a generalized theoretical understanding of memory. It ensures that researchers from the laboratory-based approaches can see naturalistic-memory studies as opportunities to test the generalizability of their theories and that researchers from the naturalistic approaches make attempts to tie their findings closely with existing theories. The integrative view has several important features that I will highlight and discuss below.
It is challenging to draw a rigid line between naturalistic and artificial
What, precisely, is a “naturalistic setting”? The integrative view considers naturalistic settings to be all possible scenarios that could be realized in the real world, including aspects of the laboratory settings that may appear artificial. Unlike the pessimistic view and intermediate views, the integrative view avoids rigidly defining what is and is not naturalistic, a line that is difficult to draw (Winograd, 1988). Some definitions are too abstract to provide concrete criteria for judgment: For example, artificial situations are “those that are specifically designed for research” and naturalistic situations are “the target situations to be understood by research” (Hoc, 2001, p. 282). Other approaches have considered a framework in which stimuli, tasks, and behavior can be evaluated on a continuum of simplicity and complexity, where laboratory experiments have a “reductionistic” tendency to simplify the complexity of the real world (Kingstone et al., 2008; Shamay-Tsoory & Mendelsohn, 2019; Sonkusare et al., 2019). However, complexity is also subjective and context-dependent. As Holleman et al. (2020) noted, complexity has often been expressed in strict mathematical terms in physical sciences (Gell-Mann, 1995), but psychologists have used the term loosely, either by describing something’s size, dimension, or variety or by referring to things that are not yet understood well, as in “the brain is too complex for us to understand” (Edmonds, 1995, p. 4). Furthermore, the definition of what is naturalistic versus artificial can be ever-changing, depending on our knowledge of the world. While virtual-reality paradigms are now accepted as realistically simulating naturalistic and real-world experiences, a few decades ago, before familiarity with the technology, the first lab participants using virtual-reality headsets would not have found the experience to be immersive or to resemble their own everyday experiences.
What is considered artificial also reveals important aspects of naturalistic behavior
To gain a full understanding of human memory, we must not dismiss certain experimental paradigms simply because they appear artificial. A complete theory of memory should be able to account for human behavior across all environments, including artificial ones involving tasks like passively viewing information, memorizing lists of random words, or performing simple key presses. In fact, by intentionally removing naturalistic elements from an experiment, researchers can often better characterize the fundamental cognitive constraints that are important in real-world behavior. For example, Hick’s law, a principle with wide real-world applications in interface design (Proctor & Schneider, 2018), was first derived under extremely artificial conditions in which 1 participant made over 8,000 button presses (Hick, 1952). Similarly, our understanding of working memory capacity has been built from studies using discrete items like digits, letters, and words (Cowan, 2001; G. A. Miller, 1956).
A dynamic boundary encourages collaboration between the two research traditions
Most importantly, setting a fixed boundary between what is naturalistic and what is artificial often promotes separation rather than integration between research traditions. For example, many contemporary articles on naturalistic memory start by highlighting elements missing from traditional laboratory-based experiments. These elements should be seen as opportunities to extend theory previously developed in laboratory-based experiments (under the integrative view), rather than as justification to move away from these laboratory-based paradigms (under an intermediate view or a pessimistic view). The integrative perspective encourages collaboration between the two research traditions by recognizing that the boundary between laboratory and naturalistic settings is not fixed but dynamic. What is considered new, naturalistic, and understudied by laboratory studies today could become a staple in laboratory settings tomorrow. For this transition to happen, collaborative efforts from both research traditions are essential: identifying important components from naturalistic settings and systematically expanding existing memory theories (particularly those formulated in formal models) to incorporate these new phenomena. For example, while early memory work focused on the memory of random materials, Bartlett (1932) introduced more naturalistic approaches for studying memory by examining the role of prior knowledge, or schema, in story recall. Though revolutionary at that time in introducing aspects of naturalistic memory overlooked by traditional laboratory methods, the concept of schema has been effectively assimilated into traditional laboratory approaches since then, with a large body of research these days investigating the role of schematic knowledge on episodic memory both empirically (Popov et al., 2019; Tompary & Thompson-Schill, 2021; Tse et al., 2007) and computationally (Hemmer & Steyvers, 2009; Huttenlocher et al., 2000; Zhang, 2022). Thus, the integrative view reduces the artificiality of laboratory experiments over time, not by removing nonnaturalistic elements, but by progressively expanding its coverage to incorporate more elements important in real-world settings.
Concrete Steps Toward Theoretical Integration Through Formal Modeling
Under the integrative view, I propose a step-by-step procedure to achieve theoretically consistent interpretations across both naturalistic and laboratory memory studies through formal modeling (illustrated in Fig. 2e). This procedure adopts the established formal modeling-refinement approaches (i.e., in iteratively testing predictions in new situations), using components identified in naturalistic settings to guide the direction of this refinement process.
First, we begin by identifying key components within naturalistic settings that current memory theories or models may not adequately address. Second, we incrementally increase the complexity of controlled laboratory experiments, maintaining experimental control while including these newly identified components of naturalistic settings, making it tractable for formal modeling to be applied. Finally, we extend existing theoretical models previously developed under well-controlled laboratory experiments, with minimal adjustments, to account for the data emerging from these more complex experiments. Successful generalization of the existing theory in this extension provides strong support for the theory, whereas any failure to generalize pinpoints specific areas where theoretical revision is needed. This step-by-step procedure provides a tight link between existing theory and newly added naturalistic components, making it clear what model mechanisms generalize and what additional mechanisms are needed. When this procedure is applied iteratively, one can gradually expand the scope of the existing laboratory studies, theories, and models to account for increasingly naturalistic scenarios (see the integrative view in Fig. 2d), ultimately achieving a generalized theory of memory. Several key characteristics of the proposed framework will be further discussed below.
Identifying components from naturalistic studies that may challenge existing theory
Researchers from laboratory-based traditions typically test their computational models in contexts similar to those that initially informed their theories. It would be a more productive practice, however, to intentionally seek scenarios that might challenge or “break” existing theories. This is where naturalistic-memory research proves valuable; researchers using naturalistic approaches actively identify aspects of memory that are underexplored in traditional laboratory paradigms, which have the potential to reveal limitations of existing memory theories. Many aspects of naturalistic memory remain understudied in controlled laboratory settings, which poses challenges for theoretical unification between laboratory and naturalistic approaches. First, everyday memory experiences often involve continuous interactions with environmental cues and other individuals. Yet the majority of controlled laboratory experiments have participants complete memory tasks in isolation. Recent cognitive research has begun to identify how an individual’s memory is altered by external aids (Cornell et al., 2024; Martin et al., 2022; Niforatos et al., 2017; Sparrow et al., 2011) and collaborative settings (Rajaram & Pereira-Pasarin, 2010; Weldon et al., 2000). For example, using a smartphone to replay rich cues from daily life can enhance recall of past events (Martin et al., 2022). When people can look up information online, they better remember where to access the information instead of the information itself (the “Google effect”; Sparrow et al., 2011). When individuals collaborate during recall, it can lead to forgetting and increased memory errors (Rajaram & Pereira-Pasarin, 2010). Second, controlled laboratory experiments primarily rely on simplified stimuli, such as word lists, whereas real-world information is rich and continuous, as seen in movies and narratives (Lee et al., 2020). Studies using these naturalistic stimuli, such as having participants recall an episode of BBC’s “Sherlock”, reveal how people segment continuous experiences into discrete events and how these events exhibit a nested hierarchical structure in the brain (Baldassano et al., 2017; Zacks et al., 2001). It remains a challenge to build a theoretical model that can capture memory recall across both simplified discrete stimuli and complex continuous narratives. Third, traditional laboratory memory experiments explicitly probe participants’ memories, providing clear instructions on when to encode and retrieve episodic memories. In real-world contexts, individuals have agency over what and when they learn (Shamay-Tsoory & Mendelsohn, 2019) and can choose whether to retrieve something based on its necessity (Lu et al., 2022). Recent research has examined scenarios in which participants took self-guided museum tours while wearing a camera, learning information at their own pace and by their own choices (St. Jacques & Schacter, 2013). Fourth, real-world memory can operate on much longer time scales than those typically studied in laboratory settings. For instance, over the course of 50 years, Bahrick (1984) traced the long-term forgetting function of Spanish words people studied at various times of their lives and found that the rate of forgetting dropped to zero after 5 years of learning. Finally, we should investigate not just how people recall the past, but also what they use the past for (Neisser, 1982). A growing number of studies show that people draw on their past experiences to guide decision-making (Hornsby & Love, 2022; Zhao et al., 2022; C. Y. Zhou et al., 2025), construct future plans (Mattar & Daw, 2018; Ólafsdóttir et al., 2018), and summarize information (Angne et al., 2025).
An incremental approach to generalizing existing theories to increasingly naturalistic settings
The components identified from naturalistic memory settings present both a challenge and an opportunity to validate and revise existing theories developed from traditional laboratory approaches. Formal models of memory have proven useful in providing theoretical unification of various empirical effects observed in traditional laboratory experiments. A critical step toward bridging laboratory and naturalistic studies of memory is to extend these existing formal models to capture the additional components identified through naturalistic approaches. One may wonder why we do not start directly with a model of naturalistic settings. While there may be instances in which building entirely new models becomes necessary to capture complex, naturalistic memory tasks, this should occur only after thoroughly exploring and exhausting possibilities with existing models to ensure that the phenomena are not already adequately explained by current theories. We should prioritize an approach that incrementally extends our current theoretical frameworks, not only because it is simpler than building entirely new models or concepts but also because these theories, which have successfully unified results from various laboratory experiments, are expected to generalize to, and withstand testing in, more complex environments. Any failure to generalize in these richer environments would not only highlight the significance of newly identified naturalistic components but also provide crucial constraints for refining the existing theory of memory.
I will highlight several examples from the recent memory literature to demonstrate the proposed step-by-step approach under the integrative view. These studies bring aspects of naturalistic memory incrementally into laboratory settings and subject them to formal modeling to provide unification with existing theories (as illustrated in Figs. 2d and 2e).
Real-world environmental statistics
Information in real-world environments, such as news articles, email, and tweets, tends to reappear with statistical regularity, often following a power-law function in which recent items are more likely to reappear (Anderson & Schooler, 1991; Anderson et al., 2023). Traditional memory experiments, however, typically deviate from these natural statistics, presenting stimuli randomly or at equal spacing. An influential theory, the rational anlaysis of memory, proposes that the statistical patterns of the natural environment have shaped human memory to produce the classic forgetting curve observed in these controlled lab experiments (Anderson, 1990). It remains unknown whether this rational principle holds when memory experiments themselves use stimuli that follow naturalistic statistical patterns. A study by Anderson et al. (2026) directly tested this by creating a more naturalistic continuous-recognition experiment. The presentation order of word stimuli in the recognition experiment was matched to the order of tweets from a real-world Twitter data set, thus embedding natural environmental statistics into a controlled laboratory task. This naturalistic condition was compared with two other conditions using typical laboratory-based environmental statistics: randomly sampled stimuli and equally spaced stimuli. Crucially, Anderson et al. (2026) showed that a single computational model, using only the history of each stimulus’s appearance, could accurately predict memory performance across all three conditions. These findings support the theoretical claim that human memory rationally adapts to the statistical structure of its environment, regardless of whether that structure is naturalistic or artificially controlled.
Rich and complex stimuli
Formal models of memory have primarily been developed using highly simplified perceptual stimuli, such as words or abstract shapes. Although these models are useful in unifying a range of empirical results and revealing mechanisms underlying encoding and recall, it remains a question whether they generalize to rich and high-dimensional real-world material. To address this, Meagher and Nosofsky (2023) applied a model of recognition and categorization, which has been successfully used to predict old/new recognition for simplified perceptual stimuli, to an experiment in which the stimuli consisted of a set of high-dimensional rock images. Their key hypothesis was that the fundamental cognitive mechanisms for recognition judgments are the same for both simplified and naturalistic stimuli: An item is judged as old or new based on its summed similarity to all individual items stored in memory. The only difference lies in the complexity of the stimulus representations on which these similarities are calculated. Confirming this, Meagher and Nosofsky (2023) embedded rock images in a high-dimensional feature space using multidimensional scaling and demonstrated that a hybrid-similarity exemplar model accounted well for recognition behavior in their experiment, similar to the way the same model explained behavior for simplified perceptual stimuli, like color patches (Nosofsky & Zaki, 2003). Compared with machine-learning approaches that can directly map complex images to recognition behavior (Bylinskii et al., 2022), extending an existing cognitive model of recognition incrementally to incorporate new stimuli contributes to a unified theory that supports recognition performance for both simplified and real-world, high-dimensional stimuli.
Interaction with environmental cues
Our memory in daily lives constantly interacts with information in the environment, like notes, or Google Calendar. Yet our current understanding of human memory is predominantly based on controlled laboratory experiments in which participants engage in memory tasks without any external reference or help. To better reflect these naturalistic scenarios in laboratory-based settings, Cornell et al. (2024) conducted a modified free-recall experiment in which participants tried to recall as many items as possible from a studied list. In a typical free-recall experiment, the recall period ends after a fixed amount of time; when participants in this study could not remember any more items, however, they pressed a button to receive an external cue as a reminder. To account for participants’ memory behavior after receiving the reminder, Cornell et al. (2024) extended a computational model of memory search that had previously successfully explained free recall behavior without interaction with external cues (Howard & Kahana, 2002a; Polyn et al., 2009b). The key additional model assumption that links the memory-search process with or without reminders is whether the next recall is driven by the context of the reminder or by the context of the preceding recall, with all other mechanisms and parameters of the model shared between the two scenarios. Using parameters fitted from a standard free-recall experiment without reminders, the model accurately predicted memory behavior in the reminder condition over a new group of participants. The model could also distinguish in real time, based on preregistered model parameters, which reminders are the most effective to deliver in order to improve memory. These findings provide a unified understanding of the cognitive mechanisms that drive memory search with or without interaction with external cues. Extending an existing model to capture the increasingly naturalistic scenario with reminders also increases the real-world applicability of the theory.
Sequential decision-making
Research on how people search their memories has largely been examined under controlled laboratory conditions, where memories are directly queried by experimenters. In real-world contexts, however, memory does not function in isolation but supports goal-directed behavior in other cognitive tasks. To better understand how memory guides sequential decision-making, Hornsby and Love (2022) analyzed a large data set of online grocery purchases and examined how people decide what item to buy next based on options generated from their long-term memory. This consumer-choice data set offers increased naturalism by examining memory’s role in real-world decision-making while maintaining tractability for computational modeling, as the human responses are lists of discrete grocery items, similar to responses in typical list-learning paradigms. Past modeling work has characterized how people search for information in their episodic memory (e.g., recalling items from a previously studied grocery list; Raaijmakers & Shiffrin, 1980) versus their semantic memory (e.g., recalling all grocery items that fall into the “vegetable” category; Abbott et al., 2015). Building on this, Hornsby and Love (2022) developed a computational model that predicts these sequential choices by proposing a two-stage process. First, people search their memories for available options using a combination of cues from episodic and semantic memory. Second, they examine the relevance of each retrieved option against their internal goals. Extending previous models of memory search to capture sequential choices provides a common framework for understanding memory search in the lab and for seeing how these basic mechanisms shape decision-making in real-world, goal-directed tasks.
Formal modeling provides a strong link between naturalistic and laboratory studies
Both laboratory and naturalistic approaches aim to develop a generalized theory of memory. Laboratory approaches emphasize internal validity, often assuming their results generalize across more complex scenarios without explicitly testing this assumption. Naturalistic approaches address this gap by examining memory processes in naturalistic settings, often drawing conclusions about whether what we know from traditional laboratory paradigms generalizes to naturalistic paradigms (Griffiths et al., 2016), or whether they reflect fundamentally different processes between traditional laboratory paradigms and naturalistic paradigms (Roediger & McDermott, 2013). Given the complexity of studying behavior in naturalistic environments, conclusions like these are challenging to reach without explicitly formulating them into precise computational models. While similar results across settings reasonably suggest shared underlying mechanisms (Griffiths et al., 2016), different behavioral outcomes do not necessarily imply fundamentally distinct cognitive processes. For example, Roediger and McDermott (2013) suggested that laboratory events may be fundamentally different from memory for events of one’s life, and they may even be considered as “two types of memory.” Their arguments rest on two observations. First, behavioral evidence demonstrates a dissociation: Individuals with highly superior autobiographical memory (HSAM) are superior in autobiographical memory, but average in remembering laboratory events (Patihis et al., 2013); mnemonists or memory athletes demonstrate excellent performance in laboratory-like memory tasks, such as encoding and retrieving a long list of items, but do not have abilities like HSAM individuals in autobiographical memory (Maguire et al., 2003). Second, neuroimaging evidence from meta-analyses reveals that different brain networks are involved during a laboratory memory task than when people are asked to remember their life events (McDermott et al., 2009). While compelling, this reasoning potentially conflates behavioral and neural differences with differences in underlying cognitive mechanisms. Without knowing what underlying cognitive mechanisms correspond to these behavioral or neural differences, we cannot conclusively determine whether laboratory and naturalistic paradigms engage truly distinct memory processes or simply reflect different manifestations of shared underlying systems.
Extending an existing model of memory from laboratory settings to incrementally more naturalistic settings (as illustrated in Figs. 2d and 2e) can aid in this reasoning, because we can clearly distinguish the mechanisms that are carried over and generalized from the lab-based settings and the mechanisms that are extensions to account for the new results in naturalistic settings, after which we can evaluate whether the necessary adjustments to the model reflect fundamental differences between the laboratory and the naturalistic paradigms. An example of this approach comes from recent theoretical modeling work on collaborative memory (Angne et al., 2024). In real-world social contexts, people frequently recall information in groups rather than in isolation, which is the typical setup in traditional laboratory studies. We might want to know whether our memory-search process functions fundamentally differently during social interactions compared with recalling information alone. Empirical evidence suggesting potential differences includes the counterintuitive collaborative-inhibition effect, in which groups recall less information collectively than the same number of individuals recall separately (Kelley et al., 2012; Rajaram & Pereira-Pasarin, 2010; Weldon et al., 2000). Although several verbal theories have been proposed to explain collaborative inhibition (Basden et al., 1997; Hyman et al., 2013), these explanations have not been formally connected to existing theories of memory developed under laboratory conditions where there is no collaboration. This theoretical gap leaves open the question of whether collaborative and individual recall engage fundamentally different cognitive mechanisms or reflect variations of the same underlying processes.
To address this question, Angne et al. (2024) connects both literatures under the same theoretical framework by extending a model of individual recall, the CMR model, to capture collaborative-recall processes. The CMR model has successfully explained various individual-recall patterns (Howard & Kahana, 2002a; Polyn et al., 2009a) by theorizing how items become associated with different states in a context space and are subsequently retrieved from this space. Critically, with minimal model adjustments, it was shown that the same fundamental processes in the model govern how people search their memory individually and collaboratively. In both cases, each new recall is influenced by the context of the previous recalls (e.g., after recalling “apple,” one is more likely to recall contextually similar items like “banana”). The key difference is that in individual recall, one’s retrieval process is driven solely by the context of their own previous recalls, whereas in collaborative recall, retrieval is additionally influenced by the context of others’ recalls through the same context-updating process. By applying model parameters that were fitted to the individual-recall condition, the extended model successfully predicted collaborative-recall behavior (Angne et al., 2024): As recall unfolds, minds (contexts) within a collaborative group become more aligned or synchronized with each other, and thus individuals miss opportunities to recall unique information that others may not have considered, giving rise to the collaborative-inhibition effect. This modeling approach not only provides an intuitive explanation of the empirical results observed in collaborative recall but also unifies these findings under the same theoretical framework that explains individual-recall behavior. This unification supports the important role of context as a shared mechanism across individuals and group settings and offers precision in linking cognitive processes underlying laboratory versus naturalistic paradigms.
Connection to Related Approaches
Verbal theories versus formal modeling
Advocating for formal modeling in this article is not intended to dismiss or replace the development of verbal theories. In fact, verbal theories often precede and guide the formulation of formal models. The primary advantage of modeling lies in its ability to add precision to an initially verbally formulated theory and better connect it with hypotheses and data. The idea that formal models can serve such a bridging function is not new. Computational modeling forces researchers to explicitly document their assumptions and remove ambiguity, a process that helps “safely remove a theory from the brain of its author” (Guest & Martin, 2021, p. 2). Computational modeling also makes it clear the kinds of experimental data that would validate or invalidate a given theory, however intuitively compelling it is (Hintzman, 1991). Despite the many advantages of modeling, it is also worth acknowledging that not all areas of memory research have carried a long tradition of formal modeling, and it is possible to formulate verbal theories that serve similar purposes. Researchers can always strive to articulate their theories more precisely. A formal theory does not have to be expressed in fully computational and mathematical terms; it can exist in various levels of abstraction (Oberauer & Lewandowsky, 2019). For example, adding precision to a theory can mean implementing the theory as a computer program that simulates detailed learning behavior (Pavlik & Anderson, 2008); it can also mean creating a diagram between several variables to make their causal relationships more explicit (Glymour, 2003).
Opportunity to integrate with neuroscience approaches
In recent years, neuroscience has played an increasingly important role in studying naturalistic memory. More naturalistic paradigms can push the brain through a wider range of states, allowing researchers to identify brain function and organization that was not possible before (Lee et al., 2020). For example, neuroimaging studies of naturalistic memory have uncovered how the brain segments continuous experiences into events (Baldassano et al., 2017; Ben-Yakov & Henson, 2018), represents narrative information (Lerner et al., 2011; Nguyen et al., 2019), and supports communication from one to another (Nozawa et al., 2016; Zadbood et al., 2017). While this growing body of research has reflected our excitement about how naturalistic-memory paradigms can yield new insights into the brain (Jääskeläinen et al., 2021), less emphasis has been placed on how these neural findings facilitate an integrated understanding of memory in both laboratory-based and naturalistic settings. A fruitful future direction would be for neuroimaging studies to simultaneously characterize both shared and distinct brain mechanisms across laboratory-based and naturalistic paradigms and, when brain activations differ, to systematically interpret whether these differences reflect methodological choices such as test format, richness of sensory inputs, or fundamentally distinct memory processes. Furthermore, neuroscience and formal modeling are not competing approaches; they can be integrated to serve the same goal. In the emerging field of model-based cognitive neuroscience (Turner et al., 2017), neural data can help validate a model’s mechanisms in ways that behavioral data alone cannot, while a model can guide the interpretation of neural differences by pinpointing underlying cognitive processes. Although the examples of formal modeling in this article are primarily based on behavioral data, one could use neuroimaging and formal modeling concurrently. For instance, in the example of collaborative recall (Angne et al., 2024), the model predicts that mental contexts within a group become more aligned. Integrating this model with neuroimaging in the future might reveal that the widely observed brain synchronization during social communication (Nozawa et al., 2016; Zadbood et al., 2017) could correspond to these synchronized mental contexts.
Alternative approaches in building generalized models
This article shares a similar goal with the concurrent work by Carvalho and Lampinen (2025): to build a generalized theory of cognition by modeling both simple lab experiments and complex naturalistic behaviors. However, we take fundamentally different, and complementary, paths to get there. Inspired by approaches and practices in machine learning, Carvalho and Lampinen (2025) proposed building complex task-performing neural-network models that work simultaneously in many tasks, trained over as wide a variety of naturalistic settings as possible. This is in contrast to the present work, which starts with existing theory developed over simple, well-controlled laboratory experiments, and carefully extends them, adding one naturalistic element at a time. Although the incremental approach may appear less ambitious, it provides a tight interpretable link to existing theory at every step of increased naturalism, making it clear what model mechanisms generalize and what additional adjustments are needed. The complex models built by Carvalho and Lampinen (2025) also capture both simple and naturalistic behaviors, but the kind of theories that could be derived and reduced from such complex models would look very different from ones that are built in a bottom-up, incremental manner. Ultimately, there is no single right answer for what makes a good theory. The solution cognitive science seeks could be similar to those in machine learning, driven by explanations reduced from complex, task-performing models that predict a range of behavior. Alternatively, it could resemble solutions in physics, where a small set of core principles, developed in well-controlled laboratory settings, are incrementally refined and tested over more naturalistic, real-world settings. This article argues for the latter path, believing that our goal is to “understand how many apparently diverse empirical phenomena can arise from a small set of basic principles” (Hintzman, 1991, p. 52).
Conclusion
Our goal as memory researchers, regardless of tradition, is to develop a theory of memory that generalizes across different contexts and situations. Although naturalistic environments are rich in the representativeness of stimuli, experimental variables, and situations, an exhaustive test of a phenomenon across more and more complex and diverse settings does not automatically lead to a generalized theory. Similarly, while we can conclude the causality of an empirical relationship from well-controlled laboratory experiments, which are likely to be fundamental, these relationships are not commensurate unless we commit to developing and testing formal theories that can accumulate knowledge across these experiments. In many ways, the challenges identified in the current work reflect the broader “theory crisis” in cognitive psychology (Borsboom et al., 2021; Jamieson & Pexman, 2020; Oberauer & Lewandowsky, 2019), where coherent psychological principles are needed to tie empirical records together.
Beyond emphasizing the role of formal theory construction, the current work proposes concrete steps to link laboratory and naturalistic traditions via formal modeling. Naturalistic studies of memory are valuable in identifying components that challenge and guide further development of existing memory theories, thus evaluating their generalizability; formal models are used to incrementally extend existing theories to account for novel findings in naturalistic studies, thereby establishing explicit connections between these findings and existing theoretical frameworks. This incremental approach makes it clear which mechanisms generalize and what adjustments are needed, answering the question of whether the findings discovered in naturalistic settings are already adequately explained by existing theory or whether they represent entirely new concepts (Neisser, 1988). Critical to the collaboration of the two traditions is to reconceptualize well-controlled laboratory experiments not as artificial or opposed to the goal of naturalistic experiments, but as intermediate stages in expanding our theoretical understanding to incorporate increasingly naturalistic scenarios.
In conclusion, this article has proposed a framework to unify laboratory and naturalistic approaches in memory research, where naturalistic data serve to constrain and build a more generalized theory of memory, and formal modeling better connects naturalistic findings with existing theories. Beyond its theoretical contributions, unifying laboratory and naturalistic approaches will also contribute to our ability to develop a theory-driven way to improve memory in the future. The current disconnect between theoretical work and practical applications has arisen, in part, from the tradition of developing memory theories and models under highly controlled laboratory environments. Bridging our knowledge from well-controlled laboratory environments to naturalistic ones simultaneously bridges the gap between our theories and the situations in which they are most relevant to be applied.
Footnotes
Acknowledgements
I would like to thank Vencislav Popov, Christopher Baldassano, Gregory Cox, Hongmi Lee, Richard Shiffrin, Jacob Feldman, Pernille Hemmer, and Karin Stromswold for helpful discussions, as well as anonymous reviewers for their valuable feedback.
Transparency
Action Editor: Zhicheng Lin
Editor: Arturo E. Hernandez
