Abstract
Objective:
The aim of this manuscript is to provide a review of contemporary research and applications on dynamic decision making (DDM).
Background:
Since early DDM studies, there has been little systematic progress in understanding decision making in complex, dynamic systems. Our review contributes to better understanding of decision making processes in dynamic tasks.
Method:
We discuss new research directions in DDM to highlight the value of simplification in the study of complex decision processes, divided into experimental and theoretical/computational approaches, and focus on problems involving control tasks and search-and-choice tasks. In computational modeling, we discuss recent developments in instance-based learning and reinforcement learning that advance modeling the processes of dynamic decisions.
Results:
Results from DDM research reflect a trend to scale down the complexity of DDM tasks to facilitate the study of the process of decision making. Recent research focuses on the dynamic complexity emerging from the interactions of actions and outcomes over time even in simple dynamic tasks.
Conclusion:
The study of DDM in theory and practice continues to be a priority area of research. New research directions can help the human factors community to understand the effects of experience, knowledge, and adaption processes in DDM tasks, but research challenges remain to be addressed, and the recent perspectives discussed can help advance a systematic DDM research program.
Application:
Classical domains, such as automated pilot systems, fighting fires, and medical emergencies, continue to be central applications of basic DDM research, but new domains, such as cybersecurity, climate change, and forensic science, are emerging as other important applications.
Keywords
Introduction
Dynamic decision-making (DDM) research traditionally involves the study of complex tasks that are represented in computer simulations, often called microworlds (Brehmer & Dorner, 1993; Gonzalez, Vanyukov, & Martin, 2005; Gray, 2002). Such tasks include commanding a group of firefighters in an unknown environment (e.g., Brehmer & Allard, 1991; Omodei & Wearing, 1995), determining the procedures to follow in emergency situations (e.g., Joslyn & Hunt, 1998), and managing scarce resources under time constraints and workload (Gonzalez, 2004, 2005), among others. From this research, we have learned that making decisions in complex dynamic tasks is very challenging for humans. For example, people do not always improve their decisions with practice in a task (Brehmer, 1980), and their performance may remain suboptimal even with full and immediate feedback, unlimited time, and high performance incentives (Diehl & Sterman, 1995; Sterman, 1994). People are generally poor at handling systems with long feedback delays (Brehmer, 1992; Sterman, 1989), and they have difficulty learning in situations involving environmental constraints, such as workload and time pressure (Gonzalez, 2004, 2005; Kerstholt & Raaijmakers, 1997). Unfortunately, highlighting suboptimal performance and the poor strategies humans use in these tasks does not give insights on how people actually make decisions and the basic processes involved to be able to improve decision making (Gonzalez, Lerch, & Lebiere, 2003; Hotaling, Fakhari, & Busemeyer, 2015).
Identifying the boundaries of decision making in complex, dynamic tasks is only a motivation toward understanding how these difficulties emerge. A revelation of years of research with DDM tasks that are structurally complex (i.e., they consist of a large number of alternatives, high time constraints, and high uncertainty) and tasks that are structurally simple (i.e., they have few alternatives, no time constraints, and little uncertainty) is that dynamics and complexity of human behavior exist even in tasks that appear simple. Simple tasks can have dynamic complexity, which emerges from the relationship between choices and their effects over time, from the sequential nature of these interdependencies, and from the various lags between actions and their effect on the environment (Sterman, 1989; Gonzalez, 2017). Furthermore, dynamically complex tasks that are structurally simple are very common in many daily life situations. For instance, a diabetic patient may have difficulty controlling the different speeds and delays of choices he or she makes even when the alternatives are limited (e.g., taking insulin, taking some sugar by mouth or drink; Brunstein, Gonzalez, & Kanter, 2010); a witness viewing a police lineup may be influenced by the order in which the pictures are presented and by the time spent observing each individual picture, even when there are limited options in the lineup (Wells & Olson, 2003); and searching for the best partner is influenced by whom we meet and spend time with, before we make a marriage decision (Todd, 1997). The study of dynamic complexity helps in understanding basic decision-making processes, and it is perhaps the most important source of difficulty in DDM.
The simplification of dynamic tasks has occurred not only in laboratory studies but also in theoretical developments. Interests have emerged in using computational representations of cognitive processes to elucidate the cognitive mechanisms by which people make decisions in dynamically complex tasks (Busemeyer, 2002; Busemeyer & Pleskac, 2009; Busemeyer & Townsend, 1993; Dienes & Fahey, 1995; Gibson, Fichman, & Plaut, 1997; Gonzalez et al., 2003). Although existing models differ in many aspects, they highlight common processes in DDM: learning from exploration, experiential-based decisions, sequential search for information and search through alternatives, and feedback processing and delays. Recent and future research should make connections between these processes and the behavior in dynamically complex tasks.
In what follows, we provide a brief review of laboratory studies and theoretical developments focusing on those that contribute explanations and formalization of decision-making processes in dynamic environments. We start with a definition of DDM, present a synthesis of recent experimental results, review major theoretical advancements and computational approaches, and end with discussion of applications and trends in DDM research.
DDM: A Continuum of Dynamics and Complexity
Early definitions of DDM rely on a distinction between static and dynamic decisions (Edwards, 1954; 1961; 1962; Rapoport, 1975; Toda, 1962). Static decisions are characterized by a single choice and are often conceptualized as linear processes—one observes explicit alternatives and makes a decision but cannot learn from the consequences of those decisions (Gonzalez, 2012, 2013; Rapoport, 1975). Alternatives in typical static decisions are often described by probabilities and likelihoods. A choice between an alternative that gives $3 for sure and one that gives $4 with probability 0.8 and $0 otherwise is an example of a static decision.
Dynamic decisions, in contrast, involve a sequence of choices made in an environment that can change exogenously or as a function of previous choices and where decisions are sequentially linked to each other through their effects so that an action at a specific time directly or indirectly influences future actions (Brehmer, 1992; Busemeyer, 2002; Edwards, 1962; Gonzalez, 2017). Consider our previous example on searching for the best partner. Whether or not we continue to see a person affects our chances to meet a better/worse candidate.
However, DDM exists as a continuum of dynamics and complexity. As described in Edwards’ (1962) taxonomy of DDM, dynamic environments may involve various degrees of change, where alternatives may vary independently from external events or endogenously (as a result of decisions made previously). Dynamic environments vary in their inclusion of delayed feedback, interlinked actions and their effects over time, and time dependence, where the value of actions is determined by when an action is taken. The accumulation of these characteristics makes an environment dynamic and complex to different degrees. Although not all DDM tasks involve all of these characteristics, every dynamic decision task must involve a series of choices taken over time to achieve some overall goal. Conceptually, dynamic decision making is a closed learning loop in which decisions are informed by the results of previous choices and their outcomes (Gonzalez, 2017; Gonzalez et al., 2003).
Results of Behavioral Studies
A review of experimental research that helps in understanding basic decision processes can be classified into two groups: system control and search and choice. The system control approach presents DDM as a task aimed at maintaining a system in “balance” over time by reducing the gap between the state of the system and a target state. The search-and-choice approach presents DDM as a sequential task in which the goal is to maximize the total utility (e.g., score, reward, money) over the long run after a sequence of choices.
System Control
The control approach of DDM has its origins in research conducted in industrial, engineering, and managerial situations (Forrester, 1961; Pew & Baron, 1978; Rapoport, 1975; Sterman, 1989; Wickens & Kramer, 1985). A common task is that of a manager controlling the level of inventory to a target or within acceptable ranges in a dynamic stock management task (Sterman, 1989). This task can be accomplished by altering the inflows (e.g., fulfilled orders from suppliers, which increase the stock) and outflows (e.g., sales to customers, which decrease the stock) to counteract the environmental disturbances that push the stock away from its desired value (e.g., delays from suppliers and irregular demands). Control tasks are common at the societal, organizational, and individual levels (Cronin, Gonzalez, & Sterman, 2009): humanity struggling to stabilize the concentration of CO2 in the earth’s atmosphere, decision makers making production and sales decisions to maintain an optimal inventory, and a diabetic patient making decisions about the consumption of sugar and use of insulin to maintain an optimal level of sugar in the blood.
Early research involved tasks with high structural and dynamic complexity (Sterman, 1989; Gonzalez et al., 2003). For example, the Water Purification Plant task involved a large set of alternatives, high time constraints, high uncertainty, and large dynamic complexity given the interrelationship of decisions over time (Gonzalez, 2004, 2005). These tasks have shown that it is very difficult for humans to reach and maintain optimal control of a dynamic system, even after extended practice (Diehl & Sterman, 1995; Gonzalez, 2005; Paich & Sterman, 1993; Sterman, 1989, 1994). These difficulties arise from limited cognitive capacity to respond to delayed feedback (Diehl & Sterman, 1995; Sterman, 1994) and the tendency to rely on context-specific knowledge (Gonzalez, 2004, 2005; Gonzalez et al., 2003). Increased feedback delays between decisions and corresponding outcomes negatively affect long-term performance in dynamic control tasks (Einhorn & Hogarth, 1978; Gonzalez, 2005; Kleinmuntz, 1985; Sterman, 1989). Some research has concluded that people do not learn to control dynamic systems because they misperceive the feedback (Sterman, 1989), whereas others suggest that outcome feedback may be insufficient and that other levels of feedback (e.g., process feedback, or an explanation of how the outcome emerged) are needed for people to learn to control a dynamic task (Gonzalez, 2005; Kluger & DeNisi, 1996; Lerch & Harter, 2001). Research also suggests that extended practice is often required for improved decision making (Gonzalez et al., 2003; Kerstholt & Raaijmakers, 1997; Martin, Gonzalez, & Lebiere, 2004), but it is clear that practice alone does not necessarily lead to better decisions (Brehmer, 1980; Gonzalez, 2004). Clearly more research is needed to shed light in these processes.
To contribute to a better understanding of decision-making processes, recent research has reduced control tasks to their fundamental elements—one stock, one inflow, and one outflow—and asked for judgments about the relations between these elements over time (Cronin et al., 2009; Cronin & Gonzalez, 2007; Gonzalez & Dutt, 2011a; Gonzalez & Wong, 2012; Sterman, 2002; Sweeney & Sterman, 2000). Interestingly, researchers have found that even in these simplified problems, most people, often with high levels of education, perform poorly (Cronin et al., 2009). A general difficulty, termed “stock-flow failure” (SF failure), seems to stem from the erroneous human tendency to expect that the results (i.e., stock behavior) should follow the same trend of the behavior of the cause (i.e., flow behavior) (Cronin et al., 2009). For example, when students were asked to estimate the trend of CO2 emissions over time in order to control an increasing tendency of accumulation of CO2 in the atmosphere, they responded by drawing an increasing trend of emissions that is similar to the shape of CO2 accumulation over time (in reality, emissions would need to decrease to the same level of absorptions in order to control the CO2 accumulation; Dutt & Gonzalez, 2009).
Recent efforts present cognitive explanations of SF failure, suggesting the importance of human ability to observe similarities among experienced patterns of behavior (Fischer & Gonzalez, 2016; Gonzalez & Wong, 2012) and to make intentional effort to attend and to be aware of information, to effectively control a dynamic system (Weinhardt, Hendijani, Harman, Steel, & Gonzalez, 2015).
Search and Choice
This view of DDM originates from the work of Edwards (1962), the research paradigms that followed (Hogarth, 1981; Rapoport, 1975), and early work characterized by the use of simulated “microworlds” (Toda, 1962). Under this tradition, research investigated the effects of real-world characteristics of decisions, such as time constraints, feedback delays, and cognitive workload, and how people deal with such environmental constraints (Gonzalez, 2004; Kerstholt, 1994; Omodei & Wearing, 1995). In this sense, DDM search-and-choice research shares some connections with naturalistic decision making (NDM; Klein, 1989; Lipshitz, Klein, Orasanu, & Salas, 2001): Both lines of research have focused on the effects of knowledge, experience, and intuition in decision making; they have investigated the effects of context and properties of a decision environment; and they have also investigated collective behaviors rather than individual behavior alone. However, a discussion of the connections between NDM and DDM is beyond the scope of this manuscript (for earlier and recent related discussions, see Gonzalez et al., 2003; Gonzalez & Meyer, 2016; Gonzalez, Meyer, Klein, Yates, & Roth, 2013).
Recent developments in the field of behavioral decision research provide new insights and opportunities for advancing our understanding of DDM processes (Gonzalez & Meyer, 2016). For example, the study of repeated and sequential decisions in the absence of explicit information is now a growing area referred to as decisions from experience (Hertwig, Barron, Weber, & Erev, 2004; Barron & Erev, 2003) as opposed to description-based decisions that present task information (explicitly) to participants. Again, the simplicity of new research paradigms is a major factor in this research advancement. Paradigms have emerged to study the process of search of information over time (Gureckis & Love 2009b; Lee, 2006), and sequential choice from experience (Barron & Erev, 2003; Erev & Barron, 2005; Hertwig et al., 2004).
In sequential search, a decision is made at each stage of time whether to stop or to continue analyzing new options (as in deciding whether to purchase a house or to hire an applicant for a job; Lee, 2006). This research shows that people employ various strategies, such as searching for new options until the value of new options exceeds the advantages of the current option, or stopping the search when the current option meets the desirable characteristics (Lee, 2006; Todd, 1997). The costs and benefits of obtaining more information (Ravenzwaaij, Moore, Lee, & Newell, 2014) and the order of the cues considered are crucial to find the optimal stopping-search rule (Lee & Newell, 2011; Ravenzwaaij et al., 2014).
Sequential choice paradigms often involve two alternatives, each representing unknown outcomes. People make repeated decisions and observe feedback regarding the outcomes. The sequential process of exploration is studied in sampling paradigms, where participants first explore the available options before they make a single consequential choice. The process of how people adapt their choices to changing environments is studied in repeated consequential choice paradigms, in which each selection contributes to earnings and feedback after each choice (Gonzalez & Dutt, 2011b). This line of research informs at least three factors that influence human exploration processes. First, it has been found that people engage in very limited exploration before making a choice (e.g., Hertwig & Pleskac, 2008, 2010), but people search longer when they encounter a prospect of losses and when they experience variant relative to consistent environments (Mehlhorn, Ben-Asher, Dutt, & Gonzalez, 2014; Lejarraga, Hertwig, & Gonzalez, 2012). Second, people often fail to maximize payoffs, and rather, people often match their response probabilities to the payoff probabilities (Erev & Barron, 2005; Lejarraga, Dutt, & Gonzalez, 2012; Shanks, Tunney, & McCarthy, 2002). Third, people learn to adapt to changing outcomes and probability distributions, but adaptation can be slow, and it depends on cognitive parameters of the information experienced, such as the recency and primacy of the relative outcomes from different alternatives (Cheyette, Konstantinidis, Harman, & Gonzalez, 2016; Lejarraga, Lejarraga, & Gonzalez, 2014; Rakow & Miler, 2009).
Theoretical Advancements and Computational Approaches
Simon (1955) and Edwards (1961) highlighted the importance of learning processes in DDM, with Edwards (1961, p. 489) suggesting that DDM and learning are closely related: “The distinction between dynamic decision processes and learning is one of emphasis, not content.” The learning process in DDM was formalized in instance-based learning theory (IBLT; Gonzalez et al., 2003). IBLT proposed that decisions in dynamic tasks are made by retrieving experiences from past similar situations and applying decisions that worked well in the past. IBLT’s most important contribution is the description of the learning process and memory mechanisms by which experiences may be built, retrieved, evaluated, and reinforced during an interaction with a dynamic environment. IBLT used the memory mechanisms proposed by a well-known cognitive architecture, ACT-R (Adaptive Control of Thought–Rational; Anderson & Lebiere, 1998), and their mathematical formulations are currently used in the implementation of computational models based on IBLT (i.e., IBL models) in tasks of various degrees of complexity, including control and choice tasks (Fu & Gonzalez, 2006; Gonzalez, 2013; Gonzalez & Lebiere, 2005; Martin et al., 2004).
A simple IBL model has emerged recently as a general approach for search-and-choice processes, whereby the rewards are learned from experience in binary-choice tasks (Gonzalez & Dutt, 2011b; Lejarraga, Dutt, et al., 2012). In this model, a simulated human (i.e., an agent) facing a choice between two options at time t would choose the option that provides the best value from experience. This concept of best value is derived from functions of memory defined in ACT-R, including the frequency of experienced events, recency, and similarity and variability of those experiences. This model captures dynamic human behavior in a large variety of sequential decision-making tasks (Cheyette et al., 2016; Glöckner, Hilbig, Henninger, & Fiedler, 2016; Gonzalez & Dutt, 2011b; Hertwig, 2015; Lejarraga et al., 2014; Lejarraga, Dutt, et al., 2012).
Another very common approach to model learning in DDM tasks is reinforcement learning (RL; Sutton & Barto, 1998). In a typical RL problem, an agent tries to find an association between an observed outcome and the earlier actions using either its memory or environmental cues. An agent takes an action at each state (e.g., selecting an option in a binary-choice task), and the environment delivers a reward or punishment based on the action-state pair and changes the current state of the agent. Importantly, like in IBL models, an RL agent tries to estimate the dynamics of the environment by experiencing it. An agent learns how good or bad each action is, based on the reward received. These characteristics might be probabilistic or deterministic and can be changed dynamically over time (Busemeyer & Bruza, 2012; Busemeyer & Pleskac, 2009). The goal of RL is to maximize the future rewards (estimating the value of each action based on the current reward and what could be expected in future). A particular type of RL algorithm, called model-based RL, is able to produce accurate accounts of human behavior in DDM (i.e., a navigation task), suggesting that people update their model of the environment after encountering changes to find the shortest path to the goal (Simon & Daw, 2011). Similar demonstrations of the ability of RL learning models to account for human sequential learning decisions can be found in Gureckis and Love (2009a, 2009b).
Applications, Trends, and Conclusions
DDM is a growing field (Fischer, Holt, & Funke, 2015), with applications for the design of decision support tools and training interventions in many domains. The use of microworlds and dynamic simulations continues to contribute insights in traditionally complex, dynamic tasks, including automated pilot systems (Jarmasz, 2006), firefighting (Barber & Smit, 2014), and medical domains (Jones et al., 2006); and new applications have emerged, including cyberdefense and cybersecurity (Ben-Asher & Gonzalez, 2015; Dutt, Ahn, & Gonzalez, 2013; Proctor & Chen, 2015), climate change (Dutt & Gonzalez, 2009, 2012; Moxnes & Saysel, 2009), and forensic science (Brewer & Wells, 2006; Dror & Cole, 2010).
But perhaps the most common application of DDM research findings and theories is in the development of training principles that apply across domains. Training recommendations follow the results from laboratory experiments using complex DDM tasks and from cognitive models, which suggest to (a) allow individuals to learn at a slow pace to help them adapt successfully to greater time constraints, (b) train individuals with a diverse set of experiences in order to increase the possibilities of effective adaptation to novel situations, and (c) use reflection over an expert’s performance during training to reinforce instances of high quality instead of reflection of self-performance of outcome feedback, among others (see a detailed description of these applications in Gonzalez, 2012).
Although research in complex, dynamic tasks will continue to inform the boundaries of human behavior, scaled-down laboratory tasks (for instance, navigation in the real world is simplified to a computer game with a virtual room; Simon & Daw, 2011) have multiple benefits. This research trend in the use of scaled-down laboratory tasks has helped and will continue to contribute to an understanding of the processes emergent from dynamically complex tasks. By studying simple tasks, we can focus on the study of human decisions that depend on the relationships between choices and their effects over time. Search-and-choice paradigms reveal essential processes of exploration among alternatives, decisions to stop search, search for information, and learning dynamics that will help in building computational models of DDM. Simplification is a tendency in theoretical and computational modeling efforts as well. A recent IBL model built to predict performance in individual repeated binary-choice tasks (Gonzalez & Dutt, 2011b) has been applied to a variety of aspects of search-and-choice tasks at the individual and team levels (Gonzalez, Dutt, & Lejarraga, 2011; Lejarraga, Dutt, et al., 2012), and RL, a simple representation of adaptive processes, is also showing its utility in explaining human behavior in many DDM tasks.
In the future, we expect increased interest in the study of sequential search processes, human adaptation to changing environments, and dynamic control tasks. We also expect increased interest in systematically expanding the simple experimental paradigms and modeling approaches. Expansions of current research will address the challenges of naturalistic environments, such as design of modern intelligent systems and collaborative systems that can dynamically interpret and adapt to changing situations, learn to make decisions from experience, and act appropriately under adversarial situations in distributed environments.
Key Points
Dynamic decision making (DDM) research is familiar to the human factors community, but it is often studied with complex simulations representing highly complex tasks, which limits findings to the identification of human behavior problems in these tasks.
We present recent advancements in DDM that focus on understanding cognitive processes involved in dynamic tasks. A main contributor to the advancement of research in DDM is simplifying environments while maintaining dynamic complexity.
Footnotes
Acknowledgements
Cleotilde Gonzalez was supported by the National Science Foundation Award No. 1530479 and the Army Research Laboratory under Cooperative Agreement No. W911NF-13-2-0045 (ARL Cyber Security CRA). The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the Army Research Laboratory or the U.S. government. The U.S. government is authorized to reproduce and distribute reprints for government purposes notwithstanding any copyright notation here on.
Cleotilde Gonzalez received her PhD in information systems from Texas Tech University, her postdoc from Carnegie Mellon University, and is a research professor at the Social and Decision Sciences Department and founding director of the Dynamic Decision Making Laboratory at Carnegie Mellon University. She is an associate editor of Cognitive Science, Journal of Cognitive Engineering and Decision Making, and Decision. She is a Fellow of the Human Factors and Ergonomics Society.
Pegah Fakhari received her BSc in electrical engineering from Tehran University, her MSc in applied statistics from Indiana University, and is a PhD candidate (double major in cognitive psychology and neuroscience) at Indiana University, Bloomington. Her work is on experimental and computational models of human learning and decision making.
Jerome Busemeyer received his PhD in mathematical psychology from the University of South Carolina, his postdoc from the University of Illinois, and is Provost Professor in psychological and brain sciences, cognitive science, and statistics at Indiana University, Bloomington. He is the founding chief editor of Decision and associate editor of Psychological Review and Topics in Cognitive Science. He is a Fellow of the Society of Experimental Psychologists.
