Abstract
Background.
Wargaming has a long history as a tool for understanding the complexity of conflict. Although wargames have shown their relevance across topics and time, the immersive nature of wargames and the guild-like communities that surround them have often resisted the social scientific advances that occurred alongside the evolution of warfare. However, recent work raises new possibilities for integrating wargaming practices and social scientific methods.
Purpose.
Develop the experimental wargaming method and practice. Prioritizing the focus on iteration, control, and generalizability within experimental design can provide new opportunities for wargames to answer broader questions about decision-making,
Method.
The International Crisis Wargame developed in 2018 demonstrates the viability of experimental wargaming, and models the process of theorizing, designing, developing, and executing these wargames. It also identifies what makes games more or less experimental and details how experimental design influenced choices in the game.
Conclusion.
Experimental wargames are a promising new tool for both the
Background
Wargames have evolved to answer questions across historical context, wars, and technological changes (Caffrey, 2019; Kriz, 2017; Smith, 2010; Wintjes, 2015; Leonard, 2019). Can the social scientific advances that occurred alongside the evolution of wargaming inform game design (Bartels, 2018, pp. 37–39; Barzashka, 2019; Compton, 2019; Judge, 2020)? Wargames are often characterized as subjective (Perla, 2016; McGrady, 2019). This represents a strong belief among many in the wargaming community that the best wargames privilege realism, external validity, and inductive processes over more positivist and internally valid methodologies proffered by modern social science research (Bartels, 2017; Rubel, 2006). However, recent work on the integration of experiments within wargaming suggests wargames can utilize social scientific methods, and prioritizing iteration, control, and generalizability within experimental design can provide new opportunities for wargames (Jensen & Banks, 2018; Lin-Greenberg, 2018; Reddie et al., 2018; Schneider, 2017; Schneider et al., 2017).
Purpose
Although much of the academic work that integrates experimental design into wargaming is new, experimental applications within wargames are not novel (Bloomfield & Whaley, 1963; Mandel, 1977). Military gaming used experimental methods in the past, though more recent experiments are focused on the process of validating operational concepts, not a method for testing policies or exploring crises (McCue, 2003; Fong, 2006). Similarly, social scientists have used games for decades to understand behavioral phenomena (Johnson et al., 2006). Despite the interest in experimental wargames, there is little work to help us understand what makes games experimental, when to use experimental versus observational games, and how researchers consider internal and external validity in experimental game design (Lin-Greenberg et al., 2020).
To begin a more rigorous application of experiments within wargaming, this article introduces an explicitly experimental wargame design, the International Crisis Wargame (ICWG), and uses it to illustrate the design choices that researchers must make when utilizing experiments within wargames. ICWG features treatments and controls of a two-scenario, one-sided simulation designed to test hypotheses about the use and impact of cyber exploits and vulnerability on nuclear use. Although the game can answer a substantive question about cyber and crisis decision-making, it is shown here as an example of how game designers can use experimental design to inform game development regardless of the substantive question.
This article begins by defining experimental wargames and then breaking down their design process. It then introduces the ICWG as a design case study. Finally, it concludes with a discussion of the future of experimental wargaming as a methodology and extensions of ICWG.
Experimentation and Wargaming
Social scientists have renewed interest in experimental wargaming. (Reddie et al., 2018; Lin-Greenberg, 2020; Schneider, 2017). So, what makes a wargame experimental? And why would researchers or wargame designers use experimental design when developing wargames?
Morton and Williams (2010) define experiments as “when a researcher intervenes in the data generating process by purposely manipulating elements” (p. 44). Experiments can occur in labs, through surveys, or in the field, but fundamentally they attempt to control confounding factors in order to narrow in on causal mechanisms (or, why x leads to y). Researchers using experimental design make decisions about sample, iteration, and scenarios in order to limit these confounding observable and non-observable factors, thus maximizing internal validity (McDermott, 2002). Experimental design, therefore, privileges internal validity, or the “ability to draw confident causal conclusions from the research,” (Schram, 2005) over external validity, or the “the approximate truth of the inference or knowledge claim for observations beyond the target population studied” (Morton & Williams, 2011, pp. 274-275). As Rose McDermott (2011) explains, “internal design comes first.…Without first establishing internal validity, it remains unclear what process should be explored in the real world.…External validity follows, as replications across time and populations seek to delineate the extent to which these conclusions can generalize” (p. 28).
To understand the distinction between internal and external validity and its importance for game design, imagine a scenario where a six-sided die is thrown by 100 people in a room; an even number is rolled half of the time. Now imagine an experimenter tests this phenomenon by creating a lab-version of the same room and die, but draws players from a different population than the true population. If significant differences in how these players throw affect the outcome, then this design might not be externally valid. But overall, the die should have similar probability outcomes even with the different sample. Imagine the same group that threw the original die replicates the same throws in the same original room, but this time the die is seven-sided. They will get invalid results because the construct of the experiment is incorrect. Therefore, when trying to understand causal mechanisms, internal validity must be achieved, or questions of external validity become moot.
What then makes a game more or less experimental? And must games hold every factor constant to be designated “experimental”? “Experimental” is not a binary characteristic for wargames; it is based on choices made to prioritize internal versus external validity. The most controlled can resemble conventional lab experiments with an emphasis on randomizing samples, control and treatment groups, iterations conducted in a controlled environment, and a high abstraction of game elements not related to the manipulations (Falk & Heckman, 2009). These high experimental wargames may privilege internal validity over more detailed scenarios, non-random expert samples, and iterations over time or at different locations.
However, less rigidly controlled games may look more like artefactual field experiments with non-standard subject pools or framed field experiments with non-standard subject pools leveraging their unique skills and knowledge – similar to a true field experiment (Harrison & List, 2004). These games, for example, may hold the scenario or treatments constant, but vary players, locations, or situational context (Morton & Williams, 2010, pp. 301–306; Gerber & Green, 2012, pp. 9–11). This kind of trade-off substitutes confounding variable control for generalizability and sample heterogeneity.
Therefore, when wargamers seek to use experimental design, they must weigh trade-offs between control and immersion, the role of sample in design, and the challenges of iteration and generalizability (Gerber & Green, 2012, pp. 13–25). The big difference between experimental and observational methods is that experimental design makes these choices deliberate and uses deductive reasoning to manipulate game design, centralizing causal mechanisms for research questions; it tests hypotheses. In contrast, observational games may use an inductive method to generate hypotheses that are not evaluated for their validity within the game (Bartels, 2019; Morton & Williams, 2010, pp. 42–46). Wargamers testing why an outcome occurs and in what situations will find experimentally designed wargames useful whereas observational games are better suited to understand the universe of possibilities for a given question.
International Crisis Wargame
To demonstrate some of the choices made in experimental wargame design, below we detail an experimental game, The International Crisis Wargame (ICWG). ICWG uses explicitly experimental choices to test hypotheses about cyber and nuclear stability derived from current cyber literature. Developed in December 2017, it was first executed with senior officials at a Department of Energy-sponsored event in May 2018. Since then, ICWG has over one hundred iterations in a dozen locations (including virtual) and over five hundred players across the globe.
Variables, Hypotheses, and Experimental Intervention
Game design began with a research puzzle: How do cyber operations affect nuclear stability? Our research question led to the first step in experimental design (Figure 1), designating a clear independent variable and dependent variable. Cyber operations were our independent variable, which we qualified even more narrowly as cyber exploits into and vulnerabilities within the Nuclear Command, Control, and Communications (NC3) system (Office of the Deputy Assistant Secretary of Defense for Nuclear Matters, 2020). Our dependent variable was nuclear use, which we defined as the employment of nuclear weapons without specifying yield or distinguishing between electromagnetic pulse or nuclear strikes.
Phases of experimental wargame design
After identifying our variables, we generated hypotheses. The existing literature had no empirical evidence about how our independent variable influenced nuclear use (thus necessitating games), but it did feature hypotheses about how cyber operations might affect stability (Acton, 2018; Borghard & Lonergan, 2017, 2019; Cimbala, 1999; Gartzke, 2013; Gartzke & Lindsay, 2017; Lindsay, 2015; Nye, 2017; Slayton, 2017; Stoutland & Pitts-Kiefer, 2018; Unal & Lewis, 2018). Based on the literature, we identified four hypotheses about nuclear use.
During a crisis, cyber operations against NC3: Create windows of vulnerability that create incentives for preemption and deliberate escalation. Create insecurity that leads to nuclear alert or predelegation of weapons platforms that increase the chance of accidental or inadvertent nuclear use. Create beliefs about mutual vulnerability, shoring up deterrence and ultimately lead to restraint. Play little role in decisions to use or not use nuclear weapons.
These hypotheses required answers to two questions: how would players behave if an adversary potentially targeted their NC3 during a crisis? And how would players behave if they had the ability to target their adversaries NC3? Mechanically, this would require two different experimental treatments in the wargame: a cyber vulnerability in the players’ NC3 that an adversary could exploit, and possession of offensive exploits or access to an adversary’s NC3 that players could use. The two treatments generated four experimental conditions to which player groups were randomly assigned to control for bias (Table 1).
The hypotheses and experimental treatments shaped the wargame scenario. The experiment requires two states with nuclear capabilities and sufficiently digitized NC3 systems to be impacted by cyber operations. Furthermore, these treatments assume the cyber operations occur in the context of a crisis or conflict where nuclear capabilities may be relevant. Players then respond to the crisis with provided options, including nuclear and cyber operations options as well as other credible alternatives.
Treatment groups based on Access/Exploit and Vulnerability.
Game Design
ICWG prioritized internal validity and control but also sought to iterate over time with a large and heterogeneous sample to create generalizable findings. This decision led to a simple game design with an abstract scenario that could be executed many times in different locations with limited logistical support. Consequently, ICWG is a one-sided, one move, two scenario wargame. The two scenarios represent a low-intensity (Scenario One) and high-intensity (Scenario Two) version of the same crisis. The scenarios are sequential, but actions taken in Scenario One do not impact Scenario Two. Although this simple design may have sacrificed the player immersion provided by a more complex game, it eliminated the need to develop a complex adjudication model and allowed for cross-game comparison without significant variation. Fortunately, post-game surveys suggested players were heavily invested in game outcomes—even with the simple design. This game design supported a narrow research question about immediate responses to threats versus a more complicated question on escalation dynamics.
Players and Groups
Players were primarily recruited from a non-standard subject pool with elite decision-making experience (government, private sector, or military) or nuclear or cyber expertise. In recruiting a heterogeneous expert sample, we aimed to replicate a cabinet of policymakers with diverse expertise, similar to a head of state inner circle. As the game increasingly iterated, the became more heterogeneous, allowing for a post-hoc examination of how expertise might matter to game outcomes.
Players are assigned randomly to groups of 4-6 players, remaining with that group for both scenarios. The game uses groups (vice individuals) to simulate the most representative decision-making for national security crises (Gvosdev, 2017). This introduces an important intervening variable in our research design: group dynamics. Accounting for this intervening variable in data collection and post-hoc analysis is therefore important to design.
To encourage engagement and broaden thinking, players have roles within a notional national security cabinet for an undisclosed system of governance. Players get a brief description of the roles (Online Appendix A) and then pick roles they keep throughout the wargame. The roles conferred no special information, privileges, or powers aside from those the players collectively gave them. The roles serve a primarily dramaturgical and organizational purpose, increasing the game’s immersive quality and providing players a way to organize their involvement in the game.
Scenario
The scenarios introduce a crisis between two states: Our State, whose cabinet the players represent, and Other State, a notional adversarial state. The two scenarios intentionally create threatening crises, raising the specter of nuclear weapons use (especially in Scenario 2)—forcing players to consider nuclear dynamics. If the crisis never reached the threshold where nuclear weapons were potentially relevant, the experimental treatment would be irrelevant.
Our State and Other State are highly abstract, their names selected to discourage easy analogy or caricature, but emphasizing their allegiances. The two countries are geographic neighbors (Online Appendix A) and have a contentious history, particularly over the Gray Region, a semi-autonomous area of Our State. Gray Region’s citizenry is equally divided between Our State and Other State, further raising tensions. The two states are diplomatically, economically, and militarily symmetrical—a choice made to control for the interesting but confounding variable of power asymmetries. This includes cyber and nuclear capabilities, except for the NC3 access or vulnerability provided by the treatment conditions.
Crisis
The distinct low-intensity and high-intensity scenarios produce increasing threat, pressure to escalate, and encourage difficult deliberations about using nuclear capabilities or, to groups with the relevant treatment assignment, their special cyber capability/vulnerability. The crisis in Scenario One begins when Other State seizes control of Gray Region during the protests and subsequent violent riots commemorating the anniversary of Gray Region’s semi-autonomous status (Online Appendix A). Players are led to believe the protests and following intervention were premeditated. This scenario was designed to raise questions about territorial integrity and risks of potential escalation. Players are supplied Other State’s objectives and nuclear red lines, providing control over the degree of uncertainty regarding Other State intentions.
The crisis in Scenario Two is an escalated version of the first scenario. Other State has launched an invasion of Our State, taking land along the border, with reports of looting and killing of civilians. Other State issues fiery rhetoric claiming they would use nuclear weapons to protect their territorial acquisitions, reinforcing that claim by preparing nuclear weapons for potential use. Players are also presented with the uncertain prospect that Other State may continue to seize territory. These pronouncements strongly imply that the survival of Our State is at risk and include many aspects that are believed to provoke nuclear escalation. This scenario creates a strong emotional response through unprovoked aggression, hostile rhetoric, civilian casualties, and uncertainty.
Structure and Execution
To maximize control between iterated games, the execution of the wargame is extremely regimented. The game adheres to a strict schedule that takes approximately three hours (Online Appendix B). The wargame is executed by a facilitator who reads prompts, distributes necessary materials, and collects player materials at specific times directed by the facilitator script (Online Appendix B). Each scenario has an initial briefing, information distribution, player decision-making, and survey phase.
Players are provided an initial, verbal wargame briefing reminding players of experimental protocol and their rights as participants, instructing wargame play, and describing Scenario One and its crisis. Because of the experimental nature of the game, facilitators cannot provide any additional information outside the script and predetermined player materials.
Introducing the experimental treatment effectively and unobtrusively was critical to the experiment and game design. Players are given a two-page Additional Intelligence handout and instructed to review with the other materials. Each page of the handout updates the group on their access/exploit or vulnerability status (Online Appendix A); this is the experimental treatment. Players must receive and internalize this information or risk invalidating the game. Conversely, if the process were too obvious it would risk unduly influencing player behavior. Introducing the experimental treatment alongside other information while still allocating time for review was deemed an appropriate compromise.
In Scenario One, the players pick their roles and receive the Additional Intelligence, a Player Briefing handout, and the map. The Player Briefing reinforces the information from the initial brief and provides some more detail on the crisis. Players are told each Player Briefing handout is identical, except the labeled role. Once players have reviewed the information, they address the crisis by completing the Response Plan as a group and then individually take surveys.
Scenario Two progresses the same, beginning with a briefing to the players about the new crisis; all other information remains the same. The changes briefed to the players are distributed as the Scenario Two Update for reference. The players are told that the two scenarios are separate events, with the actions taken in Scenario One not affecting Scenario Two. Finally, players complete a new Response Plan for Scenario Two and then individually take the same survey again, but with additional questions about group dynamics.
Data Capture
The ICWG has three mechanisms for data capture: Response Plans, player surveys, and facilitator notes (Online Appendix B). Groups respond to the crisis by developing a course of action and filling out the Response Plan (Figure 2). Players are told the Response Plan is not a long-term planning document, but their immediate response to the crisis. The Response Plans are the primary data capture mechanism created to measure the experimental treatment effects.
Response Plan with indicated sections.
In the first section, players are asked to describe Our State’s course of action. The instructions specify that actions can be as simple as “we will do X to accomplish Y,” or more complex. It is intended as a narrative explanation of the plan. In section two, players indicate the actions that comprise their plan. This section is the primary data collection mechanism of the Response Plan. While the narrative section provides context, this section requires players to indicate what actions their plan explicitly entails. This structured section encourages quantitative cross-game comparison, unlike the free-text narrative which provides qualitative context.
The response actions represent an array of behaviors available to Our State. The categories are broad, with the intent to allow players to identify all major tools they plan to use in response to the crisis and standardizing the tools available across games. These options were generated by examining common player actions in other strategic level political-military games.
The third and fourth sections address the desired end state and objectives. If the first section is the narrative, the end state is the desired conclusion to the story. In the last section, players rank, from most to least significant, their motivating objectives, and exclude any irrelevant objectives, of their Response Plan from a list of options. Similar to section two, the ranked objectives allow for quantitative comparison across games.
The player surveys capture demographic information and individual perceptions of the wargame (all tied to participant ID, not attributable names). Although the Response Plan is developed collectively by the group, individual players may have different perceptions of the crisis or beliefs about the best course of action. The surveys are intended to capture those perceptions and beliefs. Additionally, the survey attempts to capture how group dynamics may have influenced the completion of the Response Plans.
Facilitator notes, while interesting and potentially insightful, provide limited analytic utility. Without extensive facilitator training and standardized behavior coding, facilitator notes serve to call attention to interesting behaviors and to identify problems, such as groups that suffered player attrition.
Data Analysis
To illustrate data analysis of an experimentally designed game, we will walk through a three step process. First, we need to measure treatment effects on our dependent variable: nuclear use. There are a few ways to do this. The researcher could perform an analysis of variance, which tests for a statistically significant difference between groups. This tells the researcher: is there a significant difference between the control and treatments? And are the treatments significantly distinct from each other? If yes answers the first question, then cyber (whether exploit or vulnerability) matters. The second answer informs which treatment conditions were more influential in nuclear use. Alternatively, a researcher could conduct a t-test to compare treatments to control groups. Statistically significant t-tests (p<.05) suggest a treatment effect. Using Cohen’s D can then provide a measure of the magnitude of the effect.
To better understand how treatments affected the probability of nuclear use, we might turn to statistical regression analysis. Because our dependent variable is binary, we can use either a logit or probit regression to examine the treatments’ statistical effect on nuclear use. Further, regression analysis allows examination of the role of confounding factors in explaining our dependent variable. Variables like game location, expertise, age, can be approximated and accounted for within these models.
These methods test the role of our treatment on the outcome, but they do not explain why groups used nuclear weapons or not. For insight into the motivations behind these outcomes, we turn to surveys. Survey data provides quantitative data about why players chose to use nuclear weapons and what role cyber vulnerabilities and exploits have on nuclear use (this data can be presented by showing percentages, means, or standard deviations— but may also be tested against dependent variables using statistical regressions). Further, surveys ask when individual beliefs deviate from the group, providing both a quantitative and qualitative measurement of individual responses differentiating from group outcomes.
Finally, qualitative data includes form survey responses, descriptions of the crisis Response Plan, and facilitator notes. This qualitative data can be coded for themes (for example, hedging strategies vs. first-mover strategies; diplomacy vs. conventional warfare; hierarchical groups vs. collaborative groups). Analysts can use qualitative evidence to determine whether the data collected in the outcome and survey measurements represent the players’ intent. As such, they are a useful validity check against strictly quantitative measures.
Game analysts should be careful when working with data. They must be aware of the unit of analysis when compiling data: the crisis response plan (or game outcome), the group, and the individual. For example, individual motivations captured in the surveys can only explain game outcomes when aggregated within their respective groups. Further, if analysts only look at individual survey data, they may miss important group dynamics that mediate individual preferences to create substantively different game outcomes.
Future of Experimental Wargaming
The influence of experimental design can bear great promise for analytical wargaming. The focus on controls, treatments, and causal mechanisms limits bias and helps the game tease out why something occurs, not just what the possibilities might be. Although experimental design prioritizes internal validity, whether a game is experimental is not binary. Designers can make choices about scenarios, moves, iteration, and sample that increase external validity but decrease the overall control that the designer has over confounding variables. When these decisions are made deliberately, they can be accounted for and even sometimes mitigated in post-hoc data analysis, especially with thoughtful data collection designed to identify intra-game bias.
The ICWG design represents an experimentally designed game and illustrates how researchers must make trade-offs about internal and external validity to answer their research question. Here, we opted for control and replicability over complexity and player immersion. This allowed game reproduction over a larger sample and across time, thus creating generalizability while not sacrificing too much internal validity. Some decisions—for instance, about groups vs. individuals, no adjudication, or hypothetical country names – reflect assumptions made by the game design team about how these options would influence the value of the overall research. But ICWG is just a baseline model. Wargamers or researchers interested in international organizations, mid-level action officer decision-making, or specific regions can modify scenario details, players, and roles to better answer their research questions. The use of multiple moves, adversaries, and adjudication procedures can be added to this base model to examine multiple move dynamics and the role of player immersion in outcomes. Further wargaming research needs to be conducted to examine how these choices affect both the outcomes of games and their overall validity of these outcomes to real-world behaviors.
Supplemental material
Supplemental material for this article is available online.
Supplemental Material for Wargaming as a Methodology: The International Crisis Wargame and Experimental Wargaming by Benjamin Schechter, Jacquelyn Schneider, Rachael Shaffer, in Simulation & Gaming
Footnotes
Authors’ Note
The views presented here are the authors’ and do not necessarily represent those of the U.S. Naval War College, the U.S. Navy, or the Department of Defense.
Declaration of Conflicting Interests
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Author Biographies
Contact:
Contact:
Contact:
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
