Abstract
This article introduces and demonstrates a new methodology for searching for emergent behavior in simulation-based analysis as rare, extreme events using adaptive techniques enabled by recent advances in Bayesian machine learning (ML). The new methodology, Low-cost Adaptive exploratioN to Track down Extreme, Rare events using Numerical optimization (LANTERN), supports analysis activities in defense planning that iteratively build up the understanding of new technology and concept alternatives in complex military scenarios. Central to this process is emergent behavior—hard-to-predict but highly important behaviors that present problems or opportunities. These unexpected behaviors can be generated in complex military scenarios and are crucial to decision-making, but are often difficult to find and work with due to the expense and cost of the existing approaches for working with high-fidelity military simulation. To address this challenge, LANTERN is formulated to accelerate the discovery of emergent behavior as rare, extreme events by combining human expert understanding with new artificial intelligence (AI)-driven adaptive experimentation techniques in iterative analysis. A demonstration of the methodology is presented using military agent-based simulation scenarios developed in the Advanced Framework for Simulation, Integration, and Modeling (AFSIM). The demonstration highlights how analysis can focus directly on searching for emergent behavior and shows substantial improvements over brute-force Monte Carlo approaches.
Ongoing defense planning efforts focusing on modernization and capability development planning are using military simulation to iteratively explore new technology and concept alternatives and decide on investments that need to be made now. This includes force design activities in the United States Air Force Warfighting Integration Capability (AFWIC) that are organized to iteratively assess, develop, and evaluate alternative ways of operating in potential future military scenarios. 1 Constructive simulation is used as a core approach during the early analysis iterations to explore potential future technologies and concepts in different future military scenarios. This allows highlighting gaps in knowledge, focusing analysis on updating understanding, and perturbing the scenarios until acceptably confident with investment decisions.
In simulation-based analysis during capability design, while some simulation results may be well understood, the nonlinear and chaotic dynamics of military scenarios mean that others can be surprisingly extreme and unexpected. These emergent behaviors provide important information about what the military operations can look like and where investments may be most effective. Searching for these results is often a central goal in analysis to then be able to address problematic behavior or exploit potential opportunities. 2 Existing methods for searching for important behaviors often rely on Monte Carlo Simulation (MCS) or Design of Experiments (DoE) exploration techniques, but these approaches need to be enhanced to be feasible in cases with significant nonlinear and stochastic behavior combined with limited computational budgets. To address this problem, this work presents a methodology focused on artificial intelligence (AI)-driven adaptive analysis to accelerate finding important events.
1. Introduction
As military concepts and technologies are considered in “a campaign of learning” in the United States Department of Defense, military simulation is used to represent and analyze the dynamics in complex military scenarios. 3 For example, alternative technologies and concepts for operating effectively in Air Base Air Defense (ABAD) scenarios are considered in the study by Vick et al. 4 with recommendations for simulation activities. While simulation is often used successfully to support decision-making, as decisions continue to require working quickly with “complex and extremely challenging problems to simulate” in capability design, 2 there is simultaneously a need for the pace of analysis to be accelerated as technology and threats rapidly evolve—to “Accelerate Change or Lose.” 5
A central part of the simulation-based analysis activities completed during capability design involves supporting processes to iteratively assess the current and future options for operating in future military scenarios. 1 This involves searching for problematic scenarios and looking for promising alternatives that address the problems. During these steps, finding surprising, extreme, emergent behavior provides substantial important knowledge as critical changes in the dynamics are connected to vulnerabilities and opportunities in the military operations.
To work with emergent behavior, it is important to note that there are disagreements and contradictions in the existing research about the fundamental understanding of emergent behavior. This requires a definition of emergent behavior to be specified for the current context of military modeling and simulation (M&S). As will be discussed below, a definition of emergent behavior as rare, extreme events is synthesized for this case. Note also that this definition provides a way to make discovery practical, and acts as an inconclusive indicator of important emergent behavior. This builds off of the existing literature from systems engineering and adds a distinction using core ideas from the nonlinear systems and chaos field. With this definition, this article presents a new methodology, Low-cost Adaptive exploratioN to Track down Extreme, Rare events using Numerical optimization (LANTERN), which addresses the key challenge of searching for and finding emergent behavior in simulation-based analysis.
The new methodology for searching for rare, extreme events leverages recent advancements in AI techniques for adaptive search and experimentation using Bayesian machine learning (ML). These recent advances have been part of the increasing integration of AI-driven adaptive approaches to support scientific search and discovery. 6 For example, adaptive techniques leveraging different kinds of ML have been used for finding new superconductors 7 and new biological and chemical compounds. 8
The rest of this article is organized as follows. First, background on military M&S approaches in capability design is provided to set the context for working with emergent behavior. Then, a brief review of the relevant literature on emergent behavior in capability design and military M&S is included. This review allows a specific definition of emergent behavior to be synthesized. After that, the new LANTERN methodology for searching for emergent behavior as rare, extreme events is introduced as a set of specific steps, and a case study using military scenarios simulated with the Advanced Framework for Simulation, Integration, and Modeling (AFSIM) is presented. In the case study, new techniques for AI-driven adaptive search find rare, multi-modal extrema approximately at 3% of the cost of MCS and find 140% more rare, stochastic extrema in the given budget. Finally, discussion of the results is provided followed by a summary and ideas for future work.
2. Important behavior in military modeling, simulation, and analysis
The system of interest in military capability design is the operational system-of-systems (SoS) where systems such as people, vehicles, weapons, sensors, communications, and facilities interact in the operational environment and agents work to complete tasks in cooperative and adversarial concepts of operations. For example, the system could be the set of fighter aircraft—including airframes, sensors, weapons, subsystems, pilots, communications, tactics, and decision-making options—on both sides in an air-to-air combat scenario, where each side acts to find and shoot down the aircraft on the other side. Military M&S looks at what this system may, or can, look like in a future military scenario with special focus on dynamics leading to vulnerabilities or opportunities.1,9 The associated capability gaps and solutions are central in defense planning activities and investment decision-making.1,2,10
Specifically, a model—“a physical, mathematical, or otherwise logical representation”—of a military scenario implemented over time in constructive simulation involves nonlinear and stochastic interactions between entities, the result of which is a key focus in early capability design.9,11 Constructive here means the simulation focuses on simulated humans interacting with simulated systems to quickly explore many alternatives and scenarios. Typically, the simulations used in early capability design focus on engagement and mission scenarios based on decision-maker investment perspectives. 2
2.1. Nonlinear and chaotic dynamics in military simulation
The focus in this article is on agent-based simulation (ABS, also known as agent-based modeling—ABM) based on the current interest in the US Air Force,12,13 and the direct implementation of nonlinear interactions between autonomous agents that can lead to important, unexpected behaviors. The ABS approach focuses on individual entities—agents—which can autonomously perceive, assess, decide, and act. 14 This agent-based instantiation of system entities results in chains of dynamic interactions between agents, 15 which allow “unanticipated behaviors to emerge.” 14 This mirrors the nonlinear and chaotic dynamics that can be produced in the real military scenarios being considered and allows the use of techniques to identify, explore, and manage the emergent behavior. 15 ABS also is a critical part of analysis efforts focused on autonomous systems based on their natural description using the paradigm. An illustration of the ABS approach is given in Figure 1 (synthesized using definitions from Herrera et al., 16 Wooldridge, 17 and Russell and Norvig 18 ). The figure emphasizes the independent perception, decision-making, and action of the agents as well as the importance of chains of interactions that generate the resulting dynamics in the military scenario. In addition, the complexity in terms of nonlinear and stochastic dynamics in ABS opens degrees of freedom that can result in nonstationary, nonconvex, and non-Lipschitz continuous functional behavior, and which can additionally include non-Gaussian (potentially heavy-tailed) noise.

Illustration of the agent-based simulation (ABS) approach to describing and simulating military operations. The central focus on agents that independently perceive, assess, decide, and act is emphasized. The diagram is synthesized using definitions from the studies by Herrera et al., 16 Wooldridge, 17 and Russell and Norvig. 18 The pyramid represents different levels of scope and resolution available, such as focus on mission-level compared to campaign-level operations.
Recent examples using ABS for analysis include the exploration of impacts of technologies and concepts with autonomous robot swarms, 19 exploration of air combat tactics with fighter aircraft and air-to-air missile designs, 20 looking at small unmanned aircraft in future operations, 21 and searching for emergent behavior using DoE and statistical debugging techniques in Ballistic Missile Defense System modeling. 22 These approaches successfully allow estimating measures of effectiveness, and allow decision-making about alternative choices, but the existing DoE exploration techniques need to be enhanced to allow emergent behavior to be effectively explored when strong nonlinear and chaotic dynamics are generated—especially in cases with expensive, high-fidelity simulation (where running many, many Monte Carlo samples is not feasible as the exact locations and cases where emergent behavior is generated are not known a priori). The core challenges of nonlinear and chaotic dynamics that generate emergent behavior will be considered next in the following section.
3. Emergent behavior
The key type of behavior supporting analysis to build up understanding of military scenarios in capability design was identified above as emergent behavior—as critical, unexpected changes in outcomes of interest. This importance is based on the close connection to the need to avoid vulnerabilities and gaps that result in catastrophic failure of military operations, as well as from the need to exploit opportunities for increased effectiveness and affordability in operations. These vulnerabilities and opportunities exist based on the nonlinear and chaotic dynamics encountered in military scenarios.
However, there exist significant disagreements and contradictions in the literature on emergent behavior, 23 including “deep contradictions as to what [fundamentally] constitutes emergence.” 24 This creates a challenge for developing methods to work with emergent behavior, and requires a review of the literature focused on the context of simulation-based analysis in capability design, as well as a specific, synthesized definition that supports the goal of accelerating effective analysis. The central parts of a review are included here, and additional detail and discussion is available in the study by Braafladt. 25 The additional background in the study by Braafladt 25 includes discussion focused on cases with military simulation that use alternative definitions of emergent behavior.
In early capability design, research activities include work in the SoS, Complex Systems Engineering, and M&S fields focused on complex military operational systems as embedded in the Systems Engineering discipline. In addition, the focus of simulation on the dynamic behavior of systems means the Dynamical Systems Theory and Controls fields as well as the sub-discipline of Nonlinear Systems and Chaos are important based on the core characteristic of complexity in military operational systems.
3.1. Complex systems engineering
Starting with Systems Engineering, fundamental ideas work with emergent behavior as “properties of the whole systems not attributable to individual parts,” 26 and importantly, this idea is further specified in Complex Systems Engineering to focus on limitations of prediction given a current understanding about a system. 24 Working with emergent behavior is often understood in terms of exploring possible system behaviors with a changing state of knowledge about a system over time. Approaches in Complex Systems Engineering focus on the need to iteratively work to manage emergent behavior that is discovered in certain contexts and with certain simulation and real perspectives.27,28
3.2. Modeling and simulation
Working with emergent behavior is heavily focused on M&S as ways to “detect, manage, and control emergence in the virtual realm before challenges produce risks in the real world.” 29 The discovery of these not-yet-known, important behaviors is closely related to the nonlinear and stochastic interactions and agent-based decision-making encoded in simulation of complex systems. 29 These dynamics can often propagate to unexpected outcomes and events. 28 M&S is also central to analysis in capability design where data are not available and iterative assess-develop-evaluate processes (as in AFWIC) 1 are used to explore military scenarios and investment alternatives.
3.3. Dynamical systems theory
Dynamical systems theory and control theory focus on simulation and control of systems represented using mathematical, analytical models based on the “application of basic physical laws to system components and the interconnection of [those] components.” 30 For military simulation, while rooted eventually in physics, the nonlinear and chaotic dynamics involved are driven by “information exchange” where the equations of motion have not been “defined, specified, quantified, and documented,” making direct control of emergent behavior “beyond the scope of modern/classical control theory.” 27 However, mathematical approaches for working with nonlinear and stochastic outcome responses allow the potential for analysis techniques to better target important behaviors. Specifically, nonlinearity and chaos—as observed in military operations and in simulation of those operations—also result in cases which are inherently unpredictable and extremely difficult to solve analytically as dynamical systems.
3.4. Nonlinear systems and chaos
For complex systems, nonlinearity leads to extremely difficult analytical solutions, and chaos to inherent unpredictability. 31 For military simulation, this is fundamentally part of the nonlinear and stochastic interactions between cognitive systems with move, countermove dynamics (“whenever parts of a system interfere, or cooperate, or compete, there are nonlinear interactions going on”). 31 With these challenges in nonlinear systems and chaos, research has worked to characterize different resultant behavior patterns such as growth, oscillations, limit-cycles, and collapse to name a few, and to connect these patterns to causal dynamics. 32 For this work, the focus is on extreme events—“abrupt changes in the state of the system” 33 that are related to catastrophic problems or important opportunities in military scenarios.
The abruptness of the events has been part of recent work on the mechanisms underpinning the extreme events that look at rare transitions into extreme domains 33 and events “in the tails of a probability distribution function.” 34 These attributes of rareness are fundamental to the unexpected nature of emergent behavior and of ways to potentially work with the events using numerical optimization due to difficulty of prediction. 33
3.5. Definition of emergent behavior
Considering the literature on emergent behavior from systems engineering, M&S, and nonlinear systems and chaos, “rare,”“extreme” events are the critical quantitative, mathematical behavior associated with emergent behavior based on the abrupt, significant, difficult to predict occurrence of the events, and the associated nonlinear and chaotic dynamic processes that generate the events. Rareness is due to either or both of two quantitative characteristics. First, localization in the variable space, where the probability of randomly choosing specific inputs that activate dynamics resulting in extrema can be very low, and second, related to probability in the stochastic space, where the probability of perturbing the dynamics into certain instabilities leading to extrema can be very low. For emergent behavior, rareness is also combined with extremeness, which is the quantitative characteristic of variables taking values larger (maxima) or smaller (minima) than others. These can be local, in comparison to other points in the same local domain—or global, in comparison to all points in the entire domain. Extremeness is necessary for the specified definition of emergent behavior based on the importance of the relative value of metrics encountered in analysis that are typically desired to be maximized or minimized. Rare, extreme events combine the perspectives on emergent behavior from complex systems engineering and dynamical systems theory and form the core of the meaning of emergent behavior in this work. This is illustrated in Figure 2. Note also that the complexity in ABS resulting in emergent behavior is also closely related to the ability to generate non-stationary, non-convex, non-Lipschitz continuous, and non-Gaussian functional behavior due to nonlinear and stochastic dynamics in ABS. In addition, an initial approach for working with rare, extreme events is prompted as numerical optimization as in the study by Farazmand and Sapsis. 33

Illustration of definition of emergent behavior as rare, extreme events. Localized extremeness is illustrated as a small, extreme domain in a contour plot in the top left image. Stochastic extremeness is illustrated as tail events in a probability distribution function in the lower left image. Note that the nonlinear and stochastic complexity of ABS dynamics is illustrated using part of Figure 1 on the right. This results in possible non-stationary, non-convex, non-Lipschitz continuous, non-Gaussian functional behavior with military simulation.
Note that events that are rare or extreme are often both rare and extreme, as both are characteristics expected to sharply contrast with the general response behavior. However, while rare events can also be generated at non-extreme values (e.g., in between peaks of a multi-modal distribution), the focus in this work is on rare events with extreme values. In addition, extreme events—as the largest or smallest values observed—can be non-rare (e.g., if most values of an output with a range of 0 to 1 take the value of 1, and the other values are very near, but just below 1). However, in practice, extremeness typically implies rareness. This convention is continued in this work and both are taken as necessary conditions which together are sufficient to define emergent behavior.
Furthermore, the use of rare, extreme events as defining indicators of emergent behavior reduces the direct emphasis on causality leading from interactions of the system parts to new behaviors. This work takes an analytical viewpoint focused on working with rare, extreme events as practical, but inconclusive, indicators of important underlying interactions and dynamics. This work enables follow-on future approaches for managing emergent behavior that consider the discovered events in further causal dynamical analysis to make insights that drive potential changes to exploit or manage the events.
A final distinction is that rare, extreme events as emergent behavior are distinct from the common usage of the related, but different, idea of outliers. Outliers are “data points that appear to be separate” 35 and are also rare, but are often considered to be due to errors and deliberately removed. For example, those arising because of errors in a simulation. Such errors may be present in practice, but the result of discovery is correction of the simulation rather than further analysis to work to manage the dynamics associated with the events as in capability design. Therefore, an assumption is made that the rare, extreme events observed in this work are just those connected to important dynamics, and that those which are connected to errors in simulation and assumptions are corrected through some other process not directly considered in this work.
4. Problem formulation
Simulation-based analysis in capability design involves assessing current capabilities, developing alternatives, and evaluating the alternatives as in the AFWIC assess-develop-evaluate process. As military scenarios are iteratively considered with different views and simulation fidelity, the complexity of the dynamics make it likely that there will exist unexpected important new behaviors or changes in important extreme behaviors. Finding and understanding these rare, extreme events is a primary goal for building up knowledge of the scenarios and capability concepts and enables investment decision-making that works to manage the important behaviors.
Putting aside the assumption that an expert already knows about specific emergent behaviors or where to search for them, the traditional approach to finding important unknown behaviors is MCS. MCS involves unbiased random or uniform search approaches that converge on rare behaviors at an infinite limit using random numbers. 36 MCS techniques are already used frequently with military simulation, but in the different context of averaging stochastic behavior to work with deterministic analysis methods (e.g., as in Morgan et al. 37 or Gordon. 38 ) MCS techniques are used to search for important behaviors less often based on the expense required for the many, many simulation runs needed to find rare events (e.g., as in Chetcuti, 39 Gordon, 38 or Diallo et al. 22 ). Note that while expensive, MCS or related DoE techniques are often preferred when possible when working with military simulation due to the robustness to unknown structural characteristics of the response. If ill-suited structured techniques are chosen, they can perform worse than MCS.
The expense of brute-force MCS is illustrated in an analytical example in Figure 3, where the corner function (given in Equation (1)) is used to generate a response with rare, extreme behavior in the lower left corner of the two-dimensional input domain. This behavior is easy to predict mathematically, but requires many MCS evaluations to discover if no knowledge of the function is assumed. Each panel in Figure 3 shows a set of

Example search for rare behavior with MCS using an analytical problem. Each panel shows
State-of-the-art techniques are exploring numerical optimization to find extreme events even with expensive simulation by directing sampling to areas of the input domain that are promising based on iteratively observed outcomes. These approaches are constrained by the problem characteristics of complexity discussed above as well as by the lack of derivative observations given the black-box mathematical and computational nature of military simulation. Under these constraints, the most popular techniques use derivative-free optimization (DFO), reinforcement learning (RL), and Bayesian optimization (BO) approaches. This ongoing research is also part of a broader effort that seeks to leverage AI methods to find important behaviors and events (e.g., discoveries in physical and scientific fields). 6 Applying traditional AI methods runs into specific challenges in military capability design. The traditional methods build off of large, physically validated datasets, but in capability design, data need to be created for each specific use case. In addition, in capability design, the fidelity of the dynamics included in military simulation is often pieced together for a specific case rather than built by leveraging a selected case from established physical fidelity models.
The search and discovery process considered in this work for finding rare, extreme events has the objective of maximizing metrics of interest in scenarios simulated using ABS (for a metric
The existing research looking at military simulation includes using DFO techniques such as genetic algorithms to search for important input settings 38 and training neural networks for policies controlling aircraft in air combat. 41 These approaches can be very successful in finding local extreme events, but often still require many evaluations (10,000+) or the use of simplified training simulations. Similarly, the use of RL techniques has been very successful with simplified simulation, 42 and in initial approaches with higher-fidelity simulation. However, the RL approaches often require extensive resources approaching MCS (e.g., 500,000+ in Pope et al. 43 ) There also is a narrower focus on learning policies for specific sequential decision-making actions, a subset of the search problems in capability design. Adaptive experimentation techniques such as BO have been applied to ABS similar to military simulation cases including those by Antunes et al., 44 Pedrielli et al., 45 Deskevich et al., 46 and LinQuest, 47 but these approaches focus on initial search and do not consider rareness of extreme behaviors. Overall, the explicit focus on working more efficiently with black-box, expensive simulation and recent advances in the AI-driven Active Learning and Adaptive Sampling fields that are evolving in parallel makes these approaches especially promising for extension to searching for emergent behavior.
BO is based on iterations of ML training and inferences to approximate the knowledge about a simulation response and to direct subsequent evaluations of the simulation. This loop is illustrated in Figure 4. The process iteratively trains and uses a probabilistic surrogate model (often a Gaussian Process (GP)) as well as a heuristic acquisition function that encodes a specific notion of value for taking the next sample at a given input location. See studies by Frazier 48 and Braafladt 25 for additional discussion. Importantly, the existing BO approaches rely on assumptions of relatively smooth behavior and stationary noise in the simulation response—characteristics that do not typically hold for military simulation. To be effective in capability design, especially in the case of rare, extreme events, this means that BO techniques need to be enhanced. Forthcoming publications associated with work by Braafladt 25 present results of research enhancing existing techniques for these challenges. Given that these new, enhanced techniques are available, this article focuses on methodologically using the new and existing adaptive techniques effectively to search for emergent behavior with military simulation in capability design and presents a case study demonstration.

Illustration of Bayesian optimization (BO) loop. A probabilistic surrogate model is iteratively used and updated to guide search using an acquisition function heuristic that defines value across the response.
5. LANTERN methodology
The success of adaptive techniques with military simulation in finding rare, extreme events more efficiently than MCS prompts a need for a new methodology to put the techniques into practice effectively. This is embedded in the context of working to manage the emergent behavior that is discovered. Specifically, the new LANTERN methodology is introduced. LANTERN is designed as an approach for operating on a simulation and analysis space to explicitly direct analysis toward finding types of rare, extreme events.
LANTERN parallels AI-driven discovery approaches in other fields that combine human understanding of context with ML techniques for pattern recognition and adaptive optimization heuristics in analysis (see for example, Pogue et al., 7 Wang et al., 6 or Urbina et al. 8 ) Military capability design sets unique requirements for LANTERN to work solely with new simulation-based data because of the difficulty of transferring data between cases. An overview of the LANTERN methodology is given in Figure 5. The steps in LANTERN include setting up the analysis, initial exploration, iterative application of adaptive search vignettes for important behavior, and human-expert-centered analysis of knowns and unknowns.

Overview of the Low-cost Adaptive exploratioN to Track down Extreme, Rare events using Numerical optimization (LANTERN) methodology. The process begins at Step 1 and continues through Step 6 with iterative analysis using high-fidelity military simulation at each step.
5.1. Step 1: set up analysis case
The first step in LANTERN brings in context from the outer goal of managing emergent behavior. This involves translating the analysis scenario to adaptive search problems. This step requires the specification of a military scenario and a simulation representation of the scenario. In addition, variables to vary in the simulation are needed. These variables can represent integrating new technologies or tactics (as by Mavris and Sudol 15 ) or uncertainties and operational conditions (as by Leftwich et al. 2 )
5.2. Step 2: initial response data
This step involves collecting initial data on the simulation response with traditional design space exploration methods. Some examples include the studies by Foster and Petty, 19 Connors et al., 20 Sanchez, 49 and Morgan et al. 37 These approaches use DoE techniques to samples across the variable combinations at an acceptable computational cost. Typical techniques include space-filling and full-factorial designs with extensions using surrogate models to explore interpolations and sensitivities.15,49 These techniques enable initial understanding about known knowns and known unknowns with notional bounds and allows human analysts to make hypotheses about the response (current extrema, ranges, and changes in the outcomes). The stochastic character of military simulation requires additional steps including allocating budget for replications to average. This enables deterministic techniques for exploration. Recently, additional exploration efforts have begun to include stochastic response characteristics directly (Sanchez 49 and Morgan et al. 37 ).
Important missing pieces of information to consider include empty areas where localized nonlinear response behaviors could be present as well as stochastic space that has not been sampled. Specifying this exactly in practice is exceptionally difficult and closely connected to understanding the causal dynamics. An initial approach involves assuming that these may exist, and attempting to iteratively target vignettes to find them with the budget available. In the demonstration in the following section, a vignette involves selecting a single primary outcome and input of interest at a time. While LANTERN is designed to be flexible for analysts to choose the number of input and output dimensions in the cases considered, the initial demonstrations are with one dimension each to focus on the overall methodology. Additional future research is also needed to explore efficient BO and DFO approaches for finding rare, extreme events as encountered with military simulation.
5.3. Step 3: initial search
While the exact search approaches to use will be driven by the specific case, an initial starting point involves ease-of-implementation and understanding for initial steps focused on supporting analysis of knowns and unknowns. This starting point prompts techniques for optimization that search for non-rare extrema. From the BO field, two of the most common are the expected improvement (EI) 48 and Trust Region Bayesian Optimization (TuRBO) 50 approaches. These techniques typically use a GP surrogate model of the response as a way of working with uncertainty about the functional and stochastic behavior of response as in core BO approaches. These techniques have been successful in many applications and have been empirically shown to be effective at searching for smooth, local extrema. 50
The applicability of EI and TuRBO for relatively smooth, local extrema is due to the combination of a traditional Matern kernel GP approach (e.g., as by Eriksson et al.
50
) which is highly effective at modeling relatively smooth, continuous functions, with acquisition functions focused on exploiting observed monotonic behavior. The EI approach chooses samples at points with the highest expected improvement based on a combination of the predicted mean compared to the current most extreme point
The TuRBO acquisition function instead uses a Thompson Sampling acquisition approach where function realizations are drawn from the GP and the explicit most extreme point is identified. This is repeated multiple times creating a sampled bias toward the extreme points predicted by the GP distribution. In addition, TuRBO manages a trust region based on an infinity norm, increasing the trust region lengths when successful in improving on the extreme, and decreasing the lengths when failing to improve. 50 In practice, this technique was found to be highly effective at following local monotonic behavior and converging on a local extreme. See the study by Braafladt 25 for further detailed discussion.
For LANTERN, when given no prior information, a limited budget, and the time-cost of implementation, using an initial search technique such as EI or TuRBO provides a way to confirm existing non-rare extrema with some chance of finding localized or stochastic extrema. This chance is expected to be lower as the complexity encountered increases because of the dependence of EI and TuRBO on Gaussian noise (and of EI performance on Lipschitz continuity). The results gained in this step refine the knowledge considered in Step 2 and allow updated hypotheses about what is missing but desired about the response.
5.4. Step 4: rare, localized extrema
Next, when there are hypotheses about potential highly localized behavior or portions of the input space that warrant additional exploration, new techniques effective at adaptively finding rare, localized extrema can be used in search vignettes. These vignettes focus in-depth on important slices of the response and can search for counter-examples which contrast with the existing knowledge about the response. For example, see the search vignette box in Figure 5. One specific option is to use the Taylor Expansion-Based Adaptive Design (TEAD) technique from the study by Mo et al.
51
that biases space-filling exploration toward areas of high nonlinearity where local Taylor Expansion models differ significantly from a global GP model.
51
This technique is shown in the study by Braafladt
25
to more efficiently find a highly localized extreme by allocating more search value to exploration and areas of high observed nonlinearity, making it an important resource for LANTERN. This approach uses the same GP modeling approach as described for the EI and TuRBO BO techniques, but uses an acquisition function as in Equation (3)
51
where
Alternatively, the Partitioned bayesIan OptimizatioN for multi-Extrema Emergent behavioR (PIONEER) technique developed by Braafladt 25 demonstrates significant sample efficiency improvement for finding multi-modal extrema. PIONEER uses geometric input space partitioning and penalization of discovered extrema to more efficiently guide search toward unexplored parts of the input domain, and converges on extrema when monotonically increasing data points are observed in a local area using internal local searches with TuRBO. See the study by Braafladt 25 for additional detail, and a future in-depth publication is forthcoming.
Overall, tailored search vignettes focused on specific characteristics of localized rareness allow important slices of the response to be considered in depth. This has the goal of providing updated information about whether (and where) these extrema exist in combination with analyst expertise. The new application of TEAD and the new PIONEER BO technique provide ways to complete the search more efficiently than searching randomly with MCS. For this step in LANTERN, newly identified extreme events can be added to a set of important events to consider further in capability design.
5.5. Step 5: rare, stochastic extrema
In addition to extreme events in the expected value of the simulation response, surprisingly extreme stochastic realizations can also be highly important as they showcase potential vulnerabilities or opportunities. Depending on the dynamics involved, scenarios with these rare, stochastic events can be modified to reinforce or neutralize the associated dynamics and extreme response. The LANTERN approach includes a step that uses adaptive approaches for finding these rare, stochastic extrema. This type of adaptive search approach is new, as existing methods typically focus only on expected values, use high density MCS, or seek to minimize variability. A new technique developed by the current authors in the study by Braafladt 25 called Variability Bayesian Optimization (VarBO), and which is part of a forthcoming detailed publication, focuses adaptive sampling on areas of localized high variability to specifically find rare, extreme stochastic events. This involves comparing the GP-predicted variability with a local empirical statistic of variability with the goal of focusing on areas where these diverge. This step of LANTERN focuses on specific, scoped search vignettes trying to find areas of important high variability, and newly identified events can once again be added to the set to consider further in capability design.
5.6. Step 6: evaluate search results
After completing the AI-driven adaptive search vignettes looking for extrema, the LANTERN methodology is iterative, returning to previous steps based on the new information uncovered. This part of the methodology is human-centered, and involves re-evaluating the known knowns, known unknowns, and potential for unknown unknowns. Importantly, the adaptive search techniques are driven by heuristics targeting different types of dynamics and behavior. While limiting the results by no longer being mathematically convergent (as MCS is), the adaptive techniques have been shown empirically 50 to be more effective than MCS in certain cases. This requires the human-guided approach as in LANTERN, and is best suited to cases where high-density MCS is not feasible. Finally, when important events are found during the iterative steps in LANTERN, analysis steps further exploring those events are required to work to address or exploit the events in capability design.
6. LANTERN analysis case study
A demonstration of LANTERN to search for rare, extreme events more quickly than with MCS is described next using simulations developed using the US Air Force tool AFSIM. 12 Additional details on the simulation implementation are available in the study by Braafladt. 25
6.1. Step 1: set up analysis case
Two example scenarios are used to demonstrate the LANTERN methodology. First, a two-dimensional Suppression of Enemy Air Defenses (SEAD) scenario is used to demonstrate the step focusing on rare, localized, extreme events, and second, a four versus four air combat scenario is used to demonstrate the step focusing on rare, stochastic, extreme events. The process for creating the models and simulations that start an analysis case are part of capability design, but this is not the focus of LANTERN in this paper, and so a description alone is given.
The two-dimensional (2D) SEAD scenario is adapted from the study by Zhang et al. 42 and involves aircraft choosing a path through air defenses to arrive at a target while air defenses search for, and try to shoot down the aircraft if it is within their range. An overview of the scenario is shown below in Figure 6. The choice of path for the aircraft is parameterized by a turn along the indicated line, and the scenario is designed to result in rare, localized extremes in terms of aircraft survivability.

Illustration of the 2D SEAD scenario as implemented in AFSIM. The aircraft fly along the flight lines to try and reach the target. The circles show the areas where the air defenses will fire surface-to-air (SAM) missiles to try and shoot down the aircraft. The input variable is the point along the indicated line chosen as the turn point in the flight path.
Next, the four versus four air combat scenario is based on an engagement between fighter aircraft. The scenario starts with a beyond visual range (BVR) phase with agent behaviors designed using description from the study by Soleyman et al. 41 followed by a within visual range (WVR) phase with agent behaviors designed based on the studies by Pope et al. 43 and Stillion. 52 The initial configuration of the scenario is shown from a top-down view in AFSIM in Figure 7 and the BVR and WVR phases are highlighted in Figure 8. The input variable chosen for the demonstration is the fraction of expected range for the medium range missile (MRM) fired by the red fighters. This is the distance at which the red fighter will launch an MRM at a target as a fraction of the maximum expected range of the missile. Based on a characterization of the simulation in the study by Braafladt, 25 the average missile miss distance was chosen as an output that can be used to demonstrate occurrences of rare, stochastic extrema.

Illustration of starting configuration for the four versus four air-to-air engagement. The aircraft shapes are the agents that move and act in the scenario. One side is on the top and one on the bottom.

Air-to-air engagement phases included in simulation—Beyond Visual Range (BVR) and Within Visual Range (WVR). In addition to the aircraft agents, missile agents are created and act dynamically in the simulation when launched.
6.2. Step 2: initial response data
Analysis of the scenarios specified in LANTERN Step 1 first involves initial exploration of the simulation. This typically involves the use of DoE techniques to consider the response at multiple input combinations (e.g., as in Christensen and Salmon, 21 Connors et al., 20 or Diallo et al. 22 ) The goal of this step is to get an initial idea about what the response looks like across the input space. After generating initial data, LANTERN requires analysts to select search vignettes to consider in more detail. The default first step is LANTERN Step 3—adaptive search for non-rare extreme events. When important, new extreme events are found, an exit through LANTERN Step 6 to return to outside methods in capability design to consider the dynamics associated with the events is recommended.
Analysis with LANTERN leverages a combination of human analyst expertise and AI techniques. This requires humans to generate, test, and revise hypotheses using the techniques available in LANTERN. These hypotheses can take the form of expectations about the simulation behavior to test or as what-if questions to consider. For example, in the 2D SEAD problem case, a guiding hypothesis could be “If fighter-bombers fly through air defenses to a target with no knowledge of the defenses, certain paths will be much more survivable based on vulnerabilities in the air defenses.” The orange points in Figure 9 provide a traditional search using an MCS approach (which is also similar to a space-filling DoE technique), but does not find the important peaks in the response.

Example search in the 2D SEAD problem using the TuRBO BO technique. The input turn point location is the horizontal axis and the output survivability of the blue aircraft is the vertical axis based on 60 replications. The red line (mean) and shaded area (1% to 99% percentiles) is the true response based on 300,000 MCS samples. The orange points are 10 initial random samples and the green points are 120 adaptive samples using TuRBO (each with 80 simulation replications). The blue area is the GP surrogate model mean (line) and two standard deviations (shaded area). Note the convergence to the local peak on the right, but lack of exploration to the other peaks. See online version for color.
6.3. Step 3: initial search
The starting point for adaptive search in LANTERN focuses on ease-of-implementation and understanding. This extends the data from Step 2 with convergence to local optima, which in turn form a baseline for hypotheses about extreme behavior in the simulation. An example of this initial search using the popular TuRBO technique is shown in Figure 9. Each dot in the figure is a specific input setting along the parameterized line from Figure 6 (0.0 at one end of the white line and 1.0 at the other), with the output from an average across 80 replications on the vertical axis. The orange points are 10 initial random samples and the green points are 120 adaptive samples. These points are compared with the 1% to 99% percentiles from high-density MCS (300,000 simulations with 60 replications per input–output realization) shaded in red, and the blue shaded area shows the two standard deviation area from the GP model used in TuRBO. The initial search is successful in converging on an extreme event, but does not provide additional information about the other peaks that are known to exist based on having the truth data available. In practice, the red truth area will not be available, prompting a need to search for other extrema in the following LANTERN steps.
For the human expert in the analysis loop, this step builds off of the knowledge that an initial random search did not result in finding extreme events of interest. Therefore, a revised hypothesis could be stated as “There is structure in the problem that can be followed to an important, extreme, ‘thread-the-needle’ path through the air defenses.” In this case, as in Figure 9, an initial peak is found.
6.4. Step 4: rare, localized extrema
Next, hypotheses about multi-modal behavior or a desire to find counter examples prompt adaptive search vignettes specifically looking for rare, localized extrema. In the SEAD scenario illustrated in Figure 6, a comparison of the new PIONEER adaptive technique from the study by Braafladt 25 is made with MCS for finding rare, localized extrema. With the PIONEER adaptive technique, a factor of four times improvement was observed for success in finding all multi-modal peaks compared to MCS. 25 The improvement can be observed in an example search shown below in Figure 10 where the “truth” is plotted in red (based on 300,000 MCS simulations) and the BO GP ML model confidence area (two standard deviations) is plotted in blue. The initial 10 random samples are shown as orange dots and the 120 MCS (top plot) or adaptive PIONEER samples (bottom plot) are shown as green dots. The black lines on the lower plot show partitions of the input space made by the PIONEER algorithm (which is described in Braafladt 25 and will be further detailed in a forthcoming publication).

Comparison of locations of adaptive samples using PIONEER and random MCS samples on the SEAD problem described above. The red line shows the mean of the MCS-based truth observed with 300,000 simulations, and the shaded red area shows the 1% to 99% percentile area. In the lower plot, the blue line shows the mean of the GP posterior and the shaded blue area shows the two standard deviation area of posterior. The orange points in both plots are the 10 initial random samples, and the 120 green points are adaptive PIONEER samples (for the lower plot) and MCS samples (for the upper plot). The PIONEER plot shows the partitions in the input space with black lines. Note the effective concentration of samples around the peaks in the PIONEER approach compared to the MCS approach. See online version for color.
In Figure 10, the PIONEER technique can be seen concentrating samples at the peaks while the MCS technique spends many samples in unpromising areas. In addition, PIONEER adaptively continues to all three peaks, improving on TuRBO in Figure 9, which converges well only to a single peak, and would require restarts or constraints to find the other peaks consistently. Overall, the adaptive PIONEER technique finds 88% of the possible peaks for 3% of the cost of the MCS truth (across eight replications, with the results of one replication plotted in Figure 10) in an experiment further detailed in the study by Braafladt. 25
For this example step, the human expert interaction in LANTERN involves specifying and testing a further hypothesis along the lines of “There are additional localized extreme events which provide paths through the air defenses with higher average survivability.” In this case, this would prompt the multi-modal techniques from LANTERN and identify alternative options as in Figure 10. In comparison to relying only on a denser MCS run, the techniques from LANTERN provide information about heuristic convergence of BO techniques inside each of the partitioned areas.
6.5. Step 5: rare, stochastic extrema
A new approach for adaptive search that focuses on rare, stochastic extrema is introduced with LANTERN. When given hypotheses about important rare stochastic dynamics or a desire to find stochastic counter examples, the new VarBO technique enables more efficiently finding these types of events compared to MCS. This is demonstrated in the four versus four air combat scenario described above. A comparison between the VarBO adaptive technique and a traditional MCS approach is shown in Figure 11. In the figure, the MCS-based “truth” is shown in red with the shaded area being from the 1% to 99% percentiles of 40,000 simulations (80 uniformly distributed input values with 50 realizations of 10 averaged replications). Note the two areas of localized, skewed, heavy-tailed noise (one around 0.17 and one around 0.5 in the input). The 250 samples chosen by the search techniques are shown in green, and 10 initial random samples in orange. Each sample output is taken as the average of 10 simulation replications. The blue area is the confidence region (two standard deviations) of the GP ML model used with the VarBO technique. Comparing the two approaches in Figure 11, the MCS approach does have a few samples in the area of the left rare, stochastic events, but the VarBO technique concentrates significantly more samples in the area, and reaches a higher observed extreme value. Note that some initial exploration samples are required in the area of the stochastic extreme dynamics—with an extreme realization—for the VarBO technique to latch on. This means that given the sample budget allocated in this experiment, the stochastic extreme peak in the center is not identified, but is expected to be identified if additional sample budget is used. Overall, for nine repeated searches in Braafladt, 25 the VarBO technique requires 58% fewer simulations to find rare, stochastic extrema and resulted in a 140% increase in rare, stochastic extrema found compared to MCS. Discovering these extreme events rapidly and more efficiently enables follow-on techniques that can investigate them in detail.

Comparison of locations of samples using VarBO and MCS to search for rare, stochastic events in the four versus four scenario described above. The red line is the mean MCS-truth based on 40,000 simulations with the red shaded areas showing the 1% to 99% percentiles (80 uniform grid input points with 50 realizations each an average of 10 replications). The blue area is the GP ML model used with VarBO, where the blue line is the GP mean and the shaded area is the two standard deviation area. The orange points are the 10 initial random samples, and the green points are the 250 search samples from MCS on the upper figure and VarBO on the lower figure. Note that the VarBO technique concentrates significantly more samples near the left-most stochastic extreme peak, but that both techniques miss the right stochastic peak. See online version for color.
The human-centered process in this step involves what-if hypotheses about the existence of events leading to much less effective missile performance in the air combat scenario. For example, a hypothesis could be “Certain red MRM launch ranges will be more important and missiles will be closer to hitting on average.” or “Certain dynamics will result in extreme missile miss distances, and if these dynamics can be isolated and examined, may prove important to exploit for choosing missile launch tactics.” Based on choosing the recommended technique from LANTERN, Figure 11 shows that many more examples with extreme missile miss distances at certain points in the tactical input can be used in follow-on analysis to attempt to define better tactics.
6.6. Step 6: evaluate search results
The last step in LANTERN involves evaluating the search results and then either switching to another step in LANTERN, or returning to other methods in capability design to explore the causal behavior leading to the newly discovered extrema to work to develop ways to mitigate or exploit the dynamics. The default step is to return to Step 2 and update analyst-centered hypothesizing about knowns and unknowns in the current dataset. Analysts then can define new search vignettes based around postulated impacts of dynamics in the scenario, and systematically test the hypothesis in the other steps of LANTERN.
Using the example hypothesis from the previous section, human-centered analysis following from this step could involve collecting the identified points with extreme missile miss distances and comparing them to the default case in further analysis focused on causality in the dynamics. This could allow the definition of better rules-based tactics that cause the adversary to have extreme missile misses.
7. Conclusion
The results presented in this article demonstrate the new LANTERN methodology for AI-driven adaptive analysis of emergent behavior in capability design. Past approaches have used MCS to search for important events with brute-force, pseudo-random algorithms, but these approaches are infeasible with the expensive, high-fidelity simulation used in capability design. Finding emergent behavior with LANTERN is instead enabled by specifying a definition of emergent behavior as rare, extreme events in military simulation and by leveraging new techniques for AI-driven adaptive experimentation to specifically target the nonlinear and stochastic responses associated with rare, not-yet-known, important events.
LANTERN follows a set of steps focused on different search vignettes that guide analysts through ways to search for specific types of potential response behavior. To improve search for rare, localized, and stochastic events, the new PIONEER and VarBO BO techniques were developed and compared to MCS in this work. These techniques provide ways to find rare, extreme events at a fraction of the simulation cost of MCS. For example, in experiments with simulation in AFSIM, PIONEER finds 88% of multi-modal, localized extreme events at 3% of the simulation cost of MCS and VarBO finds 140% more rare, stochastic events than MCS for the same cost. Also important to the success of these new techniques is Bayesian ML. GPs are one of the core components of the techniques and fulfill a requirement to systematically adapt a probabilistic understanding of what is known and unknown about the response of the objective.
The new LANTERN methodology provides analysts with a structured approach that supports making and testing hypotheses about important behaviors that may exist in the military scenarios being simulated in capability design. This new approach opens up the possibility of finding these important events deliberately instead of by chance, and additional future work is needed to handle additional challenges for searching for rare, extreme events as simulation complexity and analysis needs scale-up. This includes exploring multi-dimensional ABS scenarios and BO techniques for working efficiently with multi-dimensional inputs and outputs. In addition, the ability to find surprising, rare, extreme events prompts the ability—and need—to develop methods for working with rare, extreme events after they are found. This includes finding ways to mitigate or exploit the associated dynamics as part of accelerated military capability design.
Footnotes
Acknowledgements
The authors would like to acknowledge the support and advice provided in prior thesis research by Dr. Nicholas Hanlon, Dr. Mark Whorton, and Dr. Daniel Schrage.
Declaration of conflicting interests
The views expressed in this article are solely those of the authors. The Author(s) declare(s) that there is no conflict of interest.
Funding
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
