Abstract
Computer simulations have revolutionized the analysis of military scenarios. As computing power has advanced, simulations can now incorporate intricate tactical-level engagements. However, accurately representing actors’ decisions at this level poses new challenges for developing and validating these simulations. In this context, this paper presents the methodologies and lessons learned from a study conducted to assess the application of agent-based modeling and simulation (ABMS) in analyzing beyond visual range (BVR) air combat scenarios, focusing on the influence of agent behavior on the outcomes. The proposed approach integrates real pilots into a face validation phase to examine symmetric and asymmetric engagements. The results underscore the significance of agent behaviors for the outcomes, for example, showing how specific behaviors are capable of mitigating the advantages of superior weaponry. Furthermore, the research explores the dynamics of aircraft acting in pairs, demonstrating the potential to evaluate tactics and the impact of numerical advantage. Ultimately, the results enhance the simulations’ credibility and confirm their plausibility, in line with the face validation methodology. This powerful phase bolsters subsequent steps in the overall validation process. In addition, the findings show how specific configurations of the agents, including tactical coordination, can significantly affect the simulation outcomes and validity.
Keywords
1. Introduction
The analysis of military scenarios supported by computer simulation is not new and has been a challenging study since the advent of modern computers. In the 1950s, the US Army was already seeking to integrate computer solutions into wargames to automate the calculation of abstract models, which defined the evolution of some scenarios. With the increase in computational power and advances in modeling techniques, simulations can now represent the environment, systems, and military equipment in high resolution, allowing the analysis of the most diverse scenarios. 1 As military simulations progress, there is a growing demand for sophisticated modeling methods that precisely depict the intricacies of modern warfare. This includes the portrayal of human behavior in operating these systems, which continues to present a significant challenge in constructing credible simulations. 2 Furthermore, despite combat being fundamentally a human endeavor, the recent incorporation of intricate autonomous systems 3 has prompted numerous studies seeking the development of advanced models capable of precisely representing and analyzing such scenarios.
In this context, various military studies have attributed relevance to the term agent-based modeling and simulation (ABMS) due to its utilization in diverse scenarios to represent complex behaviors effectively. Although there is no consensus on the exact definition of ABMS,4,5 an accepted view is that agents are autonomous entities that perceive the conditions of the environment, such as the state of the simulated situation and other agents, making decisions that influence the evolution of the scenario. 4 Furthermore, the increase in the utilization of agent-based simulations is not exclusive to military analyses. Macal 4 identified at least 17 other different areas, including economics, tourism, geographic research, psychology, and epidemiology, demonstrating the flexibility that ABMS has acquired. Military applications have employed ABMS to analyze numerous aspects of warfare scenarios, from communications to operator performance. This technique has gained acceptance as a viable and even indispensable approach for addressing the intricacies of the modern battlefield.1,6–8 Representing the behaviors of these agents, whether human or non-human, introduces new challenges that extend beyond the development of the models. The validation and verification processes also necessitate a reassessment of previously established understandings, as they tackle with these novel complexities.2,9–11 Given the significant role of agents in the outcomes, developing assessment mechanisms to measure simulation validity, considering the influence of their behaviors, is critical to allow effective utilization in combat simulations.2,4,5,9 Despite the significance of these aspects, they have been largely overlooked in the majority of publications within this field, resulting in the persistence of numerous challenges.9,11
Therefore, studies that examine different scenarios are essential to increase confidence in these methodologies and consolidate the understanding about the real possibilities and limitations of the ABMS to evaluate military scenarios. To address this, this paper introduces methodologies and lessons learned from a study conducted to assess the application of ABMS in the analysis of beyond visual range (BVR) air combat scenarios. These scenarios are typical cases in which the simulated agents require complex behaviors to represent human decisions or unmanned systems. The main contributions of this study are as follows:
Propose methods to help the face validation of models representing a high-resolution air combat simulated scenario with practical results and findings.
Exemplify how certain main parameters of the agents in a BVR combat scenario, including tactical coordination, can significantly affect the simulation results.
Introduce an alternative solution to model the BVR air combat decision-making process using a Unified Behavior Framework (UBF).
2. Background and related work
2.1. ABMS in military simulations
The ABMS has gained significant traction in military simulations due to its ability to represent complex systems, human behavior, and autonomous entities in a more realistic and flexible manner. 4 The representation of human and autonomous behaviors in military simulations is essential to accurately model and analyze the interactions between various entities and the evolving dynamics of modern warfare. ABMS offers a suitable framework for capturing these complex behaviors and interactions by allowing the modeling of individual agents, each with unique attributes and decision-making processes, and allowing the simulation of emergent phenomena arising from agent interactions.4,6,8 Recent studies demonstrate applications of ABMS to simulate various scenarios. Lee et al. 12 applied agent-based simulations to assess the impact of failures in communication links in a network-centric warfare. Agents were also used to evaluate the effects of unmanned aircraft systems in different battlefield scenarios.6,7,13 Representing several elements of a joint force as agents, such as aircraft, anti-aircraft, radar, and vessels, Andrew Au et al. 8 consider the outcomes of the simulations to predict the advantages or disadvantages of using different resources and equipment. McCourt et al. 14 introduced an approach to understanding network-enabled operations using agent-based simulations. Connors et al. 15 presented a method to evaluate a new type of air combat weapon using ABMS. De Lima Filho 16 proposed a methodology that uses agent-based war game simulations to optimize tactical unmanned aerial vehicle (UAV) formations for different scenarios. This type of simulation also found application in training a decision tree designed to support decision-making by human pilots in air combat missions. 17 Other publications have focused on developing models of the agent behaviors themselves,18–21 representing fundamental studies to consolidate the use of this type of simulation. The majority of these studies concentrate on simulations at the engagement level, wherein ABMS provides valuable insights into diverse facets of military operations. This, in turn, aids in the advancement of more efficient tactics and enhances the decision-making process. Specifically, for BVR combat scenarios, the ABMS has been shown to be a powerful tool and has been applied to support analysis,16,22 training,10,23 and the development of new technologies.15,17,24
2.2. The beyond visual range air combat simulations
BVR combat summarizes an engagement between two opposing teams aiming to hinder the enemy’s advancement through the utilization of long-range missiles (over 50 km). 24 While close combat is highly dynamic and intense, depending on the pilot’s instinctive reactions, BVR combat is more cadenced and dependent on adequate tactical planning. 25 This planning is carried out owing to several factors related to the systems involved, such as aircraft performance, radar range, missile range, missile–aircraft communication capabilities, and missile detection capabilities. These characteristics and more recent questions regarding autonomous combat aircraft have motivated several studies, either in the search for improvements in models and agents to simulate air combat scenarios, as in Toubman, 10 Huang et al., 19 Toubman, 20 Yao et al., 23 Floyd et al., 24 Yang et al., 25 and Ernest et al. 26 or in using these simulations for specific analysis.8,15–17 Most BVR combat simulation studies have focused on scenarios at the engagement level, with each aircraft treated individually. Although achieving reasonable simulations at this level can be challenging, it is crucial to invest efforts in building more realistic models that represent next-generation aircraft and related technologies, given the significant impact of combat aircraft on the battlefield, including the decision-making process involved in operating these advanced aircraft. 2 Therefore, it is essential to adequately evaluate the implemented agent behaviors to grant meaningful analysis using simulations; otherwise, the performance of that behavior will restrict the validity of the results.9,20
For example, Connors et al. 15 introduced a methodology for using agent-based simulations to evaluate new aerial combat weapons. The suggested approach aims to identify the key performance parameters that distinguish new air-to-air missiles to support the acquisition process. The study conducted tactical-level simulations to assess scenarios that involved high-resolution models of aircraft and weapons. However, to appraise a new weapon type, in addition to the weapon simulation model, the agent needs to know how to apply it adequately, or the simulation will not reflect reality. Consequently, even though the method delineated is suitable and statistically robust, 15 for a practical application, behavior validation would be critical to show that the agent can take advantage of the new characteristics of the weapon. This example reinforces the need to keep in mind that the agent plays a crucial role in the simulation, as well as in many phases of the verification and validation processes. 27
2.3. Verification and validations of systems and behaviors
The verification and validation processes of the models are essential to ensure the accuracy of simulation in the desired scenario. Verification is the phase in which the implementation of the models is evaluated, looking for possible errors in the programming or running platform, with conceptual models as the primary reference. 27 Validation, in turn, starts from an already verified model and seeks to adjust the performance to ensure that it satisfactorily represents the desired real-world system or behavior. Although the validation process can have significant differences depending on the problem and the models, some essential characteristics must guide high-level decisions regarding the validation process.27,28
There is no definitive validation considering that the model cannot exactly represent the real-world system.
A properly validated model can provide guidance for decisions similar to those that would be made if the utilization of the real system were possible.
The development of a simulation model should be tailored to specific goals. The model’s validity hinges on these defined objectives.
The complexity of the models directly correlates with the difficulty of the validation process; as models become more complex, the validation process also becomes more demanding.
Another relevant definition is the type of validation that affects the methods used to conduct the validation process and the consolidation of the models. An accepted concept divides the validity into two dimensions. The first dimension refers to the nature of the test methods and can be subdivided into two instances: face validity after conducting a face validation or empirical validity after empirical validation. Face validity relies on theoretical models, system experts, or stakeholder views, also known as plausibility checking. Empirical validity relies on statistical analysis and tests to compare the results with real-world data. The second dimension refers to the model element addressed and has two instances: behavioral and structural. Behavioral validity refers to the input-output outcomes of the overall model, and structural validity refers to the internal structure of the model. 27 The types of validity required in the application of the simulations depend on the simulation objectives. Although technical and data limitations can also be decisive, it is possible to define recommended techniques according to simulation outcome objectives, as presented in Table 1.
Recommended type of validity for the simulation validation according to outcome objectives.
Source: Klügl. 27
For general military scenarios, the guides and frameworks available in the literature propose methods and techniques for adequately conducting the validation process.29–31 However, the lack of real-world data is a frequent limitation. The costs to carry out sufficient field tests, the demand for authoritative threat data, and uncertain future scenarios often require relaxing some validation type requisites. Considering ABMS, the challenges become even greater, adding critical factors:
It is hard to reproduce the agent decision-making process precisely.
Highly-dynamic models create chaotic behaviors (no simple steady states for comparison with reality).
Data to compare models at the individual agent level are often unavailable.
These models can have too many degrees of freedom (DoFs), enabling the fit to any data through machine learning or optimization techniques.
Therefore, the validation of ABMS applied to military scenario analysis is a complex task. Gaps in the data and intrinsic characteristics of the agent behaviors make the traditional validation techniques unsatisfactory. In the dimension of the nature of the tests, face validity can become a crucial part of the process, even surpassing the empirical validity that becomes unfeasible in many contexts. In the dimension of the model elements, the recommendation is to build the agents as simply as possible and pursue the best explainability of the behavior structures, always looking to facilitate the integration of the domain expert in the process.2,9,27,32
Regarding air combat simulations, although there are an increasing number of studies looking to represent the agents with high resolution, there is a lack of detailed references that outline the methodologies applied and the findings obtained during the verification and validation processes. Poropudas and Virtanen 18 proposed a game-theoretic approach for the validation of a discrete-event air combat simulation by analyzing many parameters related to the decision-making model representing engagement actions, while in the work of Toubman10,20 and Lee, 31 domain experts (fighter pilots) played a crucial role in granting a more realistic scenario to evaluate computer-generated forces, combating them in virtual simulators. However, most of the literature on air combat simulations8,15–17,23,25 only presents brief or no comments about their validation process, leaving a gap in discussions about these questions.
3. Methodology
The methodologies and lessons learned presented in this paper are from studies planned to assess the feasibility of using ABMS to analyze BVR combat scenarios to support the sizing and evaluation of air defense capabilities. The results presented here primarily refer to the face validation phase when domain experts evaluated the outcomes of already functional and integrated models, applying their technical knowledge and practical experiences.
3.1. Simulation platform
The base of the simulation experiments was an in-house developed computational suite, the Aerospace Simulation or ASA, an acronym derived from its Portuguese name, Ambiente de Simulação Aeroespacial. ASA is a flexible solution to the simulation and analysis of military scenarios. 33 Although some commercial solutions facilitate the construction of simulations in the defense area, such as FLAMES, VR-Forces, VSB4, and others, they can often limit flexibility and scalability for academic research and advanced studies. 34
In this context, the ASA uses as its simulation engine an open-source platform that integrates several resources to optimize the development of models with a focus on military applications. This solution is the Mixed Reality Simulation Platform (MIXR), a project that originated in the US Department of Defense and has been available to the public since 2006, with the original name of OPENEAAGLES. 35 The platform came from a private framework known as EAAGLES. It successfully found applications in various solutions, including virtual simulators for fighters and bombing aircraft, an air defense system simulation, and the management of futuristic armed conflicts. The available resources in the framework are libraries and classes that optimize the development of simulation solutions. The open-source version does not have a user-friendly interface, and the learning curve for creating MIXR-based applications is relatively long. However, after a good understanding of the platform structure, it is possible to develop solutions that are as complex and varied as necessary, quickly scaling the number of models and applications owing to its modular characteristics. Another relevant factor is the computational performance, which is greatly favored by native C++ development, with features such as parallel processing and distributed simulation incorporated into the framework. By adding ASA resources, the potential of the MIXR can be expanded, creating a flexible and scalable solution for the development and analysis of military simulations.
3.2. Models and agent behavior
This study utilized data and models derived from existing literature and the native models of MIXR, incorporating enhancements tailored to meet the specific contextual requirements guided by insights from experts such as fighter pilots and engineers. The main objective was to build generic models (not necessarily representing specific real-world systems) but with sufficient details and resolution to create a realistic simulation scenario. The simulations were initially built as deterministic to facilitate the initial evaluation; however, randomization factors were gradually added as necessary to increase realism.
3.2.1. The fighter aircraft
A model that aims to simulate an F-16, available in the MIXR through JSBSim, 36 was initially adopted to represent a fighter aircraft. The model works with 6 DoFs at 100 Hz and includes a flight control system, aerodynamics, and propulsion simulation. However, as will be elucidated in the analysis of the first-phase results, some unexpected asymmetric conditions motivated the switch to a less complex model, also standard in MIXR, which seeks to represent an F-22 performance. This model is also based on JSBSim and has fewer aerodynamic tables in its structure than the F-16 model. It reduces the complexity but is still sufficient to represent the main mechanical and aerodynamic characteristics of an aircraft that influence the BVR combat. Other elements, such as autopilot, communication radio, radar warning systems, and self-defense systems, were incorporated. These elements were not the focus of the evaluations and were kept identical for all simulations.
3.2.2. Airborne radar
The capability of airborne radar is a determining factor for the outcome of engagements in the context of BVR combat, being the primary element to confirm the opponent location. An aircraft capable of identifying its opponent in advance through its radar can predict enemy behaviors and better plan its engagement. The model adopted was based on the classes provided by the MIXR and sought to represent a system capable of detecting the enemy aircraft of the F-16 class at approximately 150 km. Even after adopting the aircraft model, which aims to represent the performance of the F-22, there were no changes in airborne radar and aircraft response to received emissions, ensuring the same previous range.
3.2.3. Air-to-air missile
The simulation of the air-to-air missile used the base model available in MIXR. This model allows the adjustment of several parameters at a high level, thereby significantly changing its performance. The references to evaluate the missile performance were two metrics frequently adopted in this context: the Maximum Range (MR)—the maximum fire distance to have some effectiveness, and the No Escape Zone (NEZ)—the region where any fire has a high probability of success. 37 The determination of these factors is essential for BVR combat because this information guides the firing moment during flight.
This study determined the MR and NEZ by running simulations with different initialization parameters considering head-to-head engagements. The findings showed a linear correlation between the MR/NEZ and missile flight time (one of the native MIXR missile parameters). The tests allowed the building of two missiles, defined according to their MR and NEZ, under specific conditions:
3.2.4. The agent behavior model
To simulate BVR combat at the tactical level is necessary to implement agents that reproduce the engagement decisions, being human pilot or not. Although the systems interact with each other, for example, a radar can identify another aircraft automatically, the engagement decisions drive the flow of aerial combat. In ABMS, there are multiple ways to model these agents. Despite the availability of some consolidated tools, such as NetLogo, AnyLogic, Repast, and Simio, in defense studies, looking for greater control, flexibility, and performance, it is common to find unique alternative solutions. 5 In this context, this work adopted the methodology UBF, a flexible solution successfully used in several applications, 38 including the model of air combat engagement behavior, 39 which integrates natively into the MIXR framework.
The agent behavior applied in this study is an adaptation of a model implemented by the ASA project development team. The structure foresees the use of basic components of the UBF, with Arbiters and Behaviors managing the creation of Actions in such a way as to subdivide the decisions by specific scopes. The definitions of the basic components of the UBF can be found in the work of Roberson. 38 For example, there is a specific Arbiter to generate radar operation Actions that can be modified and optimized independently from the rest of the structure, allowing a better understanding of the influence of this factor on the agent’s final performance. The specific details of the implementation of these Arbiters and Behaviors are beyond the scope of this work. However, an overview of their characteristics is interesting to understand better the common challenges faced in the quest to represent engagement decisions in BVR combat.
The implemented structure contains six Arbiter units in parallel, integrated by a general Arbiter, defined as the Master Arbiter. Second-level Arbiters act on a specific system or decision process to facilitate the implementation and understanding of each structural component. Thus, the development of the model, both conception and implementation, can be carried out in a modular way, taking advantage of this native characteristic of the UBF. The main objective was to create an agent realistic enough to demonstrate the methodology and simple enough to allow a clear interpretation of the effects when modifying the parameters of its components. In this way, the Arbiters act as Tactics Manager, Weapons Manager, Datalink Manager, Maneuvers Manager, Radar Operation Manager, and Target Allocation Manager, representing six areas considered decisive for the performance of a BVR combat mission. Different Behaviors evaluate the simulation state for each area and feed the Managers with suggested Actions. The diagram in Figure 1 represents the created structure, which also highlights the Finite State Machine (FSM Tactics) that feeds the Tactics Manager. The FSM facilitates the representation and implementation of transitions between high-level tactical decisions in a BVR air combat scenario. The solution also has the automatic coordination of two or more aircraft to simulate, even if partially, synergistic effects that enhance the group offensive and defensive capabilities. In general, the solution consisted of determining whether the more offensive aircraft had priority to engage with the target. By contrast, an aircraft with less offensive capability would be in a support condition. Determining the level of offensiveness and defensiveness followed the proposal presented in the work of Macedo 22 according to the state of the aircraft and the enemies.

UBF diagram of the behavior used in the experiments.
3.3. Validation methodology
The primary objective of the methodology is to support the face validation of the simulation, assuming that the conceptual development and implementation of the models were already verified. Considering the recommendations presented in Table 1, the face validation alone is only enough to support the illustration or to increase the understanding about the scenario. In case of a real application of the methodology, by doing additional steps based on real data, the statistical validation could be achieved to allow the use of the results to support up to strategical advice, which is not the goal of this study.
The proposal is to start simulating the entire scenario using already-integrated models, only reviewing the components individually in case of inconsistencies. The identification of inconsistencies during the validation steps is expected and would lead to revisions in the conception and verification of models as an iterative process. 27 The empirical validation of systems such as aircraft, radar, weapons, and related systems does not directly affect the demonstration of the methodology because the focus is on the consistency of the agent behaviors. However, the models should at least be plausible with reality to produce meaningful outputs for domain expert analysis. The adoption of the MIXR and ASA solutions supported the assumption of model plausibility, and an evaluation with domain experts was carried out through a structured walkthrough 40 to validate the plausibility of the agents’ behaviors. This step increased the understanding of domain experts about the technologies used to represent the decision-making processes represented in the simulation.
The process of conducting face validation had seven steps that gradually increased in complexity, allowing technicians and domain experts to identify possible inconsistencies cumulatively. Table 2 provides a comprehensive listing of the configuration, objectives, and motivation for each step.
Sequential verification and validation methods for the BVR combat simulation.
In all phases, the proposed test aims to be the starting point for analysis; the identification of inconsistencies can lead to additional configurations. The evaluation of the scenarios combined techniques presented by Sargent 40 with the face validation looking to facilitate the understanding and feedback from the domain experts:
The first reference for the process was defining primary parameters sufficiently relevant that changing them would make it possible to generate agents with significantly different characteristics of aggressiveness and efficiency. The view of domain experts was the central aspect when choosing these parameters, independent of the implemented models. The confirmation that the parameters selected by the domain experts could generate diverse behaviors would be the first validation of the models. Sensitivity analysis of the implemented behaviors could statistically indicate the most significant parameters. However, in this way, the result would be biased by the model in execution, which is exactly the validation target.
Furthermore, the implemented agents have more than 100 parameters that can affect the simulations. Although an effective design of experiment (DoE) could facilitate a more comprehensive sensitivity analysis, even for such complex simulation models, 41 the focus during the face validation phase was primarily on the domain experts’ input. This approach was intended to enhance their trust in the models, focusing on the most familiar parameters. This could also provide the developers valuable insights about what is most relevant in real-world scenarios. In this context, the parameters defined together with domain experts were as follows:
Shot distance (
Vulnerability threshold limit to initial defense (
Vulnerability threshold limit to last chance defense (
As an expected result, for example, defining that a pilot fire at a shorter distance (lower
Metrics for the behavior evaluation.
The initial phase involved the analysis of the simulation symmetry. The conceptual models expect that two aircraft approaching in symmetrical conditions, adopting the same behavior, will present the same performance in the engagements. Therefore, the proposal was to run simulations in different situations but always resulting in symmetrical engagements, looking for convergence in the performance of each team after several repetitions. The main objective of these tests was to perform the first analysis of the models and behaviors working together; therefore, implementation errors and other inconsistencies could appear. These conditions also make its tests a complementary part of the verification process. The basic configuration was 2 aircraft initiated 220 km apart, flying to a common central reference point between them: the blue aircraft from North to South and the red aircraft from South to North. This configuration was also the base reference for the other phases. Engagement should occur in the central region, both in the same conditions for the combat. In this test, the implemented behaviors did not have random factors, always acting in the same way for both aircraft. Finally, the initialization positions received the addition of random uniformly distributed values limited to a square area of 400 m on each side to generate variable engagements.
For Phases 2–7, the definition of seven different agents involved the active selection of specific parameters
In Phase 2, domain experts indicated the level (High, Medium, or Low) they expected that each agent would fit with the defined metrics (M1, M2, M3, and M4), considering symmetrical engagements (same agents for each side). To compare the simulation results with expectations, the level indicated by the domain experts received numeric values, and an additional metric called consistency factor (Cf) provided a global view of each agent:
The agents under assessment served as the benchmark for all subsequent tests. From Phases 3 to 5, the air combat scenarios are still between two aircraft, but gradually add asymmetric conditions: asymmetric agents in three, weapons in four, and agents and weapons in five. Phases 6 and 7 aim to evaluate the implemented tactics, considering the blue team acting in pairs with both aircraft using the same agent profile in Phase 6 and selecting different profiles in Phase 7.
Another critical factor to consider in the validation and acceptance process of constructive simulations is the determination of the number of executions required to achieve satisfactory results. This factor receives special attention when dealing with ABMS, 11 considering that the stochastic characteristics of this type of simulation often necessitate many replications to perform a comprehensive analysis. The exact number of replications needed can vary depending on several factors, including the complexity of the simulation, the degree of stochasticity, the precision required for the results, and the computational resources available. Standard techniques typically analyze the results based on means, standard deviation, and variance. However, in ABMS, the distributions of the outputs are usually unknown a priori, necessitating careful application of techniques common in other simulation types. 11 An alternative is the analysis of the stability of variance to define whether the results are entering a stability phase.11,41,42 The reference for the stability analysis is computed as follows:
where σ and µ represent the standard deviation and the mean of the sample, respectively.
The technique consists of running the simulations with increasing repetitions and measuring the value of cv for each case. As the number of repetitions increases, cv decreases and gradually tends to stability. A stopping criterion ε is defined, and if the reduction in cv between one sample and the next is smaller than ε, the previous becomes the minimum number of executions. Suppose an example where the reference number of simulation executions is n = {10, 50, 500, 1000, 5000, 10,000}, and a stopping criterion ε = 0.02. Suppose that when increases from 1000 to 5000 runs, the change in cv was lower than 0.02; therefore, 1000 runs would be the minimum number required. The definition of ε depends on the analyst and can affect the use of the technique, leading to a minimum number of executions that is greater than what is necessary. Despite this uncertainty, the technique proves to be a good solution if there is some flexibility in using computational resources, that is, the cost of running additional simulations is not critical. 41
4. Results and discussions
The steps in Table 1 guided the face validation of the simulation; however, inconsistencies and uncertainties led to further examination as the process progressed. Simplified solutions to consolidate the analysis were always preferred, aiming to facilitate understanding by domain experts, who should contribute during all phases. Although some of the results and discussions in this work are related to the adopted models and solutions, they demonstrate how the methodology helped identify inconsistencies and facilitate corrections and the validation process.
4.1. BVR combat 1 versus 1
These experiments analyzed the consistency of the models in BVR combat scenarios between two isolated aircraft. Although this represents the most basic situation in this context, the number of systems and decisions involved is complex.
4.1.1. Initial validation through symmetry analysis—Phase 1
The initial results demonstrated few variations; all aircraft remained alive or destroyed simultaneously, depending on their behavior. However,

Results of the initial symmetrical evaluation with different approximation radials.
In this phase, confirming the expectations,28,29,40 the relevance of an adequate visualization tool became evident, which made the analysis much more intuitive and efficient, for example, to visualize the direction of the radar beam frame-by-frame at critical moments of engagements. This detailed analysis identified many engagements decided by minor differences in the relative angles between the aircraft near the missile firing moment, generated by a slight difference in headings of less than one degree. The problem was that the differences were not symmetrical for the aircraft. In this case, the aircraft coming from one direction always exhibited a slightly larger aspect angle. After conducting multiple investigations, it was determined that several small contributing factors influence the results. For example, changes in the Earth representation model were sufficient to affect the results. An explanation of these effects is beyond the scope of this study. However, they drew attention to the level of sensitivity that high-resolution models can generate, which also raises the question: Does an engagement decided by a difference of less than one degree in aspect angle represent a real-world situation?
It would seem the answer would be no. However, a relevant factor is that the models and behaviors are exactly the same for both sides, allowing tiny details to define the results. In the real world, the represented elements would not be as precise as the computer models. In particular, for the agent behaviors, the decisions would never be based on so exact data, as for the implemented algorithms. These results reinforce the need for stochastic factors to represent combat behavior, avoiding exact threshold limits at decision points. 27 Thus, after this observation, random factors were added to the agent decisions by inserting minor errors at critical points. For example, after calculating the reference distance to fire the weapon, the value was modified upward or downward as a function of a normal distribution around the original value. Parameters were selected such that the errors generated did not exceed 2.5% in 95% of the cases. Although these values were arbitrarily selected, they could be adjusted based on real-world data representing, for example, the expected error in the function of the uncertainty in the sensor reads, enabling additional helpful analysis.
After implementing the corrections and using stochastic factors in the behaviors, the results improved and became symmetrical in most cases. However, inconsistencies emerged under certain conditions, as shown in Figure 3. The graph represents the accumulated result of 2000 executions in a face-to-face scenario. The solid lines are the results for the aircraft in the Northern Hemisphere (45° North latitude), and the dashed lines are the results for the same simulations, but with the aircraft in the Southern Hemisphere (45° South latitude). On average, the blue aircraft had 40% of victories in the Southern Hemisphere, whereas the red aircraft had 34%. In the Northern Hemisphere, the situation reversed, resulting in 41% of victories for the red aircraft and 34% for the blue aircraft.

Asymmetry in the results as a function of the variation only in the initial latitude of the aircrafts (F-16 model). (Batch 1: 45° N/Batch 2: 45° S).
The results led to a new investigation, culminating in identifying an asymmetry in the aircraft (the standard F-16 model of the MIXR/JSBSIM with an extra autopilot control). The performance differed when starting a turn, depending on the side (the aircraft was more responsive when starting left-hand turns). In short, asymmetric moments in aircraft rotation generated unexpected responses in the control loop, causing a difference in the time to complete turns for different sides. After analyzing some alternative corrections, a review of the models indicated that the aircraft was more complex than necessary for the desired analysis and would be more efficient in exchanging the aircraft model. The substitution was straightforward owing to the flexibility of MIXR, and the model that seeks to represent the F-22 performance was selected. Repetition of the tests with the new aircraft model confirmed this suspicion, and the results became symmetrical, independent of approaching radials and initialization latitudes.
The findings demonstrate how the methodology of symmetric tests is essential to identify inconsistencies and better understand the models. The results also raised questions regarding the models’ realism. At times, the models appeared to overextend beyond the scope of the intended analysis of this work, making the understanding unnecessarily complex at some points.
4.1.2. Symmetric behavior engagements—Phase 2
Phase 1 did not assess whether the agent behaviors adequately represented the expected decisions; however, this became the focus of Phases 2–7. Because these phases would demand many more cases, the coefficient of variability was applied to reduce the number of executions. With no reference to arbitrating an ε stop criterion, the decision was to analyze the coefficient evolution graphically. The results presented in Figure 4 indicated a point of diminishing returns after 200 runs owing to the significant reduction in coefficient variation. An g = 0.02 would result in 200 runs as the minimum number, which became the reference. Table 4 presents the configuration of the agents, domain experts’ expectations, metrics obtained from batch simulations, and Cf.

Coefficient of variability analysis to estimate the necessary simulation repetitions.
Results from initial evaluation of the behavior consistency analysis.
Considering the Cf results, the behaviors of the evaluated agents had an average compliance of 85.7% with the experts’ expectations, which indicates that the implemented models qualitatively performed as expected. Among the results, Agents 3 and 4 exhibited the most significant differences. In the case of Agent 3, Cf was 75%, with high lethality (M3R = 0.77) and low evasiveness (M4R = 0.23), whereas the expectations of domain experts were medium for both. For Agent 4, Cf was 63%, highlighting high offensiveness (M1R = 0.71) and low defensiveness (M2R = 0.29), contrasting with expected medium values. In the same way, the medium lethality (M3R = 0.49) contrasted with expected low. These differences do not necessarily confirm the inconsistency of the models, considering that the analysis by domain experts is subjective, and the discretization of the reference ranges can play a decisive role. The objective of the methodology is to provide an overall view of the behaviors and whether the changes in the parameters generate the expected outcomes. The metrics should condense the results and draw attention to further evaluation. For example, with a visual review of the simulations, domain experts can analyze the reasons for inconsistencies or confirm whether the models agree with reality despite early expectations.
In this phase, the metrics indicate the characteristics of each behavior under symmetrical conditions. These values are relevant to obtaining the relative reference characteristics among agents; however, there is no guarantee that the same would occur against different enemies. This first comparison was essential to simplify the evaluation by the domain experts, bringing a greater understanding of the behavior and models, and helping to fix specific aspects in a more profound analysis.
4.1.3. Asymmetric behavior engagements—Phase 3
In this phase, engagements were among the different agents previously defined in Table 4. The simulations approach more realistic scenarios than with asymmetric agents, but still in a limited context to facilitate the contribution of experts. Table 5 presents the average results of 200 repetitions. There were no cases with Agents 6 and 7 because all simulations using them ended without shots from both sides owing to their highly conservative characteristics.
Results from BVR 1 versus 1 with asymmetric behaviors.
The results have led to several discussions regarding the models and behaviors. For example, Cases 4, 7, 9, and 10 called attention because no aircraft hit. A common factor identified in these cases is Agent 5, which is highly conservative. After a visual verification of the engagements, the reason became evident: Agent 5 did his initial defense well in advance and could not resume any offensive action due to the enemy aircraft’s continuous pressure. These cases were not necessarily inconsistencies; they occurred due to the agent’s lack of a defined high-level objective; thus, defending itself becomes a priority. For example, in a mission context, the objective of defending a contact line should force reengagement. Another interesting result, although already expected, was that the offensive and defensive levels found in symmetrical engagements do not necessarily correspond to the results of asymmetrical engagements. The levels from symmetric scenarios are an average profile of the agents, representing how much each agent tends to be exposed or not to risk; however, in asymmetric engagements, the results change with the enemy profile.
In Cases 1–3, the blue aircraft adopts the most aggressive behavior, setting all main parameters to 1.0, whereas the red aircraft varies between Agents 2–4. In Case 1, blue with Agent 1, and red with Agent 2, blue wins 77% against 19% for red. In Case 2, with red using the less aggressive Agent 3, the win rate decreases to 56% for blue and 16% for red, reducing the difference. Finally, Agent 4 reversed the score with the red side winning 63% of the time and the blue side only 6%. These differences demonstrate how the result obtained by a behavior profile is dependent on the opponent, which is an expected outcome.
4.1.4. Asymmetric weapon scenarios—Phase 4
For the tests in Phase 4, the blue aircraft incorporated the longer-range missile B. The main expected result was that the superior missile would be sufficient to provide a great advantage to blue aircraft. The results in Table 6 demonstrate that the longest-range missile was decisive for the blue aircraft to outperform the red aircraft by a large margin in Cases 1–4. The worst results were in Case 3, with 84% of victories and 10% for red. In Cases 5–7, the very defensive profile of the red agents ended with almost no missile firing, except for 5% of hits in Case 5. Additional analyses by visualizing cases that drew attention confirmed the consistency of behaviors and demonstrated that the agents could use better weapons adequately.
Results from BVR 1 versus 1 with asymmetric weapons.
Blue: Missile B (MR 65 km/NEZ 41 km). Red: Missile A (MR 54 km/NEZ 37 km).
4.1.5. Asymmetric weapon and behaviors scenarios—Phase 5
In Phase 5, the tests maintain distinct missiles, including asymmetry in agent behaviors. An expected finding is how much a change in the behavior of red could reduce its disadvantages. After all, in a real situation, if the red aircraft knew the disadvantages of its weapon, it would hardly behave in the same way as the enemy, adopting, in general, more defensive behavior. The tests evaluated 20 cases with all possible confrontations among Agents 1–5, with missile B always in the blue aircraft. The results confirm that the red aircraft, in most cases, was able to reduce its disadvantages when using more conservative behaviors (agents with a higher index). As shown in Table 7, some exceptions drew attention to the blue team using Agent 3. Comparing Cases 9 and 10, although red has a more conservative behavior in Case 10, its performance worsened, with blue wins increasing from 84% to 91%. However, in Case 11, red with an even more conservative agent, 4, reduced the blue wins and won in 23% of the simulations. These results confirm that specific behaviors could reduce the difference despite a better missile representing a great advantage.
Results from BVR 1 versus 1 with asymmetric weapons and behaviors.
4.2. BVR combat 2 versus 1—Phases 6 and 7
After a good understanding of individual combats, Phases 6 and 7 explored aircraft acting in pairs. The main objective of this scenario was to demonstrate the possibility of evaluating the tactics. The aircraft acting in groups should guarantee significant superiority owing to the numerical advantage. The complexity can increase significantly in these scenarios, considering that the aircraft interactions can be diverse. Two simplified tactics were adopted to demonstrate the influence on the results. The tactical behavior of the implemented agent seeks to represent a basic concept of engagement in pairs: mutual support. 43 Essentially, the aircraft should coordinate to ensure that the numerical advantage enhances individual offensive and defensive capabilities. 20 In this context, three possibilities were considered.
The first part, identified as Phase 6, consisted of comparing simulation results with different tactics with all aircraft using the same agent. The initial condition was the already adopted symmetrical positions with the addition of the second blue aircraft 500 m laterally and 100 m vertically from the first. The tests consisted of 15 different scenarios with 300 replications each, recalculated using the coefficient of variability. The observation was whether the numerical advantage represented a significant gain for the blue team, which is evident in the number of victories that surpassed the red team. Considering all the simulations, the blue team won 42% of the engagements compared with only 1% of the red team. The highest winning percentage for the red team was 9%, with all aircraft using Agent 4 and the blue team using the mutual support tactic at 6 h. Another factor that drew attention was the M1 and M2. While the blue team got an average M1 of 0.43, the red got 0.33. For M2, the blue had an average of 0.82 against 0.57 for the red team. These results confirmed that the numerical advantage potentialized, as expected, the offensive and defensive capabilities of the blue team. However, lethality (M3) and evasiveness (M4) values have drawn attention. They resulted in practically equal average values of around 0.50 for both teams, with the red team doing much better in some cases. For example, using Agent 4, while the blue team got 0.27 for M3 and 0.38 for M4, the red team got 0.63 and 0.73, respectively, indicating that the advantage of the blue team, by the highest number of victories, could be simply due to the numerical advantage, not due to effective coordination. Analyzing the raw data of the results, in three cases (all with Agent 4), the blue team had more losses than the red team despite the higher number of victories. In these cases, after visual analysis, it was observed that, in several runs, after the initial defense, the red aircraft successfully fired against the blue aircraft. The results in Table 8 consolidate the observations with average results for each tactic.
Average results for each tactic from 15 cases of BVR 2 versus 1 using the same agent for all aircrafts.
Analyzing the total averages of wins, the coordination between aircraft did not necessarily result in an advantage. Indeed, the blue wins average with the 6 h positioning tactic was worse than with the individual tactic. On other metrics, it is possible to see some advantages, such as a significant reduction in the use of missiles for both tactics, with a drop of 0.41 for the 6 h tactic and 0.22 for 8/4 h, which demonstrates that, at least, the target allocation was effective. With tactical coordination, the aircraft does not fire simultaneously. However, a general evaluation demonstrated that the tactics model using the same aggressiveness for all the agents was ineffective. The understanding was that team blue using the same behavior for both aircraft was ineffective in applying the central concept of mutual support tactics. The expectation is that each aircraft of the pair must exploit its offensive or defensive capacity to the maximum at every moment of combat. 20
One solution was to add flexibility to the behaviors depending on the engagement state. Essentially, the most threatened aircraft of the pair should adopt more conservative behavior, while the other aircraft would wait for a safer situation to act more offensively. Two behaviors were selected to serve as this profile change for the blue team, while the red aircraft would maintain a fixed profile. With this possibility, a new test phase verified whether the difference would result in a more significant advantage for the blue team. Even considering only the 5 first agents and the 3 tactics used in the 2 × 1 BVR tests, 150 scenarios would be needed to evaluate all possibilities. For this study, these scenarios were reduced to 15, choosing only a pair of behaviors for the blue team, with Agent 1 for the most offensive aircraft and Agent 5 for the most defensive, with extremes in each characteristic. In this context, the results confirmed the expectation that changing the profiles in different situations aligned with the tactic could enhance offensive and defensive capabilities.
As shown in Table 9, the exchange in behaviors significantly increased the efficiency of the blue team. For example, comparing the winning percentage, the 6 h positioning tactic increased by 0.55 percentage points over the uncoordinated teams, while 0.32 for 4/8 h. Furthermore, M3 doubled to 1.0 for the blue team, indicating that all the missiles fired hit the enemies. These results reinforce the importance of careful analysis of coordination among agents. The tactics in this work are simplified solutions but sufficient to demonstrate the relevance of this aspect. Figure 5 shows snapshots of critical moments of the simulations using different agents in the blue team tactics.
Results from BVR 2 versus 1 using the different Agents 1 and 5 for the blue team.

Visualization of engagements with different behaviors in the BLUE team tactics. In “A,” BLUE uses Agent 3 for both, and in “B,” Agents 1 and 5.
5. Conclusion
This study proposes methods and shares findings from case studies that evaluate the use of ABMS in the analysis of BVR air combat scenarios. The results demonstrated that relatively simple techniques, such as symmetric and asymmetric engagement tests, can effectively help to validate and consolidate the models. Some findings, such as the detection of asymmetry in the F-16 aircraft model and the potential to decrease the disadvantage against a more powerful weapon, showed the ability of these methods to grasp the potential and limitations of the models. The high relevance of face validation when using ABMS 27 and building models to analyze military scenarios 32 justified this study. The characteristics of the face validation process and the applied tools are often directly related to the problem domain. Therefore, the lessons learned from this study are valuable for supporting the development of air combat simulations. However, the process, method of presenting the results, and proposed analysis sequence are applicable to other simulated scenarios and can provide useful insights to facilitate the understanding and integration of experts in the validation process.
The decision to create a limited number of agent profiles with different aggressiveness and defensiveness, instead of a complete exploration of the parameters, facilitated evaluations by domain experts. The selection of the three primary parameters allowed the experts to have a clear view of what each agent represents, increasing the explainability of the models. The variability in the performances demonstrated in Phase 1 confirmed the adequacy of
The stochastic elements at critical moments in the agent decisions proved effective in filtering out inconsistent results caused by the problem of the boundary parameters. These parameters could also be an option to represent the different performances of the agents, thereby expanding or reducing their margin of error.
Finally, this study also showed that the MIXR platform and its models allow the production of realistic behaviors to simulate BVR combat. The modularity of the UBF also facilitated the organization of the decision-making process in areas typically recognized by pilots, making its general structure more understandable to experts.
Footnotes
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Council for Scientific and Technological Development—CNPq 307691/2020-9; the Research and Projects Financing (FINEP—The Brazilian Innovation Agency) 2824/20; and the Brazilian Air Force Postgraduate Program in Operational Applications (PPGAO).
