An agent-based model to simulate and analyse behaviour under noisy and deceptive information

Abstract

This paper presents an agent-based model to analyse behaviour produced under noisy and deceptive information conditions. A simple yet powerful simulation environment is developed where adaptive agents act and adapt to varying levels of information quality that they sense about their environment. The simulation environment consists of two types of agents moving in a bounded two-dimensional continuous plane: a neuro-evolutionary learning agent that adapts its manoeuvreing strategies to escape a pre-programmed deceptive agent; and a pre-programmed agent, whose goal is to capture the adaptive agent, that acts on noisy information about the adaptive agent’s manoeuvres that it senses from the environment. The pre-programmed agent is also able to produce deceptive actions to confuse the adaptive agent. The behaviour is represented in terms of the manoeuvreing strategies that the agents adopt as their actions to the environmental changes. A behaviour analysis methodology is developed to compare agent actions under different information conditions, that elicits interesting relationships between behaviour and the studied information conditions. The framework is easily extendable to analyse human behaviour in similar environments by replacing the adaptive agent with an interactive human–machine interface.

Keywords

Behaviour analysis agent-based modelling simulation noise deception neuro-evolution

1 Introduction

The focus of our work is to understand behaviour in an adversarial environment, known as adversarial behaviour. Understanding adversarial behaviour is motivated by the goal to improve the decision-making process especially in competitive real-world situations. The efforts of developing and evaluating the strategies against the emerging threats from an adversarial environment can be widely found in the research related to adversarial learning (Dalvi, Domingos, Mausam, Sanghai, & Verma, 2004; Lowd & Meek, 2005; Barreno, Nelson, Sears, Joseph, & Tygar, 2006; Nelson & Joseph, 2006; Newsome, Karp, & Song, 2006; Veloso & Meira Jr., 2006; Jorgensen, Zhou, & Inge, 2008; Barreno et al., 2008; Nelson et al., 2009; Barreno, Nelson, Joseph, & Tygar, 2010), and also in military operations (Yang, Abbass, & Sarker, 2006; Abbass, 2009; Lauder, 2009; Abbass, Bender, & Whitbread, 2010a, 2010b). In such environments, the state of preparedness can be further improved or optimised if we are able to predict the actions taken by the opponents. However, the state of preparedness is no longer limited to anticipating the preparation efforts against adversarial threats, but also how the efforts affect the environment, the opponents or allies in the environment, and how they, or any action taken on them, affect our objectives in return (Abbass, Bender, Gaidow, & Whitbread, 2011). In this case, the study of adversarial behaviour allows us to anticipate the actions of the opponents, learn about their preferences, and organise our strategies against possible threats and competition from the opponents.

Behaviour, in natural organisms, systems or artificial agents, is commonly associated with their actions or decisions. The type of behaviour or change in behaviour is known to affect actions or decisions (Bargh, Chen, & Burrows, 1996). Conversely, the actions or decisions taken by entities reflect on their behaviour (Vallacher & Wegner, 1987). On the other hand, actions or decisions are taken to achieve certain goals or objectives (Maes, 1990). Furthermore, the goals are usually not set in isolation but are usually a function of the environment in which agents are embedded, through both including other entities in the system as well as including the physical environment itself. In a competitive world the goals may be set in the context of opponents who may have conflicting or even hostile goals. Common examples of such environments include predator–prey systems in nature (Lima, 2002), human decision-making in unknown environments, information warfare (Chaturvedi, Gupta, Mehta, & Yue, 2000) and adversarial learning (Riley & Veloso, 2000).

Many factors affect behaviour, including the quality of information available to agents about their environment when they make their decisions or take actions to achieve goals. Information (or lack of information, also known as uncertainty) can have a major impact on goals and objectives and hence on behaviour, especially in the types of hostile environments referred to above. The effect of information should therefore be a key design consideration when studying behaviour, especially in simulation-based studies of artificial behaviour. The key motivation of our work is thus to study adaptive behaviour under varying levels of information conditions. Specifically, we look at two dimensions of information conditions and their effects on behaviour: noise in the sensory input, and the introduction of purposeful deviation in the output actions, referred to as deception. Our hypothesis is that noise in information and deception in actions are two interdependent concepts. To this effect, we present an artificial agent-based simulation system in this work, in order to elicit the effect of such interdependent information conditions on behaviour, where behaviour is defined as the set of action sequences (strategies) taken by the agents to manoeuvre in the environment to achieve their goals.

The key ingredients of our simulation system include two artificial entities or agents: one with pre-defined action strategies, and the other, one that learns to adapt in the environment using neuro-evolutionary learning controllers. Both agents have set goals of capturing and evading the other agent respectively in a two-dimensional continuous bounded environment. The variation in the quality of information is simulated by controlling a) the level of noise, in terms of frequency and accuracy of information sensed by the pre-programmed agent; and b) the level of deception, in terms of frequency and magnitude of deviation from the desired action taken by the pre-programmed agent.

The neural controller in the adaptive agent is evolved under each of the experimental configurations resulting from different combinations of information-quality conditions. The movements of the evolved adaptive agents are then recorded across a number of simulation runs. The effect of varying the two information conditions on agent behaviour is then analysed across different configurations. The methodology used to analyse agent behaviour is another significant contribution of this work. Several features are extracted from the recorded action sequences and compiled in logical ways in order to see the effect of information conditions on behaviour. The methodology is generic and can be applied to compare behaviour in other environments.

Several researchers in the past have demonstrated the use of neuro-evolutionary controllers to generate complex action sequences, or behaviour, by artificial agents including those acting in hostile environments (Beer & Gallagher, 1992; Gomez & Miikkulainen, 1997). However, few studies look at the effect of noise on agent actions (Floreano & Mondada, 1994; Harvey et al., 2005). To the best of our knowledge, no study has looked at the effect of purposeful deviation (deceptive actions) from goals on agent behaviour. Our experimental setup allows us to study agent behaviour under the effects of both noise and deception.

The rest of this paper is organised as follows. Section 2 provides a detailed description of our proposed agent-based simulation environment, and the agent models used to study the agent behaviour under variable information conditions. Section 3 describes the experimental setup, detailing different configurations as a result of combining different information conditions. Section 4 provides details for the methodology used to analyse the outcomes from experiments. Discussion of the experimental results through the analysis methodology follows in Section 5. Finally, Section 6 presents conclusions and future work.

2 Simulation environment

In this section, we provide a complete description of the proposed agent-based simulation environment. The simulation environment can be considered a kind of game, where two agents with competing goals autonomously move in a two-dimensional world. One simulation run, or one game, consists of a fixed number of time steps, where both agents make equi-distant moves according to their strategies at each time step. A simulation run may also terminate earlier, if the adaptive agent is intercepted by the pre-programmed agent.

2.1 Environment representation

The environment essentially consists of a bounded two-dimensional continuous square plane, in which two agents move with assigned goals, as illustrated in Figure 1. The two agents (described in the sections below) can roam freely in this space without crossing the square boundaries. The simulation runs in discrete time, measured in terms of the number of time steps. The position of an agent in the environment at each time step is determined by its current x- and y-coordinates, where both x- and y-coordinates are bounded in the range $[0, 1]$ . At each time step, agents can move a fixed Euclidean distance ξ in any direction, represented by an angle within a range $[- π, π]$ . The travel angle of the pre-programmed agent is denoted $θ_{b}$ (b for blue) and the travel angle for the adaptive agent is denoted $θ_{r}$ (r for red). The agents generate these angles based on the strategies they adopt during the simulation. Both types of agent can move the same distance during one time step, so their manoeuvreing capabilities are matched. At the start of each simulation run, agent positions are initialised randomly within the given bounds of the environment. However, a minimum separation is enforced between the agents’ initial positions, to ensure adequate learning time is provided for the adaptive agent. This separation is measured in multiples of ξ and is given by $D_{init} = m ξ$ , where m is set arbitrarily (200 in all experiments reported in this paper).

Figure 1.

Schematic representation of the initial setup of the pre-programmed and adaptive agents in the game environment.

During the simulation, both agents move in the environment with opposing goals. The pre-programmed agent’s goal is to capture the adaptive agent, which adapts its strategies to escape being captured by the other agent. For both agents, their speeds are fixed so that they make equi-distant moves at each time step according to their strategies, λ = 10ξ/step. The adaptive agent is considered to be captured by the pre-programmed agent if the distance between the two agents falls below a minimum separation distance $D_{\min}$ . This threshold distance is also measured in multiples of ξ, and is given by $D_{\min} = c ξ$ , where c is chosen arbitrarily to ensure a capture event is triggered when the two agents overlap visually on the GUI (20 in all experiments reported in this paper). Note that the choice of this parameter has no impact on our experimental setup, and is merely an artefact of the visual representation that we chose to display the agents.

Each simulation run terminates when the adaptive agent is captured or a fixed number of time steps S elapses, whichever is the earliest.

2.2 The pre-programmed agent representation

The goal of the pre-programmed agent is to capture the adaptive agent as quickly as possible. It follows a fixed strategy to produce its actions (direction of movement in terms of $θ_{b}$ ) and achieve this goal. The fixed strategy simply follows an attraction rule, in that the agent always produces a travel angle in the direction of the adaptive agent’s perceived position. The perception of the pre-programmed agent is based on the information it senses from the environment about the current position of the other agent. This perception is affected by two factors: the first is the frequency and accuracy, of the information received through its sensors about the adaptive agent’s position. These two factors allow for the controlling of the simulation of quality of information. The frequency of information received by the agent about the position of the other agent is controlled by the parameter $N_{I}$ , and the accuracy of information is controlled by adding a uniformly distributed amount of noise, denoted by ${\hat{α}}^{(t)}$ , to the actual position of the other agent.

The second component of the fixed strategy allows the pre-programmed agent to deviate from its policy-based actions in a controlled fashion. That is, the pre-programmed agent produces deceptive actions to confuse the other agent. This is controlled by two parameters: $N_{D}$ , which controls the frequency of producing a deceptive action; and $ζ^{(t)}$ , which controls the magnitude of deviation from the angle produced by the fixed attraction rule and is chosen uniform randomly over a defined deception range.

The pseudo-code of the strategy used by the pre-programmed agent is shown in Algorithm 1. The terms used in the algorithm are described below:

${\hat{P}}_{b}^{(t)}$ : the actual position of the pre-programmed agent at time t;

${\hat{P}}_{r}^{(t)}$ : the actual position of the adaptive agent at time t;

${\hat{P}}_{ir}^{(t)}$ : the position of the adaptive agent as perceived by the pre-programmed agent at time t;

${\hat{α}}^{(t)}$ : the relative change in x- and y-coordinates due to the level of noise added to ${\hat{P}}_{r}^{(t)}$ , expressed as ${\hat{α}}^{(t)} = (Δ x, Δ y)$ ;

Counter: simulation clock.

Algorithm 1 Update intel and position
while Counter is not equal to Sdo
Update intel:
if (Counter modulo $N_{I}$ ) = 0 then
${\hat{P}}_{ir}^{(t)} \leftarrow {\hat{P}}_{r}^{(t)} + {\hat{α}}^{(t)}$
else {(Counter modulo $N_{I}$ ) ≠ 0}
${\hat{P}}_{ir}^{(t)} \leftarrow {\hat{P}}_{ir}^{(t - 1)}$
end if
Update position:
$θ_{c} = {\hat{P}}_{ir}^{(t)} - {\hat{P}}_{b}^{(t)}$
if (Counter modulo $N_{D}$ ) = 0 then
$θ_{b}^{(t)} = θ_{c}^{(t)}$
else {(Counter modulo $N_{D}$ ) ≠ 0}
$θ_{b}^{(t)} = θ_{c}^{(t)} + ζ^{(t)}$
end if
Increase Counter by 1
end while

2.3 The adaptive agent representation

To closely match these design requirements, the adaptive agent in the proposed framework is designed based on the neuro-evolutionary model (Nolfi, Parisi, & Elman, 1994). Using this model, the adaptive agent is represented by a population of back-propagating neural networks.

A genetic algorithm (GA) is used to evolve this population of neural networks, whereby individuals (neural networks) reproduce selectively based on their performance in generating the desired behaviour. It is undeniable that there are other evolutionary techniques which may perform better than GAs. However, the main focus of our work is to understand adversarial behaviour through the simulation. Since we focus on understanding the behavioural simulation rather than optimisation, we do not pay much attention to performing the behavioural simulation with different evolutionary techniques. Instead, we use the evolutionary technique that is commonly used for behavioural simulation, as shown in Nolfi and Floreano (2000).

The goal of the adaptive agent in our simulation environment is to escape capture by the pre-programmed agent. Rationally, this can be achieved by always producing an action using a repulsive strategy, opposite to the pre-programmed agent’s attraction strategy. However, in our setup the adaptive agent learns a strategy through a neuro-evolutionary process that always produces an angle, Δθ_mr, relative to the actual repulsion angle, $θ_{(- ar)}$ . This serves two purposes: firstly, it incorporates the instinctive ability in the agent to always move away from danger and improves the learning time; and secondly, it allows the adaptive agent to learn a strategy which does not follow the bounded rationality principle, shown to be inadequate in explaining human behaviour (Simon, 1955). The final travel angle for the adaptive agent at each time step, $θ_{mr}$ , is thus the sum of two angles (i.e. $θ_{mr} = θ_{(- ar)} + Δ θ_{mr}$ ), as illustrated in Figure 2. It is important to note that the adaptive agent learns this strategy (i.e. learning the deviant angle) without any other preferential fitness bias (e.g. one biased strategy could be to maximise the distance between two agents). In other words, the agent is free to learn any manoeuvreing strategy within the deviation constraint, described above, without being penalised for taking any specific actions.

Figure 2.

Generation of travel angle for the adaptive agent.

In the subsections below we provide further description of the agent design.

2.3.1 Network architecture

A fixed and identical architecture is used to represent individual neural networks in the evolutionary framework. Each neural network consists of a multi-layer perceptron of sigmoid units with a back-propagation algorithm. The learning rate for the network, η, is set to 0.2, and the momentum rate, γ, is set to 0. Each connection weight of the neural network ranges in the interval $[- 1, 1]$ . In this study, we do not concentrate on parameter tuning for GAs or neural networks because system optimisation is not our main concern.

To further elaborate, the networks consist of IN input, HN hidden and ON output neurons. The architecture is depicted in Figure 3 where IW and LW refer to the connection weight matrices of HN×IN and ON×HN dimensions respectively between the input-hidden and hidden-output layers; and ${\hat{ρ}}^{1}$ with $IN \times 1$ dimensions and ${\hat{ρ}}^{2}$ with $ON \times 1$ dimensions refer to the vectors of bias units between the input-hidden and hidden-output layers.

Figure 3.

Selection of maximum values of the uniform distribution for the generation of $ζ^{(t)}$ .

In the experiments, the values of IN, HN and ON are 5, 7 and 2 neurons respectively. The number of input and output neurons are determined in a quite straightforward way, as they are based on the numbers of input and output variables respectively. On the other hand, the numbers of hidden neurons is determined arbitrarily based on the fact that we need a small network size which requires less processing time. Then, GA is used to evolve the connection weights of the networks. It is found that the neural networks with evolved initial connection weights learn faster and better compared with those of initial random connection weights (Nolfi & Floreano, 2000).

Each neural network in the population is evaluated iteratively for a number of simulation runs. At each time step in a simulation run, an individual neural network senses $β^{(t)}$ , $d_{opp}^{(t)}$ , $Δ β^{(t)}$ , $Δ d_{opp}^{(t)}$ and $d_{wall}^{(t)}$ as its inputs, and produces the outputs $θ_{r}^{(t)}$ and $θ_{b}^{(t)}$ . The parameters for input and output are explained below:

Input variables:

– $β^{(t)}$ : the relative angle between the pre-programmed and adaptive agents;

– $d_{opp}^{(t)}$ : the relative distance between the pre-programmed and adaptive agents at time t;

– $Δ β^{(t)}$ : the relative change in β at time t;

– $Δ d_{opp}^{(t)}$ : the relative change in $d_{opp}$ at time t;

– $d_{wall}^{(t)}$ : the relative distance to the boundary of the simulation environment that the adaptive agent faces at time t;

– ${\hat{α}}^{(t)}$ : the level of noise in the information received by the pre-programmed agent.

Decision variables:

– $Δ θ_{mr}^{(t)}$ : the deviation angle predicted by the neural networks for the adaptive agent;

– $θ_{pb}^{(t)}$ : the pre-programmed agent’s travel angle.

At this point, the planned travel angle of the adaptive agent, $θ_{r}^{(t)}$ , is executed and the agent moves to a new location. At the same time, the pre-programmed agent also moves to a new location based on its own strategy. The actual travel angle of the pre-programmed agent, ${\hat{P}}_{b}^{(t)}$ , is compared with the angle predicted by the adaptive agent, ${\hat{P}}_{pb}^{(t)}$ , and the difference is used to adjust the connection weights between network layers through back-propagation. Using the network as the control system, the adaptive agent moves in the environment until the game is terminated. The fitness (see Section 2.3.3) of each neural network is evaluated iteratively based on its performance (i.e. how successful it was in escaping from being caught by the pre-programmed agent) averaged over a number of repeated simulation runs, $N_{G}$ (see Section 2.1 for the description of a simulation run). In all experiments reported in this paper $N_{G}$ is set to 10. Once all neural networks in the population are evaluated, a GA is applied to create the next generation of neural networks.

2.3.2 Genetic algorithm

Given that the architecture of the neural network is fixed, $IW$ , $LW$ , ${\hat{ρ}}^{1}$ and ${\hat{ρ}}^{2}$ are mapped into a vector of weights which represents a chromosome. Figure 4 depicts the representation of a chromosome used in the evolutionary process, and ${\hat{W}}_{IW}$ refers to a vector of connection weights and bias units between the input-hidden layers, mapped as ${\hat{W}}_{IW} = {{\hat{w}}_{1}^{1}, {\hat{w}}_{2}^{1}, \dots, {\hat{w}}_{hn}^{1}, \dots, {\hat{w}}_{HN}^{1}}$ , as depicted in Figure 5. Similarly, ${\hat{W}}_{LW}$ refers to a vector of connection weights and bias units between the hidden-output layers, mapped as ${\hat{W}}_{LW} = {{\hat{w}}_{1}^{2}, {\hat{w}}_{2}^{2}, \dots, {\hat{w}}_{on}^{2}, \dots, {\hat{w}}_{ON}^{1}}$ , as depicted in Figure 6. The parameters in, hn and on with ranges $1 \leq in \leq IN$ , $1 \leq hn \leq HN$ , $1 \leq on \leq ON$ refer to the input, hidden and output layers’ neurons.

Figure 4.

Weights for the fixed architecture neural network in chromosome representation.

Figure 5.

Mapping of the connection weights and bias units for both input-hidden layers.

Figure 6.

Mapping of the connection weights and bias units for both hidden-output layers.

The GA begins with a fixed size, $pop = 100$ , population of randomly generated individuals, each yielding a different set of connection weights for a neural network. As mentioned, the network architecture and relevant learning parameters are fixed and identical for all individuals. All networks in the population are then evaluated for $N_{G} = 10$ simulation runs. The agent locations are initialised randomly in each simulation run, with the given separation distance constraints (see Section 2.1 for the description of a simulation run). Once all individuals are evaluated, a new population is created through the elitism and selective reproduction processes. First, the fittest 10% of individuals (neural networks) from the population are copied to the next-generation population. Next, new offspring are created iteratively and added to the new population. In each iteration two parents are selected without replacement using the binary tournament selection method. An offspring is then generated using one point crossover and uniform mutation. In mutation, each gene in the weight vector chromosome is perturbed probabilistically, based on the mutation rate, uniform randomly in the weight range (i.e. [−1,1]). The process is repeated until the maximum population size is reached. The evaluation procedure explained above is then repeated for each individual neural network in the new population for $N_{G}$ simulation runs. The whole evolutionary cycle is repeated for a number of generations, $gen = 200$ . Finally, the best individual is chosen from the final population for the behavioural analysis, described later in the paper.

2.3.3 Fitness function

The goal of the adaptive agent is to avoid being captured for as long as possible. In other words, the longer an agent can survive, the fitter it is in the evolutionary setup. Obviously, the best agent (or neural network) setup is the one that can survive through an entire simulation run without being captured.

The fitness function is determined by taking into consideration how humans would develop their behaviour in a partially unknown and unpredictable environment, which is through the interaction between innate knowledge and learning. Instead of focusing on how an agent should perform a task, such as maximising the separation distance, the fitness function is evaluated based on whether the agent succeeds in performing the task or not. As the fitness function does not explicitly show how to perform the task, the agent is able to develop some useful behaviour rather than learning a particular behaviour derived in advance by the experimenter. In other words, we would like the fitness function to be loosely defined so that the design of the fitness function will be behaviour-implicit (Nolfi & Floreano, 2000), whereby the function is rated based on behavioural outcome of an evolutionary network and relies on a few variables and constraints only. Therefore, the fitness function thus is simply proportional to the average number of time steps an agent survived over a number of simulation runs. The fitness function is defined as follows:

\begin{matrix} F = \frac{1}{N_{G}} \sum_{g = 1}^{N_{G}} h_{g} \end{matrix}

(1)

h_{g} = {\begin{matrix} 1 & if d_{opp} > D_{\min} \\ 0 & otherwise \end{matrix}

(2)

where $h_{g}$ refers to the frequency of cases where the adaptive agent meets the constraints $(d_{opp} > D_{\min})$ in the $g^{th}$ simulation run, with $1 \leq g \leq N_{G}$ . The frequency of meeting the constraint is calculated as F, and is accumulated for each step. Hence there is an explicit fitness pressure to survive longer in the environment.

3 Experimental setup

The primary objective of this research is to study behaviour under varying conditions of quality of information. This is achieved by monitoring the manoeuvreing strategies (behaviour) of an adaptive agent, evolved through a neuro-evolutionary process, in a simulated environment against a pre-programmed agent that follows a fixed rule-based strategy and operates under variable information-quality conditions. As explained in Section 2.1, there are two dimensions across which the information quality is varied: the accuracy and frequency of location information available to the pre-programmed agent about the adaptive agent, and the range of deceptive actions that the pre-programmed agent may produce based on its perception of the adaptive agent’s position in the environment. Parameters $N_{I}$ and ${\hat{α}}^{(t)}$ control the frequency and noise, while $N_{D}$ and $ζ (t)$ control the deception generated by the pre-programmed agent.

Tables 1 and 2 show the different combinations of these parameters that we used in our experiments to simulate varying levels of information quality and deception respectively.

Table 1.

The combinations of $N_{I}$ and ${\hat{α}}^{(t)}$ , given that the deception effort from the pre-programmed agent is fixed.

Combination	$N_{I}$	${\hat{α}}^{(t)}$
1	1	0
2	1	$U (0, 20)$
3	10	0
4	10	$U (0, 20)$

Table 2.

The combinations of $N_{D}$ and $ζ^{(t)}$ , given that the information received about the adaptive agent’s intelligence is fixed.

Combination	$N_{D}$	$ζ^{(t)}$
1	1	0
2	5	$U (- 15^{\circ}, 15^{\circ})$
3	5	$U (- 30^{\circ}, 30^{\circ})$
4	10	$U (- 15^{\circ}, 15^{\circ})$
5	10	$U (- 30^{\circ}, 30^{\circ})$

The value $N_{I} = 1$ means that the pre-programmed agent receives information about the adaptive agent at each step of the simulation. In contrast, $N_{I} = 10$ indicates that the information is available to the agent at every $10^{th}$ step. The reason that $10$ is used as the maximum value of $N_{I}$ is based on the scenario where the adaptive agent and the pre-programmed agent move towards each other as illustrated in Figure 7, which causes the game to be terminated at the $10^{th}$ step. A uniform distribution, $U (0, 20)$ , is used to generate $ζ^{(t)}$ . This represents the noise in sensing the position of the adaptive agent. The noise is added by generating two random numbers in the given distribution range and adding them to the actual x- and y-coordinates. The range of $20$ is used to satisfy the $D_{\min}$ constraint required to trigger a capture event (See Section 2.1).

Figure 7.

Selection of maximum values for N_I.

Similarly, the first combination in Table 2 represents the scenario where the pre-programmed agent always moves in the direction it expects the other agent to be, ${\hat{P}}_{ir} (t)$ . The other combinations represent scenarios in which the pre-programmed agent deviates from its expected trajectory once after a number of time steps given by $N_{D}$ . The deviation from the trajectory depends on the value of $ζ^{(t)}$ , which is based on a uniform distribution. Two degrees of deception angle are selected in our setup, as given in Table 2. By referring to Figure 8, small ranges of deviation angles are selected to avoid side-tracking the pre-programmed agent from its intended goal.

Figure 8.

Selection of maximum values of the uniform distribution for the generation of $ζ^{(t)}$ .

Based on the four combinations given in Table 1 and the five combinations given in Table 2, 20 different experimental configurations are obtained, each representing different conditions of information quality. Since the neuro-evolutionary model for the adaptive agent is a stochastic model, experiments were repeated with 30 independent random seeds for each of the experimental configurations discussed above.

4 Behaviour analysis methodology

One of the key motivations of this work is to describe and simulate behaviour in silico. Research on behavioural decision theory (Einhorn & Hogarth, 1981) has shown the conditional nature of human decision-making and its dependence on the constraints under which the decisions are made. Decisions that might sound optimal under some given circumstances may not be optimal in a different environment and time. This implies that changes in the environment affect decision-making, or in other words, behaviour, and are reflected in the actions taken. In our simulation environment, constraints are represented by the environmental boundaries and the limited actions that the agents can perform in the environment. The variation in the quality of information under which agents decide their moves, on the other hand, represent changes in the environmental conditions under which decisions are made. The agent behaviour, specifically the adaptive agent behaviour, is then derived from the manoeuvreing strategies or sequence of actions that the agent takes in the simulated environment in response to a pre-programmed agent operating under varying information-quality conditions. While a co-evolutionary setup could easily be adopted, behaviour analysis under fixed conditions allows us to simplify the analysis and focus on our main research questions, in other words, the effect of different information conditions on adaptive behaviour.

Since the primary focus of this research is to study adaptive agent behaviour under different information-quality conditions, the analysis methodology should be able to compare behaviours under these different environmental conditions to show similarities or differences in the behaviour. This obviously requires defining features and measures against which the behaviour can be compared. In the following sections, we discuss the analysis techniques used to elicit the effect of different information conditions on adaptive agent behaviour. The analysis techniques are based on the features extracted from the action sequences (or behaviour) of the agents recorded during the simulation runs.

4.1 Capture-time analysis

The most obvious feature to analyse is the capture time, which refers to the number of time steps taken by the pre-programmed agent to capture the adaptive agent. This is analysed by plotting the best and average capture times (of 30 independent runs) over the number of generations for each of the experimental configurations discussed in Section 3. In particular, five graphs are plotted, each showing the results for an experimental setup with one of the five deception levels given in Table 2; and the capture times are plotted for four noise-level combinations given in Table 1. That is, each graph reflects the results of a setup where the deception level is fixed and the noise level is varied. What we expect to see is a trend in the adaptive agent’s learning behaviour over time, and variation of this trend across varying combinations of the informational conditions discussed above.

4.2 Trajectory visualisation

In trajectory visualisation, agents’ action sequences (that is, their actual movements) are plotted in the two-dimensional simulation environment. Due to limited space, trajectories from a single simulation run in the last generation, chosen arbitrarily based on the interestingness of evolved patterns, are plotted for each of the 20 experimental configurations. As is said, a picture is worth a thousand words: these figures reveal different movement patterns displayed by agent actions.

4.3 Action sequence analysis: action frequency distribution

Due to the cognitive load of a visual analysis, which would require an analyst to view thousands of trajectories and make sense of them, other means are needed to analyse action sequences. The intent of frequency analysis of actions is to capture the essence of the type of strategy adopted by the adaptive agents overall, relative to the strategies used by the pre-programmed agent, across a number of simulation runs. This is done by conducting a statistical analysis of the relative-heading-change feature, recorded as the difference between the two agents’ travel angles at each time step. This approach can neatly summarise the manoeuvreing strategies adopted by the agents. For instance, a predominantly zero heading-change strategy (strictly, a single peak on the frequency diagram) might indicate a straight-line motion, a two-spiked frequency pattern might indicate a zig-zag motion and a uniformly distributed heading change might indicate a circular motion. Other strategies will sit somewhere in between these extreme monotonous strategies.

For each of the 20 experimental configuration setups discussed in Section 3, the best individual (neural network) in the final population evolved over 200 generations is selected to perform the frequency diagram analysis. The best evolved adaptive agents play another $N_{G} = 10$ simulated runs or games, during which their manoeuvres are recorded for analysis. The relative angle between both agents, denoted $ϑ^{t}$ , and its change, $Δ ϑ^{t}$ , are measured at each time step t after the movements of both agents are updated simultaneously. Given that each simulation run terminates at different times T, we will have a vector, $\hat{ϑ}$ , representing a sequence of $ϑ^{t}$ , with $\hat{ϑ} = {ϑ^{1}, ϑ^{2}, \dots, ϑ^{t}, \dots, ϑ^{T}}$ . At the same time, we can obtain a sequence of changes in $\hat{ϑ}$ ; this vector is denoted by $Δ \hat{ϑ} = {Δ ϑ^{2}, Δ ϑ^{3}, \dots, Δ ϑ^{t}, \dots, Δ ϑ^{T}}$ . Here $Δ ϑ^{t}$ refers to $ϑ^{t} - ϑ^{t - 1}$ , as illustrated in Figure 9. To perform statistical analysis, each single sequence of ϑ and $Δ ϑ$ can be viewed as a sample. Finally, the frequency distributions are plotted across all samples and a central-tendency-based curve-fitting approach is used to summarise the overall pattern.

Figure 9.

Calculation for the travel angle change, $Δ ϑ^{t}$ .

4.4 Action sequence analysis: sub-sequence similarity analysis

While the heading-change feature used in the frequency analysis provides a handle to analyse agent strategies at every time step, the action sub-sequence analysis captures the movement patterns on a larger time scale and allows the comparison of strategies or behaviour on a much higher level of abstraction. Similar to frequency analysis, the best individual from the final generation is chosen for this analysis from each of the 20 different experimental configurations. Since the analysis is carried out for 10 simulation runs with 30 different seeds, a total of $20 \times 10 \times 30 = 6000$ action sequences are recorded.

The sub-sequences are extracted by dividing the action sequences obtained from each simulation run into an equal number of windows (Figure 10 demonstrates this concept pictorially). Since each simulation run can last a different number of time steps, the number of sub-sequences $N_{S}$ is chosen based on the smallest sequence observed across all simulation runs. Sub-sequences are then extracted from each sequence as $ω_{i_{j}} = len (A_{i}) / N_{s}$ , where $ω_{i_{j}}, j \in (1, N_{s})$ refers to the $j^{th}$ sub-sequence of the $i^{th}$ action sequence $A_{i}$ and $len (A_{i})$ corresponds to the number of time steps in $A_{i}$ .

Figure 10.

Two example action sequences divided into an equal number of sub-sequences. Each action sequence has a different length (referred as $T_{1}$ and $T_{2}$ ) which leads to different sub-sequence lengths (referred to as $ω_{1}$ and $ω_{2}$ respectively).

Next, the two features velocity and acceleration are computed, corresponding to the rate of change in the position of the adaptive agents over the entire vector of sub-sequences. Formally, their relationships are given by equations (3) and (4) respectively. Here ${\hat{P}}_{r}^{t_{1}}$ and ${\hat{P}}_{r}^{t_{2}}$ refer to the heading angles, measured relative to the horizontal axis in an anti-clockwise direction, of the adaptive agent at the start and end of a sub-sequence respectively. Notice that in our simulation setup, the speed (distance covered per time step) of both agents is constant. Hence, the velocity v and acceleration $Δ v$ refer to the positional derivatives that define the direction of movements of an agent, e.g. the adaptive agent denoted as r, in the environment.

V_{r}^{t_{2}} = \frac{∥ {\hat{P}}_{r}^{t_{2}} - {\hat{P}}_{r}^{t_{1}} ∥^{2}}{t_{2} - t_{1}}

(3)

Δ V_{r}^{t_{2}} = \frac{V_{r}^{t_{2}} - V_{r}^{t_{1}}}{t_{2} - t_{1}}

(4)

Then, the relative direction movements between the adaptive agent $r$ and the pre-programmed agent b, based on velocity and acceleration, are shown in equations (5) and (6) respectively.

V_{r rel b}^{t_{1}} = V_{r}^{t_{1}} - V_{b}^{t_{1}}

(5)

Δ V_{r rel b}^{t_{1}} = Δ V_{r}^{t_{1}} - Δ V_{b}^{t_{1}}

(6)

Using this method, the actual trajectories of agents are mapped to a feature space with an $N_{s}$ -dimensional velocity vector and an $N_{s} - 1$ -dimensional acceleration vector. The behaviour of an agent using an action sequence $A_{i}$ can then be viewed in an $ℓ = 2 N_{S} - 1$ -dimensional space as $B (A_{i}) = {V_{i}, Δ V_{i}} = {V_{i}^{1}, \dots V_{i}^{N_{s}}, Δ V_{i}^{2}, \dots, Δ V_{i}^{N_{s}}}$ . The collection of action sequences is represented by a matrix $M$ of dimension $n \times ℓ$ , where n refers to the total number of action sequences used for analysis and ℓ refers to the dimension of each action sequence.

Finally, the similarities between behaviours defined in terms of the feature space discussed above are analysed using a fuzzy c-mean (FCM) clustering technique. Clustering is an unsupervised data-mining technique that categorises a set of instances into several groups which share higher similarities within the clusters but lower similarities between the clusters. Unlike the popular k-means method that clusters instances into unique clusters, FCM allows the instances to belong to several clusters simultaneously with different degrees of membership in each cluster (Bezdek, 1981). We consider this to be a better technique for this analysis, owing to the complexity of the actions produced by the agents.

Two fundamental issues that need to be addressed in clustering analysis are to determine the appropriate cluster size (the number of clusters) and the quality of the formed clusters. Cluster-validity analysis (Maulik & Bandyopadhyay, 2002; Vendramin, Campello, & Hruschka, 2009) is used in our work to address these two issues when clustering agent sequences.

Cluster-validity analysis is the assessment of a clustering process’s output. It usually involves a specific criterion of optimality. There are three main categories of validity indices: external assessment, internal assessment and relative test. An external assessment compares the discovered structure with respect to an external criterion that quantifies the degree of compatibility between the discovered clusters and the actual ones. For example, the use of data labels to evaluate clustering performance is an example of external assessment. On the other hand, an internal assessment determines intrinsically whether the discovered structure is appropriate for the data without any prior information. Lastly, a relative test compares two structures and measures the relative indices between them. In our work, cluster validity based on internal assessment is used, because there is no label involved in our data. Before a clustering process is performed, cluster validity is used to determine the appropriate value from an interested range of cluster sizes. To perform the analysis, four well-known validity indices, namely the Silhoutte index (Rousseeuw, 1987), the Davies–Bouldin index (Davies & Bouldin, 1979), the Calinski–Harabasz index (Caliáski & Harabasz, 1974) and the Dunn index (Dunn, 1974), are selected for the assessment because they evaluate the discovered structure according to their appropriateness of data such as cluster separation and compactness. Since there is no `gold standard’ for selecting cluster-validity indices, we use all four indices as an ensemble to determine an appropriate cluster size.

In a nutshell, the sub-sequence clustering analysis consists of dividing action sequences across different simulation runs into equal numbers of sub-sequences, computing velocity and acceleration feature vectors for each of these sub-sequences, and measuring the similarity between behaviours (action sequences) using a FCM unsupervised clustering technique.

5 Results and analysis

In this section we show and discuss the results of the experiments run under the various experimental configurations discussed in Section 3. For ease of discussion, Table 3 gives labels to the different noise and deception levels used in the experimental configurations. The results are arranged in the subsections below according to the four types of analysis techniques (see Section 4) that are used to assimilate the findings from the experimental results.

Table 3.

Labelling of noise and deception levels for a descriptive narrative.

Definition	Parameter	Value	Description
Noise	Intel frequency	$N_{I} = 10$ $N_{I} = 1$	InfrequentFrequent
	Intel noise	$ζ = 0$	Accurate
		$ζ = U (0, 20)$	Noisy
Deception	Deception frequency	$N_{D} = 10$ $N_{D} = 5$	InfrequentFrequent
	Deception degree	$ζ^{(t)} = U (- 15^{\circ}, 15^{\circ})$ $ζ^{(t)} = U (- 30^{\circ}, 30^{\circ})$	LowHigh
	Combination	$N_{D} = 1$ , $ζ^{(t)} = 0$	No deception

5.1 Capture/survival time

Referring to the capture-time analysis discussed in Section 4.1, Figure 11 shows the average capture times (averaged over 30 runs) for different combinations of noise levels in the sensed information by the pre-programmed agent. Each sub-figure refers to an experimental setup with a different combination of deception and noise levels.

Figure 11.

The plots of average capture/survival time. Each sub-figure shows the average capture time over the number of generations for one of the combinations of $N_{D}$ and $ζ^{(t)}$ . For each level of deception, capture/survival time graphs are plotted for varying noise levels controlled by $N_{I}$ and ${\hat{α}}^{(t)}$ .

By setting the deception frequency as low and high, the rows of Figures 11(a)–11(c) and Figures 11(d)–11(f) respectively show the average capture time with the increase of the degree of deception. These graphs can be read from two perspectives: from the adaptive agent’s perspective the plots refer to survival time, the average number of time steps the agent survived without being captured by the pre-programmed agent; from the pre-programmed agent’s perspective they refer to capture time, the average number of time steps taken to capture the adaptive agent. Figure 12 depicts a zoomed-in version of the last 50 generations in Figure 11.

Figure 12.

Zoom-in of Figure 11 for the last 50 generations.

A few interesting trends can be observed from these figures. First, an increasing average survival time can be observed across all setups, indicating that evolution is working and the adaptive agent improves its learning over time. Second, in accordance with intuition, the adaptive agent has shorter average survival times when the pre-programmed agent receives information more frequently. Third, the level of noise in the received information seems to have less impact on capture times than the frequency of received information. Not surprisingly, the worst survival times or best capture times are observed when the infrequent reception is compounded with higher noise level. Fourth, and the most interesting finding, is discovered by comparing the capture/survival times under infrequent and noisy information levels across different deception levels. A close look at the comparison shows that the difference between average capture/survival times under the two noise levels decreases with increasing deception levels when the frequency of receiving information is high. In contrast, the difference under the two noise levels increases with increasing deception levels when the frequency of receiving information is low.

Another interesting observation is the column-wise comparison between sub-figures (i.e. Figures 11(c)–11(f) and Figures 12(b)–12(e)). By having the same degree of deception, the difference between average capture/survival times under the two noise levels decreases with increasing deception frequencies when information is received infrequently.

This implies that a higher degree of deception has a counterbalancing effect on noise level when information is frequently received. In a situation of receiving infrequent information, a higher frequency of deception is a better option to counterbalance the effect of noise. In other words, the pre-programmed agent is better off with higher deception when it senses higher noise in information.

5.2 Trajectory visualisation

Referring to the discussion in Section 4.2, Figure 13 shows trajectories across different experimental configurations for the same starting conditions (the same seeded runs). The quality of information received by the pre-programmed agent deteriorates from left to right, while its adopted deception level increases from top to bottom. In each sub-figure, the solid dot shows the locations of both agents in the environment at the start of the simulation run, and the cross sign indicates the positions of agents at the completion of a simulation run. Based on Figure 13, the agent manoeuvreing patterns are less complicated and only follow simple curve and straight-line trajectories when the pre-programmed agent is situated in the following conditions:

The deception of the pre-programmed agent is fixed and the quality of information is varied (left to right);

The quality of information is fixed and the level of deception adopted by the pre-programmed agent is varied (top to bottom).

Figure 13.

Trajectories shown by the pre-programmed and adaptive agents for different combinations of information and deception.

As the combinations of information and deception change from top left to bottom right, the trajectories show more complicated and interesting patterns, with most taking circular and spiral shapes. It is also interesting to observe that the areas travelled by the agents are smaller under such conditions. This means that the adaptive agent, using neuro-evolution, is able to produce more creative strategies as the distractions in information and deception increase, and it does not necessarily travel far in order to survive.

5.3 Action frequency distribution

Referring to the discussion in Section 4.3, Figures 14 and 15 show the distribution of the adaptive agent’s headings (ϑ) and its changes ( $Δ ϑ$ ) relative to the heading of the pre-programmed agent respectively. For each of the 20 experimental configurations we have a vector of 300 samples (ϑ and $Δ ϑ$ ). The mentioned figures show the frequency distribution of two angular features, measured in radians ( $[π, - π]$ radians). For ease of comparison, maximum likelihood estimates-based curve-fitting is used to approximate the frequency distribution for each measure in each experimental scenario under a probability density function (pdf). The pdfs are generated by estimating the parameters of a mixture normal distribution (ratio, mean and standard deviation) based on a 95% confidence interval. In each sub-figure, the quality of information is fixed but the deception level is varied.

Figure 14.

The probability plots of ϑ for different experimental configurations based on varying levels of input noise and deception.

Figure 15.

The probability plots of $Δ ϑ$ for different experimental configurations based on varying levels of input noise and deception.

It can be seen that the ϑ distributions across all experimental configurations are multi-modal, each consisting of four modes. The obvious observation from the sub-figures is that the adaptive agent actions follow similar distributions for a specific noise-level combination across different levels of deception. When the received information quality for the pre-programmed agent changes from frequent–accurate to infrequent–noisy, we can observe changes in the heights of the four peaks in the distributions, and these patterns are consistent across different levels of deception. Figure 14(a) associated with frequent–accurate information shows the highest peaks in the distribution. This implies that the adaptive agent adopts certain action strategies more commonly when the pre-programmed agent receives frequent and accurate information on its position. As frequently received information changes from accurate to noisy, there is a marginal reduction in the peaks’ heights overall as shown in Figure 14(b). The lowest peaks are observed in Figure 14(c), implying a uniformly distributed action strategy by adaptive agents when the pre-programmed agent receives accurate information infrequently. However, the introduction of noise into infrequent information as shown in Figure 14(d) encourages the adaptive agent to prefer specific actions.

When paired comparisons are made between frequent and infrequent information for accurate and noisy information respectively (i.e. Figures 14(a)–14(c) and Figures 14(b)–14(d)) we can see that the differences in peaks’ heights are clearer in accurate information than noisy information. The adaptive agent has a much lower preference for certain actions as the frequency of receiving accurate information reduces than when under noisy information conditions. In other words, one of the key results of these experiments is that the frequency of received information generally has a higher impact in constructing a preference towards specific actions than the noise level of the information does.

On the other hand, analysing the pdfs of $Δ ϑ$ (shown in Figure 15) reveals that all pdf plots are uni-modal and highly concentrated near zero, regardless of information quality and deception level. This implies that an adaptive agent evolves highly robust strategies against a set of strategies adopted by the pre-programmed agent under different information-quality conditions and deception levels.

In summary, the findings from the action distributions show that the adaptive-agent behaviours resulting from neuro-evolution are more influenced by information quality than by deception. Even though the pre-programmed agent tries to confuse the adaptive agent by being deceptive, the adaptive agent is able to discover the dominant patterns in its movements, and thus use similar strategies to survive. However, the quality of the information influences the adaptive agent’s actions. The patterns shown in the distributions of ϑ suggest that the preferences of the adaptive agent’s action at certain values reduces in the following order: frequent–accurate, frequent–noisy, infrequent–noisy and infrequent–accurate. When the frequency of information is high, the preferences of the adaptive agent towards certain actions are less affected by the noise. However, if there is a delay in the received information by the pre-programmed agent, an introduction of noise in the information causes the adaptive agent to prefer certain actions when compared to receiving accurate information.

5.4 Sub-sequence similarity analysis

Recall from our discussion in Section 4.4 that the number of sub-sequences in each action sequence is chosen based on the smallest sequence size. Based on the 6000 samples of action sequences recorded in our experiments, the number of sub-sequences chosen in our analysis is 10. This means that our total feature vector size is 19, where the first 10 values refer to the velocity and the next nine values refer to the acceleration features, computed from the 10 sub-sequences in each sample. The total data set size for clustering analysis is then $6000 \times 19$ . Recall also that the similarity analysis requires choosing the number of clusters to appropriately find similar patterns in the adaptive agent’s behaviour across a range of information conditions. The cluster-validity method based on internal criteria was chosen to evaluate the appropriate number of clusters (between two and ten) in our analysis. Figure 16 show the plots of cluster validity for the set of action sequences based on the proposed indices. It can be seen that the highest validity across all four cluster-validity indices occurs with two clusters. Therefore, the number of clusters chosen in our analysis for both the velocity and the acceleration parameters is two.

Figure 16.

Cluster-validity indices for the action sub-sequence data set.

Table 4 shows the 19 feature values for two cluster centroids obtained through the FCM clustering method applied on the data set built from the velocity/acceleration features of sub-sequences. Figure 17 shows a plot of the two centroid values for both velocity and acceleration features.

Table 4.

Cluster centroid that represents the actions of the adaptive agent relative to the pre-programmed agent.

Feature	Window	Centroid 1 / Strategy 1	Centroid 2 / Strategy 2
$V_{r rel b}$	$ω_{1}$	9.7350	5.1761
	$ω_{2}$	10.6030	5.6861
	$ω_{3}$	10.7830	5.6966
	$ω_{4}$	10.4380	5.6434
	$ω_{5}$	10.1430	5.4967
	$ω_{6}$	10.0850	4.7920
	$ω_{7}$	10.0440	3.6245
	$ω_{8}$	10.0020	2.7445
	$ω_{9}$	10.0510	2.4442
	$ω_{10}$	10.0690	2.4856
$Δ V_{r rel b}$	$ω_{2}$	0.1062	0.0542
	$ω_{3}$	0.0369	0.0029
	$ω_{4}$	−0.0158	−0.0045
	$ω_{5}$	−0.0191	−0.0145
	$ω_{6}$	0.0064	−0.0682
	$ω_{7}$	0.0071	−0.1147
	$ω_{8}$	0.0137	−0.0832
	$ω_{9}$	0.0067	−0.0256
	$ω_{10}$	0.0078	0.0110

Figure 17.

$V_{r rel b}$ and $Δ V_{r rel b}$ for the action sequences.

While this analysis clearly clustered the action sequences in our sample in two strategies, a t-test is carried out with a 0.05 significance level to further evaluate the null hypothesis that the samples in both strategies have equal means, against the alternative hypothesis that the means are unequal. The results of the t-test are shown in Table 5. The mean scores for strategies 1 and 2 are denoted by $μ_{s 1}$ and $μ_{s 2}$ respectively.

Table 5.

The results of t-test for the mean scores between strategies 1 and 2.

Hypotheses	Reject $H_{0}$	p-value
$H_{0}$ : $μ_{s 1} = μ_{s 2}$ , $H_{1}$ : $μ_{s 1} \neq μ_{s 2}$	Yes	0.00
$H_{0}$ : $μ_{s 1} \geq μ_{s 2}$ , $H_{1}$ : $μ_{s 1} < μ_{s 2}$	Yes	0.00

Based on Table 5, the results show that there is enough evidence to reject the null hypothesis. Therefore, we can claim that the scores for the two strategies are significantly different. The scores for both strategies, based on mean and standard deviation, are shown in Table 5. Since the mean score of strategy 2 is higher than that of strategy 1, we wish to test the null hypothesis that the score samples from strategy 1 have a greater or equal mean compared with strategy 2, against the alternative that strategy 1’s mean is less than strategy 2’s mean. Again, a t-test is carried out with a 0.05 significance level to test the hypothesis, and the significance test shows that the null hypothesis is rejected. The result of the t-test supports the conclusion that the scores for strategy 1 are significantly lower than those for strategy 2. However, the scores for both strategies are still high, with both means exceeding 90%.

Since the patterns in the actions taken by the pre-programmed agent are influenced by the received information and its own deception, this means the evolution is also influenced by the effects of information and deception, given that it learns from the patterns in the pre-programmed agent’s actions. The best evolved solutions can be categorised into two main groups, where the differences between these two strategies lie on the mean and standard deviation of their scores. Strategy 1 has lower mean and higher standard deviation, while strategy 2 is associated with a higher mean and lower standard deviation. Besides that, the results in Table 6 show that the frequency of strategy 2 (66.6%) is higher than strategy 1 (33.4%) . This means evolution is likely to generate strategy 2 which is associated with higher scores.

Table 6.

Scores and frequencies for strategies 1 and 2.

Strategy	Score, $Mean \pm Stdev$	Frequency
Strategy 1	93.2±20.5	2001 (33.4%)
Strategy 2	97.3±9.8	3999 (66.6%)

Most of the strategies generated by the evolution are good and they share high similarity. A possible reason to explain the high similarity among the strategies is that evolution is able to extract the hidden patterns, even though they may be affected by information and deception, and thus produces optimum solutions. For example, the pre-programmed agent’s movements may become less obvious to the adaptive agent due to distraction in the pre-programmed agent’s information and deception. Due to the interaction between the pre-programmed agent and adaptive agent, the evolution of the adaptive agent uncovers the true intention of the pre-programmed agent and produces optimum solutions for the situation. Given that the true intention of the pre-programmed agent is to catch the adaptive agent, similar strategies are generated by the evolution, preventing the adaptive agent from being caught. This means the evolution manages to capture the dynamics between both agents and produces strategies that are mostly still capable of selecting the correct response, enabling the adaptive agent to survive for longer.

6 Conclusion

This paper presents an agent-based model to study adaptive behaviour under varying conditions of information quality. Information quality is varied in two dimensions: the noise in and frequency of the input signal, and purposeful deviation in output actions. Experiments were conducted to study the combined effect of these two information dimensions on the behaviour of an adaptive agent, operating against a controlled agent whose goal is to capture the adaptive agent. The analysis methodology included inspecting the survival time of the adaptive agent, the visual analysis of different trajectories exhibited by agent actions, and comparison of action sequences, under different informational conditions.

The analysis of capture/survival time suggests that a higher degree of deception counterbalances the effect of noise when information is received frequently. On the other hand, a higher frequency of deception has a counterbalancing effect on noise when information is received infrequently. In other words, the use of higher deception, which can be either the degree or the frequency of deception, as a potential solution to counterbalance the effect of noise depends on the frequency of receiving information.

The action distribution analysis suggests that the agent behaviour is influenced more by the quality of information than the level of deception. It seems that the characteristics of the task, such as information, can evoke strategies that partially determine the preferences of action in the adaptive agent.

Despite different configurations of information and deception, evolution is able to learn the dominant patterns of movement adopted by the pre-programmed agent and produce the best evolved solutions, where the actions generated by the solutions experience minimal changes. In other words, the strategies produced by the best evolved solutions are robust across the fixed strategies under different information conditions.

Analysis of action similarity reveals that the adaptive agent’s strategies across different combinations of information and deception can be categorised into two main groups. Even though both strategies achieve high fitness scores, one of them has a higher mean and lower standard deviation in term of scores than the other does. Furthermore, the frequency of the better strategy is higher as well. This means neuro-evolution is likely to generate strategies with high performance.

Our next step in this research is to replace the artificial agents with human subjects in a similar environment, and analyse their behaviour using the methodology presented in this work. A comparison between the strategies taken by the human subjects, and the artificial agents studied in this paper, would lead us to determine how successfully human behaviour can be replicated by artificial agents. The same result can also be used to further refine our agent models to closely match human behaviour.

Footnotes

Funding

The first author would like to thank the Sultan Idris University of Education (UPSI) and Ministry of Higher Education, Malaysia, for providing the scholarship to carry out the PhD study at the University of New South Wales.

Author biographies

Shir Li Wang received a BSc in 2002 and an MSc in 2007, both from the University of Science Malaysia. She recently completed a PhD at University of New South Wales, Canberra Campus, Australia in 2012. Her research interests focus on neural network, evolutionary computation, clustering, data mining and adversarial learning.

Kamran Shafi holds a PhD in computer science, a MSc in telecoms engineering and a BSc in electrical engineering. His research focus is on the development of computational intelligence techniques that can be applied at various stages of data-centric predictive modelling in order to provide effective solutions to real-world decision problems in diverse domains including national defence, logistics and computer security. In this context, he has contributed in several disciplines including genetic-based machine learning, game theory and optimization. His PhD thesis `An online and adaptive signature-based approach for intrusion detection using learning classifier systems’ received the Stephen Fester Award for the most outstanding thesis on an information technology topic by a postgraduate research student in the school of ITEE at UNSW Canberra. His other major research achievements in the field of LCS research include the development of an LCS-based scenario mining approach in the context of free-flight air traffic control concept and development of an LCS-based multi-objective hyper-heuristic framework for the defence logistics problem. He is a program committee member for GECCO and IEEE CEC since 2005. He was the publicity chair for the 2012 World Congress on Computational Intelligence (WCCI 2012).

Chris Lokan received a BSc in 1980 and a PhD in computer science in 1985, both from the Australian National University. He is a senior lecturer at UNSW Canberra. His teaching concentrates on software engineering. His main research interests are empirical software engineering, software effort and cost estimation, software benchmarking, complex adaptive systems, and data mining. He is a member of the ACM, the IEEE Computer Society, and the Australian Software Metrics Association.

Hussein Abbass is a professor of information technology at the University of New South Wales, Canberra Campus, Australia. He is a fellow of the Operational Research Society (UK), a fellow of the Australian Computer Society, and a senior member of the IEEE. He has had more than 200 refereed articles published. According to Microsoft Academics, he is in the top 0.3% most cited academics worldwide in artificial intelligence in the last 10 years. According to Google Scholar, he has more than 2800 citations, h-index of 26 and i10-index of 67. He is an associate editor of the IEEE Transactions on Evolutionary Computation, the IEEE Computational Intelligence Magazine, the American Institute of Mathematical Sciences Journal of Industrial and Management Optimization, and three other journals. His work integrates cognitive science, operations research and artificial intelligence. He was the general chair of the 2012 IEEE World Congress on Computational Intelligence, held in Brisbane June 2012, and is the premier and largest event by the IEEE Computational Intelligence Society.

References

Abbass

H. A.

(2009). Computational red teaming and cyber challenges. Platform Technologies Research Institute Annual Symposium, PTRI 09, 14–15 July 2009. (Invited Speech.)

Abbass

H. A.

Bender

Gaidow

Whitbread

(2011). Computational red teaming: Past, present and future. IEEE Computational Intelligence Magazine, 6(1), 30–42.

Abbass

H. A.

Bender

Whitbread

(2010a). Computational red teaming. In Defence operations research symposium (DORS). Adelaide, Australia.

Abbass

H. A.

Bender

Whitbread

(2010b). Computational red teaming: Unravelling the pharaoh’s curse. In Computational Intelligence in Security and Defence Applications Workshop. Barcelona, Spain.

Bargh

Chen

Burrows

(1996). Automaticity of social behavior: Direct effects of trait construct and stereotype activation on action. Journal of Personality and Social Psychology, 71(2), 230.

Barreno

Bartlett

P. L.

Chi

F. J.

Joseph

A. D.

Nelson

Rubinstein

B. I. P.

… Tygar

J. D.

(2008). Open problems in the security of learning. In Proceedings of the 1st ACM workshop on AISec (pp. 19–26). New York, NY: ACM.

Barreno

Nelson

Joseph

A. D.

Tygar

J. D.

(2010). The security of machine learning. Machine Learning, 81(2), 121–148.

Barreno

Nelson

Sears

Joseph

A. D.

Tygar

J. D.

(2006). Can machine learning be secure? In Proceedings of the 2006 ACM symposium on information, computer and communications security (ASIACCS) (pp. 16–25). New York, NY: ACM Press.

Beer

Gallagher

(1992). Evolving dynamical neural networks for adaptive behavior. Adaptive Behavior, 1(1), 91–122.

10.

Bezdek

J. C.

(1981). Pattern recognition with fuzzy objective function algorithms. Norwell, MA: Kluwer.

11.

Caliáski

Harabasz

(1974). A dendrite method for cluster analysis. Communications in Statistics – Theory and Methods, 3(1), 1–27.

12.

Chaturvedi

Gupta

Mehta

Yue

(2000). Agent-based simulation approach to information warfare in the SEAS environment. In Proceedings of the 33rd annual Hawaii international conference on system sciences (pp. 1–10).

13.

Dalvi

Domingos

Mausam Sanghai

Verma

(2004). Adversarial classification. In Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 99–108). New York, NY: ACM.

14.

Davies

D. L.

Bouldin

D. W.

(1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2), 224–227.

15.

Dunn

J. C.

(1974). Well-separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 4(1), 95–104.

16.

Einhorn

H. J.

Hogarth

R. M.

(1981). Behavioral decision theory: Processes of judgement and choice. Annual Review of Psychology, 32(1), 53–88.

17.

Floreano

Mondada

(1994). Automatic creation of an autonomous agent: Genetic evolution of a neural-network driven robot. From Animals to Animats, 3, 421–430.

18.

Gomez

Miikkulainen

(1997). Incremental evolution of complex general behavior. Adaptive Behavior, 5(3–4), 317–342.

19.

Harvey

Di Paolo

Wood

Quinn

Tuci

Iridia

(2005). Evolutionary robotics: A new scientific tool for studying cognition. Artificial Life, 11(1–2), 79–98.

20.

Jorgensen

Zhou

Inge

(2008). A multiple instance learning strategy for combating good word attacks on spam filters. Journal of Machine Learning Research, 9, 1115–1146.

21.

Lauder

(2009). Red dawn: The emergence of a red teaming capability in the Canadian forces. Canadian Army Journal, 12(2), 25–36.

22.

Lima

(2002). Putting predators back into behavioral predator–prey interactions. Trends in Ecology & Evolution, 17(2), 70–75.

23.

Lowd

Meek

(2005). Adversarial learning. In Proceedings of the eleventh ACM SIGKDD international conference on knowledge discovery in data mining (pp. 641–647). New York, NY: ACM.

24.

Maes

(1990). Situated agents can have goals. Robotics and Autonomous Systems, 6(1–2), 49–70.

25.

Maulik

Bandyopadhyay

(2002). Performance evaluation of clustering algorithms and validity indices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(12), 1650–1654.

26.

Nelson

Barreno

Jack Chi

Joseph

A. D.

Rubinstein

B. I. P.

Saini

… Xia

(2009). Misleading a learner: Coopting your spam filter. In Machine learning in cyber trust (pp. 17–51). New York, NY: Springer.

27.

Nelson

Joseph

A. D.

(2006). Bounding an attack’s complexity for a simple learning model. In Proceedings of the first workshop on tackling computer system problems with machine learning techniques (SysML) (pp. 1–5).

28.

Newsome

Karp

Song

(2006). Paragraph: Thwarting signature learning by training maliciously. In Recent advances in intrusion detection (pp. 81–105). Berlin/Heidelberg, Germany: Springer.

29.

Nolfi

Floreano

(2000). Evolutionary robotics: The biology, intelligence, and technology of self-organizing machines. Cambridge, MA: The MIT Press.

30.

Nolfi

Parisi

Elman

J. L.

(1994). Learning and evolution in neural networks. Adaptive Behavior, 3(1), 5–28.

31.

Riley

Veloso

(2000). On behavior classification in adversarial environments. Distributed autonomous robotic systems, 4, 371–380.

32.

Rousseeuw

P. J.

(1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65.

33.

Simon

H. A.

(1955). A behavioral model of rational choice. The Quarterly Journal of Economics, 69(1), 99–118.

34.

Vallacher

Wegner

(1987). What do people think they’re doing? Action identification and human behavior. Psychological Review, 94(1), 3.

35.

Veloso

Meira

Jr. (2006). Lazy associative classification for content-based spam detection. In Proceedings of the fourth Latin American web congress (pp. 154–161).

36.

Vendramin

Campello

R. J. G. B.

Hruschka

E. R.

(2009). On the comparison of relative clustering validity criteria. In Proceedings of the Siam international conference on data mining (pp. 733–744). Sparks, NV.

37.

Yang

Abbass

H. A.

Sarker

(2006). Characterizing warfare in red teaming. IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics, 36(2), 268–285.