Multi-objective mapping of full-mission simulators on heterogeneous distributed multi-processor systems

Abstract

Full-mission simulators (FMSs) are considered the most critical simulation tool belonging to the flight simulator family. FMSs include a faithful reproduction of fighter aircraft. They are used by armed forces for design, training, and investigation purposes. Due to the criticality of their timing constraints and the high computation cost of the whole simulation, FMSs need to run in a high-performance computing system. Heterogeneous distributed systems are among the leading computing platforms and can guarantee a significant increase in performance by providing a large number of parallel powerful execution resources. One of the most persistent challenges raised by these platforms is the difficulty of finding an optimal mapping of n tasks on m processing elements. The mapping problem is considered a variant of the quadratic assignment problem, in which an exhaustive search cannot be performed. The mapping problem is an NP-hard problem and solving it requires the use of meta-heuristics, and it becomes more challenging when one has to optimize more than one objective with respect to the timing constraints. Multi-objective evolutionary algorithms have proven their efficiency when tackling this problem. Most of the existent works deal with the task mapping by considering either a single objective or homogeneous architectures. Therefore, the main contribution of this paper is a framework based on the model-driven design paradigm allowing us to map a set of intercommunicating real-time tasks making up the FMS model onto the heterogeneous distributed multi-processor system model. We propose a multi-objective approach based on the well-known optimization algorithm “Non-dominated Sorting Genetic Algorithm-II” satisfying the tight timing constraints of the simulation and minimizing makespan, communication cost, and memory consumption simultaneously.

Keywords

Full-mission simulators real-time systems heterogeneous computing architectures mapping problem schedulability analysis multi-objective optimization local search algorithm meta-heuristics

1. Introduction

A heterogeneous distributed multi-processor system (hereafter named “HDMS”) is a complex yet powerful architecture that refers to a collection of autonomous interconnected multiprocessor machines with various capabilities.¹ This type of architecture was developed to meet the high computation and timing requirements of real-time systems and have been widely used in industry, covering different domains such as aerospace, automotive, and avionics. One of the most critical systems that needs to be executed on large distributed computing systems is full-mission simulators (FMSs). An FMS is a complex simulator allowing the crew to practice regular flight operations, fire weapons, and familiarize themselves with the cockpit in normal and extreme situations by faithfully reproducing the aircraft and the mission environment in which it will operate. Moreover, many FMSs are able to communicate between each other in the distributed mission operation. The simulation provides an environment allowing military forces to train as they fight. For further details regarding FMSs, please refer to Section 2.

Exploiting the full potential of HDMS for improving FMS performance is a current challenge in aerospace and defense fields. The FMS software is contributed to by different teams with different specific backgrounds (e.g., mechanic, hydraulic, electronic). Therefore, efficient FMS execution on a HDMS requires a global view of the FMS and deep knowledge of the execution platform. In this context, we propose a new approach for efficiently improving the FMS performance using the HDMS. This approach is based mainly on a model-based paradigm. Starting from FMS and HDMS models, we apply our optimization algorithm for mapping the different real-time tasks making up the FMS on the components of the HDMS. The FMS and the HDMS models represent the basis for collaboration between the different teams involved in the FMS design while the optimization algorithm using as inputs the FMS and HDMS models facilitates integration.

The problem of finding an optimal mapping of n tasks on m heterogeneous processing elements is NP-hard.² In other words, if we assume that our aim is to map an application composed of 16 tasks on a quad-core architecture, the solution space that will be explored to find the optimal mapping schema contains $4^{16} = 4 294 967 296$ possible solutions. Thereby, if evaluating one solution takes $0.1$ s, evaluating the overall solution space takes more than 13 years. Therefore, the solution space defined as the set of possible mapping schemes is likely to be very large and checking whether each candidate satisfies the problem’s constraints in addition to evaluating its fitness is unrealistic. While this exhaustive search technique is easy to implement and always leads to the optimal solution, its cost is prohibitive. Thus, the complexity involved in finding the optimal solution has been the main reason to accept good or near optimal solutions, that is, trading accuracy for a faster exploration. These trade-offs have motivated the use of several meta-heuristics to tackle this issue. Based on the “no-free-lunch theorem,”³ previous works, and a set of experiments conducted internally, Evolutionary algorithms proved their efficiency in solving our problem. However, this problem has not been widely studied in the case of HDMS compared to homogeneous systems. Taking the real-time constraints of FMSs into consideration makes it more complex.

In this work, by analyzing existing FMSs, we extract a series of key performance metrics. The solutions are rated according to three metrics: makespan, communication cost, and memory consumption. As these metrics might be conflicting, suitable trade-offs are considered. To the best of our knowledge, there is no other work in the literature that addresses all the following research axes simultaneously: heterogeneity of the target architecture, dependency among the tasks modeling the system, the respect of tight timing constraints, and multi-objective optimization. Accordingly, the main contribution of this paper is that we are providing an efficient framework for mapping real-time tasks implementing the FMS among the available components of the HDMS with respect to the hard real-time constraints and minimizing at the same time makespan, communication cost, and memory consumption.

The remainder of this paper is organized as follows. In Section 2 we briefly review some basic concepts. Section 3 discusses related work. In Section 4, we formally model the FMS in the context of our problem. In Section 5 we detail our implementation and we illustrate it by a set of experiments described in Section 6. Finally, Section 7 presents concluding remarks and lists our future work.

2. Background information

This section provides a broad overview of the basic concepts applied in this work. Thus, the reader who is familiar with such concepts may skip to the next section.

2.1. Full-mission simulators

In the fast growing aviation industry, military aircraft contribute substantially. Military training relies heavily on hands-on experience with the training engines whether it be aircraft, tanks, ships, or submarines. One could easily imagine the high cost of training new recruits on these different devices. The National Training Systems Association (NTSA) reports that for the fiscal year of 2000, flying an F-16 fighter cost an estimated $5, 000 per hour. That aside, the dangerous environment and difficulty of training might lead to human casualties as well as infrastructure damage that will cost billions of dollars. To mitigate these issues, simulators have long been employed in civil industry and especially the military to reduce cost and casualties. Simulators are nowadays used for all various types of military training, such as Air Force pilots to fly fighter aircraft and Apaches, Navy officers to navigate ships and submarines, etc. Even better, the trainees have easily adapted to the use of simulators as a training tool due to the fact that they come from a generation acquainted with simulations such as video games and virtual reality. These simulators will reproduce the sounds, motion, visual scenes, and represent all the other instruments and systems to create a realistic training environment. This eased the integration of simulators which are extremely cost effective and provide a safe environment. In the NTSA report, for training on an Apache helicopter, the cost is estimated at $3, 101 per hour; whereas on a simulator, the cost plummets to $70 per hour.⁴ FMSs are devices that artificially re-create aircraft flight and the external environment. These simulators are extremely complex, as they must replicate the equations that govern aerodynamics, flight controls, weapon load-out, and how the aircraft reacts to environmental factors such as air density, turbulence, wind shear, precipitation, etc.

2.2. Real-time systems

Systems are referred to as real-time when their correct behavior depends not only on the proper functioning of the operations they perform, but also on the time at which they are performed by respecting the system’s deadline.⁵ Therefore, in real-time applications, the timing requirements are the main constraints and controlling them is the predominant factor for assessing the quality of service.⁶

We distinguish two classes of real-time timing constraints: hard and soft, depending on the criticality of timing constraints. We consider a real-time system as hard when timing faults may cause some human or economic disaster. A real-time system is considered as soft when timing faults cause only some performance degradation. In terms of modeling, we use the term “tasks” to refer to the basic executable entities. These tasks could be periodic or aperiodic.

Regardless of the tasks’ behavior, it is important to state that some of them, once elected for execution, may be interrupted to allocate the processor to another one. Due to such behavior, they are called preemptive. On the other hand, when an elected task should not be interrupted before the end of their execution, it is called non-preemptive.

2.3. Multi-objective optimization

Problems with more than one objective have the distinction of being much more difficult to treat than their mono-objective equivalent. The difficulty lies in the absence of a ranking criteria to compare solutions. A solution may be better than another for some objectives and worse for others.⁷ The solutions found by a multi-objective optimization approach have to be optimal with respect to distinct objectives, which are typically conflicting. It is not possible to find an optimal solution that satisfies all objectives but rather a pool of efficient solutions characterized by the fact that their cost cannot be improved in one dimension without being worsened in another as depicted in Figure 1. That is why the concept of an optimal solution becomes less relevant in multi-objective optimization. These solutions form the Pareto optimal front referring to the economist Vilfredo Pareto.⁸

Figure 1.

Example of a Pareto frontier. The pentagons represent feasible solutions. Triangles are not on the Pareto Frontier because they are dominated by pentagons. Points A and B are not strictly dominated by any other, and hence do lie on the frontier.

Mathematically, the multi-objective optimization problem is defined as follows: $Ω =$ decision space; $Ψ =$ solution space; $X = (x_{1}, x_{2}, \dots, x_{n}) \in Ω$ ; X is a vector of n decision variables; $F = (f_{1}, f_{2}, \dots, f_{m})$ ; m is the number of objective functions; and:

Y = F (X) = (f_{1} (X), f_{2} (X), \dots, f_{m} (X)) = (y_{1}, y_{2}, \dots, y_{m}) \in Ψ

(1)

In the literature, many approaches have been developed to address this problem and could be classified into two main categories as follows.

2.3.1. Scalar or weight-based approach

The weight-based approach consists of formulating a single-objective optimization problem such that its optimal solutions are optimal solutions to the multi-objective optimization problem. This technique is one of the oldest techniques in multi-objective optimization using heuristics such as genetic algorithms (GAs)^9–11 and simulated annealing.¹² Since setting a weight vector leads to a single point, to find different solutions with various trade-offs, the optimization process is performed with different weight vectors which produce an extensive computational cost and the decision maker has to set the most suitable weight combinations to reproduce a representative part of these Pareto solutions. Furthermore, a main technical shortcoming of this approach is that the non-convex points of the Pareto front are unreachable.

2.3.2. Pareto approach

The Pareto approach directly uses the concepts of dominance in solution generation. Therefore, the Pareto optimum gives more than a single solution, but rather a set of solutions called non-dominant solutions. The main advantage of these approaches is the simultaneous optimization of conflicting objectives. Non-dominated Sorting Genetic Algorithm (NSGA)¹³, NSGA-II¹⁴, SPEA¹⁵ and SPEA-II¹⁶ are among the most known multi-objective algorithms based on this technique.

3. Related Work

It is now more than a quarter of a century since researchers started publishing papers on distributing computation across the execution resource of parallel architectures. This section is a sampling of related literature and is not meant to be exhaustive. It covers the different levels of parallelism, the system heterogeneity, the optimization objectives, and whether the methodology considers real-time constraints.

Braun et al.¹⁷ presented a comparison among static heuristics for mapping applications onto heterogeneous distributed computing systems aiming to minimize the makespan. Through this comparison, which included GAs, tabu search, and simulated annealing among others, the experimental results showed that GAs consistently gave the best results for the parameters and implementation used in this exhaustive study. Nevertheless, this work is limited to independent tasks and a single objective. Owing to the relevance of biologically inspired methodologies for scheduling, Tumeo et al.¹⁸ provided an ant colony optimization for mapping and scheduling in heterogeneous multiprocessor systems. Their solution showed 64% and 55% better results compared to simulated annealing and tabu search respectively. Lu and He¹⁹ proposed a PSO-based GA hybrid algorithm to schedule a set of tasks on a heterogeneous multi-processor system to minimize the makespan taking into account the precedence constraints. The schedule obtained outperforms the GA and is within 9% of the optimal schedule. Kang and Zhang²⁰ presented a hybrid GA for static task mapping and scheduling on heterogeneous systems with satisfaction of the precedence requirements of the application.

A study of the efficiency of multi-objective evolutionary algorithms (MOEAs) in the task mapping and scheduling on heterogeneous systems is performed with two different objectives in Chitra et al.,²¹ minimizing the makespan and the average flow-time. Devi and Anju²² proposed a multi-objective scheduling algorithm using an MOEA for scheduling a collection of dependent tasks on available resources in a multi-processor environment. NSGA-II is used to get the Pareto optimal solutions to minimize at the same time the makespan and the reliability cost. Rajeswari et al.²³ presented an efficient allocation and scheduling method using multi-objective GAs for independent tasks. Such a procedure minimizes the makespan and flow-time simultaneously in a distributed computing system.

Samur and Bulkan²⁴ presented a GA approach to solve a bi-criteria problem (makespan and tardiness) in homogeneous parallel machines and tried to make their method fairly general to be applied to some other bi-criteria objective functions. Saranya et al.²⁵ presented a parallel multi-objective GA for the task dispatching problem in a heterogeneous distributed computing environment. It exploits the inherent parallelism of GAs on a multi-core processor to optimize the result. Navaz and Ansari²⁶ considered two objectives: execution time and total cost. They used the R-NSGA-II approach which is based on an evolutionary computing paradigm and uses an epsilon dominance based MOEA. In Serafini and Dehuri,²⁷ a combinatorial multi-objective particle swarm optimization based algorithm is proposed to map a set of dependent tasks on heterogeneous systems with consideration of failure on processors and links. The obtained results outperform the NSGA-II. However, the authors don’t consider the tuning of the NSGA-II parameters.

Dorronsoro et al.²⁸ investigated the problem of multi-objective mapping on Grids, optimizing simultaneously the makespan and its robustness. For this purpose, four different MOEAs were studied. These algorithms are NSGA-II, MOEA/D, IBEA and MOCell. From their experiments, the latter leads to the best results compared to the others. Miryani and Naghibzadeh²⁹ scheduled hard real-time systems on heterogeneous multi-processor systems taking into account the precedence relationship between tasks. Likewise, the authors studied the mapping problem from a multi-objective perspective aiming to minimize the completion time and the number of processors.

After this brief description, Table 1 depicts a detailed comparison of these works. In terms of the comparison’s characteristics, we identify the hardware architecture, real-time aspects, communication between tasks, and optimization strategy.

Table 1.

Comparison of related work.

Characteristics	Hardware Architecture			Real-time Aspects		Communication	Optimization
Author	Heterogeneity	Distributed	Multi-processor	Real-time	Preemption	Dependency	Multi-objective	Algorithm	Objectives
Braun et al.¹⁷	✓	✓	✓	×	×	×	×	11 Heuristics	Makespan
Tumeo et al.¹⁸	✓	×	✓	×	×	✓	×	Ant colony	Makespan
Kang et al.¹⁹	✓	✓	✓	×	×	×	×	PSO-based Genetic algorithm	Makespan
Kang et al.²⁰	✓	×	✓	×	×	✓	×	Hybrid Genetic algorithm	Makespan
Chitra et al.²¹	✓	×	✓	×	×	✓	✓	Weight-based MOGA MOEA	Makespan flow-time Reliability cost
Devi et al.²²	✓	×	✓	✓	×	✓	✓	NSGA-II	Makespan Reliability Cost
Rajeswari et al.²³	✓	✓	×	×	×	×	✓	Weight-based Genetic algorithm	Makespan Flow-time
Samur et al. ²⁴	×	✓	×	×	×	×	✓	Weight-based Genetic algorithm	Makespan Tardiness
Saranya et al. ²⁵	✓	✓	✓	×	×	×	✓	Parallel multi-objective Genetic algorithm	Makespan Flow-time
Navas et al.²⁶	✓	✓	×	×	×	×	✓	NSGA-II	Makespan Total cost Reliability cost
Roy et al. ²⁷	✓	✓	✓	×	×	✓	✓	Combinatorial Multi-objective Particle swarm	System cost System reliability
Dorronsoro et al.²⁸	✓	✓	×	×	×	×	✓	NSGA-II, MOEA/D IBEA and MOCell	Makespan Robustness
Miriany et al.²⁹	✓	×	✓	✓	×	×	✓	MOGA	Completion time Number of processors
Our work	✓	✓	✓	✓	✓	✓	✓	NSGA-II	Makespan Communication cost Memory consumption

PSO: ; MOGA: ; MOEA: multi-objective evolutionary algorithm; NSGA-II: Non-dominated Sorting Genetic Algorithm II; IBEA: ; MOCell: .

Table 2.

GA parameters configuration.

GA Parameters	Value	GA Parameters	Value
Population size	100	Mutation probability	0.2
Number of generations	500	Selection operator	Tournament selection
Crossover operator	Single point crossover	Execution time ranges	$[0, 3]$ ms
Crossover probability	1.0	Communication cost ranges	$[0, 50]$ MB/s
Mutation operator	Uniform mutation	Memory consumption ranges	$[0, 10]$ kB

GA: genetic algorithm.

4. FMS model

An FMS is a very complex piece of software and exhibits a high-level of system integration. Several system models need to communicate data to other systems to replicate functions of the aircraft. When developing military flight simulator software, the development team has to rely on the competence of experts in different areas and with expertise from a multitude of domains, such as mechanics, power electronics, avionics, and more. The use of different models and their combination for the production of software is referred to as model-based design. This multi-domain modular approach to designing FMSs allows specialists to build, configure and test their modules concurrently, regardless of other modules. After initial testing, the modules are connected and characterized accordingly to form the complete simulation software for test and validation purposes. This latter process is called system integration. Such complex activity results in a network of sub-systems that need to be interconnected in order to deliver the platform’s intended functionality.

We aim to automate and optimize the integration process by efficiently mapping these real-time sub-systems (hereafter named tasks) implementing the FMS to the available resources of the HDMS. We formally define the graph modeling the FMS as $G = < T, E >$ , where $T = {t_{i} | i \in {1, \dots, n}}$ is a set of n vertices, each one representing a real-time task and $E = {e_{ij} | (i, j) \in {1, \dots, n} \times {1, \dots, n}}$ is a set of edges that represents the dependencies among these tasks. The edges are weighted by the amount of data exchanged between each pair of connected tasks. Such graphs are commonly referred to as task graphs. We assume that there are n tasks and m processors belonging to different machines. In this paper, the precedence constraints are not considered. Therefore, from the scheduling point of view, the tasks are independent. Also, we assume that all the tasks are preemptive. Each processing unit can execute one task at a time, so that the system can process m tasks concurrently. Each task is defined as a tuple $t_{i} = < i d_{i}, p_{i}, C_{i}, d_{i} >$ , where $i d_{i}$ is the ith task ID, $p_{i}$ is the task period i, $C_{i}$ is the task execution time vector modeling the expected execution time $c_{ij}$ to complete task $t_{i}$ on processor $p_{j}$ , and $d_{i}$ is the deadline of the task $t_{i}$ . In a mathematical formulation of the mapping problem, we define the function $Φ$ that assigns each of n tasks to any of the m processors:

\begin{array}{l} Φ : t_{1}, \dots, t_{n} \to p_{1}, \dots, p_{m} \\ \exists j \in [1, m], \forall i \in [1, n] s . t . Φ (t_{i}) = p_{j} \end{array}

(2)

We start by defining the tasks’ expected execution time on the HDMS. This information is defined by specifying a $(n \times m)$ matrix named the execution time matrix, where $c_{ij}$ is the expected execution time of the task i on processor j. This model expresses the execution heterogeneity among the HDMS resources. The elements along a row indicate the execution times of a specific task on the different processing units, and those along a column indicate the expected execution time of all tasks on a single node. We also need to define the amount of data exchanged between each task couple. This information is defined in a square matrix $(n \times n)$ that we call the communication cost matrix (CCM), where $CCM (i, j)$ is the weight of the edge connecting $t_{i}$ to $t_{j}$ and is defined as $e_{ij}$ . This matrix is known as the adjacency matrix. The CCM is an upper triangular matrix since the data exchanged between tasks $t_{i}$ and $t_{j}$ are the same as the amount of data exchanged between tasks $t_{j}$ and $t_{i}$ . The main diagonal entries are equal to zero. Another important feature to consider in our model is the memory consumption of each task. This information is defined in a memory consumption vector with n elements where each term $m_{i}$ describes the amount of memory needed to execute task $t_{i}$ . For the scheduling strategy, we are using the Rate Monotonic algorithm.³⁰ This is a fixed-priority algorithm that assigns priorities to tasks according to their periods. The static priorities are assigned on the basis of the task’s period: the shorter the cycle duration is, the higher the task’s priority:

p_{i} \leq p_{j} \Leftrightarrow Π_{i} \geq Π_{j}

(3)

Liu and Layland³⁰ proved that a feasible schedule, that will always meet deadlines, exists if the processor utilization U is below a specific bound. For each task set $Θ_{j}$ composed of k tasks assigned to processor j, this test has to be performed with success. This test has to succeed in all the processors of the target architecture to obtain a valid mapping solution. Thus, a necessary condition to verify the schedulability of $Θ_{j}$ based on the processor utilization bound is defined by:

U = \sum_{i = 1}^{k} \frac{c_{ij}}{p_{i}} \leq k . (2^{\frac{1}{k}} - 1)

(4)

5. Proposed solution

To achieve a valid mapping solution respecting all the timing constraints and minimizing the defined fitness functions, our solution uses an approach involving two phases. Firstly, we assign the tasks implementing the FMS onto the available resources of the target architecture depending on the objective functions. Then, we perform the schedulability test of the assigned tasks to each processor of the execution platform to ensure the correctness of the solution from a scheduling point of view. The system model and the mapping approach are depicted in Figure 2.

Figure 2.

Mapping strategy of the FMS on the target platform.

Our optimization approach is implemented on top of the NSGA-II taking into account schedulability. The studied fitness functions of our implementation are set out in more detail below. We are using a closed-form expression for the three objectives to avoid the timing cost raised by evaluation using simulation.

Makespan (overall execution time) is the total length of the schedule when all the tasks on each processor finish their processing. Makespan is the time interval between the start of the first task and the completion time of the last task of $Θ_{j}$ . In this example, the makespan fitness function is defined as:

f_{1} = min (max_{1 \leq j \leq m} (\sum_{i \in Θ_{j}} c_{ij}))

(5)

In Figure 3 an example of scheduling eight tasks on two processors is provided. The makespan is equal to 7.

Figure 3.

Example of scheduling eight tasks on two processors.

Communication cost. One of the most important sources of overheads in the computation time in HDMS occurs from data transmission between processing units in the network and reducing it represents the second fitness function of this study. A mathematical formulation of this fitness function is given below:

f_{2} = min (\sum_{i = 1}^{n} (\sum_{j = 1}^{n} (e_{ij}))) / i \in Θ_{α}; j \in Θ_{β}; α \neq β

(6)

Memory consumption. The third objective is minimizing the total amount of memory consumption per processor. In other words, we will distribute the memory consumption between processing elements. Seen from this angle, the problem is considered as a variant of the bin packing problem³¹ in which the bins refer to the amount of memory required to execute each task. The memory consumption fitness function is defined as:

f_{3} = min (max_{1 \leq j \leq m} (\sum_{i \in Θ_{j}} m_{i}))

(7)

In the bio-inspired GA, a given population represents a set of solutions to the problem and new generations are created through genetic operators. According to the evolution theory, only the strongest individuals of the population are likely to survive and generate offspring, transmitting their biological inheritance to the next generation. NSGA-II varies from GAs not only in the fact that it addresses multi-objective optimization problems but also in the selection operation. It also gives the non-dominated solutions belonging to or near the Pareto front in one single run. Before selecting a number of individuals to apply the genetic operators on, the population is ranked on the basis of the non-dominance and crowding distance concepts. Below, we give more details about how we are implementing and adapting the NSGA-II to our problem to minimize the fitness functions and deliver valid solutions from a scheduling perspective.

The pseudo-code of the proposed solution is given in Algorithm 1.

Algorithm 1: Schedulability-based NSGA-II.

1 Initialize problem parameters;

2 Initialize genetic algorithm parameters;

3 Generate initial population with valid individuals;

4 Evaluate individuals of the initial population;

5 Non-dominated Sorting of the initial population;

6 while gen≤Number of Generationsdo

7 Apply Tournament Selection to create the Mating Pool;

8 while j ≤Size of Intermediate populationdo

9 Select two parents from the Mating Pool Perform a crossover;

10 Perform a mutation;

11 Move to Valid Solution;

12 Insert the Valid Solution to the Intermediate population;

13 j = j + 1;

14 end

15 Recombination of the Old and Intermediate populations;

16 Evaluate individuals of the New population;

17 Non-dominated Sorting of the New population;

18 gen = gen + 1;

19 end

Individual encoding. Each solution is represented as an array of size equal to the number of tasks and the entry in position i indicates the processor allocated to task $t_{i}$ . The genes in these chromosomes are in the range $[1 . . m]$ . We are assuming an arbitrary topology for the physical connectivity of nodes in the distributed system. Figure 4 gives a chromosome example where a 10-task graph is mapped onto four processors.

Figure 4.

Example of chromosome (i.e., solution) encoding.

Initial population. A population is a collection of valid individuals. The initial population P consists of $| P |$ randomly generated valid solutions.

Non-dominated sorting. NSGA-II uses the Pareto-based approach in its selection process and the quality of a given solution is based on its dominance indicator. Non-dominated sorting is a process in which the solutions of a given generation are assigned to different fronts using the dominance indicator of this solution compared to all the solutions of the current population. If we consider that $F_{i}$ is the front i where $rank (F_{i}) = i$ , then all the non-dominated solutions of P are assigned to front $F_{1}$ . Then, all the non-dominated solutions of $P - F_{1}$ will be assigned to $F_{2}$ and so on. We repeat this process until we assign all the solutions of P to the different fronts. Figure 5 gives an example of non-dominated sorting of a small population.

Figure 5.

Non-dominated sorting.

Crowding distance. Once we finish the non-dominated sorting of the current population, the crowding distance is calculated separately for the different fronts. It is defined as the distance between the two closest solutions on either side of a solution along each objective axis. This indicator provides a density estimation of solutions on each front and allows us to preserve diversity during the selection process. The crowding distance is calculated by:

crowdis t_{i} = \sum_{m = 1}^{M} \frac{f_{m}^{i + 1} - f_{m}^{i - 1}}{f_{m}^{max} - f_{m}^{min}}

(8)

Tournament selection. We are using a binary tournament selection operator based on the niched-comparison operator. Since this operator requires both the rank and crowding distance of each solution in the population, the comparison between two chromosomes randomly picked from the population to be part of the mating pool is carried out as follows

The individual with lesser rank is selected if $ran k_{i} \neq ran k_{j}$ .

If the two individuals belong to the same front $(ran k_{i} = ran k_{j})$ , the individual with the greater crowding distance is selected.

Crossover. After selecting individuals, crossover involves crossing random couples of the mating pool to produce the individuals of the intermediate population. The child chromosome inherits the genetic material of its parents. Our operator is a single point crossover in which a point is randomly selected. The genes of the offspring are copied from one section of the first parent and the other section from the second parent, as depicted by Figure 6.

Figure 6.

Example of single point crossover operator.

Mutation. The mutation operator is applied with a much lower probability than the crossover operator. A new chromosome is created by copying a randomly picked chromosome and changing one or more of its genes, as depicted in Figure 7. For the mutation, we are using a uniform operator which adds a unit rectangular distributed random value to the chosen gene.

Figure 7.

Example of uniform mutation operator.

Movement to valid solutions. Since our work aims not only to optimize the predefined fitness functions but also to respect the timing constraints of the FMS using the schedulability test, the validity of the obtained solution is a key issue. Since our crossover and mutation operate randomly on the input solutions, we cannot guarantee that all the processors of the new offspring will succeed their schedulability tests. Therefore, we propose a local search algorithm allowing movement from the obtained invalid solution to a valid one in its neighborhood by applying local changes. We are assuming that the target architecture is able able to run the overall simulation. Therefore, we are ensuring the existence of valid solutions in our solution space. However, to ensure that the algorithm does not fall in an infinite loop without finding any valid solution in the neighborhood, we defined a maximum number of attempts. Thus, if after this predefined number of attempts the search algorithm fails to find a valid configuration, the initial invalid solution is rejected. The pseudo code of the movement towards valid mapping configuration is given in Algorithm 2.

Algorithm 2: Movement to valid solution.

Input: Invalid Mapping Configuration

Output: Valid Mapping Configuration

1 iter = 0;

2 do

3 while i < Number of processorsdo

4 Find the tasks mapped in the processor i;

5 Test = Apply schedulability test in the processor i;

6 if Test=failthen

7 while Test=faildo

8 Put the last task mapped in processor i in the Queue;

9 Remove the last task from processor i;

10 Test = Apply schedulability test in the processor i;

11 end

12 else

13 while test=successdo

14 Add the last element of the Queue to the processor i

15 Test = Apply schedulability test in the processor i;

16 if test=successthen

17 Remove the last task from the Queue;

18 else

19 Remove the added task from processor i;

20 end

21 end

22 end

23 i = i + 1;

24 end

25 i = 0;

26 iter = iter + 1;

27 while Queue.size≠ 0 oriter <

ite r_{\max}

;

28 if iter =

ite r_{\max}

then

29 Reject solution;

30 end

Recombination. After applying the genetic operators to the individuals of the mating pool obtained by tournament selection, the individuals of the intermediate population are combined with the current population and a selection is performed to set the individuals of the next generation. If by adding all the individuals in front $F_{j}$ all the individuals of the intermediate population are placed, then individuals in front $F_{j}$ are selected based on their crowding distance in descending order until the last solution is added in the population of the next generation. At this stage, the individuals of the new population are sorted based on non-dominance and crowding distances are updated. This process is repeated until reaching the stopping criterion.

6. Experimental study

This section is dedicated to the computational experiments in which the performance of our framework is evaluated. For this purpose, a protocol test is developed in which the data are randomly generated. During this phase, we opted for a procedure to generate values close to that of a real FMS. First, the execution time of tasks implementing the FMS are randomly generated in the range $[0, 3]$ ms. Second, the amount of exchanged data between each pair of tasks is generated with a uniform distribution on $[0, 50]$ MB/s. Then, for the memory consumption, the values are in the interval $[0, 10]$ kB. Afterward, the periods are generated in such a way that the fastest task will run with a period equal to 16 ms. This choice is based on visualization purposes since the FMS runs 30 fps on 60 Hz screens.

Since it is a variant of GAs, one of the main issues of NSGA-II is the parameter tuning process to improve the quality of the obtained solutions and minimize the convergence time. To calibrate these genetic discrete parameters, we used a star plan in which we start from an initial configuration of these values. Then, each parameter covers a sample of its possible values while the others are held fixed at their latest values. When the performance metrics reach their best rates, the GA parameter under tuning is fixed to its current value and we pass to the next parameter, and so on. This operation has been processed based on a FMS modeled by 100 tasks and 10 heterogeneous machines with four processors each. Thence, a summary of the genetic parameters values yielded the best results on average which will be used to guide all our experiments.

In the whole experimental process, the focus is on measuring the impact of our mapping methodology on improving the FMS performance metrics. To quantify the algorithm sensitivity to the problem parameters (number of machines, number of tasks, connectivity degree, etc), 30 independent runs are performed for each of the test sets in order to guarantee a Gaussian approximation for the distribution of the reported averages. Then, we compute the average accuracy values. Since, the studied fitness functions have different scales, we apply feature scaling to create shifted and scaled versions of these values to facilitate the comparison and analysis tasks.

In Figure 8, we present the sensitivity of the different fitness functions to the number of machines and the number of tasks implementing the FMS. The study shows that increasing the number of machines of the HDMS helps in reducing the makespan, and the memory consumption, which is expected. For the communication cost, the algorithm will try to place tasks that communicate frequently on the same processor or machine to increase the data locality. By doing so, it creates a conflict with the previously mentioned fitness functions. Nonetheless, increasing the number of tasks implementing the FMS has an opposite impact on the mapping metrics by increasing them.

Figure 8.

Impact of varying the number of machines and the number of FMS tasks.

Another crucial parameter that differs from one FMS to another is the connectivity degree of the tasks implementing a given simulator. In the graph modeling the FMS, two tasks are connected if there is an edge between them. We define the connectivity of a task as the average number of tasks directly connected to it.

For this parameter, we studied different connectivity ratios to cover a wide range of simulator categories going from five to 50 for an FMS modeled by 100 tasks. From Figure 9, we note that the connectivity degree has a perfect linear correlation relationship with the communication cost fitness function with a correlation coefficient $\approx 1$ . On the other hand, we notice a minor impact on the other objectives. This slight impact is explained by the effect of this parameter on the trade-offs.

Figure 9.

Impact of varying the connectivity degree.

In Figure 10 we show the progression of the three fitness functions during the evolution process. We notice that as the number of generations increases, the fitness functions improve gradually while keeping a trade-off between the different metrics. Since we are using the optimal number of generations found by the tuning process, the figure does not show the steady state of the fitness functions after $Gen = 500$ .

Figure 10.

Impact of varying the connectivity degree.

Finally, Figure 11 shows the non-dominated solutions obtained in a single run by NSGA-II. Each point in the three-dimensional space is a valid mapping configuration near or belonging to the Pareto front and respecting all the predefined timing constraints of the FMS. It is up to the decision makers to choose the most suitable one for their needs.

Figure 11.

Non-dominated solutions.

7. Conclusion and future work

In this paper, we propose a new framework enabling a multi-objective mapping of real-time tasks implementing the FMS on HDMSs. The aim of our approach is to simultaneously minimize the makespan, the communication cost, and the memory consumption ensuring that the hard timing constraints are met. We showcased that exact algorithms cannot solve our problem, and thus we studied the use of MOEAs, and we opted for NSGA-II in our study. This paper also paves the way for future research building upon its contributions. Our attempt in future work will be, from one side, to migrate to NSGA-III for more flexibility and scalability when integrating new objectives. From the other side, to implement a tool to select the most representative sample of non-dominated solutions to facilitate the task of the decision maker. A deep study of the quality indicators is needed for this purpose.

Footnotes

Funding

This work was partially supported by CAE Inc.

Author biographies

Ecole Polytechnique de Montreal, department of Computer and Software Engineering. Her research interests are in the areas of optimization, artificial intelligence and real-time systems.

Alexandra Aguiar holds a Ph.D. and a Master’s degree in Computer Science from the Pontifical Catholic University of Rio Grande do Sul (PUCRS), Brazil (2013 and 2008, respectively). From 2011 to 2014 she acted as an assistant professor at PUCRS teaching undergraduate courses, such as Embedded Systems and Operating Systems. She also was one of the leaders of Hellfire Project in the Embedded Systems Group at PUCRS. Her research experience is mainly focused on multiprocessed embedded systems, real-time operating systems and virtualization of embedded systems. Currently, she works as a post-doctorate fellow at École Polytechnique de Montral with focus on mapping and scheduling of multiprocessed hard real-time systems.

Patricia Gilbert is a Software Development Lead at CAE in the Global Development Engineering department. She obtained her B. Eng. degree in Electrical Engineering from École Polytechnique de Montréal in 2003, with a Space Technologies specialization. She began working at CAE inc. soon after as an Aircraft System Software Specialist. She later occupied various technical coordination roles on large scale simulator prototype development projects. Since 2015, she leads an engineering team developing debrief and analytics technologies targeting the advancement of the flight simulation training experience.

Michel Galibois is a software architect at CAE in the Global Development Engineering department. He obtained his B. Eng. Degree in Electrical Engineering from École Polytechnique de Montréal in 1992. He has worked several years as an Avionics software engineer and as a flight training devices software integration specialist. He has now specialized in electronic interface systems and cloud infrastructures. He has been granted multiple patents for his interface systems innovations.

Jean-Pierre Rousseau ing. (CAE inc) is a system and software architect for operational systems projects that graduated at Ecole Polytechnique de Montreal in Electrical Engineering, concentration in Avionic (1998). He started at CAE in 1998 as an Avionics software engineer. Since 2009, he has participated to several operational systems R&D programs for the Defence and Security division. These programs cover various subjects from enhanced visual system to unmanned aerial system operation to anti-submarine warfare.

Giovanni Beltrame received the MSc degree in Electrical Engineering and Computer Science from the University of Illinois, Chicago, in 2001, the Laurea degree in Computer Engineering from the Politecnico di Milano, Italy, in 2002, the MTech degree in Embedded Systems Design from CEFRIEL, Milan, in 2002, and a Ph.D. in Computer Engineering from the Politecnico di Milano, in 2006. After his PhD he worked as microelectronics engineer at the European Space Agency (where he was awarded the title of ESA Research Fellow) on a number of projects spanning from radiation-tolerant systems to computer-aided design. In 2010 he moved to Montreal, Canada where he is currently Associate Professor at Polytechnique Montreal in the Department of Computer and Software Engineering. His research interests include modeling and design of embedded systems, artificial intelligence, and robotics.

Gabriela Nicolescu is professor at Polytechnique Montreal in the Department of Software and Computer Engineering. She obtained her B. Eng. Degree in Electrical Engineering from UPB (Polytechnic University Bucharest) in 1998 and her PhD Degree in 2002 from INPG (Institut National Polytechnique de Grenoble) France. Her research interests are related to the design methodologies, programming models and security for advanced heterogeneous systems on chip integrating advanced technologies. She co-authored more than 160 papers including journal articles, conference papers, books, book chapters and patents.

References

Khokhar

Prasanna

Shaaban

. Heterogeneous computing: challenges and opportunities. Computer 1993; 26: 18–27.

Kang

Song

Task assignment in heterogeneous computing systems using an effective iterated greedy algorithm. J Syst Softw 2011; 84: 985–992.

Wolpert

Macready

WG.

No free lunch theorems for optimization. IEEE Trans Evol Comput 1997; 2: 67–82.

Kennedy

Simulation Reshaping Military Training. Report, National Defence Magazine, USA, November 1999.

Davis

Burns

A survey of hard real-time scheduling for multiprocessor systems. ACM Comput Surv 2011; 43: 1–44.

Cottet

Delacroix

Kaiser

. Scheduling in real-time systems. Wiley, 2002.

Chen

Implementation of multi-objective evolutionary algorithm for task scheduling in heterogeneous distributed systems. J Softw 2012; 7: 1367–1374.

Ehrgott

. Vilfredo pareto and multi-objective optimization. In: Optimization stories: 21st international symposium on mathematical programming (ed Grötschel

), Berlin, Germany, 19–24 August 2012, pp.447–453. Bielefeld, Germany.

Syswerda

Palmucci

. The application of genetic algorithms to resource scheduling. In: Proceedings of the fourth international conference on genetic algorithms, San Diego, 13–16 July 1991, pp. 502–508. San Francisco, USA: Morgan Kaufmann Publishers.

10.

Jakob

Gorges-Schleuter

Blume

. Application of genetic algorithms to task planning and learning. In: Parallel Problem Solving from Nature, 2: Proceedings of the Second Conference on Parallel Problem Solving from Nature, Brussels, Belgium, 28–30 September, 1992. New York: Elsevier Science Inc. 1992, pp.291–300.

11.

Yang

Gen

. Evolution program for bicriteria transportation problem. In: 16th international conference on computers and industrial engineering, Ashikaga, Japan, 7–9 March 1994, pp.451–454. Elsevier.

12.

Serafini

. Simulated annealing for multiple objective optimization optimization problem. In: Tenth international conference on multiple criteria decision making, Taipei, Taiwan, 1992, pp.87–96. New York: Springer-Verlag.

13.

Srinivas

Deb

Muiltiobjective optimization using nondominated sorting in genetic algorithms. IEEE Trans Evol Comput 1994; 2: 221–248.

14.

Kalyanmoy

Amrit

Sameer

. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput 2002; 6: 182–197.

15.

Zitzler

Thiele

Multiobjective evolutionary algorithms: A comparative case study and the strength pareto approach. IEEE Trans Evol Comput 1999; 3: 257–271.

16.

Guliashki

Toshev

Korsemov

Survey of evolutionary algorithms used in multiobjective optimization. Prob Eng Cyb Robotic 2009; 60: 42–54.

17.

Braun

Siegel

Beck

. A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J Parallel Distrib Comput 2001; 61: 810:837.

18.

Ferrandi

Lanzi

Pilato

. Ant colony optimization for mapping and scheduling in heterogeneous multiprocessor systems. In: Embedded computer systems: International conference on Architectures, modeling, and simulation, 2008. SAMOS 2008, Samos, Greece, 21–24 July 2008, pp.142–149. IEEE.

19.

Kang

A pso-based genetic algorithm for scheduling of tasks in a heterogeneous distributed system. J Softw 2013; 8(6): 1443–1444.

20.

Kang

Zhang

A hybrid genetic scheduling algorithm to heterogeneous distributed system. J Appl Math 2012; 3: 750–754.

21.

Chitra

Revathi

Venkatesh

. Evolutionary algorithmic approaches for solving three objectives task scheduling problem on heterogeneous systems. In: Advance computing conference (IACC), 2010 IEEE 2nd international, Patiala, India, 19–20 February 2010, pp.38–43. IEEE.

22.

Devi

Anju

Multiprocessor scheduling of dependent tasks to minimize makespan and reliability cost using nsga-ii. Int J Found Comput Sci Technol 2014; 4: 27–39.

23.

Rajeswari

Kumar

Kanaga

Efficient scheduling using multi-objective genetic algorithm for independent task. Int J Adv Res Comput Sci Softw Eng 2014; 4: 272–277.

24.

Samur

Bulkan

. An evolutionary solution to a multi-objective scheduling problem. In: Proceedings of the world congress on engineering, London, UK, 30 June–2 July 2010. Hong Kong: Newswood Limited.

25.

Saranya

Revathi

Chitra

. Scheduling independent tasks on heterogeneous distributed computing systems using multiobjective optimization approach on multicore processors. In: International conference on Advances in computing, control, telecommunication technologies, 2009. ACT, Trivandrum, India, 28–29 December 2009, pp.481–483. IEEE.

26.

Navaz

Ansari

An evolutionary algorithm in grid scheduling by multi-objective optimization using variants of nsga. IJSRP 2012; 2: 1–5.

27.

Serafini

. Simulated annealing for multiple objective optimization optimization problems. In: Multiple Criteria Decision Making, Springer, 1994, pp.283–292.

28.

Dorronsoro

Bouvry

Cañero

. Multi-objective robust static mapping of independent tasks on grids. In: IEEE world congress on computational intelligence, Barcelona, Spain, 18–23 July 2010, pp.1–8. IEEE.

29.

Miryani

Naghibzadeh

. Hard real-time multiobjective scheduling in heterogeneous systems using genetic algorithms. In: Proceedings of the 14th international CSI computer conference (CSICC’09), Tehran, Iran, 20–21 October 2009, pp.437–445. IEEE.

30.

Liu

Layland

Scheduling algorithms for multiprogramming in a hard-real-time environment. J ACM 1973; 20: 46–61.

31.

Coffman

Jr Garey

Johnson

DS.

An application of bin-packing to multiprocessor scheduling. SIAM J Comput 2011; 7: 1–17.