Multi-fidelity black-box optimization for time-optimal quadrotor maneuvers

Abstract

We consider the problem of generating a time-optimal quadrotor trajectory for highly maneuverable vehicles, such as quadrotor aircraft. The problem is challenging because the optimal trajectory is located on the boundary of the set of dynamically feasible trajectories. This boundary is hard to model as it involves limitations of the entire system, including complex aerodynamic and electromechanical phenomena, in agile high-speed flight. In this work, we propose a multi-fidelity Bayesian optimization framework that models the feasibility constraints based on analytical approximation, numerical simulation, and real-world flight experiments. By combining evaluations at different fidelities, trajectory time is optimized while the number of costly flight experiments is kept to a minimum. The algorithm is thoroughly evaluated for the trajectory generation problem in two different scenarios: (1) connecting predetermined waypoints; (2) planning in obstacle-rich environments. For each scenario, we conduct both simulation and real-world flight experiments at speeds up to 11 m/s. Resulting trajectories were found to be significantly faster than those obtained through minimum-snap trajectory planning.

Keywords

Multi-fidelity BayesOpt quadrotor motion planning time-optimal trajectory

1. Introduction

In recent years, fast navigation of autonomous vehicles has received increasing interest. For several years an autonomous drone racing competition has been organized at IROS (Moon et al., 2019), and last year the AlphaPilot challenge introduced a similar competition to a more general audience (Guerra et al., 2019). Currently and in the near future, state-of-the-art algorithms in estimation and control are reaching a level of maturity, such that the bounds of what is physically possible with a given vehicle are approached. This presents the need for trajectory planning algorithms that fully exploit the capabilities of the vehicle and take into account the intricate limitations of vehicle dynamics, instead of relying on simplified models.

In this paper, we consider the problem of generating, i.e., planning, a dynamically feasible, time-optimal quadrotor trajectory. By definition, such an optimal trajectory is found at the boundary of the set of feasible trajectories. Hence, precise knowledge of the dynamic feasibility constraints is required to find the time optimum. This complicates the problem, as these feasibility constraints can become highly complex in light of high-acceleration flight and aggressive attitude changes, as required to achieve time optimality. The demanding maneuvers are affected by flight dynamics, but also by hardware and software for control and state estimation. The resulting set of non-convex feasibility constraints with memory (i.e., depending on present and past states and control inputs) cannot readily be incorporated in a typical trajectory planner for two main reasons. First, the feasibility constraints are not easily expressed in a convenient way, e.g., as constraints on an admissible set of control inputs and states. Instead, the feasibility of the trajectory must be considered in a holistic manner. Second, in most scenarios, precise modeling of these constraints is only possible through real-world experiments. As these experiments aim to seek the boundary of the feasible set, they are risky and potentially costly and should, thus, be kept to a minimum. In contrast to this objective, many trajectory optimization schemes rely on a large number of evaluations. These two issues form the main motivation for our algorithm, which uses a multi-fidelity optimization technique that can approximate the system feasibility constraints based on a limited number of experiments. It uses a Gaussian process (GP) black-box model to classify candidate trajectories as feasible or infeasible and is thereby able to plan increasingly fast trajectories as the model improves. An overview of the algorithm is shown in Figure 1.

Fig. 1.

Overview of the proposed algorithm that models dynamic feasibility constraints based on simulation and flight data to efficiently find the time-optimal trajectory.

This paper contains several contributions. First, we propose an algorithm for modeling of quadrotor feasibility constraints and generation of time-optimal trajectories based on GP classification. Second, we extend the applicability of the multi-fidelity deep Gaussian process kernel (multi-fidelity deep Gaussian process itself is a particular method) from the regression problem to the classification problem in order to obtain a multi-fidelity GP classification algorithm that can incorporate evaluations from analytical approximation, numerical simulation, and real-world flight experiments. Third, we design an acquisition process specifically tailored to experimental robotics. The acquisition function takes into account the additional cost of infeasible evaluations, as these may pose a threat to the vehicle and its surroundings. Candidate data points are generated using minimum-snap perturbations in order to maintain feasibility. Finally, we present an extensive evaluation of the proposed algorithm in both simulation and real-life flight experiments at speeds up to 11 m/s. A video of these experiments can be found in Extension 1. It is found that optimized trajectories are significantly faster than those obtained through minimum-snap trajectory planning.

An initial version of this paper was presented at Robotics: Science and Systems 2020 (Ryou et al., 2020). In the current work, we extend our algorithm to enable time-optimal trajectory planning in obstacle-rich environments based on convex polygonal decomposition. We also present additional simulation and flight experiments for a total of 12 distinct trajectories, as well as more elaborate analysis of said experiments.

The outline of the paper is as follows. In Section 2, we present the problem definition, and preliminaries on trajectory planning and Bayesian optimization. We detail our algorithm for the generation of time-optimal trajectories using iterative experiments in Section 3. In Section 4, we present experimental results from both simulation and real-world flights. Finally, we provide conclusions in Section 5.

2. Preliminaries

2.1. Quadrotor trajectory planning

We consider the problem of planning a time-optimal quadrotor trajectory. A trajectory $p : R_{\geq 0} \to R^{3} \times T$ , with $T$ the circle group, is a continuous function that maps time to position and yaw, i.e., $p (t) = {[\begin{matrix} p_{r} {(t)}^{T} & p_{ψ} (t) \end{matrix}]}^{T}$ . We define the following time-optimal planning problem:

\begin{matrix} \underset{p, T}{minimize} & T \\ subject to & p (0) = {\tilde{p}}^{start} \\ p (T) = {\tilde{p}}^{end} \\ p \in F \\ p \in P_{T} \end{matrix}

(1)

where ${\tilde{p}}^{start}$ and ${\tilde{p}}^{end}$ are the start and end points, respectively, each consisting of a prescribed position and yaw angle, i.e., $\tilde{p} = {[\begin{matrix} {\tilde{p}}_{r}^{T} & {\tilde{p}}_{ψ} \end{matrix}]}^{T} \in R^{3} \times T$ , and $F$ denotes the set of trajectories that satisfy the relevant geometric constraints, e.g., for obstacle avoidance. The function space $P_{T}$ is the set of all feasible trajectories, i.e., all trajectory functions that the quadrotor can successfully track over $[0, T]$ . Using differential flatness of the idealized quadcopter dynamics, we know that for any feasible trajectory, $p_{r} \in C^{4}$ and $p_{ψ} \in C^{2}$ over the interval $[0, T]$ (Mellinger and Kumar, 2011). However, more constraints should be incorporated when considering fast and agile trajectories. Critically, the complete system including aerodynamics, sensor and actuation hardware, and estimation and control software must be considered, instead of solely the idealized vehicle dynamics. This is necessary because all aspects of the system affect tracking performance during demanding maneuvers, e.g., as follows.

Significant aerodynamic forces and momenta act on the vehicle and need to be accounted for in control inputs. In contrast, vehicle aerodynamics can typically be neglected in low-speed flight.

State estimation errors due to delay and phase lag are exacerbated by the fast-changing vehicle state. Moreover, sensors measurements may incur additional noise due to large accelerations.

Actuation delay and bandwidth limitations, such as the mechanical time constant of the motors, inhibit tracking of fast-changing control inputs.

Battery internal resistance causes voltage to drop under large currents. Consequently, very high motor speeds can only be maintained for limited consecutive time periods.

What results is a set of non-convex feasibility constraints with memory (i.e., depending on present and past states and control inputs). These characteristics of $P_{T}$ make the planning problem hard to solve, even when using non-linear optimization techniques such as direct collocation or shooting methods. In addition, it is important to note that, in practice, no trajectory can be tracked perfectly due to stochastic sensing and actuation imperfections, so the definition of $P_{T}$ must incorporate an error bound.

Popular methods for trajectory planning avoid the complicated feasibility constraints by reformulating the optimization problem such that feasibility is the objective instead of a constraint (Mellinger and Kumar, 2011; Richter et al., 2016). In practice, this is achieved by minimizing high-order derivatives of the trajectory function. In particular, minimizing the fourth-order derivative of position, i.e., snap, and the second-order derivative of yaw results in a minimization of the required angular accelerations. This reduces trajectory agility and may thereby prevent activation of the aforementioned feasibility constraints. Before we detail the resulting minimum-snap optimization problem, we describe two formulations to incorporate geometric constraints imposed by $F$ . For certain applications, like mapping and surveillance, one may opt to define a sequence with $m - 1$ intermediate waypoints, i.e., $\tilde{p} = [\begin{matrix} {\tilde{p}}^{start} & {\tilde{p}}^{1} & \dots & {\tilde{p}}^{m - 1} & {\tilde{p}}^{end} \end{matrix}]$ , to be attained in order. The corresponding constraints take the form

p (\sum_{j = 1}^{i} x_{j}) = {\tilde{p}}^{i}, i = 1, \dots, m - 1

(2)

where $x = [\begin{matrix} x_{1} & \dots & x_{m} \end{matrix}]$ denotes the time allocation over trajectory segments, e.g., $x_{i}$ is the time allocated for the segment between waypoints ${\tilde{p}}^{i - 1}$ and ${\tilde{p}}^{i}$ . When planning trajectories through obstacle-rich environments, it may be preferable to incorporate collision avoidance constraints instead of waypoints. For instance, Tang et al. (2020) and Gao et al. (2018) utilized a series of convex boxes to describe the obstacle-free space, and Deits and Tedrake (2015) decomposed the obstacle-free space into a set of convex polytopes through semi-definite programming. A sequence of $m$ obstacle-free convex polytopes to pass through can be incorporated in the trajectory optimization as follows:

A_{i} p (t) \leq b_{i}, \forall t \in [\sum_{j = 1}^{i - 1} x_{j}, \sum_{j = 1}^{i} x_{j}], i = 1, \dots, m

(3)

where matrix $A_{i} \in R^{d_{i} \times 3}$ and vector $b_{i} \in R^{d_{i}}$ constrain the trajectory segment to be within a polytope of $d_{i}$ faces. The methods for construction and sequencing of these obstacle avoidance polytopes are described in Section 3.4.

The minimum-snap trajectory optimization problem is given by

\begin{matrix} \underset{p}{minimize} & σ (p, \sum_{i = 1}^{m} x_{i}) \\ subject to & p (0) = {\tilde{p}}^{start}, \\ p (\sum_{j = 1}^{m} x_{j}) = {\tilde{p}}^{end}, \\ p \in F, \end{matrix}

(4)

where

σ (p, T) = \int_{0}^{T} μ_{r} ‖ \frac{d^{4} p_{r}}{d t^{4}} ‖^{2} + μ_{ψ} {(\frac{d^{2} p_{ψ}}{d t^{2}})}^{2} dt

(5)

with $μ_{r}$ and $μ_{ψ}$ weighing parameters. The final constraint in (4) is given by either (2) or (3). By utilizing a piecewise polynomial function to describe the trajectory, differentiability constraints can be guaranteed by appropriately selecting the order of continuity between segments, and the optimization problem (4) can be formulated as a quadratic program (Mellinger and Kumar, 2011). In case of waypoint constraints, efficient solvers for unconstrained quadratic programming can be used (Richter et al., 2016). For convenience, we define the function that gives the minimizer trajectory of (4) for the time allocation $x$ as follows:

p = χ (x, \tilde{F})

(6)

where $\tilde{F}$ represents either $\tilde{p}$ (in case of waypoint constraints), or $({\tilde{p}}^{start}, {\tilde{p}}^{end}, A, b)$ with $A$ and $b$ containing all $A_{i}$ and $b_{i}$ , respectively (in case of polytope constraints).

Minimum-snap trajectory generation algorithms employ a two-step process based on (6). First, the minimum-snap trajectory for a (very large) initial guess of the total trajectory time is found, as follows:

\begin{matrix} \underset{x \in R_{\geq 0}^{m}}{minimize} & σ (χ (x, \tilde{F}), T) \\ subject to & T = \sum_{i = 1}^{m} x_{i} \end{matrix}

(7)

Next, the obtained time allocation is scaled down, i.e. ,

\begin{matrix} \underset{η \in R_{> 0}}{minimize} & T \\ subject to & T = \sum_{i = 1}^{m} η x_{i} \\ χ (η x, \tilde{F}) \in P_{T} \end{matrix}

(8)

The feasibility constraint is typically evaluated based on control inputs obtained from differential flatness of the idealized quadrotor dynamics (Mellinger and Kumar, 2011). Although this method can generate fast, feasible trajectories, it does not attain time-optimality for several reasons. First, the time allocation ratio obtained in (7) may not be optimal as the total trajectory time is decreased, i.e., there may exist an alternative allocation ratio that will enable lower feasible total trajectory time. Second, the method uses a simple feasibility model that fails to consider the trajectory as a whole, as is required for a realistic feasibility check.

Our proposed algorithm searches for a time-optimal trajectory by directly addressing the two major shortcomings of minimum-snap trajectory generation. It builds a realistic, probabilistic feasibility constraint model that considers the trajectory as a whole, and uses this model to find the optimal time allocation over trajectory segments.

2.2. Bayesian optimization

Bayesian optimization, or BayesOpt, is a class of algorithms that use machine learning techniques for solving optimization problems with black-box objective or constraint functions that are expensive to evaluate. Evaluation points are selected to model the unknown functions and approach the optimum with maximum efficiency, so that the total number of evaluations is kept to a minimum. Within the BayesOpt framework, GP modeling (Rasmussen and Williams, 2006) is widely used to build a surrogate model that approximates the unknown objective or constraint functions based on noisy measurements. Given data points $X = {x_{1}, \dots, x_{N}}$ with corresponding evaluations $y = {y_{1}, \dots, y_{N}}$ , GP modeling predicts the probability $P (y_{*} | y, x_{*}, X)$ for a test point $x_{*}$ . It assumes the joint distribution over the evaluations is a joint Gaussian

P (y, y_{*} | x_{*}, X) = N ([\begin{matrix} y \\ y_{*} \end{matrix}] | 0, [\begin{matrix} K (X, X) & K (X, x_{*}) \\ K (x_{*}, X) & K (x_{*}, x_{*}) \end{matrix}])

(9)

Then $P (y_{*} | y, x_{*}, X)$ can be obtained by marginalizing $y$ in (9). Each element of the covariance $K (X, X)$ is non-parametrically estimated by the corresponding data points and a covariance kernel $K_{ij} (X, X) = k (x_{i}, x_{j})$ with hyperparameters $θ$ . The hyperparameters are trained by maximizing the marginal likelihood $P (y | X)$ . As a non-parametric method, GP can flexibly model a functional distribution, even with a small number of data points. This functional approximation using GP is often called kriging or GP regression.

When modeling inequality constraints, Gaussian process classification (GPC) is often used instead of direct GP modeling. Rather than assuming a joint distribution over the evaluations, GPC assumes a joint Gaussian distribution over the evaluations and the latent variables $f = [\begin{matrix} f_{1}, \dots, f_{N} \end{matrix}]$ . The latent variables encode label probabilities for the evaluations, which can be obtained through a mapping onto the probability domain $[0, 1]$ , e.g. ,

Φ (x) = \int_{- \infty}^{x} N (s | 0, 1) ds

(10)

The latent variables and the hyperparameters of the kernel function are trained by maximizing the marginal likelihood function

\begin{matrix} P (y, f | X) & = Π_{i = n}^{N} P (y_{n} | f_{n}) P (f | X) \\ = Π_{n = 1}^{N} B (y_{n} | Φ (f_{n})) N (f | 0, K (X, X)) \end{matrix}

(11)

where $B (x)$ is the Bernoulli likelihood used to formulate $Φ (f_{n})$ as a probability distribution. The joint GP prior assumption given by (9) can now be used to estimate the distribution of $f_{*}$ , which is the latent variable corresponding to the class probability $y_{*}$ . The covariance between $X$ and a test point $x_{*}$ is modeled with the same covariance kernel, as follows

P (f, f_{*} | x_{*}, X) = N ([\begin{matrix} f \\ f_{*} \end{matrix}] | 0, [\begin{matrix} K (X, X) & K (X, x_{*}) \\ K (x_{*}, X) & K (x_{*}, x_{*}) \end{matrix}])

(12)

so that the distribution of the latent variable $f_{*}$ can be estimated as

P (f_{*} | x_{*}, X, y) = \int P (f_{*} | f, x_{*}, X) P (f | X, y) d f

(13)

The resulting class probability is obtained by

P (y_{*} | x_{*}, X, y) = \int P (y_{*} | f_{*}) P (f_{*} | x_{*}, X, y) d f_{*}

(14)

For more details on GPC and its implementation, the reader is referred to Nickisch and Rasmussen (2008).

The BayesOpt acquisition function is designed to select the next evaluation point by considering both reducing the uncertainty of the surrogate model and finding the precise optimum of the objective function. Based on the data $D$ obtained in previous evaluations, the surrogate model can be trained to approximate the unknown function. Next, the approximate optimal solution can be estimated using the trained surrogate model. Each next evaluation point is obtained by solving the following optimization problem:

x_{next} = \underset{x}{\arg \max} α (x | D)

(15)

where $α (x | D)$ is the acquisition function that represents the value of an evaluation point $x$ given the data $D$ . If the surrogate model approximates an unknown objective function, the acquisition function can for example be based on the expected improvement of the objective function (Mockus et al., 1978), the expected entropy reduction of the distribution over the location of the solution (Hernández-Lobato et al., 2014; Wang and Jegelka, 2017), or the upper bound of the optimum (Srinivas et al., 2012). If constraint functions are also modeled, the acquisition function must consider the uncertainty of both objective and constraint function models. For example, Gardner et al. (2014) and Gelbart (2015) used the product of expected objective improvement and probability of constraint satisfaction. Lindberg and Lee (2015) modified this approach to select data points more effciently utilizing the asymmetric design of the acquisition function. Hernández-Lobato et al. (2015) and Marco et al. (2019) incorporated constraint satisfaction into the estimation of the expected entropy reduction. In our proposed algorithm, Bayesian optimization is applied to model the feasibility constraints of the time-optimal trajectory planning problem. Related applications can be found in literature, where the acquisition function is designed considering the uncertainty of model feasibility functions and dynamics boundaries of control systems (Berkenkamp et al., 2016; Rai et al., 2019).

In multi-fidelity Bayesian optimization, function evaluations of different fidelities can be combined. For instance, a rough simulation or an expert’s opinion may serve as a low-fidelity model, while a high-accuracy simulation or real-world experiment serves as a high-fidelity model. The key idea is that combining cheap low-fidelity evaluations with expensive high-fidelity measurements improves overall efficiency. To incorporate information from multiple sources, the surrogate model must be modified to combine multi-fidelity evaluations. Kennedy and O’Hagan (2000) and Le Gratiet and Garnier (2014) used a linear transformation to describe the relationship between different fidelities. Suppose that we have $L$ fidelity levels, and each level is denoted by $l \in {l^{1}, l^{2}, \dots, l^{L}}$ , where $l^{1}$ is the level of the lowest-fidelity experiment and $l^{L}$ is the level of the highest-fidelity experiment. The relationship between adjacent fidelity levels $l^{j}$ and $l^{j - 1}$ can be modeled as

f_{l^{j}} (x) = ρ_{l^{j - 1}} f_{l^{j - 1}} (x) + δ_{l^{j}} (x)

(16)

where $f_{l^{j}} (x)$ and $f_{l^{j - 1}} (x)$ are the output distributions of $x$ for $l^{j}$ and for $l^{j - 1}$ , respectively. The bias distribution $δ_{l^{j}}$ is independent of $f_{l^{j - 1}}, \dots, f_{l^{1}}$ , and the constant $ρ_{l^{j - 1}}$ represents the correlation between the output distributions for the two adjacent fidelity levels. Perdikaris et al. (2017) and Cutajar et al. (2019) extended this method by including a more advanced non-linear space-dependent transformation. In addition to these auto-regressive models, other approaches for multi-fidelity optimization exist. For instance, Dribusch et al. (2010) utilized the decision boundary of a support vector machine (SVM) to reduce the search space of high-fidelity data points, and Xu et al. (2017) utilized the pairwise comparison of low-fidelity evaluations to determine the adversarial boundary of the high-fidelity model.

Similar to the surrogate model, the acquisition function has to be modified to incorporate multi-fidelity evaluations. In the multi-fidelity Bayesian optimization framework, the acquisition function not only selects the next evaluation point, but also its fidelity level, as follows:

x_{next}, l_{next} = \underset{x, l \in {l^{1}, \dots, l^{L}}}{\arg \max} α (x, l | D)

(17)

The acquisition function is modified by introducing weights based on the evaluation cost at the different fidelity levels. In practice, high-fidelity evaluations will have smaller weights compared with low-fidelity evaluations. This makes the algorithm less likely to select high-fidelity evaluations, so that the overall cost of the experiments is minimized. Common metrics used in acquisition functions are modified in this manner, e.g., weighted expected improvements (Huang et al., 2006), weighted expected entropy reductions (Takeno et al., 2019), and weighted upper bounds of the optimum (Kandasamy et al., 2016). Costabal et al. (2019) approached the multi-fidelity classification problem by differently weighting the uncertainty reduction of the classifiers at each fidelity level. As multi-fidelity Bayesian optimization has the potential to reduce the number of expensive high-fidelity experiments, it has been applied in various fields, including analog circuit design (Zhang et al., 2019), aircraft wing design (Rajnarayan, 2009), and control synthesis (Marco et al., 2017; Rai et al., 2019).

Although Bayesian optimization aims to contain the number of required function evaluations, it may still suffer from large computation cost as the number of data points increases. This is mainly due to inversion of a covariance matrix including all data points for uncertainty quantification in the surrogate function, leading to a computational cost proportional to the number of data points cubed. In particular, multi-fidelity Bayesian optimization may suffer from this issue, as the number of data points can quickly increase by adding low-fidelity evaluations. The problem is addressed by the inducing points method, which uses a set of pseudo-data points for uncertainty quantification (Hensman et al., 2013; Snelson and Ghahramani, 2006). The pseudo-data points are selected to minimize the Kullback–Leibler (KL) divergence between the function posterior given all data points (including the pseudo-data points), and the function posterior given only the pseudo-data points. As the number of pseudo-data points is much smaller than the number of actual data points, the inducing points method can greatly improve algorithm performance as the size of the dataset increases. The method was extended to the classification problem by Hensman et al. (2015), and applied to multi-fidelity Bayesian optimization by Cutajar et al. (2019). Gardner et al. (2018) further increased its efficiency with GPU acceleration.

3. Algorithm

Our proposed algorithm uses Bayesian optimization to efficiently minimize the total trajectory time $T$ by approximating the feasibility constraints of $P_{T}$ using multi-fidelity evaluations from various sources, such as simulation and real-world flight experiments. Unlike typical minimum-snap trajectory planning, the algorithm maintains the ideal planning formulation of (1) with minimum time as the objective and feasibility as a constraint. We exploit the fact that any time allocation over the trajectory segments can be mapped to a smooth trajectory that attains the given geometric constraints using (6). This enables us to transform the problem of finding the time-optimal trajectory to the problem of finding the optimal time allocation over segments. Therefore, we can formulate the following optimization problem on the finite-dimensional space $R_{\geq 0}^{m}$ that is the set of all possible time allocations:

\begin{matrix} \underset{T, x \in R_{\geq 0}^{m}}{minimize} & T \\ subject to & χ (x, \tilde{F}) \in P_{T} \\ T = \sum_{i = 1}^{m} x_{i} \end{matrix}

(18)

This formulation is based on the assumption that if there exists a feasible trajectory, i.e., a member of the set $P_{T}$ , that attains geometric constraints $\tilde{F}$ with time allocation $x$ , then $χ (x, \tilde{F}) \in P_{T}$ . This assumption is reasonable, because (4) optimizes for feasibility (by minimizing snap), so it is unlikely that its optimum is infeasible while there exists a feasible trajectory subject to the same time allocation and waypoints.

We use GPC to learn a surrogate model of the feasibility constraint in (18). The surrogate model combines results from sequential experiments at $L$ fidelity levels. An experiment $f_{l} (x)$ evaluates the feasibility of the trajectory $χ (x, \tilde{F})$ at fidelity level $l$ , resulting in a classification $y \in {feasible, infeasible}$ . Evaluations at fidelity level $l$ are gathered in dataset $D_{l} = {(x_{i}, y_{i})}_{i = 1, \dots, N_{l}}$ , where $N_{l}$ is the number of evaluations at fidelity level $l$ . The GPC probability model at fidelity level $l$ , i.e., $P_{l} (y = feasible | x)$ , is based on data points from fidelity levels $l$ and lower. The overall objective is to find the optimal time allocation that satisfies the feasibility constraint at the highest fidelity level with sufficient confidence, i.e., $P_{l^{L}} (y = feasible | x^{*}) \geq h$ .

Each evaluation is selected based on the acquisition function, which considers exploration and exploitation with the goal of maximizing the efficiency of the overall optimization process. Based on the current trained surrogate model, it estimates the expected improvements in objective function value and model accuracy. By iteratively improving the feasibility model and minimizing the objective function, our algorithm searches for the time allocation that minimizes total trajectory time. An overview of the complete algorithm is given in Algorithm 1, and its major components are detailed in ensuing sections.

Algorithm 1: Multi-fidelity black-box optimization for time-optimal trajectory planning
Input: Multi-fidelity experiments $f_{l}$ , $l = l^{1}, \dots, l^{L}$ ;
acquisition function $α$
1 Find initial solution by solving (8)
2 Initialize multi-fidelity dataset $D_{l}, \forall l \in {l^{1}, \dots, l^{L}}$
3 For $i = 1, \dots$ do
4 Build the surrogate model $M_{l}, \forall l \in {l^{1}, \dots, l^{L}}$
5 Generate candidate solutions $X$
6 $x_{i}, l_{i} \leftarrow {\arg \max}_{x \in X, l \in {l^{1}, \dots, l^{L}}} α (x, l \| D)$
7 $y_{i} \leftarrow f_{l_{i}} (x_{i})$
8 $D_{l_{i}} \leftarrow D_{l_{i}} \cup {(x_{i}, y_{i})}$
9 end
10 $T^{} \leftarrow \sum_{i = 1}^{m} x_{i}^{}$
11 $p^{} \leftarrow χ (x^{}, \tilde{F})$ Output: $T^{}$ , $p^{}$

3.1. Multi-fidelity constraint model

We define the GPC-based surrogate model $M$ as the set of latent variables $f = [\begin{matrix} f_{1}, \dots, f_{N} \end{matrix}]$ , the hyperparameters of covariance matrix $θ$ , and the hyperparameters of the inducing points $m$ and $S$ , such that $M = (f, θ, m, S)$ . The inducing points method is used to accelerate the optimization process based on the work by Hensman et al. (2015). This enables us to avoid the $O (N^{3})$ , with $N$ the number of data points, computational complexity otherwise associated with the calculation of the inverse covariance matrix, which is used for estimating the distribution of the latent variables, i.e., $P (f | X, y)$ . By introducing an additional variational distribution $q (u) = N (m, S)$ , the inducing point method approximates

P (f | X, y) ~ q (f) : = \int p (f | u) q (u) d u,

(19)

where the hyperparameters $m$ and $S$ represent the mean and covariance of the inducing points, respectively. The method uses the following variational lower bound as loss function to determine the latent variables and the hyperparameters:

L = \sum_{n = 1}^{N} E_{q (f_{n})} [\log p (y_{n} | f_{n})] - D_{KL} [q (u) ∥ p (u)]

(20)

In order to utilize the multi-fidelity optimization technique, we update the definition of the surrogate model and define the $l^{j}$ -fidelity surrogate model ( $j = 1, \dots, L$ ) as

M_{l^{j}} = {f^{l^{j}}, [θ^{l^{1}}, \dots, θ^{l^{j}}], m^{l^{j}}, S^{l^{j}}}

(21)

We use the multi-fidelity deep Gaussian process (MFDGP) by Cutajar et al. (2019) as the covariance kernel function to estimate the GP prior from multiple evaluation sources. The multi-fidelity GP has separate latent variables and inducing points $f^{l^{j}}, u^{l^{j}} ~ N (m^{l^{j}}, S^{l^{j}})$ for each fidelity $l^{j}$ . Thus, we define the surrogate model separately for each fidelity model. However, fidelity models share the hyperparameters of the covariance kernels. Specifically, the MFDGP models the relationship between adjacent fidelities with another GP as

\begin{matrix} k_{l^{j}} (x, x') = & k_{l^{j}, corr} (x, x') (σ_{l^{j}, linear}^{2} f_{l^{j - 1}} {(x)}^{T} f_{l^{j - 1}} (x') \\ + k_{l^{j}, prev} (f_{l^{j - 1}} (x), f_{l^{j - 1}} (x'))) + k_{l^{j}, bias} (x, x') \end{matrix}

(22)

where $f_{l^{j - 1}}$ is the GP estimation from the preceding fidelity level, $σ_{l^{j}, linear}$ is a constant scaling the linear covariance kernel, and $k_{l^{j}, prev}$ , $k_{l^{j}, corr}$ , and $k_{l^{j}, bias}$ represent the covariance with the preceding fidelity, the space-dependent correlation function and the bias function, respectively. Radial basis functions are used for these kernels. Their hyperparameters are trained by maximizing the following variational lower bound:

\begin{matrix} L = \sum_{j = 1}^{L} (\sum_{n = 1}^{N_{l^{j}}} E_{q_{l^{j}} (f_{n}^{l^{j}})} [\log p (y_{n}^{l^{j}} | f_{n}^{l^{j}})] \\ - D_{KL} [q_{l^{j}} (u^{l^{j}}) ∥ p (u^{l^{j}})]) \end{matrix}

(23)

where $N_{l^{j}}$ denotes the number of data points in the $l^{j}$ -fidelity dataset $D_{l^{j}}$ , and $q_{l^{j}} (f_{n}^{l^{j}}) : = \int p (f_{n}^{l^{j}} | u^{l^{j}}) q (u^{l^{j}}) d u^{l^{j}}$ , which is the same formulation as used in (20) for the hyperparameters of the $l^{j}$ -fidelity model. Note that inferencing of data points at $l^{j}$ -fidelity requires estimation of the covariance kernel at $l^{j}$ and lower fidelities (cf. (21)), whereas the $l^{j}$ -fidelity covariance kernel is trained based on the all data points at $l^{j}$ and higher fidelities. We follow the implementation of the inducing points method by Cutajar et al. (2019), but utilize the likelihood approximation by Hensman et al. (2015) to extend the multi-fidelity kernel to the classification problem. The overall GPC modeling is developed in the GPytorch framework with GPU acceleration (Gardner et al., 2018).

3.2. Acquisition function

We design the acquisition function considering both exploration to improve the surrogate model, and exploitation to find the optimal solution. In exploration, we aim to maximize the effectiveness of improving the surrogate model. Inspired by Costabal et al. (2019), we select the most uncertain sample near the decision boundary of the classifier. Since the latent function mean approaches zero at the decision boundary, this sample is found as the maximizer of

α_{explore} (x, l) = - \frac{| μ_{l} (x) |}{σ_{l} (x)} C_{l}

(24)

where $C_{l}$ reflects the cost of an evaluation at fidelity level $l$ . We note that, in practice, the feasible set is connected (feasibility of a time allocation implies feasibility of any time allocation that is element-wise larger), so that exploration on the boundary suffices. In exploitation, we utilize expected improvement with constraints (EIC) to quantify the expected effectiveness of a candidate data point based on the product of expected objective improvement and the probability of feasibility (Gardner et al., 2014). We modify EIC for use in an experimental robotics application, where the cost of evaluation depends on the outcome, as attempting infeasible experiments may pose danger to the vehicle and its surroundings. We consider not only the probability of success, but also the corresponding variance, as follows:

{\tilde{P}}_{l} (y = 1 | x) = P_{l} (μ_{l} (x) - β σ_{l} (x) \geq 0 | x)

(25)

where $β$ is a penalty on variance, and $μ_{l}$ and $σ_{l}$ are the mean and standard deviation of the posterior distribution $P (f | x, D)$ , respectively, obtained from (13). The resulting acquisition function is given by

α_{exploit} (x, l) = {\begin{matrix} α_{EI} (x) {\tilde{P}}_{l} (y = 1 | x), & if {\tilde{P}}_{l} (y = 1 | x) \geq h_{l} \\ 0, & otherwise \end{matrix}

(26)

where $h_{l}$ is based on the cost of an infeasible evaluation. As the objective function is deterministic, so is the expected improvement, i.e., $α_{EI} (x) = \sum_{i} {\bar{x}}_{i} - \sum_{i} x_{i}$ with $\bar{x}$ the current best solution.

Finally, we combine (24) and (26) to obtain

\begin{matrix} α (x, l) = \\ {\begin{matrix} α_{exploit} (x, l), & if \exists x \in X subject to α_{exploit} (x, l) > 0 \\ α_{explore} (x, l), & otherwise \end{matrix} \end{matrix}

(27)

Note that exploration only occurs if exploitation is ineffective. We found that this greedy approach works well in practice, as overly explorative searching leads to a large number of infeasible evaluations.

In case of a discontinuous acquisition function such as (27), Latin hypercube sampling (LHS) is often used to generate candidate solutions of the acquisition function (Srinivas et al., 2012). However, we observe that this method may fail to generate an optimal solution because it does not consider the correlation between adjacent time allocations. We address this by devising a sampling-based algorithm that uses a smooth perturbation of the current best solution. For this purpose, we extend the notion of goal-convergent exploration by Kalakrishnan et al. (2011). This method generates smooth trajectories by minimizing the expected sum of acceleration at each collocation point. Instead of acceleration, we minimize the expected jerk. As time allocation is inversely related to flight speed, this approximately minimizes the jerk of the speed profile, and thereby the snap of the perturbation, which is favorable for maintaining feasibility. The perturbation vector $ϵ$ is found by solving the following semi-definite program:

\begin{matrix} \underset{Σ \in R^{m \times m}}{minimize} & trace (A^{T} A Σ) (= E [ϵ A^{T} A ϵ]) \\ subject to & ϵ ~ N (0, Σ) \\ Σ ⪰ 0, Σ_{ii} = γ \end{matrix}

(28)

where $γ > 0$ scales the variance of perturbations and $A$ is the third-order finite differencing matrix. Finally, the set of $N_{c}$ candidate solutions is generated as $X = {\bar{x} ⊙ (1 + ϵ_{i})}_{i = 1, \dots, N_{c}}$ with $ϵ_{i} ~ N (0, Σ)$ and ⊙ the element-wise product. Candidates with negative elements are rejected.

3.3. Initialization

Initial data points to build the surrogate model are generated around the trajectory found by (8), which may differ between fidelity levels, as feasibility constraints differ. At fidelity levels with low evaluation cost, data point generation is done using LHS. At fidelity levels where this method imposes prohibitive evaluation cost, we use the fact that the initial trajectory is on the feasibility boundary. As such, time allocations with the same ratio but shorter total time are infeasible, while time allocations with the same ratio but larger total time are feasible. This enables generation of data points without any additional evaluation cost by scaling the initial trajectory.

The initial trajectory is also used for the normalization of data points. This is required because feasibility constraints differ between the different levels of fidelity, leading to bias in the dataset. Moreover, time allocations between trajectory segments may be at different scales of magnitude, which may make the training process numerically unstable. To resolve these issues, time allocation for each trajectory segment is scaled with the time allocation of the corresponding segment in the initial trajectory at the same fidelity level.

3.4. Geometric constraints

We consider two formulations to incorporate geometric constraints in the trajectory optimization problem: waypoint constraints and polytope constraints. In many applications, candidate trajectories are constrained to attain a pre-defined sequence of waypoints. By using a piecewise polynomial trajectory representation, the optimization can be formulated as an unconstrained quadratic program based on (4) and (2). In other scenarios, particularly when planning in obstacle-rich environments, waypoints may become an ineffective method to avoid collisions. Convex polytopes can instead be used to constrain the trajectory to the obstacle-free space without imposing waypoint restrictions. Each polynomial trajectory segment is now constrained to a polytope, leading to the constrained quadratic program (4) subject to (3). As described in Section 2, the function (6) that maps time allocation and geometric constraints to the corresponding minimum-snap trajectory can be formulated based on either waypoint constraints or polytope constraints. Hence, our proposed algorithm for multi-fidelity Bayesian optimization can be applied to either case without any modification.

In order to fully demonstrate the compatibility of our algorithm with polytope constraints, the remainder of this section details a procedure to construct such constraints. We focus on planning in two-dimensional obstacle-rich environments, and use the convex decomposition method proposed by Keil (1985) in combination with Dijkstra’s algorithm (Dijkstra, 1959). Major steps of the resulting algorithm to generate the geometric constraints are shown in Figure 2.

Fig. 2.

Algorithm to generate collision-free trajectories. (a) Initial environment with obstacles. (b) Transform into a simple polygon. (c) Decompose free space into polygons. (d) Generate the face graph. (e) Find the sequence of polygons. (f) Apply multi-fidelity optimization.

The free space of an environment, e.g., as shown in Figure 2(a), can be modeled as a polygon with inner holes. However, its decomposition into a set of convex polygons would be NP-hard (Keil, 1985). Thus, we add bridges to transform the free space into a simple polygon, i.e., a polygon without intersections or inner holes (Figure 3). To place these bridges, we model the border of the environment and the obstacles as polygons $q_{1}, \dots, q_{n}$ , and we generate a complete graph with corresponding vertices $w_{1}, \dots, w_{n}$ . Each edge in the graph is weighted according to the minimum Euclidean distance between the corresponding obstacles. If an edge is included in the minimum spanning tree, a bridge is placed on the shortest path that connects the corresponding polygons. As the bridges are added on the shortest line between the polygons, they will not intersect polygon faces or other bridges. Figure 2(b) shows the resulting simple polygon.

Fig. 3.

Polygon graph to convert free space into a simple polygon.

We obtain a convex decomposition of the resulting simple polygon using the method by Keil (1985). The pseudocode in Algorithm 2 represents the procedure, and Figure 2(c) shows the resulting set of convex polygons. Given trajectory start and end positions, a connecting sequence of polygons can be found by various methods. In some scenarios, such as drone racing on a known track, the sequence of polygons can be selected manually or based on a guiding trajectory. If this is not possible, one may find the shortest path from start to end using a planning algorithm in geometrical space and select the polygon sequence through which the resulting path passes. For our proposed algorithm, we first generate a graph connecting the center point of each polygon and its faces, as shown in Figure 2(d). The shortest path that connects the start point and end point can then be found efficiently through combinatorial planning using Dijkstra’s algorithm (Dijkstra, 1959). Finally, the sequence of polygons on the shortest path is selected to construct the geometric constraints, as shown in Figure 2(e).

Algorithm 2: Convex polygonal decomposition
Input: A simple polygon $P$ with vertices $v_{1}, \dots, v_{n}$
Output: $E_{d}$
1 Definition:
2 Notch vertex: vertex with a reflex interior angle
3 Visibility pair $(v_{i}, v_{j})$ : set of vertices $(v_{i}, v_{j})$ such that $edge (v_{i}, v_{j})$ does not intersect with other faces
4 Function: Decomp ( P ):
5 $E_{d} = \emptyset$
6 $num_edges = \infty$
7 forNotch vertex $v_{i}$ do
8 for $v_{j}$ ∈ P where $(v_{i}, v_{j})$ is visibility pair do
9 $P_{i, j}$ = subpolygon with $v_{i}, v_{i + 1}, \dots, v_{j}$
10 $P_{j, i}$ = subpolygon with $v_{j}, v_{j + 1}, \dots, v_{i}$
11 $E_{t}$ = Decomp $P_{i, j}$ ∪Decomp $P_{j, i}$
12 If $\| E_{t} \| < num_edges$ then
13 $num_edges = \| E_{t} \|$
14 $\| E_{d} \| = E_{t} \cup {edge (v_{i}, v_{j})}$
15 end
16 end
17 end
18 return $E_{d}$

Let $Q_{i} = {p_{i, 0}, n_{i, 0}, \dots, p_{i, d_{i} - 1}, n_{i, d_{i} - 1}}$ denote the $i$ th polygon, where $p_{i, j}$ and $n_{i, j}$ are the adjacent vertex and the normal vector of the $j$ th face, respectively, and $d_{i}$ is the number of faces. The corresponding polynomial trajectory segment is constrained to be within the convex polygon by the following linear constraints:

\begin{matrix} A_{i} & = {[n_{i, 0} \dots n_{i, d_{i} - 1}]}^{T} \\ b_{i} & = {[n_{i, 0}^{T} p_{i, 0} \dots n_{i, d_{i} - 1}^{T} p_{i, 0}]}^{T} \\ A_{i} & p (t) \leq b_{i}, \forall t \in [\sum_{j = 1}^{i - 1} x_{j}, \sum_{j = 1}^{i} x_{j}] \end{matrix}

(29)

For each polygonal cell, such a set of corresponding linear constraints is incorporated into (4).

3.5. Levels of fidelity

Evaluations at three different fidelity levels are used for quadrotor trajectory planning. Low-fidelity evaluations are based on differential flatness of the quadrotor dynamics, which enables us to transform a trajectory and its time derivatives from the output space, i.e., position and yaw angle with derivatives, to the state and control input space, i.e., position, velocity, attitude, angular rate, and motor speeds (Mellinger and Kumar, 2011). The resulting reference control input trajectory $u (t) = ζ (p, t)$ would enable a hypothetical perfect quadcopter to track the trajectory $p$ . Feasibility of the control input values can be evaluated at relatively small computational cost, and as such can serve as a cheap, cursory evaluation of trajectory feasibility. The set of feasible trajectories at fidelity level $l^{1}$ is defined as

P_{T}^{l^{1}} = {p | ζ (p, t) \in [\underline{u}, \bar{u}]^{4} for all t \in [0, T]},

(30)

where $\underline{u}$ is the minimum and $\bar{u}$ is the maximum motor speed.

Medium-fidelity evaluations are obtained using the open-source multicopter dynamics and inertial measurement simulation by Guerra et al. (2019) with the trajectory tracking controller by Tal and Karaman (2021). The feasible set is defined as

P_{T}^{l^{2}} = {p | {‖ p_{r} (t) - r (t) ‖}_{2} \leq \bar{r} \land | p_{ψ} (t) - ψ (t) | \leq \bar{ψ} for all t \in [0, T]}

(31)

where $\bar{r}$ is the maximum allowable Euclidean position tracking error, and $\bar{ψ}$ is the maximum allowable yaw tracking error. When planning in close proximity to obstacles, the vehicle may collide with an obstacle, even when tracking the reference trajectory within the bounds defined by (31). Therefore, we also exclude any trajectories that result in a collision from the feasible set. At this fidelity level, stochastic measurement and actuation noise, motor dynamics, and simplified aerodynamic effects are incorporated. We note that these factors are typically unfavorable, leading to the reduction of the feasible set compared with low fidelity. At the same time, the controller may be able to perform adequate tracking even if reference control inputs are infeasible, so that neither feasible set is a subset of the other. In order to account for stochastic effects, each evaluation consists of multiple simulations that all need to succeed for a trajectory to be deemed feasible.

High-fidelity evaluations are obtained from real-world experiments using a quadcopter and motion capture system. At this fidelity level, each evaluation incorporates dynamics of the full system, including actuation and sensor systems, vehicle vibrations, unsteady aerodynamics, battery performance, and estimation and control algorithms. This provides a highly accurate assessment of feasibility, but comes at great cost as it involves actual flying hardware and cannot be executed any faster than real-time. We use the same controller as in simulation, and again perform multiple flights to account for stochastic effects. The feasible set is identical to (31), and any trajectories resulting in a collision are again deemed infeasible. The overall objective of the algorithm is to find the time-optimal trajectory, subject to feasibility at the highest-fidelity model included.

4. Experimental results

We present experimental results that demonstrate the operation and performance of our proposed algorithm. First, we focus on a simple two-segment trajectory. Through visualization of the corresponding two-dimensional solution space, insights in the algorithm operation, particularly its acquisition function, are provided. Next, we present an extensive experimental evaluation of the proposed algorithm in two scenarios. For the first scenario, we apply the algorithm to obtain time-optimal trajectories based on a predetermined sequence of waypoints through an environment without any obstacles. For the second scenario, we define only start and end points and consider environments that include obstacles. We apply the proposed algorithm including polygonal decomposition to find a time-optimal trajectory that avoids the obstacles. In both scenarios, we evaluate the proposed algorithm in a simulation environment, and in a hybrid environment including simulation and real-world experiments.

4.1. Two-segment trajectory

In order to illustrate the operation of our proposed algorithm, we present results for the simple two-segment trajectory shown in Figure 4(a). The yaw rate, velocity, acceleration, and jerk are constrained to zero at the first waypoint. Results were obtained in the simulation environment, where we incorporate low-fidelity evaluations using reference control input feasibility, and medium-fidelity evaluations using the multicopter simulation. For the medium-fidelity evaluations, we set the maximum Euclidean position tracking error to 20 cm and the maximum yaw tracking error to 15°. To reflect the difference in evaluation cost between evaluations, we set the parameters of the acquisition functions (24) and (26) as $C_{l^{1}} = 1$ and $h_{l^{1}} = 0.1$ corresponding to the low-fidelity evaluations, and $C_{l^{2}} = 10$ and $h_{l^{2}} = 0.4$ corresponding to the medium-fidelity evaluations, and $β = 3.0$ .

Fig. 4.

Waypoints and feasibility maps for two-segment trajectory optimization in the simulation environment. (a) Trajectory with waypoints. (b) Initial medium-fidelity feasibility. (c) Final medium-fidelity feasibility.

As shown in Figure 4(b), the low-fidelity dataset is initialized using LHS of 400 data points, and the medium-fidelity dataset is initialized using 20 evenly scaled data points. Given the low dimension of the two-segment time allocation, third-order finite differencing as required by (28) cannot be applied. Instead, we resort to LHS for generation of candidate solutions for the acquisition function. We run the algorithm 20 times for 50 iterations using different random seeds. Each medium-fidelity evaluation along with preceding low-fidelity evaluations is considered a single iteration, and the number of low-fidelity evaluations per iteration is limited to 20. This limit is imposed to prevent the acquisition function from selecting too many successive low-fidelity evaluations, which may otherwise occur at iterations where the selected acquisition parameters do not behave well with the state of the medium-fidelity surrogate model.

Figure 4(b) and (c) show the initial and the final medium-fidelity feasibility maps for one of the runs, respectively. Note that all medium-fidelity evaluations are in proximity of the final solution, as this promising region was first found by low-fidelity evaluations. To evaluate our algorithm, we compare the total trajectory time of the best solution at each iteration with the trajectory time obtained by (8) using the same feasibility evaluation constraints. Figure 5 shows a flight time improvement of approximately 2% compared with the initial trajectory. In addition, we compare the trajectory smoothness given by (5). For this comparison, we scale the total trajectory time to the value obtained by (8), while maintaining the allocation ratio found by our algorithm. Thereby, the comparison is based solely on the difference in allocation ratio, and not affected by total trajectory time. Figure 5 shows that our algorithm selects time allocation ratios that result in increased snap and yaw acceleration at the same total trajectory time. Despite this, the corresponding trajectories are still feasible at total trajectory times smaller than the initial trajectories. This shows that our algorithm is able to find an allocation ratio that compares favorably to that found by (7) and (8), and is thereby able to find feasible trajectories with smaller total trajectory time. Indeed, we can conclude that plain snap minimization does not result in the time-optimal trajectory. Our approach based on modeling of realistic feasibility constraints converges to a quicker feasible trajectory with increased snap. Although the trajectory duration is only improved by a modest 2%, this does illustrate and validate the rationale behind our proposed multi-fidelity trajectory optimization algorithm. In Sections 4.2 and 4.3, we show that larger improvements can be obtained for more complicated multi-segment trajectories.

Fig. 5.

Mean and standard deviation of relative trajectory time and smoothness for two-segment trajectory, obtained over 20 random seeds in the simulation environment.

4.2. Waypoint trajectories

Next, we apply our algorithm to the more complicated multi-segment trajectories shown in Figure 6. For each trajectory, yaw rate, velocity, acceleration, and jerk are constrained to zero at the first and final waypoints. The trajectories range from 9 to 19 segments. Owing to the corresponding increase in dimensionality of the solution space, the minimum-jerk perturbation given by (28) can now be used to generate the candidate solution set. The scaling factor $γ$ is set to 0.2.

Fig. 6.

Multi-segment trajectories with starting point and subsequent waypoints indicated by green and red arrows, respectively.

We present results for all eight multi-segment trajectories using the simulation environment, which incorporates low-fidelity evaluations using reference control input feasibility and medium-fidelity evaluations using the multicopter simulation. In addition, we apply our algorithm to a subset of three trajectories in the hybrid environment, which incorporates medium-fidelity evaluations using the multicopter simulation and high-fidelity evaluations using real-world flight experiments. We chose not to incorporate low-fidelity evaluations in the hybrid environment, because their evaluation time is relatively very similar to medium-fidelity evaluations when compared with the time-of-flight experiments. Consequently, they provide less-valuable data points at similar cost. In both environments, we set the parameters of the acquisition functions (24) and (26) as $C_{l^{j}} = 1$ , $C_{l^{j + 1}} = 10$ , $h_{l^{j}} = 0.1$ , $h_{l^{j + 1}} = 0.4$ , and $β = 3.0$ where $l^{j}$ is the lowest fidelity in the environment. We set the maximum Euclidean position tracking error to 20 cm and the maximum yaw tracking error to 15° for all medium- and high-fidelity evaluations.

For the experiments in the simulation environment, we initialize the surrogate model using 1,000 low-fidelity samples. The maximum amount of low-fidelity evaluations per iteration is set to 50. The results in Figure 7 show that the algorithm is able to significantly reduce the trajectory time for each of the eight evaluated trajectories. The largest improvement is obtained for Trajectory 3, where the flight time is reduced by 22% on average. The optimized trajectory is feasible despite having almost four times larger snap and yaw acceleration, which shows our algorithm’s capability to find faster feasible trajectories. This is achieved by changing the time allocation ratio between trajectory segments, as shown in Figure 8. Figure 9 shows that the resulting trajectories have an increased speed on most segments, but also slow down on some segments. Speed decreases often occur during or just prior to tight turns and help maintain tracking feasibility as the speed over preceding and subsequent segments is increased. This interaction between trajectory segments is not captured by the low-fidelity control input evaluations, and demonstrates how using multi-fidelity modeling towards a holistic approach to optimization results in faster feasible trajectories.

Fig. 7.

Mean and standard deviation of relative trajectory time and smoothness for multi-segment trajectories, obtained over 20 random seeds in the simulation environment.

Fig. 8.

Average relative time allocation of initial and optimized multi-segment trajectories, obtained over 20 random seeds in the simulation environment.

Fig. 9.

Speed profiles of initial and optimized multi-segment trajectories, obtained over 20 random seeds in the simulation environment. Shading indicates standard deviation.

In the hybrid environment, we optimize three of the waypoint trajectories shown in Figure 6 based on medium-fidelity evaluations using the multicopter simulation and high-fidelity evaluations using real-world quadrotor flight experiments. The medium-fidelity dataset is initialized using LHS and the high-fidelity dataset using the scaling method. At each optimization step, the acquisition function selects the type of evaluation ( i.e., simulation or real-world experiment) and updates the feasibility model based on the evaluation result. The corresponding computation time varies between 5 and 10 seconds, depending on the convergence time of the GP feasibility model. The time spent on computation is relatively insignificant in the hybrid environment, because real-world flight experiments take several minutes on average when including steps such as battery replacement. Comparison of Figure 7 with Figure 10 shows that the obtained flight time improvements are similar to those from the simulation environment. However, Figures 11 and 12 show time allocations different from those obtained in simulation (cf. Figures 8 and 9). In fact, the time allocation ratios obtained in the simulation environment may not be feasible in flight with the real-world vehicle, as it may behave differently than the medium-fidelity simulation. This underlines the importance of incorporating real-world flight experiments in the trajectory optimization. At the same time, the inclusion of medium-fidelity simulation evaluations is essential to find effective samples for high-fidelity evaluations, which enables the algorithm to lower the number of required real-world flights. Without this feature, optimization of the trajectories in Figure 11 would be infeasible, as it amounts to search over a nine- or ten-dimensional solution space.

Fig. 10.

Mean and standard deviation of relative trajectory time and smoothness for multi-segment trajectories, obtained over five random seeds in the hybrid environment using simulation and real-world flights.

Fig. 11.

Average relative time allocation of initial and optimized multi-segment trajectories, obtained over five random seeds in the hybrid environment using simulation and real-world flights.

Fig. 12.

Speed profiles of initial and optimized multi-segment trajectories, obtained over five random seeds in the hybrid environment using simulation and real-world flights. Shading indicates standard deviation.

The optimized trajectories in the hybrid environment show similar characteristics as in the simulated environment. The time allocation is reduced for most segments, but also increased for some in order to maintain feasibility. It can be seen that the vehicle decelerates before the turns and accelerates through them. During the experiments, we observed that entering turns with reduced speed stabilizes tracking on the remainder of the trajectory, which can eventually reduce the overall flight time. We also found that our vehicle is generally able to stabilize to static hover very quickly, allowing it to finish the trajectory quite aggressively.

4.3. Polytope trajectories

In this section, our proposed algorithm including convex decomposition is applied to the planning of trajectories between start and end points in environments with obstacles, as shown in Figure 13. Velocity, acceleration, and jerk are constrained to zero at the start and end points, and yaw and yaw rate are constrained to zero for the whole trajectory. We focus on planning in environments with two-dimensional polygonal obstacles, and constrain the height to be constant throughout each trajectory. The acquisition function hyperparameters and the method for generating the initial candidate solution set are identical to those described in Section 4.2.

Fig. 13.

Initial and average optimized polytope trajectories, obtained over 20 random seeds in the simulation environment. Shading indicates standard deviation.

We first verify the algorithm for constructing polytope constraints by comparing the minimum-snap segment time allocations obtained from (7) for waypoint and polytope constraints. The waypoint sequences are obtained using straight-line RRT*, as proposed by Richter et al. (2016). For both types of constraints, the total trajectory times are scaled down according to (8) based on the medium-fidelity simulation with the maximum Euclidean position tracking error set to 20 cm and the maximum yaw tracking error set to 15°. The resulting polytope and waypoint trajectories are shown in blue in Figures 13 and 14, respectively. The corresponding flight times are given in the first two columns of Table 1. It can be seen that the polytope trajectories are all faster than the waypoint trajectories. This is not surprising, as the waypoints represent stricter constraints on the trajectory geometry. As they are set without considering any vehicle model, they may inhibit finding time-optimal trajectories when considering a dynamics model. Hence, polytope constraints are particularly advantageous to our application. In addition, we note that obstacle avoidance is imposed throughout the reference trajectory by using polytope constraints, whereas this is only the case at the actual waypoints for waypoint constraints. Consequently, it may be necessary to add additional waypoints for collision avoidance, which further inhibits trajectory optimization.

Fig. 14.

Trajectories based on waypoints from straight-line RRT*.

Table 1.

Comparison of flight times for trajectories obtained through minimum-snap planning with waypoint and polytope constraints, and our proposed algorithm for multi-fidelity Bayesian optimization (MFBO). The final column lists mean and standard deviation obtained over 20 random seeds in the simulation environment.

	Min-snap	Min-snap	MFBO
Trajectory	(waypoint)	(polytope)	(ours)
9	3.302 s	3.116 s	2.717 ± 0.0954 s
10	4.490 s	3.743 s	3.538 ± 0.104 s
11	5.637 s	5.110 s	4.704 ± 0.0706 s
12	5.637 s	3.364 s	3.028 ± 0.0760 s

Our proposed algorithm for multi-fidelity Bayesian optimization is used to optimize the time allocation ratio for each trajectory in Figure 13 using the simulation environment, which incorporates low-fidelity evaluations based on reference control input feasibility and medium-fidelity evaluations based on the multicopter simulation. By comparing the blue and red lines, it can be seen that the leeway provided by the polytope constraints is exploited to subtly adapt the trajectory geometry. The resulting decreased flight times are given in the final column of Table 1. Figures 15 and 16 show the corresponding change in speed profile and time allocation. Our proposed algorithm is able to reduce flight times by adjusting the time allocation such that the reference trajectory is further away from the obstacles. In practice, this means that its feasibility is less sensitive to tracking error. Hence, the tracking error bounds can be exploited to a larger degree, before the trajectory becomes infeasible due to an obstacle violation, resulting in faster trajectories.

Fig. 15.

Speed profiles and average relative time allocation of initial and optimized polytope trajectories, obtained over 20 random seeds in the simulation environment. Shading indicates standard deviation.

Fig. 16.

Mean and standard deviation of relative trajectory time and smoothness for polytope trajectories, obtained over 20 random seeds in the simulation environment.

In the hybrid environment, we optimize one of the polytope trajectories by incorporating medium-fidelity evaluations based on the multicopter simulation and high-fidelity evaluations based on real-world flight experiments. The real-world flights were performed in the space shown in Figure 17. Figure 18 shows that the trajectory geometry was significantly adapted during the Bayesian optimization process. The resulting trajectory is further away from the first two obstacles. In addition, the speed is decreased in proximity of the first obstacle, as can be seen in Figure 19. Subsequent trajectory segments can now be performed at significantly increased speed, resulting in an average flight time reduction of over 20%, as shown in Figure 20.

Fig. 17.

Environment for the real-world flight experiments for Trajectory 9 (cf. Figure 18).

Fig. 18.

Initial and average optimized polytope trajectories, obtained over five random seeds in the hybrid environment using simulation and real-world flights. Shading indicates standard deviation.

Fig. 19.

Speed profiles and average relative time allocation of initial and optimized polytope trajectory, obtained over five random seeds in the hybrid environment using simulation and real-world flights. Shading indicates standard deviation.

Fig. 20.

Mean and standard deviation of relative trajectory time and smoothness for polytope trajectory, obtained over five random seeds in the hybrid environment using simulation and real-world flights.

5. Conclusion

We have presented an algorithm for the generation of dynamically feasible time-optimal quadrotor trajectories based on multi-fidelity Bayesian optimization. The vehicle feasibility constraints have been efficiently modeled by incorporating evaluations from analytical approximation, numerical simulation, and real-world experiments in multi-fidelity GPC. We have conducted extensive evaluations through simulation and real-life experiments for both waypoint-constrained and polytope-constrained trajectories. It has been found that the algorithm is able to generate feasible trajectories that are significantly faster than those obtained from existing minimum-snap trajectory optimization algorithms. This demonstrates the capability of our algorithm to efficiently model the feasibility boundary and ultimately plan faster trajectories by incorporating multi-fidelity data, including flight experiments.

The current work has several limitations that provide interesting avenues for future research. With regards to planning in environments with obstacles, the algorithm for decomposition of the free space into convex polytopes is limited to two-dimensional environments. Applicability may be extended to three-dimensional environments by utilizing three-dimensional convex decomposition algorithms studied in the field of computer graphics (Attene et al., 2008). Once the set of three-dimensional convex polyhedrons is obtained, the face graph can be generated as in Figure 2(d), and the remaining steps of the proposed algorithm can be applied without modification to optimize the three-dimensional trajectory.

Although our proposed algorithm generates a time-optimal trajectory for a prescribed sequence of polytope constraints, the resulting trajectory may still be sub-optimal due to the application of Dijkstra’s algorithm. As the algorithm does not account for the real flight-time for each polytope cell during construction of the constraints, there may exist an alternative sequence of polytope constraints that ultimately results in a quicker trajectory. We expect that this issue may be addressed by utilizing more precise heuristics for path planning on the face graph, or simply by selecting several sequences of polytope constraints and comparing the corresponding time-optimal trajectories.

The current algorithm is applied to a given set of fixed waypoints or obstacles, and thus needs to be trained from the beginning for different waypoint or obstacle positions. Analysis of the obtained feasibility constraint models may provide insights in general properties of time-optimal trajectories, which may inform the design of planning algorithms. In this respect, we note that the obtained time-optimal trajectories share similar properties, such as deceleration prior to turns in proximity of obstacles. Constructing a general feasibility model based on these shared properties may enable trajectory optimization subject to arbitrary waypoint and obstacle configurations without the need for additional training.

The proposed algorithm can be generalized to motion planning problems subject to general dynamics equations by modeling the corresponding feasibility boundary as a classification problem. To be specific, if there exists a finite-dimensional parameterization of trajectories, the proposed multi-fidelity Bayesian optimization framework can be applied to generate time-optimal trajectories. For instance, when generating robotic manipulator trajectories through minimum-jerk optimization, the time allocation over trajectory segments can be optimized with the proposed algorithm. In addition, the same framework may be applied to optimize trajectories subject to binary constraints other than dynamic feasibility. For example, when planning for a mobile robot with perception system, perception error constraints may be formulated as a binary classification and incorporated in multi-fidelity Bayesian optimization.

Footnotes

Appendix. Index to multimedia extensions

Archives of IJRR multimedia extensions published prior to 2014 can be found at http://www.ijrr.org, after 2014 all videos are available on the IJRR YouTube channel at http://www.youtube.com/user/ijrrmultimedia

SAGE-Journals-Accessible-Video-Player 10.1177/02783649211033317.M1 sj-vid-1-ijr-10.1177_02783649211033317 Extension 1

Video of the experiments

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partly funded by the ONR (grant number N000141712670).

References

Attene

Mortara

Spagnuolo

Falcidieno

(2008) Hierarchical convex approximation of 3D shapes for fast region selection. Computer Graphics Forum 27: 1323–1332.

Berkenkamp

Schoellig

Krause

(2016) Safe controller optimization for quadrotors with Gaussian processes. In: International Conference on Robotics and Automation (ICRA). IEEE, pp. 491–496.

Costabal

Perdikaris

Kuhl

Hurtado

(2019) Multi-fidelity classification using Gaussian processes: accelerating the prediction of large-scale computational models. arXiv preprint arXiv:1905.03406.

Cutajar

Pullin

Damianou

Lawrence

González

(2019) Deep Gaussian processes for multi-fidelity modeling. arXiv preprint arXiv:1903.07320.

Deits

Tedrake

(2015) Efficient mixed-integer planning for uavs in cluttered environments. In: 2015 IEEE international conference on robotics and automation (ICRA). IEEE, pp. 42–49.

Dijkstra

(1959) A note on two problems in connexion with graphs. Numerische Mathematik 1(1): 269–271.

Dribusch

Missoum

Beran

(2010) A multifidelity approach for the construction of explicit decision boundaries: application to aeroelasticity. Structural and Multidisciplinary Optimization 42(5): 693–705.

Gao

Lin

Shen

(2018) Online safe trajectory generation for quadrotors using fast marching method and bernstein basis polynomial. In: 2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 344–351.

Gardner

Kusner

Weinberger

Cunningham

(2014) Bayesian optimization with inequality constraints. In: International Conference on Machine Learning (ICML). PMLR, pp. 937–945.

10.

Gardner

Pleiss

Bindel

Weinberger

Wilson

(2018) GPyTorch: blackbox matrix–matrix Gaussian process inference with GPU acceleration. In: Conference on Neural Information Processing Systems (NeurIPS), pp. 7576–7586.

11.

Gelbart

(2015) Constrained Bayesian optimization and applications. PhD Thesis, Harvard University.

12.

Guerra

Tal

Murali

Ryou

Karaman

(2019) FlightGoggles: Photorealistic sensor simulation for perception-driven robotics using photogrammetry and virtual reality. arXiv preprint arXiv:1905.11377.

13.

Hensman

Fusi

Lawrence

(2013) Gaussian processes for big data. In: Conference on Uncertainty in Artificial Intelligence. AUAI.

14.

Hensman

Matthews

Ghahramani

(2015) Scalable variational Gaussian process classification. In: International Conference on Artificial Intelligence and Statistics. PMLR, pp. 351–360.

15.

Hernández-Lobato

Gelbart

Hoffman

Adams

Ghahramani

(2015) Predictive entropy search for Bayesian optimization with unknown constraints. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning (ICML’15). New York: ACM Press, 2015, pp. 1699–1707.

16.

Hernández-Lobato

Hoffman

Ghahramani

(2014) Predictive entropy search for efficient global optimization of black-box functions. In: Conference on Neural Information Processing Systems (NeurIPS), pp. 918–926.

17.

Huang

Allen

Notz

Miller

(2006) Sequential kriging optimization using multiple-fidelity evaluations. Structural and Multidisciplinary Optimization 32(5): 369–382.

18.

Kalakrishnan

Chitta

Theodorou

Pastor

Schaal

(2011) STOMP: Stochastic trajectory optimization for motion planning. In: International Conference on Robotics and Automation (ICRA). IEEE, pp. 4569–4574.

19.

Kandasamy

Dasarathy

Oliva

Schneider

Poczos

(2016) Multi-fidelity Gaussian process bandit optimisation. arXiv preprint arXiv:1603.06288v4.

20.

Keil

(1985) Decomposing a polygon into simpler components. SIAM Journal on Computing 14(4): 799–817.

21.

Kennedy

O’Hagan

(2000) Predicting the output from a complex computer code when fast approximations are available. Biometrika 87(1): 1–13.

22.

Le Gratiet

Garnier

(2014) Recursive co-kriging model for design of computer experiments with multiple levels of fidelity. International Journal for Uncertainty Quantification 4(5): 365–386.

23.

Lindberg

Lee

(2015) Optimization under constraints by applying an asymmetric entropy measure. Journal of Computational and Graphical Statistics 24(2): 379–393.

24.

Marco

Baumann

Hennig

Trimpe

(2019) Classified regression for Bayesian optimization: Robot learning with unknown penalties. arXiv preprint arXiv:1907.10383.

25.

Marco

Berkenkamp

Hennig

Schoellig

Krause

Schaal

Trimpe

(2017) Virtual vs. real: Trading off simulations and physical experiments in reinforcement learning with Bayesian optimization. In: International Conference on Robotics and Automation (ICRA). IEEE, pp. 1557–1563.

26.

Mellinger

Kumar

(2011) Minimum snap trajectory generation and control for quadrotors. In: International Conference on Robotics and Automation (ICRA). IEEE, pp. 2520–2525.

27.

Mockus

Tiesis

Zilinskas

(1978) The application of Bayesian methods for seeking the extremum. Towards Global Optimisation 2: 117–129.

28.

Moon

Martinez-Carranza

Cieslewski

, et al. (2019) Challenges and implemented technologies used in autonomous drone racing. Intelligent Service Robotics 12(2): 137–148.

29.

Nickisch

Rasmussen

(2008) Approximations for binary Gaussian process classification. Journal of Machine Learning Research 9: 2035–2078.

30.

Perdikaris

Raissi

Damianou

Lawrence

Karniadakis

(2017) Nonlinear information fusion algorithms for data-efficient multi-fidelity modelling. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 473(2198): 20160751.

31.

Rai

Antonova

Meier

Atkeson

(2019) Using simulation to improve sample-efficiency of Bayesian optimization for bipedal robots. Journal of Machine Learning Research 20(49): 1–24.

32.

Rajnarayan

(2009) Trading risk and performance for engineering design optimization using multifidelity analyses. PhD Thesis, Stanford University.

33.

Rasmussen

Williams

(2006) Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press.

34.

Richter

Bry

Roy

(2016) Polynomial trajectory planning for aggressive quadrotor flight in dense indoor environments. In: International Symposium on Robotics Research (ISRR). Berlin: Springer, pp. 649–666.

35.

Ryou

Tal

Karaman

(2020) Multi-fidelity black-box optimization for time-optimal quadrotor maneuvers. In: Robotics: Science and Systems (RSS). Available at http://www.roboticsproceedings.org/rss16/p032.pdf (accessed July 2021).

36.

Snelson

Ghahramani

(2006) Sparse Gaussian processes using pseudo-inputs. In: Conference on Neural Information Processing Systems (NeurIPS), pp. 1257–1264.

37.

Srinivas

Krause

Kakade

Seeger

(2012) Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory 58(5): 3250–3265.

38.

Takeno

Fukuoka

Tsukada

, et al. (2019) Multi-fidelity Bayesian optimization with max-value entropy search. arXiv preprint arXiv:1901.08275.

39.

Tal

Karaman

(2021) Accurate tracking of aggressive quadrotor trajectories using incremental nonlinear dynamic inversion and differential flatness. IEEE Transactions on Control Systems Technology 29(3): 1203–1218.

40.

Tang

Sun

Hauser

(2020) Enhancing bilevel optimization for uav time-optimal trajectory using a duality gap approach. In: 2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 2515–2521.

41.

Wang

Jegelka

(2017) Max-value entropy search for efficient Bayesian optimization. In: International Conference on Machine Learning (ICML). JMLR, pp. 3627–3635.

42.

Zhang

Miller

Singh

Dubrawski

(2017) Noise-tolerant interactive learning using pairwise comparisons. In: Advances in Neural Information Processing Systems, pp. 2431–2440.

43.

Zhang

Lyu

Yang

Yan

Zhou

Zeng

(2019) An efficient multi-fidelity Bayesian optimization approach for analog Circuit Synthesis. In: Design Automation Conference (DAC). New York: ACM/IEEE.