Abstract
We consider the problem of generating a time-optimal quadrotor trajectory for highly maneuverable vehicles, such as quadrotor aircraft. The problem is challenging because the optimal trajectory is located on the boundary of the set of dynamically feasible trajectories. This boundary is hard to model as it involves limitations of the entire system, including complex aerodynamic and electromechanical phenomena, in agile high-speed flight. In this work, we propose a multi-fidelity Bayesian optimization framework that models the feasibility constraints based on analytical approximation, numerical simulation, and real-world flight experiments. By combining evaluations at different fidelities, trajectory time is optimized while the number of costly flight experiments is kept to a minimum. The algorithm is thoroughly evaluated for the trajectory generation problem in two different scenarios: (1) connecting predetermined waypoints; (2) planning in obstacle-rich environments. For each scenario, we conduct both simulation and real-world flight experiments at speeds up to 11 m/s. Resulting trajectories were found to be significantly faster than those obtained through minimum-snap trajectory planning.
1. Introduction
In recent years, fast navigation of autonomous vehicles has received increasing interest. For several years an autonomous drone racing competition has been organized at IROS (Moon et al., 2019), and last year the AlphaPilot challenge introduced a similar competition to a more general audience (Guerra et al., 2019). Currently and in the near future, state-of-the-art algorithms in estimation and control are reaching a level of maturity, such that the bounds of what is physically possible with a given vehicle are approached. This presents the need for trajectory planning algorithms that fully exploit the capabilities of the vehicle and take into account the intricate limitations of vehicle dynamics, instead of relying on simplified models.
In this paper, we consider the problem of generating, i.e., planning, a dynamically feasible, time-optimal quadrotor trajectory. By definition, such an optimal trajectory is found at the boundary of the set of feasible trajectories. Hence, precise knowledge of the dynamic feasibility constraints is required to find the time optimum. This complicates the problem, as these feasibility constraints can become highly complex in light of high-acceleration flight and aggressive attitude changes, as required to achieve time optimality. The demanding maneuvers are affected by flight dynamics, but also by hardware and software for control and state estimation. The resulting set of non-convex feasibility constraints with memory (i.e., depending on present and past states and control inputs) cannot readily be incorporated in a typical trajectory planner for two main reasons. First, the feasibility constraints are not easily expressed in a convenient way, e.g., as constraints on an admissible set of control inputs and states. Instead, the feasibility of the trajectory must be considered in a holistic manner. Second, in most scenarios, precise modeling of these constraints is only possible through real-world experiments. As these experiments aim to seek the boundary of the feasible set, they are risky and potentially costly and should, thus, be kept to a minimum. In contrast to this objective, many trajectory optimization schemes rely on a large number of evaluations. These two issues form the main motivation for our algorithm, which uses a multi-fidelity optimization technique that can approximate the system feasibility constraints based on a limited number of experiments. It uses a Gaussian process (GP) black-box model to classify candidate trajectories as feasible or infeasible and is thereby able to plan increasingly fast trajectories as the model improves. An overview of the algorithm is shown in Figure 1.

Overview of the proposed algorithm that models dynamic feasibility constraints based on simulation and flight data to efficiently find the time-optimal trajectory.
This paper contains several contributions. First, we propose an algorithm for modeling of quadrotor feasibility constraints and generation of time-optimal trajectories based on GP classification. Second, we extend the applicability of the multi-fidelity deep Gaussian process kernel (multi-fidelity deep Gaussian process itself is a particular method) from the regression problem to the classification problem in order to obtain a multi-fidelity GP classification algorithm that can incorporate evaluations from analytical approximation, numerical simulation, and real-world flight experiments. Third, we design an acquisition process specifically tailored to experimental robotics. The acquisition function takes into account the additional cost of infeasible evaluations, as these may pose a threat to the vehicle and its surroundings. Candidate data points are generated using minimum-snap perturbations in order to maintain feasibility. Finally, we present an extensive evaluation of the proposed algorithm in both simulation and real-life flight experiments at speeds up to 11 m/s. A video of these experiments can be found in Extension 1. It is found that optimized trajectories are significantly faster than those obtained through minimum-snap trajectory planning.
An initial version of this paper was presented at Robotics: Science and Systems 2020 (Ryou et al., 2020). In the current work, we extend our algorithm to enable time-optimal trajectory planning in obstacle-rich environments based on convex polygonal decomposition. We also present additional simulation and flight experiments for a total of 12 distinct trajectories, as well as more elaborate analysis of said experiments.
The outline of the paper is as follows. In Section 2, we present the problem definition, and preliminaries on trajectory planning and Bayesian optimization. We detail our algorithm for the generation of time-optimal trajectories using iterative experiments in Section 3. In Section 4, we present experimental results from both simulation and real-world flights. Finally, we provide conclusions in Section 5.
2. Preliminaries
2.1. Quadrotor trajectory planning
We consider the problem of planning a time-optimal quadrotor trajectory. A trajectory
where
Significant aerodynamic forces and momenta act on the vehicle and need to be accounted for in control inputs. In contrast, vehicle aerodynamics can typically be neglected in low-speed flight.
State estimation errors due to delay and phase lag are exacerbated by the fast-changing vehicle state. Moreover, sensors measurements may incur additional noise due to large accelerations.
Actuation delay and bandwidth limitations, such as the mechanical time constant of the motors, inhibit tracking of fast-changing control inputs.
Battery internal resistance causes voltage to drop under large currents. Consequently, very high motor speeds can only be maintained for limited consecutive time periods.
What results is a set of non-convex feasibility constraints with memory (i.e., depending on present and past states and control inputs). These characteristics of
Popular methods for trajectory planning avoid the complicated feasibility constraints by reformulating the optimization problem such that feasibility is the objective instead of a constraint (Mellinger and Kumar, 2011; Richter et al., 2016). In practice, this is achieved by minimizing high-order derivatives of the trajectory function. In particular, minimizing the fourth-order derivative of position, i.e., snap, and the second-order derivative of yaw results in a minimization of the required angular accelerations. This reduces trajectory agility and may thereby prevent activation of the aforementioned feasibility constraints. Before we detail the resulting minimum-snap optimization problem, we describe two formulations to incorporate geometric constraints imposed by
where
where matrix
The minimum-snap trajectory optimization problem is given by
where
with
where
Minimum-snap trajectory generation algorithms employ a two-step process based on (6). First, the minimum-snap trajectory for a (very large) initial guess of the total trajectory time is found, as follows:
Next, the obtained time allocation is scaled down, i.e. ,
The feasibility constraint is typically evaluated based on control inputs obtained from differential flatness of the idealized quadrotor dynamics (Mellinger and Kumar, 2011). Although this method can generate fast, feasible trajectories, it does not attain time-optimality for several reasons. First, the time allocation ratio obtained in (7) may not be optimal as the total trajectory time is decreased, i.e., there may exist an alternative allocation ratio that will enable lower feasible total trajectory time. Second, the method uses a simple feasibility model that fails to consider the trajectory as a whole, as is required for a realistic feasibility check.
Our proposed algorithm searches for a time-optimal trajectory by directly addressing the two major shortcomings of minimum-snap trajectory generation. It builds a realistic, probabilistic feasibility constraint model that considers the trajectory as a whole, and uses this model to find the optimal time allocation over trajectory segments.
2.2. Bayesian optimization
Bayesian optimization, or BayesOpt, is a class of algorithms that use machine learning techniques for solving optimization problems with black-box objective or constraint functions that are expensive to evaluate. Evaluation points are selected to model the unknown functions and approach the optimum with maximum efficiency, so that the total number of evaluations is kept to a minimum. Within the BayesOpt framework, GP modeling (Rasmussen and Williams, 2006) is widely used to build a surrogate model that approximates the unknown objective or constraint functions based on noisy measurements. Given data points
Then
When modeling inequality constraints, Gaussian process classification (GPC) is often used instead of direct GP modeling. Rather than assuming a joint distribution over the evaluations, GPC assumes a joint Gaussian distribution over the evaluations and the latent variables
The latent variables and the hyperparameters of the kernel function are trained by maximizing the marginal likelihood function
where
so that the distribution of the latent variable
The resulting class probability is obtained by
For more details on GPC and its implementation, the reader is referred to Nickisch and Rasmussen (2008).
The BayesOpt acquisition function is designed to select the next evaluation point by considering both reducing the uncertainty of the surrogate model and finding the precise optimum of the objective function. Based on the data
where
In multi-fidelity Bayesian optimization, function evaluations of different fidelities can be combined. For instance, a rough simulation or an expert’s opinion may serve as a low-fidelity model, while a high-accuracy simulation or real-world experiment serves as a high-fidelity model. The key idea is that combining cheap low-fidelity evaluations with expensive high-fidelity measurements improves overall efficiency. To incorporate information from multiple sources, the surrogate model must be modified to combine multi-fidelity evaluations. Kennedy and O’Hagan (2000) and Le Gratiet and Garnier (2014) used a linear transformation to describe the relationship between different fidelities. Suppose that we have
where
Similar to the surrogate model, the acquisition function has to be modified to incorporate multi-fidelity evaluations. In the multi-fidelity Bayesian optimization framework, the acquisition function not only selects the next evaluation point, but also its fidelity level, as follows:
The acquisition function is modified by introducing weights based on the evaluation cost at the different fidelity levels. In practice, high-fidelity evaluations will have smaller weights compared with low-fidelity evaluations. This makes the algorithm less likely to select high-fidelity evaluations, so that the overall cost of the experiments is minimized. Common metrics used in acquisition functions are modified in this manner, e.g., weighted expected improvements (Huang et al., 2006), weighted expected entropy reductions (Takeno et al., 2019), and weighted upper bounds of the optimum (Kandasamy et al., 2016). Costabal et al. (2019) approached the multi-fidelity classification problem by differently weighting the uncertainty reduction of the classifiers at each fidelity level. As multi-fidelity Bayesian optimization has the potential to reduce the number of expensive high-fidelity experiments, it has been applied in various fields, including analog circuit design (Zhang et al., 2019), aircraft wing design (Rajnarayan, 2009), and control synthesis (Marco et al., 2017; Rai et al., 2019).
Although Bayesian optimization aims to contain the number of required function evaluations, it may still suffer from large computation cost as the number of data points increases. This is mainly due to inversion of a covariance matrix including all data points for uncertainty quantification in the surrogate function, leading to a computational cost proportional to the number of data points cubed. In particular, multi-fidelity Bayesian optimization may suffer from this issue, as the number of data points can quickly increase by adding low-fidelity evaluations. The problem is addressed by the inducing points method, which uses a set of pseudo-data points for uncertainty quantification (Hensman et al., 2013; Snelson and Ghahramani, 2006). The pseudo-data points are selected to minimize the Kullback–Leibler (KL) divergence between the function posterior given all data points (including the pseudo-data points), and the function posterior given only the pseudo-data points. As the number of pseudo-data points is much smaller than the number of actual data points, the inducing points method can greatly improve algorithm performance as the size of the dataset increases. The method was extended to the classification problem by Hensman et al. (2015), and applied to multi-fidelity Bayesian optimization by Cutajar et al. (2019). Gardner et al. (2018) further increased its efficiency with GPU acceleration.
3. Algorithm
Our proposed algorithm uses Bayesian optimization to efficiently minimize the total trajectory time
This formulation is based on the assumption that if there exists a feasible trajectory, i.e., a member of the set
We use GPC to learn a surrogate model of the feasibility constraint in (18). The surrogate model combines results from sequential experiments at
Each evaluation is selected based on the acquisition function, which considers exploration and exploitation with the goal of maximizing the efficiency of the overall optimization process. Based on the current trained surrogate model, it estimates the expected improvements in objective function value and model accuracy. By iteratively improving the feasibility model and minimizing the objective function, our algorithm searches for the time allocation that minimizes total trajectory time. An overview of the complete algorithm is given in Algorithm 1, and its major components are detailed in ensuing sections.
3.1. Multi-fidelity constraint model
We define the GPC-based surrogate model
where the hyperparameters
In order to utilize the multi-fidelity optimization technique, we update the definition of the surrogate model and define the
We use the multi-fidelity deep Gaussian process (MFDGP) by Cutajar et al. (2019) as the covariance kernel function to estimate the GP prior from multiple evaluation sources. The multi-fidelity GP has separate latent variables and inducing points
where
where
3.2. Acquisition function
We design the acquisition function considering both exploration to improve the surrogate model, and exploitation to find the optimal solution. In exploration, we aim to maximize the effectiveness of improving the surrogate model. Inspired by Costabal et al. (2019), we select the most uncertain sample near the decision boundary of the classifier. Since the latent function mean approaches zero at the decision boundary, this sample is found as the maximizer of
where
where
where
Finally, we combine (24) and (26) to obtain
Note that exploration only occurs if exploitation is ineffective. We found that this greedy approach works well in practice, as overly explorative searching leads to a large number of infeasible evaluations.
In case of a discontinuous acquisition function such as (27), Latin hypercube sampling (LHS) is often used to generate candidate solutions of the acquisition function (Srinivas et al., 2012). However, we observe that this method may fail to generate an optimal solution because it does not consider the correlation between adjacent time allocations. We address this by devising a sampling-based algorithm that uses a smooth perturbation of the current best solution. For this purpose, we extend the notion of goal-convergent exploration by Kalakrishnan et al. (2011). This method generates smooth trajectories by minimizing the expected sum of acceleration at each collocation point. Instead of acceleration, we minimize the expected jerk. As time allocation is inversely related to flight speed, this approximately minimizes the jerk of the speed profile, and thereby the snap of the perturbation, which is favorable for maintaining feasibility. The perturbation vector
where
3.3. Initialization
Initial data points to build the surrogate model are generated around the trajectory found by (8), which may differ between fidelity levels, as feasibility constraints differ. At fidelity levels with low evaluation cost, data point generation is done using LHS. At fidelity levels where this method imposes prohibitive evaluation cost, we use the fact that the initial trajectory is on the feasibility boundary. As such, time allocations with the same ratio but shorter total time are infeasible, while time allocations with the same ratio but larger total time are feasible. This enables generation of data points without any additional evaluation cost by scaling the initial trajectory.
The initial trajectory is also used for the normalization of data points. This is required because feasibility constraints differ between the different levels of fidelity, leading to bias in the dataset. Moreover, time allocations between trajectory segments may be at different scales of magnitude, which may make the training process numerically unstable. To resolve these issues, time allocation for each trajectory segment is scaled with the time allocation of the corresponding segment in the initial trajectory at the same fidelity level.
3.4. Geometric constraints
We consider two formulations to incorporate geometric constraints in the trajectory optimization problem: waypoint constraints and polytope constraints. In many applications, candidate trajectories are constrained to attain a pre-defined sequence of waypoints. By using a piecewise polynomial trajectory representation, the optimization can be formulated as an unconstrained quadratic program based on (4) and (2). In other scenarios, particularly when planning in obstacle-rich environments, waypoints may become an ineffective method to avoid collisions. Convex polytopes can instead be used to constrain the trajectory to the obstacle-free space without imposing waypoint restrictions. Each polynomial trajectory segment is now constrained to a polytope, leading to the constrained quadratic program (4) subject to (3). As described in Section 2, the function (6) that maps time allocation and geometric constraints to the corresponding minimum-snap trajectory can be formulated based on either waypoint constraints or polytope constraints. Hence, our proposed algorithm for multi-fidelity Bayesian optimization can be applied to either case without any modification.
In order to fully demonstrate the compatibility of our algorithm with polytope constraints, the remainder of this section details a procedure to construct such constraints. We focus on planning in two-dimensional obstacle-rich environments, and use the convex decomposition method proposed by Keil (1985) in combination with Dijkstra’s algorithm (Dijkstra, 1959). Major steps of the resulting algorithm to generate the geometric constraints are shown in Figure 2.

Algorithm to generate collision-free trajectories. (a) Initial environment with obstacles. (b) Transform into a simple polygon. (c) Decompose free space into polygons. (d) Generate the face graph. (e) Find the sequence of polygons. (f) Apply multi-fidelity optimization.
The free space of an environment, e.g., as shown in Figure 2(a), can be modeled as a polygon with inner holes. However, its decomposition into a set of convex polygons would be NP-hard (Keil, 1985). Thus, we add bridges to transform the free space into a simple polygon, i.e., a polygon without intersections or inner holes (Figure 3). To place these bridges, we model the border of the environment and the obstacles as polygons

Polygon graph to convert free space into a simple polygon.
We obtain a convex decomposition of the resulting simple polygon using the method by Keil (1985). The pseudocode in Algorithm 2 represents the procedure, and Figure 2(c) shows the resulting set of convex polygons. Given trajectory start and end positions, a connecting sequence of polygons can be found by various methods. In some scenarios, such as drone racing on a known track, the sequence of polygons can be selected manually or based on a guiding trajectory. If this is not possible, one may find the shortest path from start to end using a planning algorithm in geometrical space and select the polygon sequence through which the resulting path passes. For our proposed algorithm, we first generate a graph connecting the center point of each polygon and its faces, as shown in Figure 2(d). The shortest path that connects the start point and end point can then be found efficiently through combinatorial planning using Dijkstra’s algorithm (Dijkstra, 1959). Finally, the sequence of polygons on the shortest path is selected to construct the geometric constraints, as shown in Figure 2(e).
Let
For each polygonal cell, such a set of corresponding linear constraints is incorporated into (4).
3.5. Levels of fidelity
Evaluations at three different fidelity levels are used for quadrotor trajectory planning. Low-fidelity evaluations are based on differential flatness of the quadrotor dynamics, which enables us to transform a trajectory and its time derivatives from the output space, i.e., position and yaw angle with derivatives, to the state and control input space, i.e., position, velocity, attitude, angular rate, and motor speeds (Mellinger and Kumar, 2011). The resulting reference control input trajectory
where
Medium-fidelity evaluations are obtained using the open-source multicopter dynamics and inertial measurement simulation by Guerra et al. (2019) with the trajectory tracking controller by Tal and Karaman (2021). The feasible set is defined as
where
High-fidelity evaluations are obtained from real-world experiments using a quadcopter and motion capture system. At this fidelity level, each evaluation incorporates dynamics of the full system, including actuation and sensor systems, vehicle vibrations, unsteady aerodynamics, battery performance, and estimation and control algorithms. This provides a highly accurate assessment of feasibility, but comes at great cost as it involves actual flying hardware and cannot be executed any faster than real-time. We use the same controller as in simulation, and again perform multiple flights to account for stochastic effects. The feasible set is identical to (31), and any trajectories resulting in a collision are again deemed infeasible. The overall objective of the algorithm is to find the time-optimal trajectory, subject to feasibility at the highest-fidelity model included.
4. Experimental results
We present experimental results that demonstrate the operation and performance of our proposed algorithm. First, we focus on a simple two-segment trajectory. Through visualization of the corresponding two-dimensional solution space, insights in the algorithm operation, particularly its acquisition function, are provided. Next, we present an extensive experimental evaluation of the proposed algorithm in two scenarios. For the first scenario, we apply the algorithm to obtain time-optimal trajectories based on a predetermined sequence of waypoints through an environment without any obstacles. For the second scenario, we define only start and end points and consider environments that include obstacles. We apply the proposed algorithm including polygonal decomposition to find a time-optimal trajectory that avoids the obstacles. In both scenarios, we evaluate the proposed algorithm in a simulation environment, and in a hybrid environment including simulation and real-world experiments.
4.1. Two-segment trajectory
In order to illustrate the operation of our proposed algorithm, we present results for the simple two-segment trajectory shown in Figure 4(a). The yaw rate, velocity, acceleration, and jerk are constrained to zero at the first waypoint. Results were obtained in the simulation environment, where we incorporate low-fidelity evaluations using reference control input feasibility, and medium-fidelity evaluations using the multicopter simulation. For the medium-fidelity evaluations, we set the maximum Euclidean position tracking error to 20 cm and the maximum yaw tracking error to 15°. To reflect the difference in evaluation cost between evaluations, we set the parameters of the acquisition functions (24) and (26) as

Waypoints and feasibility maps for two-segment trajectory optimization in the simulation environment. (a) Trajectory with waypoints. (b) Initial medium-fidelity feasibility. (c) Final medium-fidelity feasibility.
As shown in Figure 4(b), the low-fidelity dataset is initialized using LHS of 400 data points, and the medium-fidelity dataset is initialized using 20 evenly scaled data points. Given the low dimension of the two-segment time allocation, third-order finite differencing as required by (28) cannot be applied. Instead, we resort to LHS for generation of candidate solutions for the acquisition function. We run the algorithm 20 times for 50 iterations using different random seeds. Each medium-fidelity evaluation along with preceding low-fidelity evaluations is considered a single iteration, and the number of low-fidelity evaluations per iteration is limited to 20. This limit is imposed to prevent the acquisition function from selecting too many successive low-fidelity evaluations, which may otherwise occur at iterations where the selected acquisition parameters do not behave well with the state of the medium-fidelity surrogate model.
Figure 4(b) and (c) show the initial and the final medium-fidelity feasibility maps for one of the runs, respectively. Note that all medium-fidelity evaluations are in proximity of the final solution, as this promising region was first found by low-fidelity evaluations. To evaluate our algorithm, we compare the total trajectory time of the best solution at each iteration with the trajectory time obtained by (8) using the same feasibility evaluation constraints. Figure 5 shows a flight time improvement of approximately 2% compared with the initial trajectory. In addition, we compare the trajectory smoothness given by (5). For this comparison, we scale the total trajectory time to the value obtained by (8), while maintaining the allocation ratio found by our algorithm. Thereby, the comparison is based solely on the difference in allocation ratio, and not affected by total trajectory time. Figure 5 shows that our algorithm selects time allocation ratios that result in increased snap and yaw acceleration at the same total trajectory time. Despite this, the corresponding trajectories are still feasible at total trajectory times smaller than the initial trajectories. This shows that our algorithm is able to find an allocation ratio that compares favorably to that found by (7) and (8), and is thereby able to find feasible trajectories with smaller total trajectory time. Indeed, we can conclude that plain snap minimization does not result in the time-optimal trajectory. Our approach based on modeling of realistic feasibility constraints converges to a quicker feasible trajectory with increased snap. Although the trajectory duration is only improved by a modest 2%, this does illustrate and validate the rationale behind our proposed multi-fidelity trajectory optimization algorithm. In Sections 4.2 and 4.3, we show that larger improvements can be obtained for more complicated multi-segment trajectories.

Mean and standard deviation of relative trajectory time and smoothness for two-segment trajectory, obtained over 20 random seeds in the simulation environment.
4.2. Waypoint trajectories
Next, we apply our algorithm to the more complicated multi-segment trajectories shown in Figure 6. For each trajectory, yaw rate, velocity, acceleration, and jerk are constrained to zero at the first and final waypoints. The trajectories range from 9 to 19 segments. Owing to the corresponding increase in dimensionality of the solution space, the minimum-jerk perturbation given by (28) can now be used to generate the candidate solution set. The scaling factor

Multi-segment trajectories with starting point and subsequent waypoints indicated by green and red arrows, respectively.
We present results for all eight multi-segment trajectories using the simulation environment, which incorporates low-fidelity evaluations using reference control input feasibility and medium-fidelity evaluations using the multicopter simulation. In addition, we apply our algorithm to a subset of three trajectories in the hybrid environment, which incorporates medium-fidelity evaluations using the multicopter simulation and high-fidelity evaluations using real-world flight experiments. We chose not to incorporate low-fidelity evaluations in the hybrid environment, because their evaluation time is relatively very similar to medium-fidelity evaluations when compared with the time-of-flight experiments. Consequently, they provide less-valuable data points at similar cost. In both environments, we set the parameters of the acquisition functions (24) and (26) as
For the experiments in the simulation environment, we initialize the surrogate model using 1,000 low-fidelity samples. The maximum amount of low-fidelity evaluations per iteration is set to 50. The results in Figure 7 show that the algorithm is able to significantly reduce the trajectory time for each of the eight evaluated trajectories. The largest improvement is obtained for Trajectory 3, where the flight time is reduced by 22% on average. The optimized trajectory is feasible despite having almost four times larger snap and yaw acceleration, which shows our algorithm’s capability to find faster feasible trajectories. This is achieved by changing the time allocation ratio between trajectory segments, as shown in Figure 8. Figure 9 shows that the resulting trajectories have an increased speed on most segments, but also slow down on some segments. Speed decreases often occur during or just prior to tight turns and help maintain tracking feasibility as the speed over preceding and subsequent segments is increased. This interaction between trajectory segments is not captured by the low-fidelity control input evaluations, and demonstrates how using multi-fidelity modeling towards a holistic approach to optimization results in faster feasible trajectories.

Mean and standard deviation of relative trajectory time and smoothness for multi-segment trajectories, obtained over 20 random seeds in the simulation environment.

Average relative time allocation of initial and optimized multi-segment trajectories, obtained over 20 random seeds in the simulation environment.

Speed profiles of initial and optimized multi-segment trajectories, obtained over 20 random seeds in the simulation environment. Shading indicates standard deviation.
In the hybrid environment, we optimize three of the waypoint trajectories shown in Figure 6 based on medium-fidelity evaluations using the multicopter simulation and high-fidelity evaluations using real-world quadrotor flight experiments. The medium-fidelity dataset is initialized using LHS and the high-fidelity dataset using the scaling method. At each optimization step, the acquisition function selects the type of evaluation ( i.e., simulation or real-world experiment) and updates the feasibility model based on the evaluation result. The corresponding computation time varies between 5 and 10 seconds, depending on the convergence time of the GP feasibility model. The time spent on computation is relatively insignificant in the hybrid environment, because real-world flight experiments take several minutes on average when including steps such as battery replacement. Comparison of Figure 7 with Figure 10 shows that the obtained flight time improvements are similar to those from the simulation environment. However, Figures 11 and 12 show time allocations different from those obtained in simulation (cf. Figures 8 and 9). In fact, the time allocation ratios obtained in the simulation environment may not be feasible in flight with the real-world vehicle, as it may behave differently than the medium-fidelity simulation. This underlines the importance of incorporating real-world flight experiments in the trajectory optimization. At the same time, the inclusion of medium-fidelity simulation evaluations is essential to find effective samples for high-fidelity evaluations, which enables the algorithm to lower the number of required real-world flights. Without this feature, optimization of the trajectories in Figure 11 would be infeasible, as it amounts to search over a nine- or ten-dimensional solution space.

Mean and standard deviation of relative trajectory time and smoothness for multi-segment trajectories, obtained over five random seeds in the hybrid environment using simulation and real-world flights.

Average relative time allocation of initial and optimized multi-segment trajectories, obtained over five random seeds in the hybrid environment using simulation and real-world flights.

Speed profiles of initial and optimized multi-segment trajectories, obtained over five random seeds in the hybrid environment using simulation and real-world flights. Shading indicates standard deviation.
The optimized trajectories in the hybrid environment show similar characteristics as in the simulated environment. The time allocation is reduced for most segments, but also increased for some in order to maintain feasibility. It can be seen that the vehicle decelerates before the turns and accelerates through them. During the experiments, we observed that entering turns with reduced speed stabilizes tracking on the remainder of the trajectory, which can eventually reduce the overall flight time. We also found that our vehicle is generally able to stabilize to static hover very quickly, allowing it to finish the trajectory quite aggressively.
4.3. Polytope trajectories
In this section, our proposed algorithm including convex decomposition is applied to the planning of trajectories between start and end points in environments with obstacles, as shown in Figure 13. Velocity, acceleration, and jerk are constrained to zero at the start and end points, and yaw and yaw rate are constrained to zero for the whole trajectory. We focus on planning in environments with two-dimensional polygonal obstacles, and constrain the height to be constant throughout each trajectory. The acquisition function hyperparameters and the method for generating the initial candidate solution set are identical to those described in Section 4.2.

Initial and average optimized polytope trajectories, obtained over 20 random seeds in the simulation environment. Shading indicates standard deviation.
We first verify the algorithm for constructing polytope constraints by comparing the minimum-snap segment time allocations obtained from (7) for waypoint and polytope constraints. The waypoint sequences are obtained using straight-line RRT*, as proposed by Richter et al. (2016). For both types of constraints, the total trajectory times are scaled down according to (8) based on the medium-fidelity simulation with the maximum Euclidean position tracking error set to 20 cm and the maximum yaw tracking error set to 15°. The resulting polytope and waypoint trajectories are shown in blue in Figures 13 and 14, respectively. The corresponding flight times are given in the first two columns of Table 1. It can be seen that the polytope trajectories are all faster than the waypoint trajectories. This is not surprising, as the waypoints represent stricter constraints on the trajectory geometry. As they are set without considering any vehicle model, they may inhibit finding time-optimal trajectories when considering a dynamics model. Hence, polytope constraints are particularly advantageous to our application. In addition, we note that obstacle avoidance is imposed throughout the reference trajectory by using polytope constraints, whereas this is only the case at the actual waypoints for waypoint constraints. Consequently, it may be necessary to add additional waypoints for collision avoidance, which further inhibits trajectory optimization.

Trajectories based on waypoints from straight-line RRT*.
Comparison of flight times for trajectories obtained through minimum-snap planning with waypoint and polytope constraints, and our proposed algorithm for multi-fidelity Bayesian optimization (MFBO). The final column lists mean and standard deviation obtained over 20 random seeds in the simulation environment.
Our proposed algorithm for multi-fidelity Bayesian optimization is used to optimize the time allocation ratio for each trajectory in Figure 13 using the simulation environment, which incorporates low-fidelity evaluations based on reference control input feasibility and medium-fidelity evaluations based on the multicopter simulation. By comparing the blue and red lines, it can be seen that the leeway provided by the polytope constraints is exploited to subtly adapt the trajectory geometry. The resulting decreased flight times are given in the final column of Table 1. Figures 15 and 16 show the corresponding change in speed profile and time allocation. Our proposed algorithm is able to reduce flight times by adjusting the time allocation such that the reference trajectory is further away from the obstacles. In practice, this means that its feasibility is less sensitive to tracking error. Hence, the tracking error bounds can be exploited to a larger degree, before the trajectory becomes infeasible due to an obstacle violation, resulting in faster trajectories.

Speed profiles and average relative time allocation of initial and optimized polytope trajectories, obtained over 20 random seeds in the simulation environment. Shading indicates standard deviation.

Mean and standard deviation of relative trajectory time and smoothness for polytope trajectories, obtained over 20 random seeds in the simulation environment.
In the hybrid environment, we optimize one of the polytope trajectories by incorporating medium-fidelity evaluations based on the multicopter simulation and high-fidelity evaluations based on real-world flight experiments. The real-world flights were performed in the space shown in Figure 17. Figure 18 shows that the trajectory geometry was significantly adapted during the Bayesian optimization process. The resulting trajectory is further away from the first two obstacles. In addition, the speed is decreased in proximity of the first obstacle, as can be seen in Figure 19. Subsequent trajectory segments can now be performed at significantly increased speed, resulting in an average flight time reduction of over 20%, as shown in Figure 20.

Environment for the real-world flight experiments for Trajectory 9 (cf. Figure 18).

Initial and average optimized polytope trajectories, obtained over five random seeds in the hybrid environment using simulation and real-world flights. Shading indicates standard deviation.

Speed profiles and average relative time allocation of initial and optimized polytope trajectory, obtained over five random seeds in the hybrid environment using simulation and real-world flights. Shading indicates standard deviation.

Mean and standard deviation of relative trajectory time and smoothness for polytope trajectory, obtained over five random seeds in the hybrid environment using simulation and real-world flights.
5. Conclusion
We have presented an algorithm for the generation of dynamically feasible time-optimal quadrotor trajectories based on multi-fidelity Bayesian optimization. The vehicle feasibility constraints have been efficiently modeled by incorporating evaluations from analytical approximation, numerical simulation, and real-world experiments in multi-fidelity GPC. We have conducted extensive evaluations through simulation and real-life experiments for both waypoint-constrained and polytope-constrained trajectories. It has been found that the algorithm is able to generate feasible trajectories that are significantly faster than those obtained from existing minimum-snap trajectory optimization algorithms. This demonstrates the capability of our algorithm to efficiently model the feasibility boundary and ultimately plan faster trajectories by incorporating multi-fidelity data, including flight experiments.
The current work has several limitations that provide interesting avenues for future research. With regards to planning in environments with obstacles, the algorithm for decomposition of the free space into convex polytopes is limited to two-dimensional environments. Applicability may be extended to three-dimensional environments by utilizing three-dimensional convex decomposition algorithms studied in the field of computer graphics (Attene et al., 2008). Once the set of three-dimensional convex polyhedrons is obtained, the face graph can be generated as in Figure 2(d), and the remaining steps of the proposed algorithm can be applied without modification to optimize the three-dimensional trajectory.
Although our proposed algorithm generates a time-optimal trajectory for a prescribed sequence of polytope constraints, the resulting trajectory may still be sub-optimal due to the application of Dijkstra’s algorithm. As the algorithm does not account for the real flight-time for each polytope cell during construction of the constraints, there may exist an alternative sequence of polytope constraints that ultimately results in a quicker trajectory. We expect that this issue may be addressed by utilizing more precise heuristics for path planning on the face graph, or simply by selecting several sequences of polytope constraints and comparing the corresponding time-optimal trajectories.
The current algorithm is applied to a given set of fixed waypoints or obstacles, and thus needs to be trained from the beginning for different waypoint or obstacle positions. Analysis of the obtained feasibility constraint models may provide insights in general properties of time-optimal trajectories, which may inform the design of planning algorithms. In this respect, we note that the obtained time-optimal trajectories share similar properties, such as deceleration prior to turns in proximity of obstacles. Constructing a general feasibility model based on these shared properties may enable trajectory optimization subject to arbitrary waypoint and obstacle configurations without the need for additional training.
The proposed algorithm can be generalized to motion planning problems subject to general dynamics equations by modeling the corresponding feasibility boundary as a classification problem. To be specific, if there exists a finite-dimensional parameterization of trajectories, the proposed multi-fidelity Bayesian optimization framework can be applied to generate time-optimal trajectories. For instance, when generating robotic manipulator trajectories through minimum-jerk optimization, the time allocation over trajectory segments can be optimized with the proposed algorithm. In addition, the same framework may be applied to optimize trajectories subject to binary constraints other than dynamic feasibility. For example, when planning for a mobile robot with perception system, perception error constraints may be formulated as a binary classification and incorporated in multi-fidelity Bayesian optimization.
Footnotes
Appendix. Index to multimedia extensions
Archives of IJRR multimedia extensions published prior to 2014 can be found at http://www.ijrr.org, after 2014 all videos are available on the IJRR YouTube channel at http://www.youtube.com/user/ijrrmultimedia
Video of the experiments
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partly funded by the ONR (grant number N000141712670).
