Abstract
This paper presents a Robust Constrained Learning-based Nonlinear Model Predictive Control (RC-LB-NMPC) algorithm for path-tracking in off-road terrain. For mobile robots, constraints may represent solid obstacles or localization limits. As a result, constraint satisfaction is required for safety. Constraint satisfaction is typically guaranteed through the use of accurate, a priori models or robust control. However, accurate models are generally not available for off-road operation. Furthermore, robust controllers are often conservative, since model uncertainty is not updated online. In this work our goal is to use learning to generate low-uncertainty, non-parametric models in situ. Based on these models, the predictive controller computes both linear and angular velocities in real-time, such that the robot drives at or near its capabilities while respecting path and localization constraints. Localization for the controller is provided by an on-board, vision-based mapping and navigation system enabling operation in large-scale, off-road environments. The paper presents experimental results, including over 5 km of travel by a 900 kg skid-steered robot at speeds of up to 2.0 m/s. The result is a robust, learning controller that provides safe, conservative control during initial trials when model uncertainty is high and converges to high-performance, optimal control during later trials when model uncertainty is reduced with experience.
1. Introduction
Model Predictive Control (MPC) is a powerful algorithm capable of computing optimal inputs while seamlessly addressing state and input constraints. At each time-step, the controller solves a finite-horizon optimization problem based on the current state and a model of the system (Rawlings and Mayne, 2009). For mobile robots, state constraints may represent physical obstacles or localization limits, and, as a result, constraint satisfaction is tantamount to safety (Figure 1). However, system models used in practice are often uncertain and constraint satisfaction is difficult to guarantee in general. For example, operation in off-road terrain frequently leads to modeling errors due to surface materials (e.g. snow, sand, grass), terrain topography (e.g. side-slopes, inclines), and complex robot dynamics (Ostafew et al., 2015a).

State constraints for mobile robots frequently represent physical obstacles, thus constraint satisfaction is required for safety. We present a Robust Constrained Learning-based Nonlinear MPC algorithm to guarantee constraint satisfaction while improving performance through learning. The algorithm is tested on a 900 kg Clearpath Grizzly traveling up to 2.0 m/s on off-road paths with tight constraints.
Robust Constrained MPC (RC-MPC) is an active area of research and endeavors to provide guarantees on constraint satisfaction when considering uncertain systems (Mayne, 2014). At each time-step, RC-MPC aims to identify inputs such that all plausible predicted trajectories satisfy the given constraints considering a fixed estimate of model uncertainty. However, such approaches are generally conservative since the models are not updated online.
In this paper, we investigate a Robust Constrained Learning-based Nonlinear Model Predictive Control (RC-LB-NMPC) algorithm for a path-repeating mobile robot operating in challenging outdoor terrain. Learning-based algorithms alleviate the need for significant engineering work on identifying and modeling all effects that a model-based controller may be required to mitigate (Nguyen-Tuong and Peters, 2011; Schaal and Atkeson, 2010). Moreover, learning-based algorithms allow for the system to act in anticipation of disturbances.
The algorithm uses a fixed, simple robot model and a learned, nonparametric disturbance model. Disturbances represent measured discrepancies between the a priori model and observed system behavior. Essentially, the algorithm begins with a simple process model and high model uncertainty for the first trial and learns an accurate, low-uncertainty model over successive trials. Disturbances are modeled as a Gaussian process (GP) (Rasmussen, 2006) based on previous experience as a function of state, input, and other relevant variables. We use a Sigma-Point Transform (Julier and Uhlmann, 2004) to efficiently compute the mean and variance of predicted state sequences given the two-component, learned model. We then apply restricted constraints to the predicted sequence such that the 3

The predictive controller optimizes both the linear and angular velocities over a prediction horizon such that the 3
Localization for the controller is provided by an on-board, Visual Teach & Repeat (VT&R) mapping and navigation algorithm enabling operation in large-scale, off-road environments (Furgale and Barfoot, 2010; Stenning et al., 2013). In the first operational phase, the teach phase, the robot is manually piloted along the desired path. Localization in this phase is obtained relative to the robot’s starting position by Visual Odometry (VO), computing pose changes over sequential images based on extracted feature maps (three-dimensional positions and associated descriptors). At discrete points along the desired path, the algorithm stores the currently viewed feature map. During the repeat phase, the algorithm relocalizes against stored feature maps given the current robot view, generating feedback for a path-tracking controller. As long as a sufficient number of feature matches are made between the live robot view and the stored feature maps, the system generates consistent localization over trials and is able to support a learning control algorithm.
This work differs from traditional constrained NMPC algorithms in two main ways. Firstly, in traditional implementations, the process model is specified a priori and remains unchanged during operation. In this work, we augment the process model with a learned disturbance model in order to predict the mean and uncertainty of effects not captured by the fixed process model. Secondly, constrained NMPC approaches do not typically account for model uncertainty. In this work, we apply robust constraints in real-time considering the learned uncertainty. We provide robust constraint satisfaction when uncertainty is high and increased performance as uncertainty is reduced through learning. This paper is a major extension of previous work. We have previously investigated unconstrained LB-NMPC (Ostafew et al., 2015a) and unconstrained Min–Max LB-NMPC, where control inputs are optimized based on worst-case scenarios (Ostafew et al., 2015b). However, the Min–Max approach does not provide tuning knobs to directly specify tracking error constraints (the only tuning inputs are the MPC weights). Significant additions include robust constraints considering the learned uncertainty and presentation of the necessary nonlinear techniques used to solve the resulting optimization problem. The structure of this paper is as follows. Section 2 relates our work to other research in this field. Sections 3 and 4 describe the proposed RC-LB-NMPC algorithm and implementation details. Section 5 presents experimental results. Finally, Sections 6 and 7 present a discussion and conclusion.
2. Related work
This work is aimed at achieving robust constrained, high-performance path-tracking in spite of unknown disturbances. In the following, we provide a brief review of approaches involving constrained and unconstrained MPC (linear and nonlinear). Then we provide a background on learning control approaches.
2.1. Model Predictive Control
Model Predictive Control (MPC) has received a significant amount of attention in recent years due to its ability to handle complex linear and nonlinear systems with state and input constraints (Mayne, 2014; Rawlings and Mayne, 2009). It has been proposed for mobile robots (Howard et al., 2009; Klančar and Škrjanc, 2007; Krenn et al., 2013; Kühne et al., 2005; Peters and Iagnemma, 2008), unmanned aerial vehicles (Kumar and Michael, 2012; Mueller and D’Andrea, 2013), and humanoid robots (Erez et al., 2013; Gouaillier et al., 2010; Lafaye et al., 2014). However, in each of these examples, the proposed algorithms do not use learned models or account for model uncertainty when applying constraints. As a result, they either require high-accuracy models representing the system and environmental interactions, or risk violating the given constraints.
Tube MPC is a robust constrained algorithm that accounts for model uncertainty when considering constraints (Langson et al., 2004; Mayne et al., 2011). The algorithm applies restricted constraints to nominal predictions such that all possible state sequences based on the given uncertainty satisfy the constraints. Recently, González et al. (2011) and Farrokhsiar et al. (2013) apply Tube MPC to mobile robots. However, unlike their work, our robust constrained algorithm is based on a fixed a priori model and a learned, nonparametric disturbance model. This allows us to apply robust constraints while improving performance through learning. Aswani et al. (2013) propose Learning-based Tube-MPC with a bounded, learned model and use computationally expensive reachability analysis to compute restricted constraints. In this work, state sequences are predicted using an efficient Sigma-Point Transform, and restricted constraints are defined such that the predicted 3
2.2. Learning controllers
Unlike controllers based on fixed models, controllers using learned models gather data over time, incrementally constructing accurate approximations of the true system model. Learned models enable the controller to act in anticipation of disturbances. Learned models are commonly modeled as GPs, Locally Weighted Projection Regression, or Support Vector Regression (Sigaud et al., 2011). In this paper, we model disturbances as a GP based on input–output data from previous trials. This approach enables both model flexibility and consistent uncertainty estimates (Rasmussen, 2006). Modeling a process model as a GP for learning has been previously proposed. For example, Nguyen-Tuong et al. (2008) and Meier et al. (2014) model the inverse dynamics of manipulator arms as a GP based on previous input–output data. Unlike these examples, we model the mean and uncertainty of forward dynamics and propose an RC-NMPC algorithm that seamlessly incorporates the GP model, and robust state and input constraints. Iterative Learning Control (ILC) is another common approach to learning from experience for the control of high-performance robots (Hehn and D’Andrea, 2014; Moore, 2012; Ostafew et al., 2013; Schoellig et al., 2012). ILC constructs an acausal feedforward signal over sequential trials using error information from any previous trial. However, ILC-based learning controllers are only capable of incorporating state and input constraints when constructing the feedforward signal. On the other hand, our model-based RC-LB-NMPC algorithm uses the learned model at each time-step, enabling generalization from learned experiences and real-time, robust constraint satisfaction.
3. Mathematical formulation
At a given sample time, NMPC finds a sequence of control inputs that optimizes the plant behavior over a prediction horizon based on the current state. The first input in the optimal sequence is then applied to the system. The entire process is repeated at the next sample time for the new system state. In this section, we first present our overall robust constrained NMPC algorithm (Figure 3). We then present our methods of predicting uncertain state sequences and estimating disturbances.

The RC-LB-NMPC algorithm is composed of two parts: (i) the robust constrained, path-tracking NMPC algorithm based on an a priori process model, and (ii) the GP-based disturbance model. During the first trial, the RC-NMPC algorithm relies solely on the a priori process model to follow the desired path
3.1. Robust Constrained Nonlinear Model Predictive Control
Consider the following nonlinear, state-space system:
with observable system state
with disturbance dependency
As previously mentioned, the goal of NMPC is to find a set of controls that optimizes the predicted plant behavior over a given horizon. We define the cost function to be minimized over the next K time-steps as
where
We also define
where

Here we show a 1D example of the computation of robust constraints,
where
In addition, we consider
where
Considering the current estimated system state,
where the equality constraints are used to enforce the process model, and
where
where
solve for the value of
and compute an updated value for
where
3.2. Predicting uncertain trajectories
Since the state

Here we show the mean and lateral
where
where
This process is repeated K times, until the complete sequence
3.3. Gaussian process disturbance model
We model the disturbance
The resulting data pair,
In this work, we train a separate GP for each dimension in
with zero mean and kernel function,
where
where
Unlike our previous unconstrained LB-NMPC algorithm (Ostafew et al., 2015a), which used only the predicted mean of a disturbance, here we make use of both the predicted mean and variance. As previously mentioned (Section 3.2), the variance of the disturbance is used to predict state sequences with mean and uncertainty given a sequence of control inputs. Then, as detailed in Section 3.1, the predicted uncertainty is used to compute robust constraints in order to provide guarantees on constraint satisfaction. Finally, we include further detail on the storage and retrieval of observations for online operation in Section 4.2.
3.4. Gaussian process hyperparameter selection
Solving for the optimal hyperparameters,
4. Implementation
4.1. Robot model
In this paper, robots are modeled as unicycle-type vehicles (Figure 6) with state

Definition of the robot pose variables,
When the time between control signal updates is defined as
which represents a simple kinematic model for our robot; it does not account for dynamics or environmental disturbances. Instead, we depend on the learned model to account for these disturbances. As presented in previous work (Ostafew et al., 2015a), we enable the learned model (2) to represent a system with first-order dynamics by adding historic states and inputs to the disturbance dependency
where
In order to build and query the learned model,
Since
Since obstacle detection in unstructured terrain is still an open problem, lateral and heading tracking error limits are selected prior to experiments and a feasible, desired path is manually defined during the VT&R teach mode considering the given limits. Input constraints are then defined considering the maximum allowed linear and angular speeds, and the curvature of the desired path
and
4.2. Managing experiences
In order to ensure the RC-LB-NMPC algorithm is executed in constant computation time, our implementation requires the ability to use a subset of the observed experiences when computing a disturbance. In this work, we employ a local model based on a single sliding local model. As experiences are gained, they are stored in bins,
5. Experimental testing
5.1. Overview
We tested the RC-LB-NMPC algorithm on a 50 kg Clearpath Husky and a 900 kg Clearpath Grizzly in three different experiments with many different surface materials and topographies. This resulted in over 5 km of path-tracking by the RC-LB-NMPC algorithm. The first and second experiments (Sections 5.3 and 5.4) compared unconstrained LB-NMPC, constrained LB-NMPC (C-LB-NMPC), and the proposed RC-LB-NMPC algorithm. The three variants solved the same optimization problem (8) with the following two exceptions. Firstly, state constraints (5) were disabled for the LB-NMPC algorithm. Secondly, the C-LB-NMPC algorithm included only non-robust state constraints (i.e.,

Testing culminated in the third experiment where the path, shown here, was defined around the University of Toronto Institute for Aerospace Studies (UTIAS) campus. The 900-m-long path passed in between trees and structures and over vegetation, pavement, inclines, and side-slopes. (Imagery: Google)
In all experiments, the controller described in Section 3 was implemented and run in addition to the VT&R software on a Lenovo W530 laptop with an Intel 2.6 Ghz Core i7 processor with 16 GB of RAM. The camera in all tests was a Point Grey Bumblebee XB3 stereo camera. The resulting real-time localization and control signals were generated at approximately 10 Hz. As previously mentioned, hyperparameter selection is currently an offline process, taking up to five minutes for 5,000 accumulated experiences. Since GPS was not available, improvements due to the RC-LB-NMPC algorithm were quantified by the localization provided by the VT&R algorithm. The VT&R algorithm is based on Visual Odometry and provides localization with errors less than 4 cm/m when compared against GPS ground-truth (Stenning et al., 2013).
5.2. Tuning parameters
The performance of the system was adjusted using the NMPC weighting matrices
5.3. Experiment 1: Indoor algorithm comparison
The first experiment compared three algorithms (unconstrained LB-NMPC, C-LB-NMPC, and RC-LB-NMPC) over three trials using a 50 kg Clearpath Husky robot on an indoor, flat, concrete surface (Figure 9). As previously mentioned, the algorithms solve the same optimization problem (7) at every time-step, considering identical tuning parameters with the exception of constraints. Since the concrete did not develop tire ruts and the lighting did not change over the course of the experiment, these results provide a clear comparison of the algorithms.
We show the maximum tracking errors and travel times vs. trial in Figure 8. The unconstrained LB-NMPC algorithm results in the largest tracking errors in the first trial, decreasing in later trials through learning. By adding constraints, the C-LB-NMPC algorithm reduced the errors during the first trial, but was overconfident and failed to satisfy the lateral constraints. On the other hand, the RC-LB-NMPC algorithm resulted in constraint satisfaction throughout all trials but incurred increased travel time relative to the other algorithms when the model was uncertain (i.e., during the first trial). Figure 10 shows tracking errors and control inputs vs. distance along the path for all trials. The LB-NMPC and C-LB-NMPC algorithms showed lateral constraint violations at 7, 16, and 24 m along the path (i.e., the turns) in the first trial. The RC-LB-NMPC algorithm avoided such constraint violations in the first trial partly by lowering the speed to approximately 0.5 m/s throughout the path. This speed represents roughly half of the maximum speed possible by the Husky robot. The ability of the RC-LB-NMPC algorithm to optimally compute the linear and angular speeds of the robot in real time while robustly meeting constraints represents one of the advantages of this algorithm. In practice, control algorithms must be capable of reacting robustly to given state constraints in tandem with scheduled speeds. This experiment also shows how the RC-LB-NMPC algorithm produces optimal results similar to the unconstrained algorithm after the learned model has collected experience. In this way, the RC-LB-NMPC algorithm behaves conservatively when the learned model is uncertain, and optimally considering the given cost function when the model uncertainty has been reduced through learning.

The maximum tracking errors and travel times vs. trial for the three algorithms in experiment 1. The RC-LB-NMPC algorithm results in constraint satisfaction at the cost of increased travel time (i.e., slower speed) during the first two trials.

(Top) The desired path and (bottom) the 50 kg Clearpath Husky at the start of the experiment 1 path. The smooth concrete floor and constant lighting providing identical path conditions for all trials.

Tracking errors and commanded inputs vs. distance during the first, second, and third trials of experiment 1. The RC-LB-NMPC algorithm automatically reduces speed in order to meet lateral and heading constraints during the first trial. As model uncertainty is reduced through learning, the RC-LB-NMPC algorithm naturally increases speed and thus performance.
5.4. Experiment 2: Off-road algorithm comparison
The second experiment once again compared the three algorithms (LB-NMPC, C-LB-NMPC, and RC-LB-NMPC). However, in this experiment we tested with a 900 kg Clearpath Grizzly robot on a more challenging and realistic 900-m-long, off-road path (Figure 12) in the University of Toronto Institute for Aerospace Studies (UTIAS) MarsDome. In this experiment, the Grizzly was limited to 2.0 m/s considering path roughness. Since the path was on loose material and the Grizzly is quite heavy, significant ruts developed over the course of the experiment, resulting in evolving disturbances from trial to trial. In this situation, we rely on the learned model uncertainty to capture the effect of the evolving disturbances, and only the RC-LB-NMPC algorithm uses the learned uncertainty to guarantee constraint satisfaction. We show the maximum tracking errors and travel times vs. trial in Figure 11. Without state constraints the LB-NMPC algorithm results in the fastest travel time and also the highest lateral errors of the first trial. During later trials, the errors are reduced through learning, with no notable changes in travel time. By adding constraints, the C-LB-NMPC algorithm reduced the errors during the first trial but still failed to satisfy the lateral constraints. As with experiment 1, the RC-LB-NMPC algorithm resulted in constraint satisfaction throughout all trials and decreased travel time through learning. This confirmed our goal of providing guaranteed constraint satisfaction while improving performance through learning. Figure 13 shows tracking errors and control inputs vs. distance along the path for trials 1, 2, and 10. Without constraints, the LB-NMPC algorithm resulted in the highest speeds and errors. The constraints led the C-LB-NMPC algorithm to reduce speeds in general but the algorithm was overconfident and exceeded limits during each trial. Specifically, the turn at 65m is an example where significant ruts developed, causing wheel slip. Since the RC-LB-NMPC algorithm robustly incorporated learned model uncertainty, the robot successfully passed through this section during each trial. Figure 14 shows measured and predicted disturbances used by the RC-LB-NMPC algorithm. The most complex disturbances affected the angular state of the robot,

The maximum tracking errors and travel times vs. trial from experiment 2. Of the three algorithms, RC-LB-NMPC is the only one to respect constraints in all trials at the cost of higher travel time.

(Top) Lateral constraints outlining the experiment 2 path, along with the RC-LB-NMPC actual path and speed (trial 10). (Bottom) The 900 kg Clearpath Grizzly at the path start.

Tracking errors and commanded inputs vs. distance during the first, second, and tenth trials of experiment 2. The most challenging turn occurred at 60m along the path, where significant ruts developed in the loose gravel.

Modeled disturbances,
5.5. Experiment 3: Field test
Finally, the third experiment thoroughly tested the RC-LB-NMPC algorithm on a 900-m-long, off-road path repeated five times (Figures 7 and 15). The third path covered a wide range of surfaces, including grass, dirt, pavement, side slopes, and inclines while passing through trees, solid structures, and dense foliage. Once again, we tested with a 900 kg Clearpath Grizzly robot limited to 2.0 m/s considering the path roughness. We show that over the course of the experiment, the RC-LB-NMPC algorithm met constraints and improved performance over sequential trials (Figure 16). Moreover, the experiment shows that the algorithm as formulated was able to consistently deliver control input updates at 10 Hz despite the relatively large dataset gathered over the course of the five trials. Specifically, the learned model had accumulated over 25,000 experiences by the fifth trial, relying on the experience management scheme to extract up to 180 relevant experiences at any given time-step. Figure 17 shows tracking errors and control inputs vs. distance along the path for trials one, two, and five. The first trial is representative of a conservative, non-learning RC-NMPC algorithm: without experience, the predicted disturbance is zero at every time-step but with high uncertainty, capturing the wide range of possible disturbances. After just one trial, the learned model has collected enough experience to significantly reduce uncertainty and increase performance. Modeling disturbances as a GP enables efficient interpolation and extrapolation from data collected during previous trials. Finally, Figure 18 shows tracking error and velocity distributions from trials one and five. The results show that over the course of the experiment, the lateral error distribution widens but continues to fit within the limits. In practice, we do not expect the tracking errors to go to zero, since the RC-LB-NMPC algorithm is an optimal control algorithm, balancing tracking errors and control inputs subject to the constraints.

(Center) The experiment 3 desired path roughly overlaid on the UTIAS campus. (Left/Right) images showing path sections with constraint-satisfying terrain (green) and unsafe obstacles (red) highlighted. Without an obstacle detection algorithm, the desired path is manually taught with the required clearance and safe/unsafe areas shown are manually highlighted for illustration purposes. (Satellite Imagery: Google)

The maximum tracking errors and travel times vs. trial for experiment 3. Tracking errors met the given constraints while learning reduced the travel time by almost 50%.

Tracking errors and commanded inputs vs. distance during the first, second, and fifth trials of experiment 3. Results in the first trial mimic that of a non-learning RC-NMPC algorithm with high model uncertainty representing all possible disturbances and thus conservative inputs.

Error and velocity distributions for trials one and five of experiment 3. Learning resulted in a relatively wider distribution of lateral errors as the algorithm became more confident. In practice, the algorithm is seeking to balance tracking errors and control inputs subject to the given contraints. As a result, we do not expect that tracking errors will be driven to zero.
6. Discussion
This work combines concepts of RC-MPC and machine learning in a flexible way. Through extensive experimental results, we have shown practical operation considering a relatively large robot, challenging terrain, and paths requiring tight state constraints. With respect to future work, our modular approach provides the opportunity to investigate and improve both the underlying GP disturbance model and the RC-NMPC algorithm.
With respect to the learned model, Jordan and Mitchell (2015) present a survey of recent trends and prospects for machine learning. Specifically, they highlight the opportunity and need for further research into team-based learning. The goal is to identify effective methods of transferring data between robots such that any improvements found by one robot can be shared by all robots in a team. As opposed to our experiments demonstrating a single robot tracking a single path, one could imagine a large network of paths with many robots improving performance collectively and safely. Also, one of our key requirements for real-time operation is the ability to rapidly identify relevant experiences for our local disturbance model. In our work, experiences are selected based on the nearest path vertex and recent commanded velocity and old experiences are discarded in order to maintain a manageable database. However, further work could involve evaluating the sensitivity of the learned model to this selection process and proposing improvements. For example, we do not currently address the situation where disturbances change predictably over time, such as the arrival of snow or puddles. As such, our algorithm will likely be overconfident driving a snowy path for the first time if the algorithm had previously learned the path disturbances during the summer. Finally, convergence of the learned model is also an open question. In practice, convergence rates are complicated by the evolution of the environment due to the robot’s activity and daily variations. For example, repeating the same path caused ruts to form, which resulted in a change in the disturbances affecting the nominal process model. In such conditions, it is unclear when the algorithm will converge. In general, it is expected that improvements to the disturbance model will result in the largest improvements to overall algorithm performance and reductions in computational complexity.
With respect to the RC-NMPC algorithm, future work includes further integration of the state constraints with a real-time obstacle detection algorithm and investigations into the conservativeness of the algorithm. As previously mentioned, one key benefit of our algorithm is that disturbances and robust constraints are computed in real-time. As such, future work on incorporating new or dynamic obstacles in the optimization problem would be of use for robots exploring new environments or operating in dynamic environments. Secondly, conservativeness of robust algorithms is an ongoing topic for MPC research in general. As can be seen in Figures 10, 13, and 17, the tracking errors remain relatively far from the actual constraints. Reductions in conservativeness will allow for increases in the performance of practical robots.
7. Conclusion
In summary, this paper presents a Robust Constrained Learning-based Nonlinear Model Predictive Control (RC-LB-NMPC) algorithm for a path-repeating, mobile robot operating in challenging off-road terrain. The goal is to guarantee constraint satisfaction while increasing performance through learning. In previous work, we demonstrated unconstrained LB-NMPC, where tracking errors were reduced using real-world experience instead of pre-programing accurate analytical models (Ostafew et al., 2015a). This work represents a major extension where we use the learned model uncertainty to compute and apply robust state and input constraints.
The RC-LB-NMPC algorithm uses a fixed, simple robot model and a learned, nonparametric disturbance model. Disturbances represent measured discrepancies between the a priori model and observed system behavior. We use a Sigma-Point Transform to efficiently compute the mean and variance of predicted state sequences given the two-component, learned, stochastic model. Finally, we apply restricted constraints to the mean predicted sequence such that the 3
Footnotes
Acknowledgements
This research was funded by the Ontario Ministry of Research and Innovation’s Early Researcher Award Program, by the Natural Sciences and Engineering Research Council of Canada (NSERC) through the NSERC Canadian Field Robotics Network (NCFRN), and by Clearpath Robotics.
