Abstract
Robust model predictive control (RMPC) is an effective technology for controlling uncertain systems while robustly handling constraints, and its closed-loop performance heavily relies on the selection of objective functions. However, the objective functions are typically chosen to be close to the real control objectives, despite an objective function that leads to less conservative constraints often provides better closed-loop performance. In this paper, we propose an automatic tuning framework for RMPC in iterative tasks. In particular, we parameterize RMPC and develop a Bayesian optimization (BO) method to tune it by solving a black-box optimization problem. We then introduce an efficient transfer learning framework within BO, which speeds up the searching process and enhances the controller performance. The effectiveness of the proposed tuning framework is illustrated on numerical examples.
Introduction
Model predictive control (MPC) has attained remarkable success in recent decades, because of its disturbance rejection (Draeger et al., 1995) and constraint handling (Morari and Lee, 1999) capabilities. It has been widely applied in various fields (Darby and Nikolaou, 2012), such as robotics, process control and reinforcement learning (Chua et al., 2018; Kordabad et al., 2021; Pfrommer et al., 2022). However, the performance of the MPC controllers can be degraded by a series of factors, including an uncertain system model (Piga et al., 2019), a limited terminal set (Rosolia and Borrelli, 2017, 2018), or an inappropriate objective function (Marco et al., 2016). To resolve or compensate for these issues, most classic MPC controllers are designed manually and offline, which remains unchanged mainly during control. With the increasing computational power and the recent progress in machine learning, learning-based MPC methods, which exploit the data collected during operation to automatically design the MPC controllers online, have received more research interest.
Performing iterative tasks in control systems has been extensively studied in the literature (Rosolia et al., 2022; Wabersich and Zeilinger, 2022; Zhang et al., 2021), where one task execution is usually called an “iteration.” Iterative learning control (ILC) is an effective strategy for iterative tasks to achieve high closed-loop performance by learning from previous iterations. At each iteration, the system starts from the same initial condition and the goal is to track a given reference trajectory. Recently, a novel ILC controller based on MPC called iterative learning model predictive control (LMPC) (Rosolia and Borrelli, 2017, 2018) has been proposed without needing to know a given reference. The LMPC policy recursively constructs a terminal safe set from historical trajectories while updating a terminal cost function. Moreover, LMPC guarantees recursive feasibility and monotonic performance improvements at each iteration for a linear time-invariant (LTI) system (Rosolia and Borrelli, 2017). LMPC has been successfully applied in an autonomous racing example (Rosolia et al., 2017a). A detailed introduction to LMPC can be found in Rosolia and Borrelli (2018).
In the case of systems with unknown uncertainties, deterministic MPC schemes lose all the guarantees for constraint satisfaction. In contrast, robust model predictive control (RMPC) (Chisci et al., 2001; Mayne et al., 2005) is a powerful tool for handling uncertainties. A review of RMPC can be found in Saltik et al. (2018). Correspondingly, Ugo Rosolia proposed robust learning model predictive control (RLMPC) (Rosolia et al., 2017b, 2022) which robustly satisfies system constraints while iteratively improving the control performance. However, increasing the robustness of the LMPC controllers usually results in control performance degradation. Because the performance degradation in RLMPC is induced by the conservation of constraints, an efficient remedy is to revisit the hyperparameter tuning process (Garriga and Soroush, 2010), which has been proven effective in maximizing the closed-loop performance on a true system (Piga et al., 2019).
Bayesian optimization (BO) has recently been growing in popularity as a self-tuning strategy for controllers. BO is a powerful technique for optimizing complicated black-box functions (Brochu et al., 2010) and has been widely used for hyperparameter tuning in machine learning algorithms (Snoek et al., 2015). Compared with other derivative-free optimization algorithms, such as particle swarm optimization and genetic algorithms, BO is considerably faster and more data-efficient (Lu et al., 2020). BO has been adopted in a few MPC tuning problems (Guzman et al., 2022; Lu et al., 2020; Piga et al., 2019; Sorourifar et al., 2021; Stróżecki et al., 2021) to attain higher overall closed-loop performance. A review of BO can be found in Shahriari et al. (2016). However, to the best of our knowledge, there is still no research on using BO for tuning the parameters in LMPC while robustly satisfying constraints.
In this work, we propose a data-driven controller tuning framework that first considers robust constraint satisfaction when combining BO and MPC in iterative tasks. Given a LTI system with unknown uncertainties, our framework maximizes the integrated performance of RLMPC using BO while robustly satisfying all the constraints. Moreover, we propose an efficient scheme to better apply BO in iterative tasks. Considering the similarity of the datasets among the different iterations in LMPC, we introduce a transfer learning framework, which effectively extracts the information from previous iterations and speeds up the searching process.
The remainder of this paper is organized as follows. In the “Problem statement” section, we present the primary form of the robust optimal control problem and discuss the necessity of using BO. Next, we provide the fundamental background of RLMPC and BO in the “Technical background” section. Then, we introduce our method in the “RLMPC tuning using BO” section, including proposing a parameterized RLMPC problem, combining BO with RLMPC and introducing a transfer learning framework for RLMPC. An illustrative example and an ablation study are provided in the “Numerical example” section. Finally, the “Conclusion” section concludes this paper.
Problem statement
Given an initial state
where
We consider that the state and input are subject to the constraints
where
We aim to optimize the performance of the following infinite horizon optimal control problem under unknown uncertainties
where
Nonetheless, the optimal control problem (equation (3)) is usually hard to solve directly due to the infinite control horizon and the unknown uncertainty. In this paper, we use RLMPC to solve the problem in a finite-horizon fashion and improve its performance using BO.
Technical background
This section presents the technical background that provides the basis of RLMPC and BO.
RLMPC
Control policy approximation
A common approach to solve problems with bounded uncertainties is to decouple the dynamics (equation (1)) into a nominal state
where
where
Since the real cost in equation (3) is difficult to optimize straightforwardly, we consider a nominal cost of the form
As a result, given the feedback policy (equation (4)), the nominal dynamics (equation (5)) and the error dynamics (equation (6)), the robust approximation of equation (3) can be formulated as
where
Subsequently, to turn problem (equation (7)) into a finite-horizon optimal problem, we introduce the notions of a convex sampled safe set and a terminal cost in RLMPC. We mainly borrow these following notions from Rosolia et al. (2017b). A more detailed introduction to LMPC theory can be referred to Rosolia and Borrelli (2017b).
Convex safe set
Consider an iterative task aiming to optimize the problem (equation (7)). At the
Then, we can introduce the definition of the sampled safe set
where
Considering that a convex terminal set will significantly reduce the computational burden, we then try to construct a convex safe set, which is the convex hull of
Terminal cost
The definition of the terminal cost is given based on the convex safe set
Then, we introduce the Barycentric function (Jones and Morari, 2010) as the terminal cost
where
Accordingly,
Finally, integrating the notions above, we present a RLMPC problem that solves the following finite-horizon optimal control problem
While problem (equation (14)) is an alternative form for solving the problem (equation (3)), an inappropriate selection of the objective function in equation (7) might give rise to suboptimal performance. Therefore, we propose a BO-based tuning method for performance improvement in the “RLMPC tuning using BO” section.
BO
The principal idea of BO is to construct a surrogate model of the black-box objective function with a set of data. The fitted surrogate model provides a posterior distribution of the objective function, according to which an acquisition function is applied to determine where to sample next. Then, we add the newly sampled data to the dataset and update the surrogate model. This process is only terminated once the surrogate model converges to the optimum, or the maximum number of BO trade-off is reached.
Gaussian process
The Gaussian process (GP) (Williams and Rasmussen, 2006) model is the most commonly used surrogate model in BO. Given a mean function
where
where
Acquisition function
Based on the posterior functions (equations (15) and (16)), an acquisition function is then employed to search for the next sampled point. The acquisition function leads the sampled points to the optimum solution by exploration and exploitation. The acquisition functions can be classified as either using improvement-based criteria, such as probability of improvement (PI) (Kushner, 1964) and expected improvement (EI) (Mockus, 1998), or using confidence-based criteria, such as Gaussian process upper confidence bound (GP-UCB) (Srinivas et al., 2012). We choose GP-UCB as the acquisition function used in our work. At the
where
RLMPC tuning using BO
As we mentioned in the “Control policy approximation” section, the optimal control problem (equation (3)) is reformulated into a RLMPC form by decoupling the dynamics and introducing an approximated nominal cost. However, performance degradation still exists due to the inappropriate objective. In this section, we propose a parameter tuning framework that automatically adjusts the quadratic objective function and the conservatism of the constraints to achieve performance improvement.
Parameterized RLMPC
In regard to the inappropriate objective, we choose the weight matrices of the cost function
Since
Meanwhile, the terminal cost
The feedback gain
where
Correspondingly, the error set (equation (8)) is parameterized as
where
We reformulate the problem (equation (14)) after parameterization as
Thus, at the
Generic BO for RLMPC
In this section, we introduce a fundamental framework that combines a generic BO with RLMPC for parameter tuning. The transfer learning method is introduced in the next section. A universal schematic that explains how BO works for RLMPC is shown in Figure 1.

Illustration of how the BO method can be combined with RLMPC tuning. During each iteration, RLMPC receives the recommended optimal parameters from BO and computes the closed-loop performance repeatedly. At the end of the iteration, RLMPC chooses the optimal parameters from the current iteration and updates its sampled safe set.
Considering the
Finally, we choose the point with the minimum response value as the optimal point
Efficient BO
A major drawback when using generic BO is that the sampled data from previous iterations are discarded, which results in a loss of data efficiency. In this section, we introduce efficient BO, a transfer learning method for BO that further improves the data efficiency and controller performance.
In the RLMPC problem (equation (14)), the only difference among different iterations is the scale of convex sampled safe sets. Intuitively, the datasets across the various iterations may have certain similar features, so that transfer learning can be utilized to obtain more information from the former iterations for the generic BO.
A simple and direct way to implement transfer learning is to reserve the historical data from previous iterations and use them to construct the surrogate model for current iteration. However, due to the variation of terminal sets, the same data point over different iterations may experience a massive discrepancy in its responsive value, which brings about adverse effects in transfer learning.
Nevertheless, different iterations may produce similar-looking functions of response values. We display a toy example in Figure 2 to verify and show that the overall shape of the response surface varies minimally between different iterations as the response surface obtains a similar level of improvement while updating the sampled safe set. Accordingly, it is suitable to introduce an efficient transfer learning framework proposed by Yogatama and Mann (2014), which fits a surrogate GP model using the normalized response values instead of the original ones. Specifically, at the
where
Therefore, efficient BO constructs the initial surrogate model using the data points with normalized response values from previous iterations instead of randomly sampling points like generic BO. Compared with generic BO, the initial surrogate model in efficient BO contains more useful data and is closer to the actual black-box function. As a result, efficient BO is more conducive to finding the optimal

A toy example of response surfaces and the variation in the overall response value versus different iterations. (a) and (b) The response surfaces of iterations 1 and 2. (c) The variation of response value between these two iterations, which is the response value of iteration 2 minus the response value of iteration 1. It shows that the change of response value in most areas is close, demonstrating that the response surface is approximately in the translation and thus keeps a similar looking between two iterations.
As to specific settings of efficient BO, we set an upper bound on the size of the training set in case too much data slows down the GP fitting process and the earliest data points degrade the overall performance. When choosing an optimal point, we only select from the data points sampled during the current iteration.
Numerical example
In this section, we apply the proposed efficient BO for RLMPC to solve the following optimal control problem
We introduce two examples to illustrate the effectiveness of efficient BO. All of the experiments were carried out using Python 3.9.13. The optimization problems in the RLMPC are solved with the Ipopt solver, which is wrapped in package CasADi 3.5.5. Considering the randomness in BO, we run each algorithm six times independently with different random seeds.
L2 regulator
First, we consider an example of L2 regulator, where the linear system is given as
The constraints of state and control input are given by
We implement RLMPC with a control horizon
For efficient BO, we set the number of trials
Each tuned parameter
The average of the current best performance for RLMPC and the efficient BO is shown in Table 1. The bold number in their respective columns denotes the iteration at which they reach the minimum. It can be seen that joined with the proposed efficient BO method, the performance of RLMPC receives a significant improvement from
Comparison on the average of the current best performance between RLMPC and efficient BO.
The bold number in their respective columns denotes the iteration at which they reach the minimum.
Continuous stirred tank reactor
The other example we consider is the linearized continuous stirred tank reactor (CSTR) with an exothermic irreversible first-order reaction
The linearized CSTR system is given based on dos Reis de Souza et al. (2022), where
The state and control input are bound as
In this example, the control horizon
The number of trials
We list the current best performance of RLMPC and the efficient BO in Table 2, and we have bolded the minimum values for the respective columns. It shows that efficient BO improves the performance of RLMPC from 955.62 to 759.06.
Comparison on the average of the current best performance between RLMPC and efficient BO.
The bold number in their respective columns denotes the iteration at which they reach the minimum.
As we demonstrate in the “Parameterized RLMPC” section, under different selections of
Effectiveness analysis of efficient BO
Figure 3 compares the state trajectory of RLMPC with that of efficient BO. Figure 3(a) and (b) shows the trajectories of L2 regulator and CSTR, respectively. The controller adjusted by efficient BO clearly deviates and finds a shorter path leading to the origin, resulting in the decrease in cost.

Comparison of the state trajectory between RLMPC (yellow lines) and efficient BO (purple dashed lines). The yellow and purple solid points indicate the nominal states of RLMPC and the efficient BO, respectively: (a) L2 regulator and (b) CSTR.
Furthermore, we plot the state and control input tubes to illustrate how efficient BO can outperform RLMPC, as tubes intuitively reflect the conservatism of the constraints. The shapes of the state tubes are shown in Figure 4, where Figure 4(a) denotes the state tubes of L2 regulator and Figure 4(b) denotes the state tubes of CSTR. The input tubes of L2 regulator and CSTR are shown separately in Figures 5 and 6. Figure 4 shows that efficient BO enlarges the state tubes, which results in more conservative state constraints. In contrast, Figures 5 and 6 indicate that the size of the input tubes of efficient BO is clearly smaller than those of RLMPC (we only compare the first five input tubes), providing the controller with more input selections. The tubes as well as the nominal state and control input trajectories in Figures 4–6 plot the range of possible real state and control input under disturbances. Therefore, to make the real state and control input satisfy the constraints, the larger the tubes, the smaller the selectable range of the nominal state and control input. For example, in Figure 6, the tube of the first control input of efficient BO is obviously smaller than that of RLMPC so that efficient BO can choose a larger nominal control input, and the same is true for the second to fifth nominal control inputs. Because of the ability to choose larger actions, efficient BO can obtain a shorter nominal state trajectory as shown in Figure 3. Meanwhile, although the state tubes of efficient BO are enlarged and efficient BO can only select states closer to the origin, the shortened nominal state trajectories can still robustly guarantee the satisfaction of all constraints. In these two examples, efficient BO tightens the state constraints while loosening the input constraints. Although the larger input partially increases the cost, the corresponding shorter state trajectory reduces the cost by far more than the increased part. As a result, the overall cost is significantly reduced, which explains the results in Tables 1 and 2.

Comparison of the state tubes between RLMPC (yellow polytopes) and the efficient BO (purple polytopes). We only plot the first three state tubes of CSTR for clear.

Comparison of the control input tubes between RLMPC (yellow error bar) and efficient BO (purple error bar) in L2 regulator. The length of the error bars indicates the size of the input tubes. The yellow and purple solid points indicate the nominal control inputs of RLMPC and efficient BO, respectively. The gray lines connecting these two subgraphs are used to clearly compare the length of the tubes.

Comparison of the control input tubes between RLMPC (yellow error bar) and efficient BO (purple error bar) in CSTR. The meaning of each line is the same as that in Figure 5.
Comparison between the efficient BO and the other baselines
We compare the proposed efficient BO with several baselines, which can be regarded as an ablation study. The baselines are as follows:
Generic BO: A combination of RLMPC and BO without transfer learning, proposed in the “Generic BO for RLMPC” section.
Unnormalized efficient BO: A transfer learning method that constructs the GP surrogate model without using normalized response values as the efficient BO.
Unlimited efficient BO: A transfer learning method that constructs the GP surrogate model without an upper bound on the size of the training set like in the efficient BO.
Figure 7 plots the average of the current best performance with the standard error of the efficient BO and the three baselines. The results of L2 regulator are shown in Figure 7(a). The efficient BO searches faster after the fifteenth iteration with less variance. The efficient BO provides an average performance improvement of 7.26% from 785.44 to 781.50 compared with the generic BO. Meanwhile, the efficient BO performs 4.49% and 5.49% better than the unnormalized efficient BO from 784.00 to 781.50, and the unlimited efficient BO from 784.53 to 781.50, on average.

Average of the current lowest cost of the efficient BO, generic BO, unnormalized efficient BO and unlimited efficient BO. The shaped region represents ±1 standard deviation: (a) L2 regulator and (b) CSTR.
Similarly, the results of CSTR in Figure 7(b) show that efficient BO averagely outperforms 6.63% than generic BO from 771.29 to 759.06. In addition, efficient BO achieves an improvement of 3.65% compared with the performance of unnormalized efficient BO 765.99. The improvement of efficient BO with respect to unlimited efficient BO is 10.54%, from 777.82 to 759.06.
As we have discussed in the “Effectiveness analysis of efficient BO” section, the lower the cost, the better
Conclusion
In this paper, we propose a framework that automatically tunes the parameters in RMPC to attain performance improvement for iterative tasks. In particular, we formulate the parameter tuning as a black-box optimization problem, where we use BO to find the optimal objective function and the optimal shape of RLMPC tubes. Moreover, regarding the connections among different iterations, we introduce a transfer learning framework for BO, the efficient BO, which has a higher searching speed and enhanced performance. We demonstrate the effectiveness of the proposed framework using two examples. Meanwhile, we compare the shape of tubes to more intuitively demonstrate how BO works on parameter tuning. Finally, we conduct an ablation study to validate the advantages of efficient BO.
The application of existing methods to high-dimensional systems is still limited since the computational burden of GP increases rapidly with the dimension. Moreover, high-dimensional GP needs far more data to construct an accurate surrogate model, which is usually challenging to achieve in practical settings. Future research will consider making use of improved GP techniques, for example, dynamic sparse GPs (Hewing et al., 2020), which relieve the above issues and better apply efficient BO to various domains.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Key Research & Development Program of China (No. 2021YFC1809003).
