The need of automatically tuning robot controllers is real. Huge human engineering effort is put nowadays in research and industry to design sophisticated feedback control techniques that allow the robot to complete a task under some performance requirements. For example, making a quadruped robot walk with a very specific gait  or balancing an inverted pole with a robot arm [ ].
On a practical level, the control design problem ultimately involves the user’s choice on a set of parameters related directly or indirectly with the robot controller. This choice is usually critical for the robot to meet the performance requirements, and therefore, it has to be wise.
Still, it is usually the case that the user must choose (or manually tune) multiple parameters. In those situations, wise choices blur away as fast as the number of dimensions grows, leaving us only with doubtful intuition.
Manual tuning was replaced time ago by reinforcement learning (RL), a framework that allows the robot to learn (sometimes from scratch) a specific task by collecting rewards under experiment repetitions , . While RL is a promising framework for controller tuning, it often requires performing many robot experiments to find suitable controllers, which is undesired, as the robot can deteriorate, or wear off. Therefore, a lot of research effort has been invested into data efficiency of RL aiming at learning controllers from as few experiments as possible.
Recently, Bayesian optimization (BO) has been proposed for RL as a promising approach in this direction. BO employs a probabilistic description (typically a Gaussian process (GP)) of the performance function, which allows for selecting the parameters for the next iteration in a principled manner, e.g., by maximizing the expected improvement upon the best controller observed so far , or by maximizing information gain [ ].
This research project focuses on studying how BO can help to learn feedback controllers from fewer iterations. In [ ], a robotic arm learns from scratch the parameters of a linear quadratic regulator (LQR) to balance an inverted pole, when only a poor linearized model is available. As Bayesian optimizer, we use entropy search (ES) [ ], and demonstrate for the first time its effectiveness on a real setting.
In [ ], we approach the tuning problem by including the system simulator (based on a good, but imperfect model) as an additional data source to the real world experiments. We model the trust we put in our simulator versus the trust we put on the real system. While real experiments are more trustworthy, they also come at a higher cost than simulations (e.g. longer waiting time until the experiment is done). An intelligent agent based on ES decides at each iteration which source is preferable in order to save cost, but to learn faster, at the same time.
While [ ] and [ ] use standard GPs to model the performance function, we propose in [ ] a GP specifically designed lo learn LQR controllers from data. The structural information about the LQR stochastic optimization problem is exploited to construct a set of customized kernels, that we denominate LQR kernels. With them, we can learn optimal controllers from data more accurately than with standard choices (e.g., squared exponential).
 M. Neunert, F. Farshidian, A. Winkler, and J. Buchli, “Trajectory optimization through contacts and automatic gait discovery for quadrupeds”, in IEEE Robotics and Automation Letters, vol. 2, no. 3, pp. 1502-1509, 2017.
 R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. MIT press, 1998.
 J. Kober, J. A. Bagnell, and J. Peters, “Reinforcement learning in robotics: A survey,” The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238–1274, 2013.
 D. R. Jones, M. Schonlau, and W. J. Welch, "Efficient global optimization of expensive black-box functions", in Journal of Global optimization, vol. 13, no. 4, pp. 455-492, 1998.