Publications | Probabilistic Numerics - Max Planck Institute for Intelligent Systems

86 results (View BibTeX file of all listed publications)

2021

Robot Learning with Crash Constraints

Marco, A., Baumann, D., Khadiv, M., Hennig, P., Righetti, L., Trimpe, S.

IEEE Robotics and Automation Letters, 6(2):1439-1446, IEEE, February 2021 (article)

Abstract

In the past decade, numerous machine learning algorithms have been shown to successfully learn optimal policies to control real robotic systems. However, it is common to encounter failing behaviors as the learning loop progresses. Specifically, in robot applications where failing is undesired but not catastrophic, many algorithms struggle with leveraging data obtained from failures. This is usually caused by (i) the failed experiment ending prematurely, or (ii) the acquired data being scarce or corrupted. Both complicate the design of proper reward functions to penalize failures. In this paper, we propose a framework that addresses those issues. We consider failing behaviors as those that violate a constraint and address the problem of learning with crash constraints, where no data is obtained upon constraint violation. The no-data case is addressed by a novel GP model (GPCR) for the constraint that combines discrete events (failure/success) with continuous observations (only obtained upon success). We demonstrate the effectiveness of our framework on simulated benchmarks and on a real jumping quadruped, where the constraint threshold is unknown a priori. Experimental data is collected, by means of constrained Bayesian optimization, directly on the real robot. Our results outperform manual tuning and GPCR proves useful on estimating the constraint threshold.

link (url) DOI [BibTex]

2021

Marco, A., Baumann, D., Khadiv, M., Hennig, P., Righetti, L., Trimpe, S. Robot Learning with Crash Constraints IEEE Robotics and Automation Letters, 6(2):1439-1446, IEEE, February 2021 (article)

link (url) DOI [BibTex]

2020

Three-dimensional models of core-collapse supernovae from low-mass progenitors with implications for Crab

Stockinger, G., Janka, H., Kresse, D., Melson, T., Ertl, T., Gabler, M., Gessner, A., Wongwathanarat, A., Tolstov, A., Leung, S., Nomoto, K., Heger, A.

Monthly Notices of the Royal Astronomical Society , 496(2):2039-2084, August 2020 (article)

DOI [BibTex]

2020

Stockinger, G., Janka, H., Kresse, D., Melson, T., Ertl, T., Gabler, M., Gessner, A., Wongwathanarat, A., Tolstov, A., Leung, S., Nomoto, K., Heger, A. Three-dimensional models of core-collapse supernovae from low-mass progenitors with implications for Crab Monthly Notices of the Royal Astronomical Society , 496(2):2039-2084, August 2020 (article)

DOI [BibTex]

Convergence rates of Gaussian ODE filters

Kersting, H., Sullivan, T. J., Hennig, P.

Statistics and Computing, 30(6):1791-1816, 2020 (article)

DOI [BibTex]

Kersting, H., Sullivan, T. J., Hennig, P. Convergence rates of Gaussian ODE filters Statistics and Computing, 30(6):1791-1816, 2020 (article)

DOI [BibTex]

Analytical probabilistic modeling of dose-volume histograms

Wahl, N., Hennig, P., Wieser, H., Bangert, M.

Medical Physics, 47(10):5260-5273, 2020 (article)

DOI [BibTex]

Wahl, N., Hennig, P., Wieser, H., Bangert, M. Analytical probabilistic modeling of dose-volume histograms Medical Physics, 47(10):5260-5273, 2020 (article)

DOI [BibTex]

2019

Limitations of the empirical Fisher approximation for natural gradient descent

Kunstner, F., Hennig, P., Balles, L.

Advances in Neural Information Processing Systems 32 (NeurIPS 2019), pages: 4158-4169, (Editors: H. Wallach and H. Larochelle and A. Beygelzimer and F. d’Alché-Buc and E. Fox and R. Garnett), Curran Associates, Inc., 33rd Annual Conference on Neural Information Processing Systems, December 2019 (conference)

link (url) [BibTex]

2019

Kunstner, F., Hennig, P., Balles, L. Limitations of the empirical Fisher approximation for natural gradient descent Advances in Neural Information Processing Systems 32 (NeurIPS 2019), pages: 4158-4169, (Editors: H. Wallach and H. Larochelle and A. Beygelzimer and F. d’Alché-Buc and E. Fox and R. Garnett), Curran Associates, Inc., 33rd Annual Conference on Neural Information Processing Systems, December 2019 (conference)

link (url) [BibTex]

Convergence Guarantees for Adaptive Bayesian Quadrature Methods

Kanagawa, M., Hennig, P.

Advances in Neural Information Processing Systems 32 (NeurIPS 2019), pages: 6234-6245, (Editors: H. Wallach and H. Larochelle and A. Beygelzimer and F. d’Alché-Buc and E. Fox and R. Garnett), Curran Associates, Inc., 33rd Annual Conference on Neural Information Processing Systems, December 2019 (conference)

link (url) [BibTex]

Kanagawa, M., Hennig, P. Convergence Guarantees for Adaptive Bayesian Quadrature Methods Advances in Neural Information Processing Systems 32 (NeurIPS 2019), pages: 6234-6245, (Editors: H. Wallach and H. Larochelle and A. Beygelzimer and F. d’Alché-Buc and E. Fox and R. Garnett), Curran Associates, Inc., 33rd Annual Conference on Neural Information Processing Systems, December 2019 (conference)

link (url) [BibTex]

Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective

Tronarp, F., Kersting, H., Särkkä, S., Hennig, P.

Statistics and Computing, 29(6):1297-1315, 2019 (article)

Abstract

We formulate probabilistic numerical approximations to solutions of ordinary differential equations (ODEs) as problems in Gaussian process (GP) regression with non-linear measurement functions. This is achieved by defining the measurement sequence to consists of the observations of the difference between the derivative of the GP and the vector field evaluated at the GP---which are all identically zero at the solution of the ODE. When the GP has a state-space representation, the problem can be reduced to a Bayesian state estimation problem and all widely-used approximations to the Bayesian filtering and smoothing problems become applicable. Furthermore, all previous GP-based ODE solvers, which were formulated in terms of generating synthetic measurements of the vector field, come out as specific approximations. We derive novel solvers, both Gaussian and non-Gaussian, from the Bayesian state estimation problem posed in this paper and compare them with other probabilistic solvers in illustrative experiments.

link (url) DOI Project Page [BibTex]

Tronarp, F., Kersting, H., Särkkä, S., Hennig, P. Probabilistic Solutions To Ordinary Differential Equations As Non-Linear Bayesian Filtering: A New Perspective Statistics and Computing, 29(6):1297-1315, 2019 (article)

link (url) DOI Project Page [BibTex]

Active Multi-Information Source Bayesian Quadrature

Gessner, A. G. J. M. M.

Proceedings 35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), pages: 712-721, (Editors: Adams, RP; Gogate, V), UAI 2019, July 2019 (conference)

link (url) [BibTex]

Gessner, A. G. J. M. M. Active Multi-Information Source Bayesian Quadrature Proceedings 35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), pages: 712-721, (Editors: Adams, RP; Gogate, V), UAI 2019, July 2019 (conference)

link (url) [BibTex]

DeepOBS: A Deep Learning Optimizer Benchmark Suite

Schneider, F., Balles, L., Hennig, P.

7th International Conference on Learning Representations (ICLR), May 2019 (conference)

link (url) [BibTex]

Schneider, F., Balles, L., Hennig, P. DeepOBS: A Deep Learning Optimizer Benchmark Suite 7th International Conference on Learning Representations (ICLR), May 2019 (conference)

link (url) [BibTex]

Active Probabilistic Inference on Matrices for Pre-Conditioning in Stochastic Optimization

de Roos, F., Hennig, P.

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 89, pages: 1448-1457, (Editors: Kamalika Chaudhuri and Masashi Sugiyama), PMLR, April 2019 (conference)

Abstract

Pre-conditioning is a well-known concept that can significantly improve the convergence of optimization algorithms. For noise-free problems, where good pre-conditioners are not known a priori, iterative linear algebra methods offer one way to efficiently construct them. For the stochastic optimization problems that dominate contemporary machine learning, however, this approach is not readily available. We propose an iterative algorithm inspired by classic iterative linear solvers that uses a probabilistic model to actively infer a pre-conditioner in situations where Hessian-projections can only be constructed with strong Gaussian noise. The algorithm is empirically demonstrated to efficiently construct effective pre-conditioners for stochastic gradient descent and its variants. Experiments on problems of comparably low dimensionality show improved convergence. In very high-dimensional problems, such as those encountered in deep learning, the pre-conditioner effectively becomes an automatic learning-rate adaptation scheme, which we also empirically show to work well.

PDF link (url) [BibTex]

de Roos, F., Hennig, P. Active Probabilistic Inference on Matrices for Pre-Conditioning in Stochastic Optimization Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 89, pages: 1448-1457, (Editors: Kamalika Chaudhuri and Masashi Sugiyama), PMLR, April 2019 (conference)

PDF link (url) [BibTex]

Fast and Robust Shortest Paths on Manifolds Learned from Data

Arvanitidis, G., Hauberg, S., Hennig, P., Schober, M.

Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 89, pages: 1506-1515, (Editors: Kamalika Chaudhuri and Masashi Sugiyama), PMLR, April 2019 (conference)

PDF link (url) Project Page [BibTex]

Arvanitidis, G., Hauberg, S., Hennig, P., Schober, M. Fast and Robust Shortest Paths on Manifolds Learned from Data Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), 89, pages: 1506-1515, (Editors: Kamalika Chaudhuri and Masashi Sugiyama), PMLR, April 2019 (conference)

PDF link (url) Project Page [BibTex]

On the positivity and magnitudes of Bayesian quadrature weights

Karvonen, T., Kanagawa, M., Särkä, S.

Statistics and Computing, 29, pages: 1317-1333, 2019 (article)

DOI [BibTex]

Karvonen, T., Kanagawa, M., Särkä, S. On the positivity and magnitudes of Bayesian quadrature weights Statistics and Computing, 29, pages: 1317-1333, 2019 (article)

DOI [BibTex]

Probabilistic solutions to ordinary differential equations as nonlinear Bayesian filtering: a new perspective

Tronarp, F., Kersting, H., Särkkä, S. H. P.

Statistics and Computing, 29(6):1297-1315, 2019 (article)

DOI [BibTex]

Tronarp, F., Kersting, H., Särkkä, S. H. P. Probabilistic solutions to ordinary differential equations as nonlinear Bayesian filtering: a new perspective Statistics and Computing, 29(6):1297-1315, 2019 (article)

DOI [BibTex]

Dense connectomic reconstruction in layer 4 of the somatosensory cortex

Motta, A., Berning, M., Boergens, K. M., Staffler, B., Beining, M., Loomba, S., Hennig, P., Wissler, H., Helmstaedter, M.

Science, 366(6469):eaay3134, 2019 (article)

DOI Project Page [BibTex]

Motta, A., Berning, M., Boergens, K. M., Staffler, B., Beining, M., Loomba, S., Hennig, P., Wissler, H., Helmstaedter, M. Dense connectomic reconstruction in layer 4 of the somatosensory cortex Science, 366(6469):eaay3134, 2019 (article)

DOI Project Page [BibTex]

Probabilistic Linear Solvers: A Unifying View

Bartels, S., Cockayne, J., Ipsen, I., Hennig, P.

Statistics and Computing, 29(6):1249-1263, 2019 (article)

link (url) DOI [BibTex]

Bartels, S., Cockayne, J., Ipsen, I., Hennig, P. Probabilistic Linear Solvers: A Unifying View Statistics and Computing, 29(6):1249-1263, 2019 (article)

link (url) DOI [BibTex]

2018

Kernel Recursive ABC: Point Estimation with Intractable Likelihood

Kajihara, T., Kanagawa, M., Yamazaki, K., Fukumizu, K.

Proceedings of the 35th International Conference on Machine Learning, pages: 2405-2414, PMLR, July 2018 (conference)

Abstract

We propose a novel approach to parameter estimation for simulator-based statistical models with intractable likelihood. Our proposed method involves recursive application of kernel ABC and kernel herding to the same observed data. We provide a theoretical explanation regarding why the approach works, showing (for the population setting) that, under a certain assumption, point estimates obtained with this method converge to the true parameter, as recursion proceeds. We have conducted a variety of numerical experiments, including parameter estimation for a real-world pedestrian flow simulator, and show that in most cases our method outperforms existing approaches.

Paper [BibTex]

2018

Kajihara, T., Kanagawa, M., Yamazaki, K., Fukumizu, K. Kernel Recursive ABC: Point Estimation with Intractable Likelihood Proceedings of the 35th International Conference on Machine Learning, pages: 2405-2414, PMLR, July 2018 (conference)

Paper [BibTex]

Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients

Balles, L., Hennig, P.

Proceedings of the 35th International Conference on Machine Learning (ICML), 80, pages: 404-413, Proceedings of Machine Learning Research, (Editors: Jennifer Dy and Andreas Krause), PMLR, ICML, July 2018 (conference)

Abstract

The ADAM optimizer is exceedingly popular in the deep learning community. Often it works very well, sometimes it doesn't. Why? We interpret ADAM as a combination of two aspects: for each weight, the update direction is determined by the sign of stochastic gradients, whereas the update magnitude is determined by an estimate of their relative variance. We disentangle these two aspects and analyze them in isolation, gaining insight into the mechanisms underlying ADAM. This analysis also extends recent results on adverse effects of ADAM on generalization, isolating the sign aspect as the problematic one. Transferring the variance adaptation to SGD gives rise to a novel method, completing the practitioner's toolbox for problems where ADAM fails.

link (url) Project Page [BibTex]

Balles, L., Hennig, P. Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients Proceedings of the 35th International Conference on Machine Learning (ICML), 80, pages: 404-413, Proceedings of Machine Learning Research, (Editors: Jennifer Dy and Andreas Krause), PMLR, ICML, July 2018 (conference)

link (url) Project Page [BibTex]

Convergence Rates of Gaussian ODE Filters

Kersting, H., Sullivan, T. J., Hennig, P.

arXiv preprint 2018, arXiv:1807.09737 [math.NA], July 2018 (article)

Abstract

A recently-introduced class of probabilistic (uncertainty-aware) solvers for ordinary differential equations (ODEs) applies Gaussian (Kalman) filtering to initial value problems. These methods model the true solution $x$ and its first $q$ derivatives a priori as a Gauss--Markov process $\boldsymbol{X}$, which is then iteratively conditioned on information about $\dot{x}$. We prove worst-case local convergence rates of order $h^{q+1}$ for a wide range of versions of this Gaussian ODE filter, as well as global convergence rates of order $h^q$ in the case of $q=1$ and an integrated Brownian motion prior, and analyse how inaccurate information on $\dot{x}$ coming from approximate evaluations of $f$ affects these rates. Moreover, we present explicit formulas for the steady states and show that the posterior confidence intervals are well calibrated in all considered cases that exhibit global convergence---in the sense that they globally contract at the same rate as the truncation error.

link (url) Project Page [BibTex]

Kersting, H., Sullivan, T. J., Hennig, P. Convergence Rates of Gaussian ODE Filters arXiv preprint 2018, arXiv:1807.09737 [math.NA], July 2018 (article)

link (url) Project Page [BibTex]

Counterfactual Mean Embedding: A Kernel Method for Nonparametric Causal Inference

Muandet, K., Kanagawa, M., Saengkyongam, S., Marukata, S.

Workshop on Machine Learning for Causal Inference, Counterfactual Prediction, and Autonomous Action (CausalML) at ICML, July 2018 (conference)

[BibTex]

Muandet, K., Kanagawa, M., Saengkyongam, S., Marukata, S. Counterfactual Mean Embedding: A Kernel Method for Nonparametric Causal Inference Workshop on Machine Learning for Causal Inference, Counterfactual Prediction, and Autonomous Action (CausalML) at ICML, July 2018 (conference)

[BibTex]

Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences

Kanagawa, M., Hennig, P., Sejdinovic, D., Sriperumbudur, B. K.

Arxiv e-prints, arXiv:1805.08845v1 [stat.ML], 2018 (article)

Abstract

This paper is an attempt to bridge the conceptual gaps between researchers working on the two widely used approaches based on positive definite kernels: Bayesian learning or inference using Gaussian processes on the one side, and frequentist kernel methods based on reproducing kernel Hilbert spaces on the other. It is widely known in machine learning that these two formalisms are closely related; for instance, the estimator of kernel ridge regression is identical to the posterior mean of Gaussian process regression. However, they have been studied and developed almost independently by two essentially separate communities, and this makes it difficult to seamlessly transfer results between them. Our aim is to overcome this potential difficulty. To this end, we review several old and new results and concepts from either side, and juxtapose algorithmic quantities from each framework to highlight close similarities. We also provide discussions on subtle philosophical and theoretical differences between the two approaches.

arXiv [BibTex]

Kanagawa, M., Hennig, P., Sejdinovic, D., Sriperumbudur, B. K. Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences Arxiv e-prints, arXiv:1805.08845v1 [stat.ML], 2018 (article)

arXiv [BibTex]

Counterfactual Mean Embedding: A Kernel Method for Nonparametric Causal Inference

Muandet, K., Kanagawa, M., Saengkyongam, S., Marukata, S.

Arxiv e-prints, arXiv:1805.08845v1 [stat.ML], 2018 (article)

Abstract

This paper introduces a novel Hilbert space representation of a counterfactual distribution---called counterfactual mean embedding (CME)---with applications in nonparametric causal inference. Counterfactual prediction has become an ubiquitous tool in machine learning applications, such as online advertisement, recommendation systems, and medical diagnosis, whose performance relies on certain interventions. To infer the outcomes of such interventions, we propose to embed the associated counterfactual distribution into a reproducing kernel Hilbert space (RKHS) endowed with a positive definite kernel. Under appropriate assumptions, the CME allows us to perform causal inference over the entire landscape of the counterfactual distribution. The CME can be estimated consistently from observational data without requiring any parametric assumption about the underlying distributions. We also derive a rate of convergence which depends on the smoothness of the conditional mean and the Radon-Nikodym derivative of the underlying marginal distributions. Our framework can deal with not only real-valued outcome, but potentially also more complex and structured outcomes such as images, sequences, and graphs. Lastly, our experimental results on off-policy evaluation tasks demonstrate the advantages of the proposed estimator.

arXiv [BibTex]

Muandet, K., Kanagawa, M., Saengkyongam, S., Marukata, S. Counterfactual Mean Embedding: A Kernel Method for Nonparametric Causal Inference Arxiv e-prints, arXiv:1805.08845v1 [stat.ML], 2018 (article)

arXiv [BibTex]

Model-based Kernel Sum Rule: Kernel Bayesian Inference with Probabilistic Models

Nishiyama, Y., Kanagawa, M., Gretton, A., Fukumizu, K.

Arxiv e-prints, arXiv:1409.5178v2 [stat.ML], 2018 (article)

Abstract

Kernel Bayesian inference is a powerful nonparametric approach to performing Bayesian inference in reproducing kernel Hilbert spaces or feature spaces. In this approach, kernel means are estimated instead of probability distributions, and these estimates can be used for subsequent probabilistic operations (as for inference in graphical models) or in computing the expectations of smooth functions, for instance. Various algorithms for kernel Bayesian inference have been obtained by combining basic rules such as the kernel sum rule (KSR), kernel chain rule, kernel product rule and kernel Bayes' rule. However, the current framework only deals with fully nonparametric inference (i.e., all conditional relations are learned nonparametrically), and it does not allow for flexible combinations of nonparametric and parametric inference, which are practically important. Our contribution is in providing a novel technique to realize such combinations. We introduce a new KSR referred to as the model-based KSR (Mb-KSR), which employs the sum rule in feature spaces under a parametric setting. Incorporating the Mb-KSR into existing kernel Bayesian framework provides a richer framework for hybrid (nonparametric and parametric) kernel Bayesian inference. As a practical application, we propose a novel filtering algorithm for state space models based on the Mb-KSR, which combines the nonparametric learning of an observation process using kernel mean embedding and the additive Gaussian noise model for a state transition process. While we focus on additive Gaussian noise models in this study, the idea can be extended to other noise models, such as the Cauchy and alpha-stable noise models.

arXiv [BibTex]

Nishiyama, Y., Kanagawa, M., Gretton, A., Fukumizu, K. Model-based Kernel Sum Rule: Kernel Bayesian Inference with Probabilistic Models Arxiv e-prints, arXiv:1409.5178v2 [stat.ML], 2018 (article)

arXiv [BibTex]

A probabilistic model for the numerical solution of initial value problems

Schober, M., Särkkä, S., Hennig, P.

Statistics and Computing, 29(1):99–122, 2018 (article)

Abstract

We study connections between ordinary differential equation (ODE) solvers and probabilistic regression methods in statistics. We provide a new view of probabilistic ODE solvers as active inference agents operating on stochastic differential equation models that estimate the unknown initial value problem (IVP) solution from approximate observations of the solution derivative, as provided by the ODE dynamics. Adding to this picture, we show that several multistep methods of Nordsieck form can be recast as Kalman filtering on q-times integrated Wiener processes. Doing so provides a family of IVP solvers that return a Gaussian posterior measure, rather than a point estimate. We show that some such methods have low computational overhead, nontrivial convergence order, and that the posterior has a calibrated concentration rate. Additionally, we suggest a step size adaptation algorithm which completes the proposed method to a practically useful implementation, which we experimentally evaluate using a representative set of standard codes in the DETEST benchmark set.

PDF Code DOI Project Page [BibTex]

Schober, M., Särkkä, S., Hennig, P. A probabilistic model for the numerical solution of initial value problems Statistics and Computing, 29(1):99–122, 2018 (article)

PDF Code DOI Project Page [BibTex]

Probabilistic Approaches to Stochastic Optimization

Mahsereci, M.

Eberhard Karls Universität Tübingen, Germany, 2018 (phdthesis)

link (url) Project Page [BibTex]

Mahsereci, M. Probabilistic Approaches to Stochastic Optimization Eberhard Karls Universität Tübingen, Germany, 2018 (phdthesis)

link (url) Project Page [BibTex]

Analytical incorporation of fractionation effects in probabilistic treatment planning for intensity-modulated proton therapy

Wahl, N., Hennig, P., Wieser, H., Bangert, M.

Medical Physics, 45(4):1317-1328, 2018 (article)

DOI [BibTex]

Wahl, N., Hennig, P., Wieser, H., Bangert, M. Analytical incorporation of fractionation effects in probabilistic treatment planning for intensity-modulated proton therapy Medical Physics, 45(4):1317-1328, 2018 (article)

DOI [BibTex]

Large sample analysis of the median heuristic

Garreau, D., Jitkrittum, W., Kanagawa, M.

2018 (misc) In preparation

arXiv [BibTex]

Garreau, D., Jitkrittum, W., Kanagawa, M. Large sample analysis of the median heuristic 2018 (misc) In preparation

arXiv [BibTex]

Probabilistic Ordinary Differential Equation Solvers — Theory and Applications

Schober, M.

Eberhard Karls Universität Tübingen, Germany, 2018 (phdthesis)

[BibTex]

Schober, M. Probabilistic Ordinary Differential Equation Solvers — Theory and Applications Eberhard Karls Universität Tübingen, Germany, 2018 (phdthesis)

[BibTex]

2017

On the Design of LQR Kernels for Efficient Controller Learning

Marco, A., Hennig, P., Schaal, S., Trimpe, S.

Proceedings of the 56th IEEE Annual Conference on Decision and Control (CDC), pages: 5193-5200, IEEE, IEEE Conference on Decision and Control, December 2017 (conference)

Abstract

Finding optimal feedback controllers for nonlinear dynamic systems from data is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful framework for direct controller tuning from experimental trials. For selecting the next query point and finding the global optimum, BO relies on a probabilistic description of the latent objective function, typically a Gaussian process (GP). As is shown herein, GPs with a common kernel choice can, however, lead to poor learning outcomes on standard quadratic control problems. For a first-order system, we construct two kernels that specifically leverage the structure of the well-known Linear Quadratic Regulator (LQR), yet retain the flexibility of Bayesian nonparametric learning. Simulations of uncertain linear and nonlinear systems demonstrate that the LQR kernels yield superior learning performance.

arXiv PDF On the Design of LQR Kernels for Efficient Controller Learning - CDC presentation DOI Project Page Project Page [BibTex]

2017

Marco, A., Hennig, P., Schaal, S., Trimpe, S. On the Design of LQR Kernels for Efficient Controller Learning Proceedings of the 56th IEEE Annual Conference on Decision and Control (CDC), pages: 5193-5200, IEEE, IEEE Conference on Decision and Control, December 2017 (conference)

arXiv PDF On the Design of LQR Kernels for Efficient Controller Learning - CDC presentation DOI Project Page Project Page [BibTex]

Probabilistic Line Searches for Stochastic Optimization

Mahsereci, M., Hennig, P.

Journal of Machine Learning Research, 18(119):1-59, November 2017 (article)

link (url) Project Page [BibTex]

Mahsereci, M., Hennig, P. Probabilistic Line Searches for Stochastic Optimization Journal of Machine Learning Research, 18(119):1-59, November 2017 (article)

link (url) Project Page [BibTex]

Coupling Adaptive Batch Sizes with Learning Rates

Balles, L., Romero, J., Hennig, P.

In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI), pages: ID 141, (Editors: Gal Elidan, Kristian Kersting, and Alexander T. Ihler), August 2017 (inproceedings)

Abstract

Mini-batch stochastic gradient descent and variants thereof have become standard for large-scale empirical risk minimization like the training of neural networks. These methods are usually used with a constant batch size chosen by simple empirical inspection. The batch size significantly influences the behavior of the stochastic optimization algorithm, though, since it determines the variance of the gradient estimates. This variance also changes over the optimization process; when using a constant batch size, stability and convergence is thus often enforced by means of a (manually tuned) decreasing learning rate schedule. We propose a practical method for dynamic batch size adaptation. It estimates the variance of the stochastic gradients and adapts the batch size to decrease the variance proportionally to the value of the objective function, removing the need for the aforementioned learning rate decrease. In contrast to recent related work, our algorithm couples the batch size to the learning rate, directly reflecting the known relationship between the two. On three image classification benchmarks, our batch size adaptation yields faster optimization convergence, while simultaneously simplifying learning rate tuning. A TensorFlow implementation is available.

Code link (url) Project Page [BibTex]

Balles, L., Romero, J., Hennig, P. Coupling Adaptive Batch Sizes with Learning Rates In Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI), pages: ID 141, (Editors: Gal Elidan, Kristian Kersting, and Alexander T. Ihler), August 2017 (inproceedings)

Code link (url) Project Page [BibTex]

Dynamic Time-of-Flight

Schober, M., Adam, A., Yair, O., Mazor, S., Nowozin, S.

Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, pages: 170-179, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (conference)

DOI [BibTex]

Schober, M., Adam, A., Yair, O., Mazor, S., Nowozin, S. Dynamic Time-of-Flight Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2017, pages: 170-179, IEEE, Piscataway, NJ, USA, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017 (conference)

DOI [BibTex]

Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian Optimization

Marco, A., Berkenkamp, F., Hennig, P., Schoellig, A. P., Krause, A., Schaal, S., Trimpe, S.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 1557-1563, IEEE, Piscataway, NJ, USA, May 2017 (inproceedings)

PDF arXiv ICRA 2017 Spotlight presentation Virtual vs. Real - Video explanation DOI Project Page Project Page [BibTex]

Marco, A., Berkenkamp, F., Hennig, P., Schoellig, A. P., Krause, A., Schaal, S., Trimpe, S. Virtual vs. Real: Trading Off Simulations and Physical Experiments in Reinforcement Learning with Bayesian Optimization In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 1557-1563, IEEE, Piscataway, NJ, USA, May 2017 (inproceedings)

PDF arXiv ICRA 2017 Spotlight presentation Virtual vs. Real - Video explanation DOI Project Page Project Page [BibTex]

Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets

Klein, A., Falkner, S., Bartels, S., Hennig, P., Hutter, F.

Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 54, pages: 528-536, Proceedings of Machine Learning Research, (Editors: Sign, Aarti and Zhu, Jerry), PMLR, April 2017 (conference)

pdf link (url) Project Page [BibTex]

Klein, A., Falkner, S., Bartels, S., Hennig, P., Hutter, F. Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS), 54, pages: 528-536, Proceedings of Machine Learning Research, (Editors: Sign, Aarti and Zhu, Jerry), PMLR, April 2017 (conference)

pdf link (url) Project Page [BibTex]

Early Stopping Without a Validation Set

Mahsereci, M., Balles, L., Lassner, C., Hennig, P.

arXiv preprint arXiv:1703.09580, 2017 (article)

Abstract

Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. In this paper we propose a novel early stopping criterion which is based on fast-to-compute, local statistics of the computed gradients and entirely removes the need for a held-out validation set. Our experiments show that this is a viable approach in the setting of least-squares and logistic regression as well as neural networks.

link (url) Project Page [BibTex]

Mahsereci, M., Balles, L., Lassner, C., Hennig, P. Early Stopping Without a Validation Set arXiv preprint arXiv:1703.09580, 2017 (article)

link (url) Project Page [BibTex]

Krylov Subspace Recycling for Fast Iterative Least-Squares in Machine Learning

Roos, F. D., Hennig, P.

arXiv preprint arXiv:1706.00241, 2017 (article)

Abstract

Solving symmetric positive definite linear problems is a fundamental computational task in machine learning. The exact solution, famously, is cubicly expensive in the size of the matrix. To alleviate this problem, several linear-time approximations, such as spectral and inducing-point methods, have been suggested and are now in wide use. These are low-rank approximations that choose the low-rank space a priori and do not refine it over time. While this allows linear cost in the data-set size, it also causes a finite, uncorrected approximation error. Authors from numerical linear algebra have explored ways to iteratively refine such low-rank approximations, at a cost of a small number of matrix-vector multiplications. This idea is particularly interesting in the many situations in machine learning where one has to solve a sequence of related symmetric positive definite linear problems. From the machine learning perspective, such deflation methods can be interpreted as transfer learning of a low-rank approximation across a time-series of numerical tasks. We study the use of such methods for our field. Our empirical results show that, on regression and classification problems of intermediate size, this approach can interpolate between low computational cost and numerical precision.

link (url) Project Page [BibTex]

Roos, F. D., Hennig, P. Krylov Subspace Recycling for Fast Iterative Least-Squares in Machine Learning arXiv preprint arXiv:1706.00241, 2017 (article)

link (url) Project Page [BibTex]

Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in Misspecified Settings

Kanagawa, M., Sriperumbudur, B. K., Fukumizu, K.

Arxiv e-prints, arXiv:1709.00147v1 [math.NA], 2017 (article)

Abstract

This paper presents convergence analysis of kernel-based quadrature rules in misspecified settings, focusing on deterministic quadrature in Sobolev spaces. In particular, we deal with misspecified settings where a test integrand is less smooth than a Sobolev RKHS based on which a quadrature rule is constructed. We provide convergence guarantees based on two different assumptions on a quadrature rule: one on quadrature weights, and the other on design points. More precisely, we show that convergence rates can be derived (i) if the sum of absolute weights remains constant (or does not increase quickly), or (ii) if the minimum distance between distance design points does not decrease very quickly. As a consequence of the latter result, we derive a rate of convergence for Bayesian quadrature in misspecified settings. We reveal a condition on design points to make Bayesian quadrature robust to misspecification, and show that, under this condition, it may adaptively achieve the optimal rate of convergence in the Sobolev space of a lesser order (i.e., of the unknown smoothness of a test integrand), under a slightly stronger regularity condition on the integrand.

arXiv [BibTex]

Kanagawa, M., Sriperumbudur, B. K., Fukumizu, K. Convergence Analysis of Deterministic Kernel-Based Quadrature Rules in Misspecified Settings Arxiv e-prints, arXiv:1709.00147v1 [math.NA], 2017 (article)

arXiv [BibTex]

Fast Bayesian hyperparameter optimization on large datasets

Klein, A., Falkner, S., Bartels, S., Hennig, P., Hutter, F.

Electronic Journal of Statistics, 11, 2017 (article)

[BibTex]

Klein, A., Falkner, S., Bartels, S., Hennig, P., Hutter, F. Fast Bayesian hyperparameter optimization on large datasets Electronic Journal of Statistics, 11, 2017 (article)

[BibTex]

Nonparametric Disturbance Correction and Nonlinear Dual Control

Klenske, E. D.

(24098), ETH Zurich, 2017 (phdthesis)

DOI [BibTex]

Klenske, E. D. Nonparametric Disturbance Correction and Nonlinear Dual Control (24098), ETH Zurich, 2017 (phdthesis)

DOI [BibTex]

Probabilistic Active Learning of Functions in Structural Causal Models

Rubenstein, P. K., Tolstikhin, I., Hennig, P., Schölkopf, B.

2017 (misc)

Arxiv [BibTex]

Rubenstein, P. K., Tolstikhin, I., Hennig, P., Schölkopf, B. Probabilistic Active Learning of Functions in Structural Causal Models 2017 (misc)

Arxiv [BibTex]

New Directions for Learning with Kernels and Gaussian Processes (Dagstuhl Seminar 16481)

Gretton, A., Hennig, P., Rasmussen, C., Schölkopf, B.

Dagstuhl Reports, 6(11):142-167, 2017 (article)

DOI [BibTex]

Gretton, A., Hennig, P., Rasmussen, C., Schölkopf, B. New Directions for Learning with Kernels and Gaussian Processes (Dagstuhl Seminar 16481) Dagstuhl Reports, 6(11):142-167, 2017 (article)

DOI [BibTex]

Efficiency of analytical and sampling-based uncertainty propagation in intensity-modulated proton therapy

Wahl, N., Hennig, P., Wieser, H. P., Bangert, M.

Physics in Medicine & Biology, 62(14):5790-5807, 2017 (article)

Abstract

The sensitivity of intensity-modulated proton therapy (IMPT) treatment plans to uncertainties can be quantified and mitigated with robust/min-max and stochastic/probabilistic treatment analysis and optimization techniques. Those methods usually rely on sparse random, importance, or worst-case sampling. Inevitably, this imposes a trade-off between computational speed and accuracy of the uncertainty propagation. Here, we investigate analytical probabilistic modeling (APM) as an alternative for uncertainty propagation and minimization in IMPT that does not rely on scenario sampling. APM propagates probability distributions over range and setup uncertainties via a Gaussian pencil-beam approximation into moments of the probability distributions over the resulting dose in closed form. It supports arbitrary correlation models and allows for efficient incorporation of fractionation effects regarding random and systematic errors. We evaluate the trade-off between run-time and accuracy of APM uncertainty computations on three patient datasets. Results are compared against reference computations facilitating importance and random sampling. Two approximation techniques to accelerate uncertainty propagation and minimization based on probabilistic treatment plan optimization are presented. Runtimes are measured on CPU and GPU platforms, dosimetric accuracy is quantified in comparison to a sampling-based benchmark (5000 random samples). APM accurately propagates range and setup uncertainties into dose uncertainties at competitive run-times (GPU ##IMG## [http://ej.iop.org/images/0031-9155/62/14/5790/pmbaa6ec5ieqn001.gif] {$\leqslant {5}$} min). The resulting standard deviation (expectation value) of dose show average global ##IMG## [http://ej.iop.org/images/0031-9155/62/14/5790/pmbaa6ec5ieqn002.gif] {$\gamma_{{3}\% / {3}~{\rm mm}}$} pass rates between 94.2% and 99.9% (98.4% and 100.0%). All investigated importance sampling strategies provided less accuracy at higher run-times considering only a single fraction. Considering fractionation, APM uncertainty propagation and treatment plan optimization was proven to be possible at constant time complexity, while run-times of sampling-based computations are linear in the number of fractions. Using sum sampling within APM, uncertainty propagation can only be accelerated at the cost of reduced accuracy in variance calculations. For probabilistic plan optimization, we were able to approximate the necessary pre-computations within seconds, yielding treatment plans of similar quality as gained from exact uncertainty propagation. APM is suited to enhance the trade-off between speed and accuracy in uncertainty propagation and probabilistic treatment plan optimization, especially in the context of fractionation. This brings fully-fledged APM computations within reach of clinical application.

link (url) [BibTex]

Wahl, N., Hennig, P., Wieser, H. P., Bangert, M. Efficiency of analytical and sampling-based uncertainty propagation in intensity-modulated proton therapy Physics in Medicine & Biology, 62(14):5790-5807, 2017 (article)

link (url) [BibTex]

Analytical probabilistic modeling of RBE-weighted dose for ion therapy

Wieser, H., Hennig, P., Wahl, N., Bangert, M.

Physics in Medicine and Biology (PMB), 62(23):8959-8982, 2017 (article)

link (url) [BibTex]

Wieser, H., Hennig, P., Wahl, N., Bangert, M. Analytical probabilistic modeling of RBE-weighted dose for ion therapy Physics in Medicine and Biology (PMB), 62(23):8959-8982, 2017 (article)

link (url) [BibTex]

2016

Approximate dual control maintaining the value of information with an application to building control

Klenske, E. D., Hennig, P., Schölkopf, B., Zeilinger, M. N.

In European Control Conference (ECC), pages: 800-806, June 2016 (inproceedings)

PDF DOI [BibTex]

2016

Klenske, E. D., Hennig, P., Schölkopf, B., Zeilinger, M. N. Approximate dual control maintaining the value of information with an application to building control In European Control Conference (ECC), pages: 800-806, June 2016 (inproceedings)

PDF DOI [BibTex]

Active Uncertainty Calibration in Bayesian ODE Solvers

Kersting, H., Hennig, P.

Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI), pages: 309-318, (Editors: Ihler, Alexander T. and Janzing, Dominik), June 2016 (conference)

Abstract

There is resurging interest, in statistics and machine learning, in solvers for ordinary differential equations (ODEs) that return probability measures instead of point estimates. Recently, Conrad et al.~introduced a sampling-based class of methods that are `well-calibrated' in a specific sense. But the computational cost of these methods is significantly above that of classic methods. On the other hand, Schober et al.~pointed out a precise connection between classic Runge-Kutta ODE solvers and Gaussian filters, which gives only a rough probabilistic calibration, but at negligible cost overhead. By formulating the solution of ODEs as approximate inference in linear Gaussian SDEs, we investigate a range of probabilistic ODE solvers, that bridge the trade-off between computational cost and probabilistic calibration, and identify the inaccurate gradient measurement as the crucial source of uncertainty. We propose the novel filtering-based method Bayesian Quadrature filtering (BQF) which uses Bayesian quadrature to actively learn the imprecision in the gradient measurement by collecting multiple gradient evaluations.

link (url) [BibTex]

Kersting, H., Hennig, P. Active Uncertainty Calibration in Bayesian ODE Solvers Proceedings of the 32nd Conference on Uncertainty in Artificial Intelligence (UAI), pages: 309-318, (Editors: Ihler, Alexander T. and Janzing, Dominik), June 2016 (conference)

link (url) [BibTex]

Automatic LQR Tuning Based on Gaussian Process Global Optimization

Marco, A., Hennig, P., Bohg, J., Schaal, S., Trimpe, S.

In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 270-277, IEEE, IEEE International Conference on Robotics and Automation, May 2016 (inproceedings)

Abstract

This paper proposes an automatic controller tuning framework based on linear optimal control combined with Bayesian optimization. With this framework, an initial set of controller gains is automatically improved according to a pre-defined performance objective evaluated from experimental data. The underlying Bayesian optimization algorithm is Entropy Search, which represents the latent objective as a Gaussian process and constructs an explicit belief over the location of the objective minimum. This is used to maximize the information gain from each experimental evaluation. Thus, this framework shall yield improved controllers with fewer evaluations compared to alternative approaches. A seven-degree- of-freedom robot arm balancing an inverted pole is used as the experimental demonstrator. Results of a two- and four- dimensional tuning problems highlight the method’s potential for automatic controller tuning on robotic platforms.

Video - Automatic LQR Tuning Based on Gaussian Process Global Optimization - ICRA 2016 Video - Automatic Controller Tuning on a Two-legged Robot PDF DOI Project Page Project Page [BibTex]

Marco, A., Hennig, P., Bohg, J., Schaal, S., Trimpe, S. Automatic LQR Tuning Based on Gaussian Process Global Optimization In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pages: 270-277, IEEE, IEEE International Conference on Robotics and Automation, May 2016 (inproceedings)

Video - Automatic LQR Tuning Based on Gaussian Process Global Optimization - ICRA 2016 Video - Automatic Controller Tuning on a Two-legged Robot PDF DOI Project Page Project Page [BibTex]

Batch Bayesian Optimization via Local Penalization

González, J., Dai, Z., Hennig, P., Lawrence, N.

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), 51, pages: 648-657, JMLR Workshop and Conference Proceedings, (Editors: Gretton, A. and Robert, C. C.), May 2016 (conference)

link (url) Project Page [BibTex]

González, J., Dai, Z., Hennig, P., Lawrence, N. Batch Bayesian Optimization via Local Penalization Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), 51, pages: 648-657, JMLR Workshop and Conference Proceedings, (Editors: Gretton, A. and Robert, C. C.), May 2016 (conference)

link (url) Project Page [BibTex]

Probabilistic Approximate Least-Squares

Bartels, S., Hennig, P.

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), 51, pages: 676-684, JMLR Workshop and Conference Proceedings, (Editors: Gretton, A. and Robert, C. C. ), May 2016 (conference)

Abstract

Least-squares and kernel-ridge / Gaussian process regression are among the foundational algorithms of statistics and machine learning. Famously, the worst-case cost of exact nonparametric regression grows cubically with the data-set size; but a growing number of approximations have been developed that estimate good solutions at lower cost. These algorithms typically return point estimators, without measures of uncertainty. Leveraging recent results casting elementary linear algebra operations as probabilistic inference, we propose a new approximate method for nonparametric least-squares that affords a probabilistic uncertainty estimate over the error between the approximate and exact least-squares solution (this is not the same as the posterior variance of the associated Gaussian process regressor). This allows estimating the error of the least-squares solution on a subset of the data relative to the full-data solution. The uncertainty can be used to control the computational effort invested in the approximation. Our algorithm has linear cost in the data-set size, and a simple formal form, so that it can be implemented with a few lines of code in programming languages with linear algebra functionality.

link (url) [BibTex]

Bartels, S., Hennig, P. Probabilistic Approximate Least-Squares Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (AISTATS), 51, pages: 676-684, JMLR Workshop and Conference Proceedings, (Editors: Gretton, A. and Robert, C. C. ), May 2016 (conference)

link (url) [BibTex]

Gaussian Process-Based Predictive Control for Periodic Error Correction

Klenske, E. D., Zeilinger, M., Schölkopf, B., Hennig, P.

IEEE Transactions on Control Systems Technology , 24(1):110-121, 2016 (article)

PDF DOI [BibTex]

Klenske, E. D., Zeilinger, M., Schölkopf, B., Hennig, P. Gaussian Process-Based Predictive Control for Periodic Error Correction IEEE Transactions on Control Systems Technology , 24(1):110-121, 2016 (article)

PDF DOI [BibTex]

Dual Control for Approximate Bayesian Reinforcement Learning

Klenske, E. D., Hennig, P.

Journal of Machine Learning Research, 17(127):1-30, 2016 (article)

PDF link (url) [BibTex]

Klenske, E. D., Hennig, P. Dual Control for Approximate Bayesian Reinforcement Learning Journal of Machine Learning Research, 17(127):1-30, 2016 (article)

PDF link (url) [BibTex]

2015

Automatic LQR Tuning Based on Gaussian Process Optimization: Early Experimental Results

Marco, A., Hennig, P., Bohg, J., Schaal, S., Trimpe, S.

Machine Learning in Planning and Control of Robot Motion Workshop at the IEEE/RSJ International Conference on Intelligent Robots and Systems (iROS), pages: , , Machine Learning in Planning and Control of Robot Motion Workshop, October 2015 (conference)

Abstract

This paper proposes an automatic controller tuning framework based on linear optimal control combined with Bayesian optimization. With this framework, an initial set of controller gains is automatically improved according to a pre-defined performance objective evaluated from experimental data. The underlying Bayesian optimization algorithm is Entropy Search, which represents the latent objective as a Gaussian process and constructs an explicit belief over the location of the objective minimum. This is used to maximize the information gain from each experimental evaluation. Thus, this framework shall yield improved controllers with fewer evaluations compared to alternative approaches. A seven-degree-of-freedom robot arm balancing an inverted pole is used as the experimental demonstrator. Preliminary results of a low-dimensional tuning problem highlight the method’s potential for automatic controller tuning on robotic platforms.

PDF DOI Project Page Project Page [BibTex]

2015

Marco, A., Hennig, P., Bohg, J., Schaal, S., Trimpe, S. Automatic LQR Tuning Based on Gaussian Process Optimization: Early Experimental Results Machine Learning in Planning and Control of Robot Motion Workshop at the IEEE/RSJ International Conference on Intelligent Robots and Systems (iROS), pages: , , Machine Learning in Planning and Control of Robot Motion Workshop, October 2015 (conference)

PDF DOI Project Page Project Page [BibTex]

MPI Papers

Publication Type

Year

2021

2021

2020

2020

2019

2019

2018

2018

2017

2017

2016

2016

2015

2015