I gave a talk at Microsoft Research – Cambridge – 09/2015
I gave a talk at the Workshop on Probabilistic Numerics @DALI 2015
Stochastic optimization methods have become an increasingly important tool in minimizing nonlinear high dimensional objectives where only noisy function and gradient evaluations are available.
Probabilistic numerics provides a framework to address these challenges by explicitly modeling noise and uncertainty.
Optimization can be cast as an inference problem where unknown quantities (e.g. the gradient or the Hessian at every location) is inferred through previously collected noisy gradient evaluations.
Often the noise variances on gradients and values is known or can be estimated with low overhead (e.g for empirical risk minimization) such that the optimizer has a quantitative understanding of how uncertain its inputs are.
My work includes i) robust estimation of first- and second order search directions for stochastic optimization. The work is closely related to previous work by Hennig and Kiefel [ ] who showed that quasi-Newton methods, such as the BFGS rule, arise as the mean of a Gaussian distribution over the elements of the Hessian matrix of an optimization objective. More distantly related work includes e.g Hennig [ ] on the solution of linear solvers.
Further areas of research are: ii) automated step size adaptation in stochatic settings, where we extended the classic line search paradigm of deterministic optimization to a fully probabilistic one [ ]. iii) overfitting prevention by early-stopping without the help of a validation set [ ] based on a lightweight statistical test, which compares gradient magintudes to their noise.
The goal of my work is a smart machine (optimizer) that needs little to no human expert knowledge to perform its task, even communicating important information about the optimization progress to us.
About me: I am a physicist by training and I love models, math and abstract thinking. Simulations and testing own theories on data (by writing a computer program) is an essential part of science for me.
Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. In this paper we propose a novel early stopping criterion which is based on fast-to-compute, local statistics of the computed gradients and entirely removes the need for a held-out validation set. Our experiments show that this is a viable approach in the setting of least-squares and logistic regression as well as neural networks.
In Advances in Neural Information Processing Systems 28, pages: 181-189, (Editors: C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama and R. Garnett), Curran Associates, Inc., 29th Annual Conference on Neural Information Processing Systems (NIPS), 2015 (inproceedings)
In deterministic optimization, line searches are a standard tool ensuring stability and efficiency. Where only stochastic gradients are available, no direct equivalent has so far been formulated, because uncertain gradients do not allow for a strict sequence of decisions collapsing the search space. We construct a probabilistic line search by combining the structure of existing deterministic methods with notions from Bayesian optimization. Our method retains a Gaussian process surrogate of the univariate optimization objective, and uses a probabilistic belief over the Wolfe conditions to monitor the descent. The algorithm has very low computational cost, and no user-controlled parameters. Experiments show that it effectively removes the need to define a learning rate for stochastic gradient descent.
[You can find the matlab research code under `attachments' below. The zip-file contains a minimal working example. The docstring in probLineSearch.m contains additional information. A more polished implementation in C++ will be published here at a later point. For comments and questions about the code please write to email@example.com.]
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems