I gave a talk at Microsoft Research – Cambridge – 09/2015
I gave a talk at the Workshop on Probabilistic Numerics @DALI 2015
Stochastic optimization methods have become an increasingly important tool in minimizing nonlinear high dimensional objectives where only noisy function and gradient evaluations are available.
Probabilistic numerics provides a framework to address these challenges by explicitly modeling noise and uncertainty.
We see optimization as an inference problem where the unknown gradient (e.g. for gradient decent) or the unknown Hessian (e.g. for quasi-Newton methods) is inferred through previously collected noisy gradient evaluations.
In fact Hennig and Kiefel [ ] showed that quasi-Newton methods, such as the BFGS rule,
arise as the mean of a Gaussian distribution over the elements of the Hessian matrix of an optimization objective.
Aside search directions, probabilistic numerics can help to adjust step sizes (learning rates) by explicitly modeling noise and informing the optimizer about it [ ].
The goal is to have a smart machine (optimizer) that needs little to no human expert knowledge to perform its task, even communicating important information about the optimization progress to us.
About me: I am a physicist by training and I love models, math and abstract thinking. Simulations and testing own theories on data (by writing a computer program) is an essential part of science for me.
Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. In this paper we propose a novel early stopping criterion which is based on fast-to-compute, local statistics of the computed gradients and entirely removes the need for a held-out validation set. Our experiments show that this is a viable approach in the setting of least-squares and logistic regression as well as neural networks.
In Advances in Neural Information Processing Systems 28, pages: 181-189, (Editors: C. Cortes, N.D. Lawrence, D.D. Lee, M. Sugiyama and R. Garnett), Curran Associates, Inc., 29th Annual Conference on Neural Information Processing Systems (NIPS), 2015 (inproceedings)
In deterministic optimization, line searches are a standard tool ensuring stability and efficiency. Where only stochastic gradients are available, no direct equivalent has so far been formulated, because uncertain gradients do not allow for a strict sequence of decisions collapsing the search space. We construct a probabilistic line search by combining the structure of existing deterministic methods with notions from Bayesian optimization. Our method retains a Gaussian process surrogate of the univariate optimization objective, and uses a probabilistic belief over the Wolfe conditions to monitor the descent. The algorithm has very low computational cost, and no user-controlled parameters. Experiments show that it effectively removes the need to define a learning rate for stochastic gradient descent.
[You can find the matlab research code under `attachments' below. The zip-file contains a minimal working example. The docstring in probLineSearch.m contains additional information. A more polished implementation in C++ will be published here at a later point. For comments and questions about the code please write to firstname.lastname@example.org.]
Our goal is to understand the principles of Perception, Action and Learning in autonomous systems that successfully interact with complex environments and to use this understanding to design future systems