Intelligent Systems
Note: This research group has relocated. Discover the updated page here

Group Talks

  • Adrián Javaloy
  • S2 seminar room

The problem of text normalization is simple to understand: transform a given arbitrary text into its spoken form. In the context of text-to-speech systems – that we will focus on – this can be exemplified by turning the text “$200” into “two hundred dollars”. Lately, the interest of solving this problem with deep learning techniques has raised since it is a highly context-dependent problem that is still being solved by ad-hoc solutions. So much so that Google even started a contest in the web Kaggle to solve this problem. In this talk we will see how this problem has been approached as part of a Master thesis. Namely, the problem is tackled as if it were an automatic translation problem from English to normalized English, and so the architecture proposed is a neural machine translation architecture with the addition of traditional attention mechanisms. This network is typically composed of an encoder and a decoder, where both of them are multi-layer LSTM networks. As part of this work, and with the aim of proving the feasibility of convolutional neural networks in natural-language processing problems, we propose and compare different architectures for the encoder based on convolutional networks. In particular, we propose a new architecture called Causal Feature Extractor which proves to be a great encoder as well as an attention-friendly architecture.

Organizers: Philipp Hennig


  • Sergio Pascual Díaz
  • S2.014

My plan is to present the motivation behind Deep GPs as well as some of the current approximate inference schemes available with their limitations. Then, I will explain how Deep GPs fit into the BayesOpt framework and the specific problems they could potentially solve.

Organizers: Philipp Hennig


  • Giacomo Garegnani
  • Tübingen, S2 seminar room

We present a novel probabilistic integrator for ordinary differential equations (ODEs) which allows for uncertainty quantification of the numerical error [1]. In particular, we randomise the time steps and build a probability measure on the deterministic solution, which collapses to the true solution of the ODE with the same rate of convergence as the underlying deterministic scheme. The intrinsic nature of the random perturbation guarantees that our probabilistic integrator conserves some geometric properties of the deterministic method it is built on, such as the conservation of first integrals or the symplecticity of the flow. Finally, we present a procedure to incorporate our probabilistic solver into the frame of Bayesian inference inverse problems, showing how inaccurate posterior concentrations given by deterministic methods can be corrected by a probabilistic interpretation of the numerical solution.

Organizers: Hans Kersting


Meta-learning statistics and augmentations for few shot learning

IS Colloquium
  • 25 September 2017 • 11:15—12:15
  • Amos Storkey
  • Tübingen, MPI_IS Lecture Hall (ground floor)

In this talk I introduce the neural statistician as an approach for meta learning. The neural statistician learns to appropriately summarise datasets through a learnt statistic vector. This can be used for few shot learning, by computing the statistic vectors for the presented data, and using these statistics as context variables for one-shot classification and generation. I will show how we can generalise the neural statistician to a context aware learner that learns to characterise and combine independently learnt contexts. I will also demonstrate an approach for meta-learning data augmentation strategies. Acknowledgments: This work is joint work with Harri Edwards, Antreas Antoniou, and Conor Durkan.

Organizers: Philipp Hennig


A locally Adaptive Normal Distribution

Talk
  • 05 September 2017 • 14:00—15:30
  • Georgios Arvanitidis
  • S2 Seminar Room

The fundamental building block in many learning models is the distance measure that is used. Usually, the linear distance is used for simplicity. Replacing this stiff distance measure with a flexible one could potentially give a better representation of the actual distance between two points. I will present how the normal distribution changes if the distance measure respects the underlying structure of the data. In particular, a Riemannian manifold will be learned based on observations. The geodesic curve can then be computed—a length-minimizing curve under the Riemannian measure. With this flexible distance measure we get a normal distribution that locally adapts to the data. A maximum likelihood estimation scheme is provided for inference of the parameters mean and covariance, and also, a systematic way to choose the parameter defining the Riemannian manifold. Results on synthetic and real world data demonstrate the efficiency of the proposed model to fit non-trivial probability distributions.

Organizers: Philipp Hennig


  • Azzurra Ruggeri

How do young children learn so much about the world, and so efficiently? This talk presents the recent studies investigating theoretically and empirically how children actively seek information in their physical and social environments as evidence to test and dynamically revise their hypotheses and theories over time. In particular, it will focus on how children adapt their active learning strategies. such as question-asking and explorative behavior, in response to the task characteristics, to the statistical structure of the hypothesis space, and to the feedback received. Such adaptiveness and flexibility is crucial to achieve efficiency in situations of uncertainty, when testing alternative hypotheses, making decisions, drawing causal inferences and solving categorization tasks.

Organizers: Philipp Hennig Georg Martius


Some parallels between classical and kernel quadrature

Talk
  • 04 July 2017 • 11:00—12:15
  • Toni Karvonen
  • S2 seminar room

This talk draws three parallels between classical algebraic quadrature rules, that are exact for polynomials of low degree, and kernel (or Bayesian) quadrature rules: i) Computational efficiency. Construction of scalable multivariate algebraic quadrature rules is challenging whereas kernel quadrature necessitates solving a linear system of equations, quickly becoming computationally prohibitive. Fully symmetric sets and Smolyak sparse grids can be used to solve both problems. ii) Derivatives and optimal rules. Algebraic degree of a Gaussian quadrature rule cannot be improved by adding derivative evaluations of the integrand. This holds for optimal kernel quadrature rules in the sense that derivatives are of no help in minimising the worst-case error (or posterior integral variance). iii) Positivity of the weights. Essentially as a consequence of the preceding property, both the Gaussian and optimal kernel quadrature rules have positive weights (i.e., they are positive linear functionals).

Organizers: Alexandra Gessner


Bayesian Probabilistic Numerical Methods

Talk
  • 13 June 2017 • 11:00—12:00
  • Jon Cockayne

The emergent field of probabilistic numerics has thus far lacked rigorous statistical foundations. We establish that a class of Bayesian probabilistic numerical methods can be cast as the solution to certain non-standard Bayesian inverse problems. This allows us to establish general conditions under which Bayesian probabilistic numerical methods are well-defined, encompassing both non-linear models and non-Gaussian prior distributions. For general computation, a numerical approximation scheme is developed and its asymptotic convergence is established. The theoretical development is then extended to pipelines of numerical computation, wherein several probabilistic numerical methods are composed to perform more challenging numerical tasks. The contribution highlights an important research frontier at the interface of numerical analysis and uncertainty quantification, with some illustrative applications presented.

Organizers: Michael Schober


Inference with Kernel Embeddings

Talk
  • 22 May 2017 • 11:00—12:15
  • Dino Sejdinovic

Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric hypothesis testing and for learning on distributional inputs. I will give an overview of this framework and present some of its recent applications within the context of approximate Bayesian inference. Further, I will discuss a recent modification of MMD which aims to encode invariance to additive symmetric noise and leads to learning on distributions robust to the distributional covariate shift, e.g. where measurement noise on the training data differs from that on the testing data.

Organizers: Philipp Hennig


  • Philipp Berens
  • tba

The retina in the eye performs complex computations, to transmit only behaviourally relevant information about our visual environment to the brain. These computations are implemented by numerous different cell types that form complex circuits. New experimental and computational methods make it possible to study the cellular diversity of the retina in detail – the goal of obtaining a complete list of all the cell types in the retina and, thus, its “building blocks”, is within reach. I will review our recent contributions in this area, showing how analyzing multimodal datasets from electron microscopy and functional imaging can yield insights into the cellular organization of retinal circuits.

Organizers: Philipp Hennig