|09:30 to 11:00
|Praneeth Netrapalli (Microsoft Research, India)
Introductory lectures on first-order convex optimization
In these two lectures, we will introduce the first order black box setting of convex optimization, describe some fundamental algorithms: gradient descent, Nesterov’s accelerated gradient descent, mirror descent, stochastic gradient descent and stochastic variance reduced gradient and derive bounds on their performance for appropriate (smooth/nonsmooth) classes of convex functions. The lectures will not assume any prior background in optimization from the audience.
|11:30 to 12:30
|Pratik Chaudhari (University of Pennsylvania, USA)
Unraveling the mysteries of stochastic gradient descent for deep neural networks
Deep networks are mysterious. These over-parametrized machine learning models, trained with rudimentary optimization algorithms on non-convex landscapes in millions of dimensions have defied attempts to put a sound theoretical footing beneath their impressive performance.
This talk will shed light upon some of these mysteries. I will employ diverse ideas---from thermodynamics and optimal transportation to partial differential equations, control theory and Bayesian inference---and paint a picture of the training process of deep networks. Along the way, I will develop state-of-the-art algorithms for non-convex optimization.
I will conclude with a few recent results on learning with few labeled data.
|14:30 to 15:30
|Hariharan Narayanan (TIFR, India)
Random concave functions on an equilateral lattice with periodic hessians
We show that a random concave function having a periodic hessian on an equilateral lattice has a quadratic scaling limit, if the average hessian of the function satisfies certain conditions. We consider the set of all concave functions g on an equilateral lattice L that when shifted by an element of nL, incur addition by a linear function (this condition is equivalent to the periodicity of the hessian of g). We identify this set, up to addition by a constant, with a convex polytope P(s), where s corresponds to the average hessian. We show that the diameter of P(s) is bounded below by c(s)n^2 in sup norm, where c(s) is a positive constant depending only on s. Our main result is that, for any for any epsilon > 0, the normalized Lebesgue measure of all points in P(s) that are not contained in a cube Q of side length epsilon, centered at the unique (up to addition of a linear term) quadratic polynomial with hessian s, tends to 0 as n tends to ∞. Such results for other function spaces would lead to a better understanding of function fitting in Bayesian contexts. ” Arxiv preprint: “Random concave functions on an equilateral lattice with periodic hessians.” (2019), https://arxiv.org/abs/1909.08586 .
|16:00 to 17:30
|Marc Mezard (École normale supérieure of Paris, France)
Artificial intelligence: success, limits, myths and threats (Lecture 1)
Artificial Intelligence is about to have a dramatic impact on many sectors of human activity. In the last ten years, thanks to the development of machine learning in “deep networks”, we have experienced spectacular breakthroughs in diverse applications such as automatic interpretation of images, speech recognition, consumer profiling, or go and chess playing. Algorithms are now competing with the best professionals at analyzing skin cancer symptoms or detecting specific anomalies in radiology; and much more is to come. Worrisome perspectives are frequently raised, from massive job destruction to autonomous decision-making “warrior” robots.
In this talk, we shall open the black box of deep networks and explore how they are programmed to learn from data by themselves. This will allow us to understand their limits, to question whether their achievements have anything to do with “intelligence”, and to reflect on the foundations of scientific intelligence.
|09:30 to 11:00
|Gérard Ben Arous (Courant Institute of Mathematical Sciences, USA)
Exploring the random landscapes of inference
I will survey recent progress on the study of the landscapes of inference.
This will be done in the hard case of Tensor PCA, then for the p+k model, and finally for the usual inference problems in the class of Generalized Linear Models (which are not linear, as everybody knows). This will use recent works by G. Biroli, C. Cammarota, R. Gheissari, A. Jagannath, S. Mei, A. Maillard, A. Montanari, M. Nica, V.Ros, E. Subag and more.
|11:30 to 12:30
|Piyush Srivastava (TIFR, India)
Zeros of polynomials, decay of correlations, and algorithms
The problem of approximating partition functions of spin systems is an important and well-studied algorithmic primitive, in part due to its strong connections with Gibbs sampling. Randomized methods based on
This talk discusses a third method, proposed by Alexander Barvinok, which is based on the geometry of polynomials, and, in hindsight, is related to the Yang-Lee theory of phase transitions. We discuss two
We also discuss emerging connections between the geometry of polynomials, correlation decay, and Markov chain Monte Carlo, that are suggested by this line of work.
Based on joint work with Jingcheng Liu and Alistair Sinclair.
1. Liu, J., Sinclair, A. & Srivastava, P. The Ising Partition Function: Zeros and Deterministic Approximation. J Stat Phys 174, 287â€“315, 2019.
2. Liu, J., Sinclair, A. & Srivastava, P. Fisher zeros and correlation decay in the Ising model. J Math Phys 60, 103304, 2019.
3. Liu J., Sinclair, A. & Srivastava P. A deterministic algorithm for counting colorings with 2Δ colors. Extended abstract in Proc. IEEE
|14:30 to 15:30
|Serge Gratton (Université de Toulouse, USA)
Optimization with inexact gradient and function
|16:00 to 17:30
|Lenaic Chizat (University of Orsay, France)
Analyses of gradient methods for the optimization of wide two layer neural networks
Gradient-based optimization algorithms applied to artificial neural networks with many parameters typically lead to models with good train and test performance. Two lines of theoretical research have recently emerged as attempts to explain this phenomenon: the lazy training and the mean-field analysis. The lazy training analysis studies the situation when a model remains in a small neighborhood of its initialization throughout training and implicitly performs a kernel regression; and the mean-field analysis is a general description of the training dynamics of infinitely wide two-layer neural networks. In this talk, I will present the insights brought by these analyses, their connections, strengths and limitations.
This talk is based on joint work with Francis Bach and Edouard Oyallon