Time | Speaker | Title | Resources | |
---|---|---|---|---|

09:00 to 09:45 | Piyush Srivastava |
Structure recovery in graphical models This talk will offer a brief introduction to the problem of recovering block structures in graphical models from observed labellings. Such problems arise, e.g., in the general paradigm of "community detection", where one knows that a system consists of components that can be divided into two different "communities", but needs to find which community each component is in by looking at some observed behaviour of the individual components. After a brief introduction to the classical statistical approaches for modelling this problem, we will look at the a recent approach inspired from statistical mechanics that is based on a variant of the Ising model. We will then see how the phase transitions in this Ising model relate to the statistical problem of recovering block structures. Based on joint work with Quentin Berthet and Philippe Rigollet. |
||

09:45 to 10:30 | Pravesh Kothari |
Average-Case Algorithmic Thresholds via Sum-of-Squares Fundamental problems in unsupervised machine learning and average-case-complexity are concerned with recovering some hidden signal or structure surrounded by random noise. Examples include Sparse PCA, Learning Mixtures of Gaussians, Planted Clique, and Refuting Random Constraint Satisfaction Problems (CSPs). The fundamental question in such problems is determining the minimum signal to noise ratio (SNR) - the relative strength of signal compared to the noise - that allows efficient recovery of the hidden structure. The state-of-the-art algorithms for these problems are based on convex relaxations/spectral methods and typically require SNR that is markedly higher than the statistical threshold - the minimum SNR for recovery using a possibly inefficient algorithm. Are these gaps inherent? Can we reliably predict computational thresholds? The goal of this talk is to introduce the sum-of-squares semi-definite programming hierarchy as an algorithmic framework to reliably answer such questions. The SoS method is a general purpose scheme of semi-definite programming relaxations that captures and extends all known algorithmic techniques for solving the problems above. At the same time, recent developments have yielded a principled method to obtain the precise SoS-SNR thresholds. In the absence of the general techniques to prove average-case lower bounds based on standard assumptions, SoS lower bounds allowing a systematic approach for understanding computational vs statistical complexity trade-offs within this algorithmic framework. In the talk, I'll outline the use of the SoS method to design algorithms for average-case problems and discuss the principled method to prove sharp SoS lower bounds. |
||

10:30 to 11:00 | -- | Tea/Coffee | ||

11:00 to 11:45 | Anand Louis |
Accelerated Newton Iteration for Roots of Black Box Polynomials We study the problem of computing the largest root of a real rooted polynomial p(x) to within error ε given only black box access to it, i.e., for any x∈R, the algorithm can query an oracle for the value of p(x), but the algorithm is not allowed access to the coefficients of p(x). A folklore result for this problem is that the largest root of a polynomial can be computed in O(n log(1/ε)) polynomial queries using the Newton iteration. We give a simple algorithm that queries the oracle at only O(log n log(1/ε)) points, where n is the degree of the polynomial. Our algorithm is based on a novel approach for accelerating the Newton method by using higher derivatives. Based on joint work with Santosh Vempala |
||

11:45 to 12:30 | Praneeth Netrapalli |
How to escape saddle points efficiently Non-convex optimization is ubiquitous in modern machine learning applications. The most widely used methods to solve these non-convex problems in practice are gradient descent based algorithms. Unfortunately, however, it is well known that gradient descent and variants can get stuck in saddle points, while in practice it is desirable to converge to at least a local minimum. A good way to differentiate saddle points from local minima is the order of stationarity – in several non-convex problems, especially those related to matrix factorization, it has been shown recently that **allsecond order stationary points** are local minima while there are **several first order stationary points** which are saddle points. In this talk, we focus on understanding the convergence of gradient descent based methods to second order stationary points. In contrast, most existing results on gradient descent show convergence only to first order stationary points. We show the following. - Perturbed gradient descent converges to second order stationary points in a number of iterations which depends only poly-logarithmically on dimension (i.e., it is almost “dimension-free”). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors, and
- A variant of Nesterov’s accelerated gradient descent converges to second order stationary points at a faster rate than perturbed gradient descent.
The key technical components behind these results are - understanding geometry around saddle points
- designing a potential function for Nesterov’s accelerated gradient descent for non-convex problems.
Based on joint works with Chi Jin, Rong Ge, Sham M. Kakade and Mike Jordan. |
||

12:30 to 14:00 | -- | Lunch | ||

14:00 to 14:45 | Siddhartha Barman |
Algorithmic Applications of An Approximate Version of Caratheodory's Theorem In this talk I will present algorithmic applications of an approximate version of Caratheodory's theorem. The theorem states that given a set of d-dimensional vectors vectors X, for every vector in the convex hull of X there exists an epsilon-close (under the p-norm, for 2 <= p < \infinity) vector that can be expressed as a convex combination of at most b vectors of X, where the bound b depends on epsilon and the norm p, and is independent of the ambient dimension d. This theorem can be obtained by instantiating Maurey's lemma (c.f. Pisier 1980/81 and Carl 1985). I will describe how this approximate version of Caratheodory's theorem leads novel additive approximation algorithms for finding (i) Nash equilibria in two-player games (ii) dense subgraphs. https://arxiv.org/abs/1406.2296 |
||

14:45 to 15:30 | Rohit Gurjar | Derandomizing the Isolation Lemma and Parallel Algorithms | ||

15:30 to 16:00 | -- | Tea/Coffee | ||

16:00 to 16:45 | Ankit Garg |
Optimization in invariant theory, and applications to derandomization and quantum information theory Invariant theory is a classical field of mathematics that studies actions of groups on vector spaces and the underlying symmetries. An important object related to a group action is the null cone, which is the set of common zeroes of all homogeneous invariant polynomials. I will talk about the computational problem of testing membership in the null cone, its formulation in terms of a geodesically convex optimization problem and special cases for which we know polynomial time algorithms. I will also describe applications in derandomization (efficient deterministic algorithms for a non-commutative analogue of the polynomial identity testing problem) and quantum information theory. Based on joint works with Peter Burgisser, Leonid Gurvits, Rafael Oliveira, Michael Walter and Avi Wigderson. |
||

16:45 to 17:30 | Ramprasad Saptarishi |
Efficiently decoding Reed-Muller codes from random errors Consider the following setting. Suppose we are given as input a "corrupted" truth-table of a polynomial f(x1,..,xm) of degree r = o(√m), with a random set of 1/2 - o(1) fraction of the evaluations flipped. Can the polynomial f be recovered? Turns out we can, and we can do it efficiently! The above question is a demonstration of the reliability of Reed-Muller codes under the random error model. For Reed-Muller codes over F_2, the message is thought of as an m-variate polynomial of degree r, and its encoding is just the evaluation over all of F_2^m. In this talk, we shall study the resilience of RM codes in the *random error* model. We shall see that under random errors, RM codes can efficiently decode many more errors than its minimum distance. (For example, in the above toy example, minimum distance is about 1/2^{√m} but we can correct close to 1/2-fraction of random errors). This builds on a recent work of [Abbe-Shpilka-Wigderson-2015] who established strong connections between decoding erasures and decoding errors. The main result in this talk would be constructive versions of those connections that yield efficient decoding algorithms. Based on joint work with Amir Shpilka and Ben Lee Volk. |