Tuesday, 02 January 2018

This talk will offer a brief introduction to the problem of recovering block structures in graphical models from observed labellings. Such problems arise, e.g., in the general paradigm of "community detection", where one knows that a system consists of components that can be divided into two different "communities", but needs to find which community each component is in by looking at some observed behaviour of the individual components. After a brief introduction to the classical statistical approaches for modelling this problem, we will look at the a recent approach inspired from statistical mechanics that is based on a variant of the Ising model. We will then see how the phase transitions in this Ising model relate to the statistical problem of recovering block structures.

Based on joint work with Quentin Berthet and Philippe Rigollet.

Fundamental problems in unsupervised machine learning and average-case-complexity are concerned with recovering some hidden signal or structure surrounded by random noise. Examples include Sparse PCA, Learning Mixtures of Gaussians, Planted Clique, and Refuting Random Constraint Satisfaction Problems (CSPs). The fundamental question in such problems is determining the minimum signal to noise ratio (SNR) - the relative strength of signal compared to the noise - that allows efficient recovery of the hidden structure. The state-of-the-art algorithms for these problems are based on convex relaxations/spectral methods and typically require SNR that is markedly higher than the statistical threshold - the minimum SNR for recovery using a possibly inefficient algorithm. Are these gaps inherent? Can we reliably predict computational thresholds?

The goal of this talk is to introduce the sum-of-squares semi-definite programming hierarchy as an algorithmic framework to reliably answer such questions. The SoS method is a general purpose scheme of semi-definite programming relaxations that captures and extends all known algorithmic techniques for solving the problems above. At the same time, recent developments have yielded a principled method to obtain the precise SoS-SNR thresholds. In the absence of the general techniques to prove average-case lower bounds based on standard assumptions, SoS lower bounds allowing a systematic approach for understanding computational vs statistical complexity trade-offs within this algorithmic framework. In the talk, I'll outline the use of the SoS method to design algorithms for average-case problems and discuss the principled method to prove sharp SoS lower bounds.

We study the problem of computing the largest root of a real rooted polynomial p(x) to within error ε given only black box access to it, i.e., for any x∈R, the algorithm can query an oracle for the value of p(x), but the algorithm is not allowed access to the coefficients of p(x). A folklore result for this problem is that the largest root of a polynomial can be computed in O(n log(1/ε)) polynomial queries using the Newton iteration. We give a simple algorithm that queries the oracle at only O(log n log(1/ε)) points, where n is the degree of the polynomial. Our algorithm is based on a novel approach for accelerating the Newton method by using higher derivatives.

Based on joint work with Santosh Vempala

Non-convex optimization is ubiquitous in modern machine learning applications. The most widely used methods to solve these non-convex problems in practice are gradient descent based algorithms. Unfortunately, however, it is well known that gradient descent and variants can get stuck in saddle points, while in practice it is desirable to converge to at least a local minimum. A good way to differentiate saddle points from local minima is the order of stationarity – in several non-convex problems, especially those related to matrix factorization, it has been shown recently that **allsecond order stationary points** are local minima while there are **several first order stationary points** which are saddle points.

In this talk, we focus on understanding the convergence of gradient descent based methods to second order stationary points. In contrast, most existing results on gradient descent show convergence only to first order stationary points. We show the following.

- Perturbed gradient descent converges to second order stationary points in a number of iterations which depends only poly-logarithmically on dimension (i.e., it is almost “dimension-free”). The convergence rate of this procedure matches the well-known convergence rate of gradient descent to first-order stationary points, up to log factors, and
- A variant of Nesterov’s accelerated gradient descent converges to second order stationary points at a faster rate than perturbed gradient descent.

The key technical components behind these results are

- understanding geometry around saddle points
- designing a potential function for Nesterov’s accelerated gradient descent for non-convex problems.

Based on joint works with Chi Jin, Rong Ge, Sham M. Kakade and Mike Jordan.

In this talk I will present algorithmic applications of an approximate version of Caratheodory's theorem. The theorem states that given a set of d-dimensional vectors vectors X, for every vector in the convex hull of X there exists an epsilon-close (under the p-norm, for 2 <= p < \infinity) vector that can be expressed as a convex combination of at most b vectors of X, where the bound b depends on epsilon and the norm p, and is independent of the ambient dimension d. This theorem can be obtained by instantiating Maurey's lemma (c.f. Pisier 1980/81 and Carl 1985).

I will describe how this approximate version of Caratheodory's theorem leads novel additive approximation algorithms for finding (i) Nash equilibria in two-player games (ii) dense subgraphs. https://arxiv.org/abs/1406.2296

Invariant theory is a classical field of mathematics that studies actions of groups on vector spaces and the underlying symmetries. An important object related to a group action is the null cone, which is the set of common zeroes of all homogeneous invariant polynomials. I will talk about the computational problem of testing membership in the null cone, its formulation in terms of a geodesically convex optimization problem and special cases for which we know polynomial time algorithms. I will also describe applications in derandomization (efficient deterministic algorithms for a non-commutative analogue of the polynomial identity testing problem) and quantum information theory.

Based on joint works with Peter Burgisser, Leonid Gurvits, Rafael Oliveira, Michael Walter and Avi Wigderson.

Consider the following setting. Suppose we are given as input a "corrupted" truth-table of a polynomial f(x1,..,xm) of degree r = o(√m), with a random set of 1/2 - o(1) fraction of the evaluations flipped. Can the polynomial f be recovered? Turns out we can, and we can do it efficiently!

The above question is a demonstration of the reliability of Reed-Muller codes under the random error model. For Reed-Muller codes over F_2, the message is thought of as an m-variate polynomial of degree r, and its encoding is just the evaluation over all of F_2^m.

In this talk, we shall study the resilience of RM codes in the *random error* model. We shall see that under random errors, RM codes can efficiently decode many more errors than its minimum distance. (For example, in the above toy example, minimum distance is about 1/2^{√m} but we can correct close to 1/2-fraction of random errors). This builds on a recent work of [Abbe-Shpilka-Wigderson-2015] who established strong connections between decoding erasures and decoding errors. The main result in this talk would be constructive versions of those connections that yield efficient decoding algorithms.

Based on joint work with Amir Shpilka and Ben Lee Volk.

Wednesday, 03 January 2018

In this talk, I will introduce a new class of combinatorial markets in which agents have covering constraints over resources required and are interested in delay minimization. Our market model is applicable to several settings including cloud computing (scheduling) and communicating over a network.

I will discuss how this model is quite different from the traditional models, to the extent that neither do the classical equilibrium existence results seem to apply to it nor do any of the efficient algorithmic techniques developed to compute equilibria. In particular, our model does not satisfy the condition of non-satiation, which is used critically to show the existence of equilibria in traditional market models, and show that our set of equilibrium prices could be a connected, non-convex set.

I will discuss a proof of the existence of equilibria and a polynomial time algorithm for finding one, drawing heavily on techniques from LP duality and submodular minimization. Finally, I will show that our model inherits many of the fairness properties of traditional equilibrium models as well as new models, such as CEEI.

Based on a joint work with Nikhil Devanur, Jugal Garg, Vijay V. Vazirani, and Sadra Yazdanbod

Fisher market equilibrium is a powerful solution concept that has found several surprising applications even in non-market settings which do not involve any exchange of money but only require the remarkable fairness and efficiency properties of equilibria, e.g., scheduling, auctions, mechanism design, fair division, etc. A very recent new application is approximation algorithms for maximizing the Nash social welfare (NSW) when allocating a set of indivisible items to a set of agents.

In this talk, I will start with the Fisher market model and its connection with the NSW problem. Then, I will show how to design a constant-factor approximation algorithm for maximizing the NSW when agents have budget-additive valuations. Budget-additive valuations represent an important class of submodular functions. They have attracted a lot of research interest in recent years due to many interesting applications. For every e > 0, our algorithm obtains a (2.404 + e)-approximation in time polynomial in the input size and 1/e.

Based on a joint work with Martin Hoefer and Kurt Mehlhorn.

This talk is about learning Sums of Independent Commonly Supported Integer Random Variables, or SICSIRVs.

For S a finite set of natural numbers, an "S-supported SICSIRV" is a random variable which is the sum of N independent random variables each of which is supported on S. So, for example, if S = {0,1}, then an S-supported SICSIRV is a sum of N independent but not necessarily identically distributed Bernoulli random variables (a.k.a. a Poisson Binomial random variable). Daskalakis et. al. (STOC 2012, FOCS 2013) have shown that for S={0,1,...,k-1}, there is an algorithm for learning an unknown S-SICSIRV using poly(k,1/\epsilon) draws from the distribution and running in poly(k,1/\epsilon) time, independent of N.

In this work we investigate the problem of learning an unknown S-SICSIRV where S={a_1,...,a_k} may be any finite set. We show that

- when |S|=3, it is possible to learn any S-SICSIRV with poly(1/\epsilon) samples, independent of the values a_1,a_2,a_3;
- when |S| >= 4, it is possible to learn any S-SICSIRV with poly(1/\epsilon, \log \log a_k) samples; and
- already when |S|=4, at least \log \log a_k samples may be required for learning.

We thus establish a sharp transition in the complexity of learning S-SICSIRVs when the size of S goes from three to four.

Based on joint work with Phil Long (Google) and Rocco Servedio (Columbia)

At the heart of most algorithms today, there is an optimization engine trying to provide the best decision with partial information observed thus far in time, the so-called problem of online learning. Often it becomes difficult to find near-optimal solutions to these problems due to their inherent combinatorial structure that leads to certain computational bottlenecks, for instance, computing Bregman projections for online mirror descent and its variants. Motivated by these bottlenecks, we consider three fundamental convex and combinatorial optimization problems. First, we provide a conceptually simple algorithm to minimize separable convex functions over submodular base polytopes. For cardinality-based submodular functions, we show the current fastest-known running time of O(n(log n+d)), where n is the size of the ground set and d is the number of distinct values of the submodular function (d<=n). Next, we consider the problem of movement along a line while staying feasible in submodular polytopes. The use of the discrete Newton’s method for this line search problem is natural, but no strongly polynomial bound on its number of iterations was known. We solve this open problem by providing a quadratic bound of O(n2), resulting in a running time improvement by at least n6 over the state of the art. Finally, we show how efficient counting methods can be used for convex minimization.

Based on oint work with Michel Goemans and Patrick Jaillet.