09:30 to 10:20 |
Lénaïc Chizat (EPFL, Switzerland) |
Particle methods for optimization over measures (Lecture 2) Many data processing problems can be cast as minimizing a convex functional over (a subset of) the space of signed measures. These includes problems as diverse as training two-layer neural networks, sparse spikes deconvolution, Wasserstein barycenters, and trajectory inference. In this mini-course I will present recent conceptual tools and theoretical guarantees pertaining to "particle methods". This is a class of approaches to solve such problems which are versatile and practical, where the unknown measure is parameterized as a sum of Dirac masses. We will cover fixed-support methods, conic particle methods, noisy gradient methods and discuss their link with Wasserstein geometry, and (generalized) Langevin dynamics for sampling. Concepts from Optimal Transport, Stochastic Differential Equations (SDEs) and Partial Differential Equations (PDEs) will be introduced and used throughout the course.
|
|
|
10:30 to 11:20 |
Lénaïc Chizat (EPFL, Switzerland) |
Particle methods for optimization over measures (Lecture 3) Many data processing problems can be cast as minimizing a convex functional over (a subset of) the space of signed measures. These includes problems as diverse as training two-layer neural networks, sparse spikes deconvolution, Wasserstein barycenters, and trajectory inference. In this mini-course I will present recent conceptual tools and theoretical guarantees pertaining to "particle methods". This is a class of approaches to solve such problems which are versatile and practical, where the unknown measure is parameterized as a sum of Dirac masses. We will cover fixed-support methods, conic particle methods, noisy gradient methods and discuss their link with Wasserstein geometry, and (generalized) Langevin dynamics for sampling. Concepts from Optimal Transport, Stochastic Differential Equations (SDEs) and Partial Differential Equations (PDEs) will be introduced and used throughout the course.
|
|
|
11:50 to 12:40 |
Laurent Massoulié (INRIA, France) |
Graph alignment: informational and computational limits (Lecture 3) Graph alignment is a generic high-dimensional statistical inference problem with many applications, ranging from social network data de-anonymization to separation of functional regions in cortical surfaces. The aim of this short course is to present several recent theoretical results on graph alignment when the graphs to be aligned come from a specific generative model, namely the correlated Erdös-Rényi random graph model. Specifically, we shall first present results on information-theoretic limits to feasibility of graph alignment: below some threshold, the observed graphs do not contain enough information for alignment to be feasible, while above that threshold, some algorithms are known to succeed at alignment, although in exponential time. We shall then present results on computational feasibility of alignment, describing a second threshold which determines when a family of ‘local’, polynomial-time algorithms succeed at alignment. Together, these results show a rich phenomenology for graph alignment, displaying an ‘impossible phase’, a ‘hard phase’ and an ‘easy phase’ for feasibility of the task.
|
|
|
14:30 to 15:15 |
Anish Agarwal (Amazon, India) |
Causal Matrix Completion: Applications to Offline Causal Reinforcement Learning Matrix completion is the study of recovering an underlying matrix from a sparse subset of noisy observations. Traditionally, it is assumed entries of the matrix are “missing completely at random” (MCAR), i.e., each entry is revealed at random, independent of everything else, with uniform probability. This is likely unrealistic due to the presence of “latent confounders”, i.e., unobserved factors that determine both the entries of the underlying matrix and the missingness pattern in the observed matrix. For example, in the context of movie recommender systems—a canonical application for matrix completion—a user who vehemently dislikes horror films is unlikely to ever watch horror films. In general, these confounders yield “missing not at random” (MNAR) data, which can severely impact any inference procedure. We develop a formal causal model for matrix completion through the language of potential outcomes, and provide novel identification arguments for a variety of causal estimands of interest. We combine a nearest neighbors approach for matrix completion —popularly known as collaborative filtering—with the synthetic controls approach for panel data, to design a simple two-step algorithm, which we call “synthetic nearest neighbors” (SNN) to estimate these causal estimands. We prove entry-wise finite-sample consistency and asymptotic normality of the SNN estimator for matrix completion with MNAR data.
Lastly, we show how algorithms for reinforcement learning, where the data is collected offline and there is potentially unobserved confounding in how actions are picked, can be effectively designed through the lens of causal tensor completion, an extension of matrix completion to higher dimensions.
|
|
|
15:15 to 16:00 |
Siva Theja Maguluri (Georgia tech, USA) |
Finite-time Convergence Guarantees of Contractive Stochastic Approximation: Mean-Square and Tail Bounds Motivated by applications in Reinforcement Learning (RL), this talk focuses on the Stochastic Appproximation (SA) method to find fixed points of a contractive operator. First proposed by Robins and Monro, SA is a popular approach for solving fixed point equations when the information is corrupted by noise. We consider the SA algorithm for operators that are contractive under arbitrary norms (especially the l-infinity norm). We present finite sample bounds on the mean square error, which are established using a Lyapunov framework based on infimal convolution and generalized Moreau envelope. We then present our more recent result on concentration of the tail error, even when the iterates are not bounded by a constant. These tail bounds are obtained using exponential supermartingales in conjunction with the Moreau envelop and a novel bootstrapping approach. Our results immediately imply the state-of-the-art sample complexity results for a large class of RL algorithms.
|
|
|
16:30 to 17:30 |
Michael Jordan (University of California, USA) |
An Alternative View on AI: Collaborative Learning, Incentives, Social Welfare, and Dynamics Artificial intelligence (AI) has focused on a paradigm in which intelligence inheres in a single, autonomous agent. Social issues are entirely secondary in this paradigm. When AI systems are deployed in social contexts, however, the overall design of such systems is often naive---a centralized entity provides services to passive agents and reaps the rewards. Such a paradigm need not be the dominant paradigm for information technology. In a broader framing, agents are active, they are cooperative, and they wish to obtain value from their participation in learning-based systems. Agents may supply data and other resources to the system, only if it is in their interest to do so. Critically, intelligence inheres as much in the overall system as it does in individual agents, be they humans or computers. This is a perspective familiar in the social sciences, and a first goal in this line of work is to bring economics into contact with the computing and data sciences. The long-term goal is two-fold---to provide a broader conceptual foundation for emerging real-world AI systems, and to upend received wisdom in the computational, economic, and inferential disciplines.
|
|
|