Monday, 05 August 2019
New challenges in Data Science necessitate the combination of optimization techniques and Monte Carlo methods. Incorporating optimization tools in Monte Carlo methods is a known technique to improve their efficiency in order to learn on the fly a better value of some design parameters. Incorporating Monte Carlo techniques in Optimization algorithms allows the introduction of (stochastic) numerical approximations of intractable quantities. In both cases, this yields algorithms which are perturbations of simpler ones whose asymptotic behavior is (usually) well known. When introducing these intertwinings, it is fundamental to be sure that it will not destroy the convergence: convergence of the sampler to the target distribution, or convergence of the optimization method to the target set of solutions.
In the lectures, two points of view will be successively considered:

sufficient conditions for the adaptation mecanism of Monte Carlo algorithms, in order to identify the asymptotic distribution of the sampled points.

sufficient conditions on the stochastic perturbations, in order to ensure the correct convergence of the perturbed optimization algorithm. We will essentially consider the case of Stochastic Approximation (SA) algorithms (possibly also, MajorizeMinimization methods).
As a preliminary of these two parts, examples from Computational Statistics will be introduced.
In both parts, we will consider the Markovian case: Monte Carlo methods based on Markov chains (MCMC); Stochastic Optimization fed with MCMC samples, which implies, as a unusual setting for SA, a biased stochastic approximation of intractable quantities.
The task of estimating matrix based on its noisy, partial observation has emerged as the canonical challenge across a variety of fields spanning Inference, Machine Learning and Statistics over the past decade or so. Popularized examples abound, including Recommendation systems, Asymptotic Graph Theory (e.g. Graphons), Network Science (e.g. Community Detection), Social Data Processing (e.g. Ranking and Crowd Sourcing), Causal Inference (e.g. Synthetic Control), Panel Data Analysis, Bioinformatics (e.g. DNA sequencing) and more.
The purpose of this tutorial is to provide a comprehensive survey of various algorithmic and analytic approaches developed over the past decade. The goal is to ground these developments in the context of a “universal” model through connections with the theory of exchangeability (i.e. De Finetti (1937), Aldous and Hoover (1980s)). A particular attention will be paid to statistical and computational tradeoff that arise in this class of problems. Open questions pertaining to conjectured fundamental limits and mysterious empirical algorithmic successes will be discussed.
 Matrix Estimation: Model, Statement and Examples
 Overview of Algorithms
 Singular Value Thresholding
The course will discuss a variety of stochastic processes over networks, which are both of interest in their own right and also play a role in many applications in computer science, as well as the social and biological sciences.
Lecture 1: Rumours and first passage percolation
Consider the complete graph on n vertices, where there is an edge between every pair of vertices. We associate with each edge a random weight, or length, drawn from an exponential distribution, independently across edges. We begin by studying the lengths of shortest paths between two vertices, its expectation, and fluctuations. We go on to study some related combinatorial optimisation problems.
Lecture 2: Voter model and variants
Consider a set of n agents connected by an arbitrary network. Each agent has an opinion, taking values in a finite set. Agents update their opinions by contacting other agents at random times and adopting their opinions. We study the time to consensus, and the probability of consensus on each possible value, in this model. We go on to discuss some variants of this model.
Lecture 3: Epidemics
We study the contact process or SIS epidemic on a connected network. Here, each node can be infected or susceptible. Susceptible nodes are infected at some constant rate by each of their infected neighbours, while infected nodes spontaneously recover after a random time and become immediately susceptible. We describe the survival time of the epidemic in large networks, and a sharp transition that it exhibits in many random graph models.
New challenges in Data Science necessitate the combination of optimization techniques and Monte Carlo methods. Incorporating optimization tools in Monte Carlo methods is a known technique to improve their efficiency in order to learn on the fly a better value of some design parameters. Incorporating Monte Carlo techniques in Optimization algorithms allows the introduction of (stochastic) numerical approximations of intractable quantities. In both cases, this yields algorithms which are perturbations of simpler ones whose asymptotic behavior is (usually) well known. When introducing these intertwinings, it is fundamental to be sure that it will not destroy the convergence: convergence of the sampler to the target distribution, or convergence of the optimization method to the target set of solutions.
In the lectures, two points of view will be successively considered:

sufficient conditions for the adaptation mecanism of Monte Carlo algorithms, in order to identify the asymptotic distribution of the sampled points.

sufficient conditions on the stochastic perturbations, in order to ensure the correct convergence of the perturbed optimization algorithm. We will essentially consider the case of Stochastic Approximation (SA) algorithms (possibly also, MajorizeMinimization methods).
As a preliminary of these two parts, examples from Computational Statistics will be introduced.
In both parts, we will consider the Markovian case: Monte Carlo methods based on Markov chains (MCMC); Stochastic Optimization fed with MCMC samples, which implies, as a unusual setting for SA, a biased stochastic approximation of intractable quantities.
The talk will describe a controlled version of the CollatzWielandt formula for principal eigenvalue of a nonnegative matrix, for the optimal risksensitive reward for discrete time risksensitive reward problems. This in turn leads to a DonskerVaradhan type variational formula for the same. Extensions to continuous time risksensitive problems and applications to control with chance constraints and linear/dynamic programming formulations for degenerate finite state/action risksensitive reward problems are described.
Parts of these results are joint work with Venkat Anantharam, Ari Arapostathis, K. Suresh Kumar and Jerzy Flar.
Tuesday, 06 August 2019
New challenges in Data Science necessitate the combination of optimization techniques and Monte Carlo methods. Incorporating optimization tools in Monte Carlo methods is a known technique to improve their efficiency in order to learn on the fly a better value of some design parameters. Incorporating Monte Carlo techniques in Optimization algorithms allows the introduction of (stochastic) numerical approximations of intractable quantities. In both cases, this yields algorithms which are perturbations of simpler ones whose asymptotic behavior is (usually) well known. When introducing these intertwinings, it is fundamental to be sure that it will not destroy the convergence: convergence of the sampler to the target distribution, or convergence of the optimization method to the target set of solutions.
In the lectures, two points of view will be successively considered:

sufficient conditions for the adaptation mecanism of Monte Carlo algorithms, in order to identify the asymptotic distribution of the sampled points.

sufficient conditions on the stochastic perturbations, in order to ensure the correct convergence of the perturbed optimization algorithm. We will essentially consider the case of Stochastic Approximation (SA) algorithms (possibly also, MajorizeMinimization methods).
As a preliminary of these two parts, examples from Computational Statistics will be introduced.
In both parts, we will consider the Markovian case: Monte Carlo methods based on Markov chains (MCMC); Stochastic Optimization fed with MCMC samples, which implies, as a unusual setting for SA, a biased stochastic approximation of intractable quantities.
The task of estimating matrix based on its noisy, partial observation has emerged as the canonical challenge across a variety of fields spanning Inference, Machine Learning and Statistics over the past decade or so. Popularized examples abound, including Recommendation systems, Asymptotic Graph Theory (e.g. Graphons), Network Science (e.g. Community Detection), Social Data Processing (e.g. Ranking and Crowd Sourcing), Causal Inference (e.g. Synthetic Control), Panel Data Analysis, Bioinformatics (e.g. DNA sequencing) and more.
The purpose of this tutorial is to provide a comprehensive survey of various algorithmic and analytic approaches developed over the past decade. The goal is to ground these developments in the context of a “universal” model through connections with the theory of exchangeability (i.e. De Finetti (1937), Aldous and Hoover (1980s)). A particular attention will be paid to statistical and computational tradeoff that arise in this class of problems. Open questions pertaining to conjectured fundamental limits and mysterious empirical algorithmic successes will be discussed.
 Convex Relaxation
 Non Parametric Nearest Neighbor
 Connection to Collaborative Filtering
The course will discuss a variety of stochastic processes over networks, which are both of interest in their own right and also play a role in many applications in computer science, as well as the social and biological sciences.
Lecture 1: Rumours and first passage percolation
Consider the complete graph on n vertices, where there is an edge between every pair of vertices. We associate with each edge a random weight, or length, drawn from an exponential distribution, independently across edges. We begin by studying the lengths of shortest paths between two vertices, its expectation, and fluctuations. We go on to study some related combinatorial optimisation problems.
Lecture 2: Voter model and variants
Consider a set of n agents connected by an arbitrary network. Each agent has an opinion, taking values in a finite set. Agents update their opinions by contacting other agents at random times and adopting their opinions. We study the time to consensus, and the probability of consensus on each possible value, in this model. We go on to discuss some variants of this model.
Lecture 3: Epidemics
We study the contact process or SIS epidemic on a connected network. Here, each node can be infected or susceptible. Susceptible nodes are infected at some constant rate by each of their infected neighbours, while infected nodes spontaneously recover after a random time and become immediately susceptible. We describe the survival time of the epidemic in large networks, and a sharp transition that it exhibits in many random graph models.
The task of estimating matrix based on its noisy, partial observation has emerged as the canonical challenge across a variety of fields spanning Inference, Machine Learning and Statistics over the past decade or so. Popularized examples abound, including Recommendation systems, Asymptotic Graph Theory (e.g. Graphons), Network Science (e.g. Community Detection), Social Data Processing (e.g. Ranking and Crowd Sourcing), Causal Inference (e.g. Synthetic Control), Panel Data Analysis, Bioinformatics (e.g. DNA sequencing) and more.
The purpose of this tutorial is to provide a comprehensive survey of various algorithmic and analytic approaches developed over the past decade. The goal is to ground these developments in the context of a “universal” model through connections with the theory of exchangeability (i.e. De Finetti (1937), Aldous and Hoover (1980s)). A particular attention will be paid to statistical and computational tradeoff that arise in this class of problems. Open questions pertaining to conjectured fundamental limits and mysterious empirical algorithmic successes will be discussed.
 Nonconvex Approach: Matrix Estimation as a CaseStudy
 Beyond Matrix Estimation: Tensors, PCR, Time Series Imputation
Continuing from the short course lectures on the topic, this talk will present recent works on Stochastic Approximation combined with adaptive MCMC methods, and motivated by Computational Machine Learning problems. More precisely, we will consider the convergence of ProximalGradient based methods (also called “forwardbackward” methods) for the optimization of a composite objective function i.e. a function defined as the sum a smooth part and of a nonsmooth part. The convergence will be established, and we will discuss many implementation issues such as: choice of the design parameters, benefit of averaging techniques, benefit of Nesterov accelerations schemes, etc.
Wednesday, 07 August 2019
I shall gently introduce and review a class of reversible spin systems evolving with constrained Glauber dynamics known as "kinetically constrained models" (KCM). After being introduced in the physics community in order to better understand in simple models some of the basic features of the socalled "glassy dynamics", they became the object of mathematical research starting with the first work by D. Aldous and P. Diaconis in 2001. KCM also appear naturally in some random walk models for upper triangular matrices and some of the techniques developed to study their time evolution have been used in other contexts (e.g. dynamics of random surfaces). In order to make the three lectures as selfcontained as possible, during the first lecture I shall quickly review the basic notions of mixing time and relaxation time for continuous time, reversible Markov chains.
As a tool to understand the relative entropy decay and approach to stationarity of finite Markov chains, the socalled Modified Logarithmic Sobolev Inequality (MLSI) has been in the literature for at least a couple of decades. The study of various versions of the MLSI and relation to other functional inequalities on discrete spaces was formally introduced (in 2003) by Bobkov and the speaker. While typically harder to establish than the spectral gap (Poincare), thanks to various researchers, recently there has been impressive progress on this topic in a few directions – concentration of measure, mixing time of the matroid basis exchange random walk, as well as connections to notions of discrete curvature.
The author will review the above in a series of 3 lectures in a reasonably selfcontained manner and will mention several open problems.
The course will discuss a variety of stochastic processes over networks, which are both of interest in their own right and also play a role in many applications in computer science, as well as the social and biological sciences.
Lecture 1: Rumours and first passage percolation
Consider the complete graph on n vertices, where there is an edge between every pair of vertices. We associate with each edge a random weight, or length, drawn from an exponential distribution, independently across edges. We begin by studying the lengths of shortest paths between two vertices, its expectation, and fluctuations. We go on to study some related combinatorial optimisation problems.
Lecture 2: Voter model and variants
Consider a set of n agents connected by an arbitrary network. Each agent has an opinion, taking values in a finite set. Agents update their opinions by contacting other agents at random times and adopting their opinions. We study the time to consensus, and the probability of consensus on each possible value, in this model. We go on to discuss some variants of this model.
Lecture 3: Epidemics
We study the contact process or SIS epidemic on a connected network. Here, each node can be infected or susceptible. Susceptible nodes are infected at some constant rate by each of their infected neighbours, while infected nodes spontaneously recover after a random time and become immediately susceptible. We describe the survival time of the epidemic in large networks, and a sharp transition that it exhibits in many random graph models.
As a tool to understand the relative entropy decay and approach to stationarity of finite Markov chains, the socalled Modified Logarithmic Sobolev Inequality (MLSI) has been in the literature for at least a couple of decades. The study of various versions of the MLSI and relation to other functional inequalities on discrete spaces was formally introduced (in 2003) by Bobkov and the speaker. While typically harder to establish than the spectral gap (Poincare), thanks to various researchers, recently there has been impressive progress on this topic in a few directions – concentration of measure, mixing time of the matroid basis exchange random walk, as well as connections to notions of discrete curvature.
The author will review the above in a series of 3 lectures in a reasonably selfcontained manner and will mention several open problems.
The “what ifs?” or ability to explore counterfactuals is central to the study of causal inference. Randomized control and A/B testing provides an approach to address this when counterfactuals can be experimented simultaneously. However, in a large number of scenarios such as policy evaluation, this is not feasible: we can’t have two Massachusetts, one having Gun Control and the other not at the same time, so that we can evaluate the impact of Gun Control on crime rate!
To address this challenge in a datadriven manner, the method of synthetic control was proposed by Abadie et al (2003) where synthetic or virtual counterfactuals are formed using historical data: impose Gun Control in Massachusetts to observe the crime rate under Gun Control and estimate the crime rate of Massachusetts without Gun Control as a combination of the crime rate of other states (without Gun Control). Despite tremendous success of this method, it suffers from two limitations: it is not robust to noisy, missing observations and it is not able to incorporate auxiliary information such as High School Dropout rate for estimating counterfactuals with respect to Crime Rate. In this talk, we will describe a method to address these limitations building on recent advances in Matrix and Tensor estimation, and present a novel application as a forecasting method for time series trajectories exemplified through the game of Cricket.
Thursday, 08 August 2019
As a tool to understand the relative entropy decay and approach to stationarity of finite Markov chains, the socalled Modified Logarithmic Sobolev Inequality (MLSI) has been in the literature for at least a couple of decades. The study of various versions of the MLSI and relation to other functional inequalities on discrete spaces was formally introduced (in 2003) by Bobkov and the speaker. While typically harder to establish than the spectral gap (Poincare), thanks to various researchers, recently there has been impressive progress on this topic in a few directions – concentration of measure, mixing time of the matroid basis exchange random walk, as well as connections to notions of discrete curvature.
The author will review the above in a series of 3 lectures in a reasonably selfcontained manner and will mention several open problems.
I shall gently introduce and review a class of reversible spin systems evolving with constrained Glauber dynamics known as "kinetically constrained models" (KCM). After being introduced in the physics community in order to better understand in simple models some of the basic features of the socalled "glassy dynamics", they became the object of mathematical research starting with the first work by D. Aldous and P. Diaconis in 2001. KCM also appear naturally in some random walk models for upper triangular matrices and some of the techniques developed to study their time evolution have been used in other contexts (e.g. dynamics of random surfaces). In order to make the three lectures as selfcontained as possible, during the first lecture I shall quickly review the basic notions of mixing time and relaxation time for continuous time, reversible Markov chains.
Sampling distributions over highdimensional statespace is a problem which has recently attracted a lot of research efforts; applications include Bayesian nonparametrics, Bayesian inverse problems and aggregation of estimators. All these problems boil down to sample a target distribution $\pi$ having a density w.r.t. the Lebesgue measure on $\mathbb{R}^d$, known up to a normalisation factor: $x \mapsto \frac{1}{Z}e^{U(x)}$, where $U$ is continuously differentiable and smooth. In this paper, we study a sampling technique based on the Euler discretization of the Langevin stochastic differential equation. Contrary to the Metropolis Adjusted Langevin Algorithm (MALA), we do not apply a MetropolisHastings correction. We obtain for both constant and decreasing step sizes in the Euler discretization, nonasymptotic bounds for the convergence to stationarity in both total variation and Wasserstein distances. A particular attention is paid on the dependence on the dimension of the state space, to demonstrate the applicability of this method in the high dimensional setting. These bounds are based on recently obtained estimates of the convergence of the Langevin diffusion to stationarity using either transport, PoincarÃ© or logSobolev inequalities. We also investigate the convergence of an appropriately weighted empirical measure and we report sharp bounds for the mean square error and exponential deviation inequality for Lipschitz functions.
We will also cover the stochastic version of the Langevin dynamics.
I shall gently introduce and review a class of reversible spin systems evolving with constrained Glauber dynamics known as "kinetically constrained models" (KCM). After being introduced in the physics community in order to better understand in simple models some of the basic features of the socalled "glassy dynamics", they became the object of mathematical research starting with the first work by D. Aldous and P. Diaconis in 2001. KCM also appear naturally in some random walk models for upper triangular matrices and some of the techniques developed to study their time evolution have been used in other contexts (e.g. dynamics of random surfaces). In order to make the three lectures as selfcontained as possible, during the first lecture I shall quickly review the basic notions of mixing time and relaxation time for continuous time, reversible Markov chains.
There are many models of random symmetric square matrices for which the empirical distribution of eigenvalues converges to a nonrandom limiting distribution. Starting with Wigner, a powerful method to show this has been the method of moments, which works by first showing that the trace of any power of the matrix converges (after scaling appropriately) to a numbers, and secondly that these numbers form the moment sequence of a unique probability measure. However, it is in general a difficult problem to determine properties of a distribution, such as the absolute continuity or smoothness of density, from the moment sequence. In this talk we work with random Toeplitz and Hankel matrices and show the absolute continuity of the corresponding limiting distributions. The existence of the limiting distribution was already shown by Bryc, Dembo and Jiang (2003/06) and the absolute continuity was settled for the Toeplitz case by Sen and Virag (2011). The result is new for the Hankel matrix. Our methods also work for Toeplitz and Hankel matrices defined by certain other groups such as Zd. The key idea is to write the random matrix under consideration as a sum of two or more independent random circulantlike matrices.
All this is joint work with Anish Mallick.
Friday, 09 August 2019
Sampling distributions over highdimensional statespace is a problem which has recently attracted a lot of research efforts; applications include Bayesian nonparametrics, Bayesian inverse problems and aggregation of estimators. All these problems boil down to sample a target distribution $\pi$ having a density w.r.t. the Lebesgue measure on $\mathbb{R}^d$, known up to a normalisation factor: $x \mapsto \frac{1}{Z}e^{U(x)}$, where $U$ is continuously differentiable and smooth. In this paper, we study a sampling technique based on the Euler discretization of the Langevin stochastic differential equation. Contrary to the Metropolis Adjusted Langevin Algorithm (MALA), we do not apply a MetropolisHastings correction. We obtain for both constant and decreasing step sizes in the Euler discretization, nonasymptotic bounds for the convergence to stationarity in both total variation and Wasserstein distances. A particular attention is paid on the dependence on the dimension of the state space, to demonstrate the applicability of this method in the high dimensional setting. These bounds are based on recently obtained estimates of the convergence of the Langevin diffusion to stationarity using either transport, PoincarÃ© or logSobolev inequalities. We also investigate the convergence of an appropriately weighted empirical measure and we report sharp bounds for the mean square error and exponential deviation inequality for Lipschitz functions.
We will also cover the stochastic version of the Langevin dynamics.
Sampling distributions over highdimensional statespace is a problem which has recently attracted a lot of research efforts; applications include Bayesian nonparametrics, Bayesian inverse problems and aggregation of estimators. All these problems boil down to sample a target distribution $\pi$ having a density w.r.t. the Lebesgue measure on $\mathbb{R}^d$, known up to a normalisation factor: $x \mapsto \frac{1}{Z}e^{U(x)}$, where $U$ is continuously differentiable and smooth. In this paper, we study a sampling technique based on the Euler discretization of the Langevin stochastic differential equation. Contrary to the Metropolis Adjusted Langevin Algorithm (MALA), we do not apply a MetropolisHastings correction. We obtain for both constant and decreasing step sizes in the Euler discretization, nonasymptotic bounds for the convergence to stationarity in both total variation and Wasserstein distances. A particular attention is paid on the dependence on the dimension of the state space, to demonstrate the applicability of this method in the high dimensional setting. These bounds are based on recently obtained estimates of the convergence of the Langevin diffusion to stationarity using either transport, PoincarÃ© or logSobolev inequalities. We also investigate the convergence of an appropriately weighted empirical measure and we report sharp bounds for the mean square error and exponential deviation inequality for Lipschitz functions.
We will also cover the stochastic version of the Langevin dynamics.
We revisit policies for multiarmed bandits based on the costbiased maximum likelihood method developed earlier for adaptive control of Markov chains. We establish regret bounds both in an expected as well as in a sample path sense, and exhibit simulation results that suggest they have the best empirical regret performance compared to current alternatives for exponential family rewards distributions. We also provide comparative results visavis current candidates on computational times per decision.
[Joint work with Xi Liu, PingChun Hsieh, and Anirban Bhattacharya].
We consider algorithms with unbounded computation time that probe the entries of the adjacency matrix of an nvertex graph and output a clique. We show positive and negative results  what is doable versus unlikely  assuming the input graph is an ErdősRényi random graph with edge probability 1/21/2, under the assumption that the algorithm uses a constant number of rounds, with each round limited to a linear number (or more generally, o(n2)o(n2)) of probes. Several questions remain open.
This is joint work with Uri Feige, David Gamarnik, Joe Neeman, and Miklos Racz.
Saturday, 10 August 2019
Kinetically constrained models (KCM) are reversible interacting particle systems with continuous time Markov dynamics of Glauber type, which have been extensively used in the physics literature to model the liquidglass transition, a major and longstanding open problem in condensed matter physics. They also represent a natural stochastic (and nonmonotone) counterpart of the family of cellular automata known as ð’°bootstrap percolation thoroughly analyzed by P. Balister, B. BollobÃ¡s, H. DuminilCopin, R. Morris, J. Przykucki, P. Smith and A. Uzzell. I shall present a series of universality results for the mean infection time of the origin for KCM which have been obtained in various collaborations with C. Toninelli, L. MarÃªchÃ©, I. Hartarski, and R. Morris.
Consider a finitely banded Toeplitz matrix $T_N$ and let $E_N$ be an error matrix with $N^{1/2}\E_N\_{{\rm HS}} \to 0$, polynomially in its dimension $N$. In this talk, we will discuss spectral properties of $T_N+E_N$ such as the limit of the bulk of the spectrum as well as identification of the point process induced by the outliers. We will see that under a very mild assumption on the entries of $E_N$ the limit is a push forward of the uniform measure on the unit circle by the symbol of the Toeplitz matrix $T_N$. Furthermore, there are no outliers outside the spectrum of the limiting Toeplitz operator, whereas in its complement the limit of the point process induced by the outliers is given by the zero set of some random analytic function. These confirm the predictions based on the pseudospectrum of $T_N$.
The talk is based on joint works with Elliot Paquette and Ofer Zeitouni.
Variational Bayesian (VB) methods have emerged as a highly competitive, fast alternative to Markov chain Monte Carlo methods for computing Bayesian posteriors. The essential idea behind VB is to turn the posterior computation problem into one of optimization over an appropriate measure (sub)space. This talk will first present an overview of the VB method and recent progress on understanding properties of VB posterior estimators, in particular asymptotic consistency. I will then pivot towards our recent work on using VB methods for datadriven decisionmaking problems contextualized by parameterized stochastic models, where tail risks are important. In particular, I will present our recent work on bounding optimality gaps for a risksensitive VB methodology and point out connections to recent work on fairness in machine learning methods.
This is joint work with Vinayak A. Rao (Purdue Statistics) and Prateek Jaiswal (Purdue Industrial Engineering.
We consider homogeneous (RCM) and an inhomogeneous (iRCM) random connection models in the connectivity regime. The vertex set is a homogeneous Poisson point process on the unit torus. A edge exists between any two pair of nodes with a probability specified by a connection function. In the iRCM the vertices are endowed with independently distributed weights and the connection function depends on these weights. The weights are assumed to have a heavy tailed distribution. We shall discuss scaling regimes under which the isolated nodes in the two models stabilize as the intensity becomes large. In this regime the number of isolated nodes converges in distribution to a Poisson random variable. We also discuss sufficient conditions for the graph to be connected with high probability.
While most theoretical analyses of stochastic gradient descent, as applied to empirical risk minimization problems, consider sampling stochastic gradients with replacement, in practice, stochastic gradient descent without replacement (SGDo) is the method of choice as it is empirically observed to have much better rate of convergence (Bottou, 2009). However, SGDo's convergence properties are not well understood as sampling without replacement leads to coupling between iterates and gradients. By using method of exchangeable pairs to bound Wasserstein distance, we provide the first nonasymptotic results for SGDo when applied to general smooth, stronglyconvex functions. In particular, we show that SGDo converges at a rate of $O(1/K^2)$ while SGD is known to converge at $O(1/K)$ rate, where $K$ denotes the number of passes over data and is required to be large enough. Existing results for SGDo in this setting require additional Hessian Lipschitz assumption (Gurbuzabalaban et al., 2015; HaoChen & Sra, 2018). For small $K$, we show SGDo can achieve same convergence rate as SGD for general smooth stronglyconvex functions. Existing results in this setting require $K=1$ and hold only for generalized linear models (Shamir, 2016).
Joint work with Dheeraj Nagaraj (MIT) and Prateek Jain (MSR India).
I will discuss in this talk Hamiltonian Monte Carlo (HMC), a MetropolisHastings algorithm designed to sample target probability density π on Rd. This method was first proposed by Duane et al. (1987) in computational physics. It has later been introduced in the statistics community in the early paper of Neal (1993) and quickly gained popularity; see for example Liu (2008, chap. 9), Neal (2011) and Girolami and Calderhead (2011). The most attractive feature of the HMC algorithm is to allow the possibility of generating proposals  obtained by integrating a system of Hamiltonian equations  that are far away from the current position but still having a high probability of being accepted. The HMC algorithm therefore offer promise for eliminating the random walk behavior of most classical Monte Carlo algorithms. The distance between the current state and the proposal is controlled by length of the time interval along which the Hamiltonian equations are integrated; see Liu (2008, chap. 9) and SanzSerna (2014).
I will discuss the irreducibility and geometric ergodicity of the Hamiltonian Monte Carlo (HMC) algorithm.
We consider cases where the number of steps of the symplectic integrator is either fixed or random. Under mild conditions on the potential F associated with target distribution π, we first show that the Markov kernel associated to the HMC algorithm is irreducible and recurrent.
Under more stringent conditions, we then establish that the Markov kernel is Harris recurrent. Finally, we provide verifiable conditions on F under which the HMC sampler is geometrically ergodic.
References:
Duane, S., A. D. Kennedy, B. J. Pendleton, and D. Roweth. 1987. “Hybrid Monte Carlo.” Physics Letters B 195 (2): 216–22. https://doi.org/10.1016/03702693(87)91197X.
Girolami, M., and B. Calderhead. 2011. “Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 (2): 123–214. https://doi.org/10.1111/j.14679868.2010.00765.x.
Liu, J. S. 2008. Monte Carlo Strategies in Scientific Computing. Springer.
Neal, R. M. 1993. “Probabilistic Inference Using Markov Chain Monte Carlo Methods.” CRGTR931. Department of Computer Science, University of Toronto.
———. 2011. “MCMC Using Hamiltonian Dynamics.” In Handbook of Markov Chain Monte Carlo, 113–62. Chapman & Hall/Crc Handb. Mod. Stat. Methods. Boca Raton, FL: CRC Press.
SanzSerna, J. M. 2014. “Markov Chain Monte Carlo and Numerical Differential Equations.” In Current Challenges in Stability Issues for Numerical Differential Equations: Cetraro, Italy 2011, Editors: Luca Dieci, Nicola Guglielmi, 39–88. Cham: Springer. https://doi.org/10.1007/9783319013008_2.
Monday, 12 August 2019
In these series of lectures we will review recent results at the intersection of optimal transport, stochastic OR and statistics. After reviewing basic notions of optimal transport costs and Wasserstein distances, we will discuss distributionally robust performance analysis and optimization results. For example, we will show how the theory of diffusion approximations can be harnessed to provide modelrobust sample path estimates for general stochastic systems. In addition, using the same mathematical principles, we will show that many machine learning algorithms such as sqrtLasso, regularized logistic regression, group Lasso, adaptive Lasso, support vector machines, among many others admit distributionally robust representations based on optimal transport costs. Additional connections to artificial intelligence via generative networks will also be discussed. Finally, we also discuss the fundamental principles of a novel statistical methodology which can be used to optimally choose the uncertainty size in distributionally robust formulations.
Latent Gaussian processes are widely applied in many fields like statistics, inverse problems, and machine learning. A popular method for inference is through the posterior distribution, which is typically carried out by Markov Chain Monte Carlo (MCMC) algorithms. However, the infinite dimensional framework creates certain technical hurdles which need to be addressed with care. Taking cue from recent developments in adaptive Metropolis adjusted MCMC algorithms, we propose a family of proposals to sample from "good" infinite dimensional measures. We discuss the relevant issues concerned with convergence of our proposed schemes, and also demonstrate their efficiency via standard computational examples.
In these series of lectures we will review recent results at the intersection of optimal transport, stochastic OR and statistics. After reviewing basic notions of optimal transport costs and Wasserstein distances, we will discuss distributionally robust performance analysis and optimization results. For example, we will show how the theory of diffusion approximations can be harnessed to provide modelrobust sample path estimates for general stochastic systems. In addition, using the same mathematical principles, we will show that many machine learning algorithms such as sqrtLasso, regularized logistic regression, group Lasso, adaptive Lasso, support vector machines, among many others admit distributionally robust representations based on optimal transport costs. Additional connections to artificial intelligence via generative networks will also be discussed. Finally, we also discuss the fundamental principles of a novel statistical methodology which can be used to optimally choose the uncertainty size in distributionally robust formulations.
A stochastic process in discrete or continuous time parameter set is called regenerative if after some initial segment the process can be broken into iid components.Examples include Brownian Motion in one and two dimensions, Ornstein Uhlenbeck processes, recurrent irreducible Markov chains with discrete state space, Harris irreducible and recurrent Markov chains on a general state space, semi Markov chains seven by a recurrent irreducible Harris chains. The invariant measure need not be finite.
Some ergodic theorems will be discussed and applications will be given.
Tuesday, 13 August 2019
In these series of lectures we will review recent results at the intersection of optimal transport, stochastic OR and statistics. After reviewing basic notions of optimal transport costs and Wasserstein distances, we will discuss distributionally robust performance analysis and optimization results. For example, we will show how the theory of diffusion approximations can be harnessed to provide modelrobust sample path estimates for general stochastic systems. In addition, using the same mathematical principles, we will show that many machine learning algorithms such as sqrtLasso, regularized logistic regression, group Lasso, adaptive Lasso, support vector machines, among many others admit distributionally robust representations based on optimal transport costs. Additional connections to artificial intelligence via generative networks will also be discussed. Finally, we also discuss the fundamental principles of a novel statistical methodology which can be used to optimally choose the uncertainty size in distributionally robust formulations.
Nash equilibria for symmetric multiplayer games are notoriously hard to compute, even for moderate values of the number of players. In this course, I will describe how, as the number of players goes to infinity, these equilibria can be approximated by solutions to what is known as a meanfield game, which can be viewed as a game for a continuum of players. I will first describe static games and related approximation theorems, and then discuss results for differential games.
We give a tutorial overview of program synthesis, from its first formulation by Church in 1957, through its pragmatic evolution through sketching and programingbyexamples, and compare program synthesis with supervised machine learning. We then present our recent efforts in combining program synthesis and machine learning techniques to solve the problem of synthesizing extractors from heterogeneous data, as well as in solving NLP problems. Finally, we explore several opportunities at the intersection of program synthesis (and more broadly the PL community) and machine learning, such as pruning and ranking programs during synthesis, neural program synthesis and automatic differentiation.
Nash equilibria for symmetric multiplayer games are notoriously hard to compute, even for moderate values of the number of players. In this course, I will describe how, as the number of players goes to infinity, these equilibria can be approximated by solutions to what is known as a meanfield game, which can be viewed as a game for a continuum of players. I will first describe static games and related approximation theorems, and then discuss results for differential games.
Rigidity of eigenvalues about their classical locations is a topic of interest in random matrix theory that has seen much progress in the recent years both in the classical Gaussian case as well as under more general universal settings. We shall recall some of these results, and discuss two examples of how the rigidity results for certain classical random matrix ensembles lead to interesting consequences in exactly solvable models of last passage percolation via some remarkable bijections.
Wednesday, 14 August 2019
Talk 1: Sequential Stopping for Parallel Monte Carlo
The most natural way to run many Monte Carlo computations is to sample until a given error tolerance is achieved. The solution of this problem goes back to Chow and Robbins when the underlying variance can be easily estimated from the samples generated. But there are many Monte Carlo problems in which estimating variances is difficult. We discuss a new procedure for use in such settings in which the Bessel process turns out to play a key role. It turns out that it is especiallly wellsuited for parallel computing applications.
The work discussed in this talk is joint with Jing Dong.
Talk 2: Nonstationary Markov Processes: Approximations and Numerical Methods
In many Markov modeling contexts, the system under consideration exhibits strong timeofday effects, dayofweek effects, or seasonality effects. In fact, most realworld systems that are modeled as Markov processes exhibit such nonstationarities. Nevertheless, the great majority of the academic literature focuses on modeling and theory for Markov processes with stationary transition probabilities, which describe systems that have no such timeofday effects. In this talk, we will briefly describe three different methodologies for handling nonstationary Markov models. The first approach involves analytical approximations, anchored on traditional stationary analysis, that are available when the transition probabilities are slowly changing over the time horizon in question. The second class of methods relates to the use of stable numerical solvers for the Kolmogorov differential equations that exploit the stochastic structure that is present in these models. Finally, the third methodology involves use of Monte Carlo simulation to study such nonstationary processes. In this setting, we will discuss the question of how to initialize such simulations, and the role of backwards coupling in this context.
This work is joint with Harsha Honnappa, Alex Infanger, Mohammad Mousavi, and Zeyu Zheng.
Talk 3: Experimentation with Temporal Interference: Adaptive Markov Chain Sampling and Estimation
For many experiments, the treatment is a new algorithm or policy (e.g., a matching policy in a ridesharing system, delivery system, or ad auction). These policies change the state of the system, and so classical A/B tests that randomize platform participants to treatment or control are infeasible. Instead, the typical design is a “switchback” experiment: each policy is alternately applied in successive time intervals, and then the performance of each policy is estimated by its sample average across intervals where it was used.
Despite their appeal, switchback designs suffer from temporal interference: the application of policy A in one interval affects the initial state of the system as seen by policy B in the next interval. Motivated by the need to mitigate temporal interference, in this talk we study optimal experimental design and estimation for comparison of two policies. Our key innovation is to model the problem as one of efficiently estimating the difference of the steady state reward between two different Markov chains on the same state space (the two policies).
While the dynamic programming optimality principle does not hold here, it turns out that the optimal policy can be computed as the solution to a convex optimization problem over the standard polyhedron of feasible stateaction frequencies.
This is joint work with Ramesh Johari and Mohammad Rasouli.
Nash equilibria for symmetric multiplayer games are notoriously hard to compute, even for moderate values of the number of players. In this course, I will describe how, as the number of players goes to infinity, these equilibria can be approximated by solutions to what is known as a meanfield game, which can be viewed as a game for a continuum of players. I will first describe static games and related approximation theorems, and then discuss results for differential games.
Empirical findings have shown that many realworld networks share fascinating features. Indeed, many realworld networks are small worlds, in the sense that typical distances are much smaller than the size of the network. Further, many realworld networks are scalefree, in the sense that there is a high variability in the number of connections of the elements of the networks making them highly inhomogeneous.
Spurred by these empirical findings, many models have been proposed for such networks. In this lecture series, we discuss empirical findings of realworld networks, and describe some of the random graph models proposed for them. In particular, we will discuss the classical ErdösRényi random graph, and then move to the more relevant configuration model, generalized random graphs and preferential attachment models. We then discuss the smallworld phenomenon in these random graph models and its link to 'six degrees of separation'. We highlight some of the ingredients used in the proofs, namely the treelike nature of the random graphs under consideration and their connection to branching processes.
We close by discussing the more recent problems of information diffusion on random graphs, where the edges are equipped with general independent and identically distributed edge weights leading to firstpassage percolation. Such firstpassage percolation problems can also be used to describe epidemics on networks, and competition between different species exploring a network.
**Lecture 1**: Realworld networks and random graphs
This lecture series is based on joint work with, amongst others: Gerard Hooghiemstra, Shankar Bhamidi, Júlia Komjáthy, Piet Van Mieghem, Henri van den Esker, and Dmitri Znamenski.
Samplingbased methods arise in many statistical, computational, and engineering settings. In engineering settings, sampling can provide an easy means of constructing distributed algrorithms that scale well and avoid the need for centralized informationgathering. In computational environments, the use of sampling often leads to algorithms that have complexities that are relatively insensitive to dimensional effects, and that largely overcome the “curse of dimensionality”. In this talk, we will give an overview of these ideas and discuss some additional problem contexts within which samplingbased approaches are proving fruitful.
Thursday, 15 August 2019
Talk 1: Sequential Stopping for Parallel Monte Carlo
The most natural way to run many Monte Carlo computations is to sample until a given error tolerance is achieved. The solution of this problem goes back to Chow and Robbins when the underlying variance can be easily estimated from the samples generated. But there are many Monte Carlo problems in which estimating variances is difficult. We discuss a new procedure for use in such settings in which the Bessel process turns out to play a key role. It turns out that it is especiallly wellsuited for parallel computing applications.
The work discussed in this talk is joint with Jing Dong.
Talk 2: Nonstationary Markov Processes: Approximations and Numerical Methods
In many Markov modeling contexts, the system under consideration exhibits strong timeofday effects, dayofweek effects, or seasonality effects. In fact, most realworld systems that are modeled as Markov processes exhibit such nonstationarities. Nevertheless, the great majority of the academic literature focuses on modeling and theory for Markov processes with stationary transition probabilities, which describe systems that have no such timeofday effects. In this talk, we will briefly describe three different methodologies for handling nonstationary Markov models. The first approach involves analytical approximations, anchored on traditional stationary analysis, that are available when the transition probabilities are slowly changing over the time horizon in question. The second class of methods relates to the use of stable numerical solvers for the Kolmogorov differential equations that exploit the stochastic structure that is present in these models. Finally, the third methodology involves use of Monte Carlo simulation to study such nonstationary processes. In this setting, we will discuss the question of how to initialize such simulations, and the role of backwards coupling in this context.
This work is joint with Harsha Honnappa, Alex Infanger, Mohammad Mousavi, and Zeyu Zheng.
Talk 3: Experimentation with Temporal Interference: Adaptive Markov Chain Sampling and Estimation
For many experiments, the treatment is a new algorithm or policy (e.g., a matching policy in a ridesharing system, delivery system, or ad auction). These policies change the state of the system, and so classical A/B tests that randomize platform participants to treatment or control are infeasible. Instead, the typical design is a “switchback” experiment: each policy is alternately applied in successive time intervals, and then the performance of each policy is estimated by its sample average across intervals where it was used.
Despite their appeal, switchback designs suffer from temporal interference: the application of policy A in one interval affects the initial state of the system as seen by policy B in the next interval. Motivated by the need to mitigate temporal interference, in this talk we study optimal experimental design and estimation for comparison of two policies. Our key innovation is to model the problem as one of efficiently estimating the difference of the steady state reward between two different Markov chains on the same state space (the two policies).
While the dynamic programming optimality principle does not hold here, it turns out that the optimal policy can be computed as the solution to a convex optimization problem over the standard polyhedron of feasible stateaction frequencies.
This is joint work with Ramesh Johari and Mohammad Rasouli.
Empirical findings have shown that many realworld networks share fascinating features. Indeed, many realworld networks are small worlds, in the sense that typical distances are much smaller than the size of the network. Further, many realworld networks are scalefree, in the sense that there is a high variability in the number of connections of the elements of the networks making them highly inhomogeneous.
Spurred by these empirical findings, many models have been proposed for such networks. In this lecture series, we discuss empirical findings of realworld networks, and describe some of the random graph models proposed for them. In particular, we will discuss the classical ErdösRényi random graph, and then move to the more relevant configuration model, generalized random graphs and preferential attachment models. We then discuss the smallworld phenomenon in these random graph models and its link to 'six degrees of separation'. We highlight some of the ingredients used in the proofs, namely the treelike nature of the random graphs under consideration and their connection to branching processes.
We close by discussing the more recent problems of information diffusion on random graphs, where the edges are equipped with general independent and identically distributed edge weights leading to firstpassage percolation. Such firstpassage percolation problems can also be used to describe epidemics on networks, and competition between different species exploring a network.
**Lecture 2**: Degree structure and connectivity in random graphs.
This lecture series is based on joint work with, amongst others: Gerard Hooghiemstra, Shankar Bhamidi, Júlia Komjáthy, Piet Van Mieghem, Henri van den Esker, and Dmitri Znamenski.
We would consider infinite horizon (ergodic) risksensitive control problems for a class of diffusions controlled through the drift, and driven by a jump Lévy process, and a nondegenerate Wiener process. Under suitable hypotheses, we fully characterize the value function by analyzing the associated integrodifferential equations. In the process, we also study the Dirichlet eigenvalue problem for bounded smooth domains. If time permits, we would also discuss few related control problems.
A special case of the classical Dvoretzky theorem states that the space of $n$dimensional real vectors equipped with the $\ell_1$ norm admits a large “Euclidean section”, i.e. a subspace of dimension $\Theta(n)$ on which a scaled $\ell_1$ norm essentially agrees with the Euclidean norm. In particular, such a subspace can be realized as the column space of a "tall" $n \times (n/k)$ random matrix $A$ with identically distributed independent Gaussian entries $(k > 1)$.
This special case of the Dvoretzky theorem has a natural interpretation in the setting of encoding real vectors for transmission across an adversarial noisy channel when the vector $x$ to be encoded is given in advance (the socalled “block coding” scenario), so that the encoding can be computed in an “offline” fashion. Motivated by the same problem in the setting when the encoding has to be “online”, i.e., has to be carried out as each entry of x becomes available, we give randomized constructions of triangular matrices with properties similar to Dvoretzky matrices. The guarantees provided by these constructions in the “online” scenario are close to, but still somewhat worse than, those provided by the Dvoretzky theorem in the “block” scenario, and the question of finding the optimal triangular version of the Dvoretzky theorem remains open.
Joint work with Leonard J. Schulman.
The study of the rrtopp operator norms for large random matrices is a problem of importance in many areas; for example, when r=p=2r=p=2, this reduces to the question of finding the largest singular value of the matrix, whereas when p is the Holder conjugate of rr, then this problem reduces to the famous Grothendiek ℓpℓp problem, which is of relevance in optimization. We analyze this norm for fairly general sequences of nonnegative random matrices with dimension growing to infinity, and show that it satisfies a slightly nonstandard central limit theorem. Along the way, we obtain sharp approximations for the maximizing vector, which may be of independent interest.
This is based on joint work with Debankur Mukherjee and Souvik Dhara.
Consider a set of nn nodes, each of whom has a single message or packet to transmit to all the other nodes. The nodes are connected via a random graph with constant edge probability pp. Time is split into rounds where, in each round, each node may broadcast a single packet to all its neighbours, simultaneously with all other nodes. The goal is to minimise the time until all nodes have received all packets.
We study two algorithms for this problem. First, we study a random relaying scheme and show that it the number of rounds it requires is bounded below and above by constant multiples of lognlogn. Next, we consider a scheme that uses random network coding, and show that it can disseminate the message to all nodes in a constant number of rounds, with high probability.
Friday, 16 August 2019
Empirical findings have shown that many realworld networks share fascinating features. Indeed, many realworld networks are small worlds, in the sense that typical distances are much smaller than the size of the network. Further, many realworld networks are scalefree, in the sense that there is a high variability in the number of connections of the elements of the networks making them highly inhomogeneous.
Spurred by these empirical findings, many models have been proposed for such networks. In this lecture series, we discuss empirical findings of realworld networks, and describe some of the random graph models proposed for them. In particular, we will discuss the classical ErdösRényi random graph, and then move to the more relevant configuration model, generalized random graphs and preferential attachment models. We then discuss the smallworld phenomenon in these random graph models and its link to 'six degrees of separation'. We highlight some of the ingredients used in the proofs, namely the treelike nature of the random graphs under consideration and their connection to branching processes.
We close by discussing the more recent problems of information diffusion on random graphs, where the edges are equipped with general independent and identically distributed edge weights leading to firstpassage percolation. Such firstpassage percolation problems can also be used to describe epidemics on networks, and competition between different species exploring a network.
**Lecture 3**: Smallworld phenomenon in random graphs.
This lecture series is based on joint work with, amongst others: Gerard Hooghiemstra, Shankar Bhamidi, Júlia Komjáthy, Piet Van Mieghem, Henri van den Esker, and Dmitri Znamenski.
Talk 1: Sequential Stopping for Parallel Monte Carlo
The most natural way to run many Monte Carlo computations is to sample until a given error tolerance is achieved. The solution of this problem goes back to Chow and Robbins when the underlying variance can be easily estimated from the samples generated. But there are many Monte Carlo problems in which estimating variances is difficult. We discuss a new procedure for use in such settings in which the Bessel process turns out to play a key role. It turns out that it is especiallly wellsuited for parallel computing applications.
The work discussed in this talk is joint with Jing Dong.
Talk 2: Nonstationary Markov Processes: Approximations and Numerical Methods
In many Markov modeling contexts, the system under consideration exhibits strong timeofday effects, dayofweek effects, or seasonality effects. In fact, most realworld systems that are modeled as Markov processes exhibit such nonstationarities. Nevertheless, the great majority of the academic literature focuses on modeling and theory for Markov processes with stationary transition probabilities, which describe systems that have no such timeofday effects. In this talk, we will briefly describe three different methodologies for handling nonstationary Markov models. The first approach involves analytical approximations, anchored on traditional stationary analysis, that are available when the transition probabilities are slowly changing over the time horizon in question. The second class of methods relates to the use of stable numerical solvers for the Kolmogorov differential equations that exploit the stochastic structure that is present in these models. Finally, the third methodology involves use of Monte Carlo simulation to study such nonstationary processes. In this setting, we will discuss the question of how to initialize such simulations, and the role of backwards coupling in this context.
This work is joint with Harsha Honnappa, Alex Infanger, Mohammad Mousavi, and Zeyu Zheng.
Talk 3: Experimentation with Temporal Interference: Adaptive Markov Chain Sampling and Estimation
For many experiments, the treatment is a new algorithm or policy (e.g., a matching policy in a ridesharing system, delivery system, or ad auction). These policies change the state of the system, and so classical A/B tests that randomize platform participants to treatment or control are infeasible. Instead, the typical design is a “switchback” experiment: each policy is alternately applied in successive time intervals, and then the performance of each policy is estimated by its sample average across intervals where it was used.
Despite their appeal, switchback designs suffer from temporal interference: the application of policy A in one interval affects the initial state of the system as seen by policy B in the next interval. Motivated by the need to mitigate temporal interference, in this talk we study optimal experimental design and estimation for comparison of two policies. Our key innovation is to model the problem as one of efficiently estimating the difference of the steady state reward between two different Markov chains on the same state space (the two policies).
While the dynamic programming optimality principle does not hold here, it turns out that the optimal policy can be computed as the solution to a convex optimization problem over the standard polyhedron of feasible stateaction frequencies.
This is joint work with Ramesh Johari and Mohammad Rasouli.
We consider reconstruction of a manifold, or, invariant manifold learning, where a smooth Riemannian manifold $M$ is determined from intrinsic distances (that is, geodesic distances) of points in a discrete subset of $M$. In the studied problem the Riemannian manifold $(M,g)$ is considered as an abstract metric space with intrinsic distances, not as an embedded submanifold of an ambient Euclidean space. Let $\{X_1,X_2,\dots,X_N\}$ be a set of $N$ sample points sampled randomly from an unknown Riemannian $M$ manifold. We assume that we are given the numbers $D_{jk}=d_M(X_j,X_k)+\eta_{jk}$, where $j,k\in \{1,2,\dots,N\}$. Here, $d_M(X_j,X_k)$ are geodesic distances, $\eta_{jk}$ are independent, identically distributed random variables such that $\mathbb E e^{\eta_{jk}}$ is finite. We show that when $N$ is large enough, it is possible to construct an approximation of the Riemannian manifold $(M,g)$ with a large probability. This problem is a generalization of the geometric Whitney problem with random measurement errors. We consider also the case when the information on noisy distance $D_{jk}$ of points $X_j$ and $X_k$ is missing with some probability. In particular, we consider the case when we have no information on points that are far away.
This is joint work with Charles Fefferman, Sergei Ivanov and Matti Lassas.
We shall discuss about the limiting behaviour of the largest eigenvalue of the adjacency matrix of an inhomogeneous Erdős–Rényi random graph. We shall show that in certain cases after appropriate centering and scaling the largest eigenvalue converges in distribution to a Gaussian random variable. If time permits, we shall discuss the open problems regarding the spectrum of inhomogeneous Erdős–Rényi and configuration model.
The talk is based on joint work with Arijit Chakrabarty and Sukrit Chakraborty.
Classical multiarmed bandit problems use the expected value of an arm as a metric to evaluate its goodness. However, the expected value is a riskneutral metric. In many applications like finance, one is interested in balancing the expected return of an arm (or portfolio) with the risk associated with that return. In this paper, we consider the problem of selecting the arm that optimizes a linear combination of the expected reward and the associated Conditional Value at Risk (CVaR). We allow the reward distributions to be unbounded or even heavytailed. For this problem, we devise algorithms that are entirely environment oblivious, i.e., the algorithm is not aware of any information on the reward distributions, including bounds on the moments/tails, or the suboptimality gaps across arms.
Distributionally robust optimization (DRO) based on optimal transport has been shown to enhance outofsample performance in statistical estimation (this has been done by means of connecting DRO with wellknown estimators in machine learning. But DRO can be applied systematically and there is plenty of flexibility in choosing the uncertainty set to achieve desirable performance. This talk explores computational tools which enable the use of a wide range of uncertainty sets systematically and corresponding statistical tools to make inference out of decisions obtained out of data driven DRO formulations. Joint work with K. Murthy, N. Si, and F. Zhang.
Saturday, 17 August 2019
In this talk we shall discuss a natural class of dynamics in dense networks arising from resampling in multitype populations. We consider sequence of finite graphs whose dynamics are governed by the empirical type distribution of the whole population at that time and in the largepopulationsize limit this dynamics converges to a random process in the space of graphons.
This is joint work with Frank den Hollander and Adrian Rollin.
In this talk we will look at the eigenvalue problem on $\mathbb{R}^d$ for a class of second order, elliptic operators of the form $\mathcal{L}^f = a^{ij}\partial_{x_i}\partial_{x_j} + b^{i}\partial_{x_i} + f$, associated with nondegenerate diffusions. We show that strict monotonicity of the principal eigenvalue of the operator with respect to the potential function $f$ fully characterizes the ergodic properties of the associated ground state diffusion, and the unicity of the ground state, and we present a comprehensive study of the eigenvalue problem from this point of view. This allows us to extend or strengthen various results in the literature for a class of viscous HamiltonJacobi equations of ergodic type with smooth coefficients to equations with measurable drift and potential. We also apply these results to the study of the infinite horizon risksensitive control problem for diffusions, and establish existence of optimal Markov controls.
This is a joint work with A. Arapostathis and A. Biswas.
In this work, we deal with extreme value theory in the context of continued fractions using techniques from probability theory, ergodic theory and real analysis. We give an upper bound for the rate of convergence in the DoeblinIosifescu asymptotics for the exceedances of digits obtained from the regular continued fraction expansion of a number chosen randomly from (0,1) according to the Gauss measure. As a consequence, we significantly improve the best known upper bound on the rate of convergence of the maxima bettering an error term used in the proof of a conjecture of Paul Erdős. We observe that the asymptotics of order statistics and the extremal point process can also be investigated using our methods.
This talk is based on a joint work with Anish Ghosh (TIFR Mumbai) and Maxim Sølund Kirsebom (Univ of Hamburg).
Given a vector of unknown probability distributions, or arms, each of which can be sampled independently, we consider the problem of identifying the partition to which this vector belongs from a finitely partitioned universe of such vector of distributions. We primarily consider partitions where the vector of means of arms lie either in a given set or its complement in a Euclidean space. The sets considered correspond to half spaces or where either the set or its complement is a polytope, or more generally, a convex set. In these settings, exploiting the problem geometry, we characterize the lower bounds on mean number of samples from each arm that any δcorrect algorithm (algorithm that restricts wrong identification probability to a prespecified δ) must satisfy. Further, we propose δcorrect algorithms that for small δ can match these bounds. Traditionally multiarmed bandit literature focusses on restrictive distribution families such as Bernoulli distributions, subGaussian distributions or those belonging to the single parameter exponential family. We discuss how our analysis extends to general distributions where an explicit upper bound on 1+ϵ moment is available.
Citation patterns between papers form a window to how science works. In citation networks, the nodes are papers and the directed edges are formed by one paper citing the other. Citation networks give us a wealth of information about differences between subfields, how scientists collaborate, etc. These insights are useful to interpret citation statistics of papers and scientists beyond simply counting citations. How long does it take publications to be cited? How long does it take papers to be forgotten, and how much does this depend on the citation patterns of the papers early on? How many citations do papers get, and how do these develop in time? How variable are citation patterns when comparing papers within a field, and between fields? Can one quantify what a 'good' paper is based on citation network structure?
In this talk, we present empirical data of citation patterns obtained from the Web of Science database. We investigate aging of citation patterns, the presence of dynamical power laws in them, as well as how predictable citations behave after having observed them for a while. We then continue to describe a possible model for them based on preferential attachment models with aging and fitness, that, on a qualitative level, shows similar connectivity patterns as in citation networks. We also discuss PageRank, which is an algorithm that gives an order to vertices in a directed network, and show how local weak convergence can be used to analyze PageRank in large graph limits. We close with open problems.
We assume no prior knowledge in graph theory, probability or otherwise. This is joint work with Alessandro Garavaglia, Nelly Litvak, and Gerhard Woeginger.