Monday, 04 August 2025
The course will cover the basics of reinforcement learning theory. We will start by implementing simple gradient-based algorithms in PyTorch and using them to solve standard control problems like CartPole and the Atari 2600 game Pong. Along the way, we will explore how to optimize both the sample complexity (the number of interactions with the environment) and the computational complexity (GPU hours) needed to learn an optimal policy.
Lecture notes, and setup instructions - https://gomahajan.github.io/icts/rlbootcamp.html
Reinforcement learning (RL) has been the driver behind many of the most significant advances in artificial intelligence over the past decade---ranging from achieving superhuman performance in complex games like Go and starcraft to applications in autonomous driving, robotics, and economic simulations. RL is even playing a crucial role in the fine tuning of large language models and the training of AI agents more broadly. Many of these problems, however, are fundamentally multi-agent in nature: an agent's success is inextricably linked to the decisions of others. Despite these empirical successes and a wealth of research on "single-agent" RL and its variants, multi-agent reinforcement learning (MARL) remains relatively under-explored theoretically with the presence of multiple learning agents giving rise to a unique set of challenges for algorithm design and analysis.
This tutorial will give an overview of the research landscape in MARL, aiming to highlight the core theoretical principles that enable agents to learn and adapt in the presence of others. Using the formal framework of Markov games and building on a foundation in game theory, we will explore the different solution concepts and algorithms in the field. The discussion will explore the inherent inefficiencies of classical algorithms like fictitious play and policy gradient algorithms and build toward the principles underpinning modern, provably efficient learning methods for large games (i.e., algorithms that make use of function approximation like deep neural networks). Ultimately, we will identify key open problems and promising new research directions for the future of multi-agent learning.
Reinforcement learning (RL) has been the driver behind many of the most significant advances in artificial intelligence over the past decade---ranging from achieving superhuman performance in complex games like Go and starcraft to applications in autonomous driving, robotics, and economic simulations. RL is even playing a crucial role in the fine tuning of large language models and the training of AI agents more broadly. Many of these problems, however, are fundamentally multi-agent in nature: an agent's success is inextricably linked to the decisions of others. Despite these empirical successes and a wealth of research on "single-agent" RL and its variants, multi-agent reinforcement learning (MARL) remains relatively under-explored theoretically with the presence of multiple learning agents giving rise to a unique set of challenges for algorithm design and analysis.
This tutorial will give an overview of the research landscape in MARL, aiming to highlight the core theoretical principles that enable agents to learn and adapt in the presence of others. Using the formal framework of Markov games and building on a foundation in game theory, we will explore the different solution concepts and algorithms in the field. The discussion will explore the inherent inefficiencies of classical algorithms like fictitious play and policy gradient algorithms and build toward the principles underpinning modern, provably efficient learning methods for large games (i.e., algorithms that make use of function approximation like deep neural networks). Ultimately, we will identify key open problems and promising new research directions for the future of multi-agent learning.
More information - https://www.icts.res.in/lectures/PL_Aug2025
Tuesday, 05 August 2025
The course will cover the basics of reinforcement learning theory. We will start by implementing simple gradient-based algorithms in PyTorch and using them to solve standard control problems like CartPole and the Atari 2600 game Pong. Along the way, we will explore how to optimize both the sample complexity (the number of interactions with the environment) and the computational complexity (GPU hours) needed to learn an optimal policy.
Lecture notes, and setup instructions - https://gomahajan.github.io/icts/rlbootcamp.html
Reinforcement learning (RL) has been the driver behind many of the most significant advances in artificial intelligence over the past decade---ranging from achieving superhuman performance in complex games like Go and starcraft to applications in autonomous driving, robotics, and economic simulations. RL is even playing a crucial role in the fine tuning of large language models and the training of AI agents more broadly. Many of these problems, however, are fundamentally multi-agent in nature: an agent's success is inextricably linked to the decisions of others. Despite these empirical successes and a wealth of research on "single-agent" RL and its variants, multi-agent reinforcement learning (MARL) remains relatively under-explored theoretically with the presence of multiple learning agents giving rise to a unique set of challenges for algorithm design and analysis.
This tutorial will give an overview of the research landscape in MARL, aiming to highlight the core theoretical principles that enable agents to learn and adapt in the presence of others. Using the formal framework of Markov games and building on a foundation in game theory, we will explore the different solution concepts and algorithms in the field. The discussion will explore the inherent inefficiencies of classical algorithms like fictitious play and policy gradient algorithms and build toward the principles underpinning modern, provably efficient learning methods for large games (i.e., algorithms that make use of function approximation like deep neural networks). Ultimately, we will identify key open problems and promising new research directions for the future of multi-agent learning.
Reinforcement learning (RL) has been the driver behind many of the most significant advances in artificial intelligence over the past decade---ranging from achieving superhuman performance in complex games like Go and starcraft to applications in autonomous driving, robotics, and economic simulations. RL is even playing a crucial role in the fine tuning of large language models and the training of AI agents more broadly. Many of these problems, however, are fundamentally multi-agent in nature: an agent's success is inextricably linked to the decisions of others. Despite these empirical successes and a wealth of research on "single-agent" RL and its variants, multi-agent reinforcement learning (MARL) remains relatively under-explored theoretically with the presence of multiple learning agents giving rise to a unique set of challenges for algorithm design and analysis.
This tutorial will give an overview of the research landscape in MARL, aiming to highlight the core theoretical principles that enable agents to learn and adapt in the presence of others. Using the formal framework of Markov games and building on a foundation in game theory, we will explore the different solution concepts and algorithms in the field. The discussion will explore the inherent inefficiencies of classical algorithms like fictitious play and policy gradient algorithms and build toward the principles underpinning modern, provably efficient learning methods for large games (i.e., algorithms that make use of function approximation like deep neural networks). Ultimately, we will identify key open problems and promising new research directions for the future of multi-agent learning.
-
Optimal transport studies the problem of rearranging one distribution into another while minimizing an associated cost. The past decade has witnessed tremendous progress in our understanding of the computational, methodological and statistical aspects of optimal transport (OT). Recent interest in OT has blossomed due to its close connections with diffusion models.
I will introduce the mathematical framework of OT, and then quickly transition to studying how well various objects in the OT framework (OT distances, and OT maps) can be estimated from samples of the underlying distributions.
Wednesday, 06 August 2025
The course will cover the basics of reinforcement learning theory. We will start by implementing simple gradient-based algorithms in PyTorch and using them to solve standard control problems like CartPole and the Atari 2600 game Pong. Along the way, we will explore how to optimize both the sample complexity (the number of interactions with the environment) and the computational complexity (GPU hours) needed to learn an optimal policy.
Lecture notes, and setup instructions - https://gomahajan.github.io/icts/rlbootcamp.html
TBA
In this series of talks, I will introduce basic elements of generative modeling with diffusion and flow models from first principles. This includes a short introduction to stochastic calculus, ordinary differential equations, evolution of probability measures, Fokker Planck equation, and the continuity equation. We will then apply these ideas to describe training and inference algorithms for diffusion models.
TBA
Optimal transport studies the problem of rearranging one distribution into another while minimizing an associated cost. The past decade has witnessed tremendous progress in our understanding of the computational, methodological and statistical aspects of optimal transport (OT). Recent interest in OT has blossomed due to its close connections with diffusion models.
I will introduce the mathematical framework of OT, and then quickly transition to studying how well various objects in the OT framework (OT distances, and OT maps) can be estimated from samples of the underlying distributions.
Thursday, 07 August 2025
The course will cover the basics of reinforcement learning theory. We will start by implementing simple gradient-based algorithms in PyTorch and using them to solve standard control problems like CartPole and the Atari 2600 game Pong. Along the way, we will explore how to optimize both the sample complexity (the number of interactions with the environment) and the computational complexity (GPU hours) needed to learn an optimal policy.
Lecture notes, and setup instructions - https://gomahajan.github.io/icts/rlbootcamp.html
In this series of talks, I will introduce basic elements of generative modeling with diffusion and flow models from first principles. This includes a short introduction to stochastic calculus, ordinary differential equations, evolution of probability measures, Fokker Planck equation, and the continuity equation. We will then apply these ideas to describe training and inference algorithms for diffusion models.
TBA
In this series of talks, I will introduce basic elements of generative modeling with diffusion and flow models from first principles. This includes a short introduction to stochastic calculus, ordinary differential equations, evolution of probability measures, Fokker Planck equation, and the continuity equation. We will then apply these ideas to describe training and inference algorithms for diffusion models.
Optimal transport studies the problem of rearranging one distribution into another while minimizing an associated cost. The past decade has witnessed tremendous progress in our understanding of the computational, methodological and statistical aspects of optimal transport (OT). Recent interest in OT has blossomed due to its close connections with diffusion models.
I will introduce the mathematical framework of OT, and then quickly transition to studying how well various objects in the OT framework (OT distances, and OT maps) can be estimated from samples of the underlying distributions.
Friday, 08 August 2025
In this series of talks, I will introduce basic elements of generative modeling with diffusion and flow models from first principles. This includes a short introduction to stochastic calculus, ordinary differential equations, evolution of probability measures, Fokker Planck equation, and the continuity equation. We will then apply these ideas to describe training and inference algorithms for diffusion models.
TBA
Data assimilation is a set of methods for incorporating sparse observations of a complex dynamical system, either deterministic or stochastic, into incomplete models of these systems. Mathematically this is the problem of nonlinear filtering and computationally, they are based on a variety of techniques including Markov chain Monte Carlo, optimization, importance sampling. This tutorial will begin with a quick introduction to the Bayesian underpinnings of data assimilation followed by applications to chaotic dynamical systems.
Data assimilation is a set of methods for incorporating sparse observations of a complex dynamical system, either deterministic or stochastic, into incomplete models of these systems. Mathematically this is the problem of nonlinear filtering and computationally, they are based on a variety of techniques including Markov chain Monte Carlo, optimization, importance sampling. This tutorial will begin with a quick introduction to the Bayesian underpinnings of data assimilation followed by applications to chaotic dynamical systems.
Optimal transport studies the problem of rearranging one distribution into another while minimizing an associated cost. The past decade has witnessed tremendous progress in our understanding of the computational, methodological and statistical aspects of optimal transport (OT). Recent interest in OT has blossomed due to its close connections with diffusion models.
I will introduce the mathematical framework of OT, and then quickly transition to studying how well various objects in the OT framework (OT distances, and OT maps) can be estimated from samples of the underlying distributions.
Monday, 11 August 2025
In recent years, large language models (LLMs) have achieved unprecedented success across various disciplines, including natural language processing, computer vision, and reinforcement learning. This success has spurred a flourishing body of research aimed at understanding these models, from both theoretical perspectives such as representation and optimization, and scientific approaches such as interpretability.
To understand LLMs, an important research theme in the machine learning community is to model the input as mathematically structured data (e.g. Markov chains), where we have complete knowledge and control of the data properties. The goal is to use this controlled input to gain valuable insights into what solutions LLMs learn and how they learn them (e.g. induction head). This understanding is crucial, given the increasing ubiquity of the models, especially in safety-critical applications, and our limited understanding of them.
While the aforementioned works using this structured approach provide valuable insights into the inner workings of LLMs, the breadth and diversity of the field make it increasingly challenging for both experts and non-experts to stay abreast. To address this, our tutorial aims to provide a unifying perspective on recent advances in the analysis of LLMs, from a representational-cum-learning viewpoint. To this end, we focus on the two predominant classes of language models that have driven the AI revolution: transformers and recurrent models such as state-space models (SSMs). For these models, we discuss several concrete results, including their representational capacities, optimization landscape, and mechanistic interpretability. Building upon these perspectives, we outline several important future directions in this field, aiming to foster a clearer understanding of language models and to aid in the creation of more efficient architectures.
References and detailed explanation of our tutorial is here: https://capricious-comb-
This talk will consist of two parts: In the first part, we will present an overview of posterior sampling with diffusion models, and motivate the connection to inverse problems. Specific topics that we will cover include Gibbs sampling, Importance sampling and approximations for test-time optimization (aka training-free approaches such as DPS) with diffusion models. In the second part, we will discuss algorithms for image editing, stylization, etc, that are in production in large-scale settings. Specifically, we will discuss both diffusion and flow-based algorithms (PSLD, STSL, RB Modulation, RF Inversion) that operate in the latent space of SOTA foundation models (such as Stable Diffusion or Flux).
Diffusions class videos are posted on YouTube (and lecture notes link is also posted in the video caption). Link: https://www.youtube.com/
TBA
Designing effective collaboration between humans and AI systems is crucial for leveraging their complementary abilities in complex decision tasks. But how should agents possessing unique, private knowledge—like a human expert and an AI model—interact to reach decisions better than either could alone? If they were perfect Bayesians with a shared prior, Aumann's classical agreement theorem suggests conversation leads to a prediction via agreement which is accuracy-improving. However, this relies on implausible assumptions about their knowledge and computational power.
We show how to recover and generalize these guarantees using only computationally and statistically tractable assumptions. We develop efficient "collaboration protocols" where parties iteratively exchange only low-dimensional information – their current predictions or best-response actions – without needing to share underlying features. These protocols are grounded in conditions like conversation calibration/swap regret, which relax full Bayesian rationality, and are computationally efficiently enforceable. First, we prove this simple interaction leads to fast convergence to agreement, generalizing quantitative bounds even to high-dimensional and action-based settings. Second, we introduce a weak learning condition under which this agreement process inherently aggregates the parties' distinct information, that is, agents via our protocols arrive at final predictions that are provably competitive with an optimal predictor having access to their joint features. Together, these results offers a new, practical foundation for building systems that achieve the power of pooled knowledge through tractable interaction alone.
This talk is based on joint work with the amazing Natalie Collina, Varun Gupta, Ira Globus-Harris, Aaron Roth, Mirah Shi.
The success of modern AI models defies classical theoretical wisdom. Classical theory recommended the use of convex optimization, and yet AI models learn by optimizing highly non-convex function. Classical theory prescribed to control model complexity and yet AI models are very complex, so complex that they often memorize the training data. Classical wisdom recommends a careful and interpretable choice of model architecture, and yet modern architectures rarely offer a parsimonious representation of a target distribution class.
The discovery that learning can take place in completely unexpected scenario poses beautiful conceptual challenges. I will try to survey recent work towards addressing them.
Tuesday, 12 August 2025
In recent years, large language models (LLMs) have achieved unprecedented success across various disciplines, including natural language processing, computer vision, and reinforcement learning. This success has spurred a flourishing body of research aimed at understanding these models, from both theoretical perspectives such as representation and optimization, and scientific approaches such as interpretability.
To understand LLMs, an important research theme in the machine learning community is to model the input as mathematically structured data (e.g. Markov chains), where we have complete knowledge and control of the data properties. The goal is to use this controlled input to gain valuable insights into what solutions LLMs learn and how they learn them (e.g. induction head). This understanding is crucial, given the increasing ubiquity of the models, especially in safety-critical applications, and our limited understanding of them.
While the aforementioned works using this structured approach provide valuable insights into the inner workings of LLMs, the breadth and diversity of the field make it increasingly challenging for both experts and non-experts to stay abreast. To address this, our tutorial aims to provide a unifying perspective on recent advances in the analysis of LLMs, from a representational-cum-learning viewpoint. To this end, we focus on the two predominant classes of language models that have driven the AI revolution: transformers and recurrent models such as state-space models (SSMs). For these models, we discuss several concrete results, including their representational capacities, optimization landscape, and mechanistic interpretability. Building upon these perspectives, we outline several important future directions in this field, aiming to foster a clearer understanding of language models and to aid in the creation of more efficient architectures.
References and detailed explanation of our tutorial is here: https://capricious-comb-
The success of modern AI models defies classical theoretical wisdom. Classical theory recommended the use of convex optimization, and yet AI models learn by optimizing highly non-convex function. Classical theory prescribed to control model complexity and yet AI models are very complex, so complex that they often memorize the training data. Classical wisdom recommends a careful and interpretable choice of model architecture, and yet modern architectures rarely offer a parsimonious representation of a target distribution class.
The discovery that learning can take place in completely unexpected scenario poses beautiful conceptual challenges. I will try to survey recent work towards addressing them.
This talk will consist of two parts: In the first part, we will present an overview of posterior sampling with diffusion models, and motivate the connection to inverse problems. Specific topics that we will cover include Gibbs sampling, Importance sampling and approximations for test-time optimization (aka training-free approaches such as DPS) with diffusion models. In the second part, we will discuss algorithms for image editing, stylization, etc, that are in production in large-scale settings. Specifically, we will discuss both diffusion and flow-based algorithms (PSLD, STSL, RB Modulation, RF Inversion) that operate in the latent space of SOTA foundation models (such as Stable Diffusion or Flux).
Diffusions class videos are posted on YouTube (and lecture notes link is also posted in the video caption). Link: https://www.youtube.com/
The classical paradigm of randomness in the sciences is that of i.i.d. random variables, and going beyond i.i.d. is often considered a difficulty and a challenge to be overcome. In this talk, we will explore a new perspective, wherein strongly constrained random systems in fact help to understand fundamental problems in machine learning. In particular, we will discuss strongly correlated particle systems that are well-motivated from statistical and quantum physics, including in particular determinantal probability measures. These will be used to shed important light on questions of fundamental interest in learning theory, focussing on applications to novel sampling techniques and advances in stochastic gradient descent.
Can a sample from one parametric statistical model (the source) be transformed into a sample from a different (target) model? Versions of this question were asked as far back as 1950, and a beautiful asymptotic theory of equivalence of experiments emerged in the latter half of the 20th century. Motivated by problems spanning information-computation gaps and differentially private data analysis, we ask the analogous non-asymptotic question in high-dimensional problems and with algorithmic considerations. We show how a single observation from some source models can be approximately transformed to a single observation from a large class of target models by a computationally efficient algorithm. I will present several such reductions and discuss their applications to the aforementioned problems.
This is joint work with Mengqi Lou and Guy Bresler.
Wednesday, 13 August 2025
The classical paradigm of randomness in the sciences is that of i.i.d. random variables, and going beyond i.i.d. is often considered a difficulty and a challenge to be overcome. In this talk, we will explore a new perspective, wherein strongly constrained random systems in fact help to understand fundamental problems in machine learning. In particular, we will discuss strongly correlated particle systems that are well-motivated from statistical and quantum physics, including in particular determinantal probability measures. These will be used to shed important light on questions of fundamental interest in learning theory, focussing on applications to novel sampling techniques and advances in stochastic gradient descent.
The success of modern AI models defies classical theoretical wisdom. Classical theory recommended the use of convex optimization, and yet AI models learn by optimizing highly non-convex function. Classical theory prescribed to control model complexity and yet AI models are very complex, so complex that they often memorize the training data. Classical wisdom recommends a careful and interpretable choice of model architecture, and yet modern architectures rarely offer a parsimonious representation of a target distribution class.
The discovery that learning can take place in completely unexpected scenario poses beautiful conceptual challenges. I will try to survey recent work towards addressing them.
When a neural network becomes extremely wide or deep, its learning dynamics simplify and can be described by the same “mean-field” ideas that explain magnetism and fluids. I will walk through these ideas step-by-step, showing how they suggest practical recipes for initialization and optimization that scale smoothly from small models to cutting-edge transformers. I will also discuss neural scaling laws—empirical power-law rules that relate model size, data, and compute—and illustrate them with solvable toy models.
Vector search is a fundamental problem with numerous applications in machine learning, computer vision, recommendation systems, and more. While vector search has been extensively studied, modern applications have introduced new requirements, such as diversity, multivector, multifilter, and others. In this talk, we explore these emerging research directions, with a focus on diversity and multivector embeddings in vector search.
For both problems, we propose the first provable graph-based algorithms that efficiently return approximate solutions. Our algorithms leverage popular graph-based methods, enabling us to build on existing, efficient implementations. Experimental results show that our algorithms outperform other approaches.
When sampling from a base measure tilted by a reward model, a popular trick is to approximate the score of the tilted measure with the sum of the base score and the gradient of the reward. It is well-known that this does not sample from the base distribution but nevertheless seems to do something interesting and useful, e.g., classifier-free guidance (CFG) and diffusion posterior sampling (DPS). In this talk, I provide some theoretical perspectives on what this method actually samples from, focusing on a simple mixture model setting. In the first part, I will rigorously characterize the dynamics of CFG, proving that it generates archetypal and low-diversity samples in a certain precise sense. In the second part, I will show that for linear inverse problems, DPS with a careful choice of initialization simultaneously boosts reward and likelihood under the prior. I will then describe some experiments demonstrating that DPS with this initialization scheme achieves strong performance on hard image restoration tasks like large box inpainting. Based on https://arxiv.org/abs/2409.13074 and https://arxiv.org/abs/2506.10955
Thursday, 14 August 2025
When a neural network becomes extremely wide or deep, its learning dynamics simplify and can be described by the same “mean-field” ideas that explain magnetism and fluids. I will walk through these ideas step-by-step, showing how they suggest practical recipes for initialization and optimization that scale smoothly from small models to cutting-edge transformers. I will also discuss neural scaling laws—empirical power-law rules that relate model size, data, and compute—and illustrate them with solvable toy models.
We consider the problem of finding the optimal value of n in the n-step temporal difference (TD) learning algorithm. Our objective function for the optimization problem is the average root mean squared error (RMSE). We find the optimal n by resorting to a model-free optimization technique involving a one-simulation simultaneous perturbation stochastic approximation (SPSA) based procedure. Whereas SPSA is a zeroth-order continuous optimization procedure, we adapt it to the discrete optimization setting by using a random projection operator. We prove the asymptotic convergence of the recursion by showing that the sequence of n-updates obtained using zeroth-order stochastic gradient search converges almost surely to an internally chain transitive invariant set of an associated differential inclusion. This results in convergence of the discrete parameter sequence to the optimal n in n-step TD. Through experiments, we show that the optimal value of n is achieved with our algorithm for arbitrary initial values. We further show using numerical evaluations that the proposed algorithm outperforms a well known discrete parameter stochastic optimization algorithm ‘Optimal Computing Budget Allocation (OCBA)’ on benchmark RL tasks.
Bandit convex optimization is a powerful framework for sequential decision-making, but existing algorithms with optimal regret guarantees are often too computationally expensive for high-dimensional problems. This talk introduces a simple and practical BCO algorithm, the Bandit Newton Step, which leverages second-order information for decision-making. We will show that our algorithm obtains an optimal $O(T^{1/2})$ regret bound for a large and practical class of functions that satisfy a condition we call “$\kappa$-convexity,” which includes linear, quadratic, and generalized linear losses. In addition to optimal regret, this method is the most efficient known algorithm for several well-studied applications including bandit logistic regression.
Furthermore, we'll discuss the extension of our method to online convex optimization with memory. We show that for loss functions with a certain affine structure, the extended algorithm attains optimal regret. This leads to an optimal regret algorithm for the bandit Linear-Quadratic (LQ) control problem under a fully adversarial noise model, resolving a key open question in the field. Finally, we contrast this result by proving that bandit problems with more general memory structures are fundamentally harder, establishing a tight $\Omega(T^{2/3})$ lower bound on regret.
n causal inference, true causal order and the graph of causal interaction can be uniquely determined if you have sufficient interventional data. Interventions are local (randomized control trials) RCTs done where different variables, taken few or one at a time, in a causal graph are randomized. We consider a harder problem when the causal variables are not directly observed and are "latent". Instead, we observe a high dimensional transformation (as images etc.) of the true causal variables. Central problem in causal representation learning is to invert the unknown transformation between true causal variables and the observations up to coordinate wise scaling and permutation. We show that this is possible with enough interventional diversity by exploiting two key ideas: a) Represent interventional distributions in terms of their scores (gradient of likelihoods). b) The encoder-decoder pair that minimizes reconstruction loss and sparsifies the score difference in the latent space is the optimal pair. We show various versions of these results for linear transforms and general transforms with mild regularity assumptions on the diversity of interventions. We also will discuss empirical results on some simple image datasets.
Joint work with Burak Varici (CMU), Emre Acarturk (RPI), Abhishek Kumar (Amazon, ex-GDM), Ali Tajer (RPI)
In this work, we address the challenge of identifying the optimal arm in a stochastic multi-armed bandit scenario with the minimum number of arm pulls, given a predefined error probability in a fixed confidence setting. Our focus is on examining the asymptotic behavior of sample complexity and the distribution of arm weights upon termination, as the error threshold is scaled to zero, under confidence-interval based algorithms. Specifically, we analyze the asymptotic sample complexity and termination weight fractions for the well-known LUCB algorithm, and introduce a new variant, the LUCB Greedy algorithm. We demonstrate that the upper bounds on the sample complexities for both algorithms are asymptotically within a (universal) constant factor of the established lower bounds.