Monday, 29 January 2024

Geometry is a powerful tool in many scientific domains, from fundamental physics to social science. It also plays and important role in the study of complex systems, especially from the point of constructing models for networks. Here nodes are assigned positions in some geometric space and connections are created based on the distances in that space. In this talk I will highlight advances for two key aspects of geometry in complex network and touch upon some challenges geometric network models present us with.

In the first part I will discuss a family of models for complex networks that uses geometry. They generalize many known models and can create networks that exhibit key features found in many real-world networks: sparse, power-law degrees and clustering. I will present results on local weak convergence of these models and on the clustering features of the resulting networks. In the second part I shall touch upon the challenge of recovering geometric information from networks. That is, if we see the resulting network can we identify the underlying geometric space? I will present results that show that a discrete notion of curvature, known as Olliver-Ricci curvature, can actually uncover the curvature of the hidden geometry. In addition I will present large scale simulation that show how well this works in practice for Euclidean, Spherical and Hyperbolic spaces.

Bayesian networks (or directed acyclic graphical models) have been a cornerstone of the modeling of causal inference. They have also led to the study of rich new combinatorial notions such as d-separation and Markov equivalence. Although these structures have been studied extensively since at least the 80s, many natural combinatorial questions about them—including basic ones regarding counting and uniform sampling—are still not fully understood. This talk will be a broad survey of some such problems, and recent progress on them.

While a small part of this talk will be based on joint work with collaborators, most of it will be a survey of work by others, and will be based on discussions with—and work of—Vidya Sagar Sharma, a PhD student at TIFR.

Consider the empirical measure process associated with a mean-field Markov interacting particle system of N particles. The Freidlin-Wentzell quasipotential is a natural candidate rate function for the sequence of invariant measures indexed by N. We discuss two counterexamples on countably infinite state spaces where the quasipotential is not the rate function. We will then provide some sufficient conditions where the situation is remedied. The talk will be based on joint work with Sarath Yasodharan. https://arxiv.org/abs/2110.12640

I discuss some results on a very simple random recursive tree model that gives rise to interesting emergent phenomena, such as a highly skewed degree sequence. Unlike in the preferential attachment tree, the attachment rule is local: given the tree with n vertices, select a vertex uniformly at random and attach vertex n+1 to a uniformly random neighbour (or friend) of the selected vertex. We study various local and global properties of the resulting tree, but many questions remain open. This talk is based on a work in progress with Louigi Addario-Berry, Simon Briend, Luc Devroye, Céline Kerriou and Gabor Lugosi.

Consider a \emph{probabilistic tree automaton} (PTA) defined on the rooted regular tree $T_{m}$, comprising the alphabet $\{B,R\}$ (where $B$ stands for the colour blue, and $R$ for the colour red), in which the stochastic update rule, referred to as the \emph{majority policy}, is described as follows: let $u$ be a vertex of $T_{m}$, with children $u_{1}, u_{2}, \ldots, u_{m}$. Let $C_{t}(u_{i}) \in \{B,R\}$ denote the \emph{colour} or \emph{state} of $u_{i}$ at time step $t$, for each $i \in \{1,2,\ldots,m\}$. Associated with $u_{i}$ is a random variable $X_{t}(u_{i})$, where $X_{t}(u_{1}), X_{t}(u_{2}), \ldots, X_{t}(u_{m})$ are i.i.d.\ Bernoulli$(p)$. Conditioned on $C_{t}(u_{1}), C_{t}(u_{2}), \ldots, C_{t}(u_{m})$, we set

\[

C_{t+1}(u) =

\begin{cases}

B & \text{if } \sum_{i=1}^{m}X_{t}(u_{i})\mathbf{1}_{C_{t}(u_{i})=B} > \sum_{i=1}^{m}X_{t}(u_{i})\mathbf{1}_{C_{t}(u_{i})=R}, \\

R & \text{if } \sum_{i=1}^{m}X_{t}(u_{i})\mathbf{1}_{C_{t}(u_{i})=B} < \sum_{i=1}^{m}X_{t}(u_{i})\mathbf{1}_{C_{t}(u_{i})=R},

\end{cases}

\]

and $C_{t+1}(u)$ equals $B$ with probability $\frac{1}{2}$ and $R$ with probability $\frac{1}{2}$, otherwise. For each $m \in \mathbb{N}$ with $m \geqslant 2$, starting with an i.i.d.\ Bernoulli$(\pi)$ assignment of colours to the vertices of $T_{m}$ (i.e.\ $C_{0}(v)$, for all $v \in T_{m}$, are i.i.d.\ and $C_{0}(v) = B$ with probability $\pi$), we show that there exists a $p(m) \in (0,1)$ such that there is a unique stationary distribution when $p \leq p(m)$, and there are multiple stationary distributions when $p > p(m)$.

Tuesday, 30 January 2024

Let $G_n$ be an undirected finite graph on $n\in\N$ vertices labelled by $[n] = \{1,\ldots,n\}$. For $i \in [n]$, let $\Delta_{i,n}$ be the \emph{friendship bias} of vertex $i$, defined as the difference between the average degree of the neighbours of vertex $i$ and the degree of vertex $i$ itself when $i$ is not isolated, and zero when $i$ is isolated. Let $\mu_n$ denote the \emph{friendship-bias empirical distribution}, i.e., the measure that puts mass $\frac{1}{n}$ at each $\Delta_{i,n}$, $i \in [n]$. The friendship paradox says that if $G_n$ has no self-loops, then $\int_\R x\mu_n(\dd x) \geq 0$, with equality if and only if in each connected component of $G_n$ all the degrees are the same.

We show that if $(G_n)_{n\in\N}$ is a sequence of sparse random graphs that converges to a rooted random tree in the sense of convergence locally in probability, then $\mu_n$ converges weakly to a limiting measure $\mu$ that is expressible in terms of the law of the rooted random tree. We study $\mu$ for four classes of sparse random graphs: the homogeneous Erd\H{o}s-R\'enyi random graph, the inhomogeneous Erd\H{o}s-R\'enyi random graph, the configuration model and the preferential attachment model. In particular, we compute the first two moments of $\mu$, identify the right tail of $\mu$, and argue that $\mu([0,\infty))\geq\tfrac12$, a property we refer to as \emph{friendship-paradox significance}.

Joint work with R.S. Hazra and A. Parvaneh.

Recently models from population genetics were used to define stochastic dynamics in the space of graphons arising as continuum limits of dense graphs. In this talk we consider a Markov chain in the space of finite graphs that resembles a Moran model with resampling and mutation - also known as infinitely many alleles model. We encode the finite graphs as graphemes, which can be represented as a triple consisting of a vertex set, an adjacency matrix and a sampling measure. We equip the space of graphons with convergence of sample subgraph densities and show that the grapheme-valued Markov chain converges to a grapheme-valued diffusion as the number of vertices goes to infinity. We show that the grapheme-valued diffusion has a stationary distribution that is linked to the GEM distribution.

We describe the cluster of large deviations events that arise when one such event occurs. We work in the framework of an infinite moving average process with a noise that has finite exponential moments. The cluster turns out to have different shapes in the cases when the moving average process has short memory and long memory.

I will start discussing the link between meeting time of random walks and consensus time in Markovian opinion dynamics on finite networks. I will then focus on sparse random directed graphs and give an account on a recent result in which we characterize in such directed setup how random walks meet and related implications for opinion dynamics. I will in particular discuss how the directed degree structure of the underlying network may speed up or slow down meetings of different random walks.

Joint work with F. Capannoli, R. Hazra and M.Quattropani.

Empirical findings have shown that many real-world networks are scale-free, in the sense that there is a high variability in the number of connections of the elements of the networks. Spurred by these empirical findings, models have been proposed

for such networks.

Percolation on networks is one of the simplest models for network functionality. It can be interpreted as describing the effect of random attacks on the network, where edges are removed independently with a fixed probability, or the result of a simple epidemic

on the network.

We investigate the percolation critical behavior for a popular models of complex networks, the Poisson random graph, which can be interpreted as a model with multi-edges, or single edges by collapsing the multi-edges. We identify what the critical values are, and how they scale with the graph size. Interestingly, this scaling turns out to be rather different for the multi-edge case compared to the single-edge case in the scale-free regime. This clears up part of the confusion in the physics literature. Furthermore, the single-edge case has an unexpected phase transition at the appropriate scale of the percolation parameter, where the size of the largest component jumps from a random value to a much larger almost deterministic value that is proportional to the root of the graph size.

[This is joint work with Shankar Bhamidi, Souvik Dhara, Johan van Leeuwaarden and Sanchayan Sen.]

Wednesday, 31 January 2024

We study random permutations of 1,...,n drawn at random according to the Mallows distribution. For n ∈ N and q > 0, the distribution Mallows(n, q) samples a random permutation Π_n of 1,...,n in such a way that each has probability proportional to q^{inv(π)}, where inv(π) is the number of inversions (pairs 1 ≤ i < j ≤ n for which π(i) > π(j)).

This distribution was introduced in the late fifties by C.L. Mallows in the context of “statistical ranking models” and has since been studied in connection with a diverse range of topics.

In the present work we consider cycle counts. That is, for l fixed we study the vector (C_1(Π_n ), ..., C_l(Π_n)) where C_i(π) denotes the number of cycles of length i in π and Π_n is sampled according to the Mallows distribution.

When q = 1 then the Mallows distribution is simply the uniform distribution on S_n. A classical result going back to Kolchin and Goncharoff states that in this case, the vector of cycle counts tends in distribution to a vector of independent Poisson random variables, with means 1,1/2,1/3,...,1/l.

Surprisingly, the problem of finding analogues of this result for q is not 1 has largely escaped attention until now. In the talk, I plan to discuss our proof of the fact that if 0 < q < 1 is fixed and n → ∞ then the cycle counts have linear means and the vector of cycle counts can be suitably rescaled to tend to a joint Gaussian distribution. Our results also show that when q > 1 there is a striking difference between the behaviour of the even and the odd cycles. The even cycle counts still have linear means and when properly rescaled tend to a multivariate Gaussian distribution, while for the odd cycle counts on the other hand, the limiting behaviour depends on the parity of n when q > 1.

Time permitting, I may also discuss some results on the (probabilities of) properties of permutations that can be expressed in first order logic.

(Based on joint work with Jimmy He, Fiona Skerman and Teun Verstraaten)

We prove central limit theorem for spin systems on spatial random graphs. The spin systems are finite-valued and satisfy weak mixing. The random graphs are constructed on weakly mixing point processes such as Poisson or Ginibre processes. As illustrative examples, we shall consider the hard-core model and Widom-Rowlinson model on k-nearest neighbour graphs or Gilbert graph. Weak mixing in these spin systems hold for low-activity parameters and using this we establish `Lipschitz-stabilization' of spins on the point process. Such a stabilization suffices central limit theorem. This is joint work with Bartek Blaszczyszyn (ENS-INRIA, Paris) and Joseph Yukich (Lehigh University, Bethlehem).

A common assumption in many statistical network models is that edges are independent conditionally on the vertex properties. An example is the Stochastic Block Model, where the edges are assumed to be independent assuming each vertex belongs to a known community and assuming the edge probabilities only depend on the communities of the respective end points. We discuss how to statistically test this assumption using local graph statistics.

Thursday, 01 February 2024

This talk will give an overview of a class of distributed recursive algorithms on networks.

Hamilton-Jacobi (HJ) equations are the equations that descibe the infinitesimal evolution of the value function of control problems, and as such play a key role in the description of dynamic large deviatons principles. In infinite dimensional context, like for the Wasserstein-2 space that plays a role in Dawson-Gärtner type large deviations, well-posedness of the HJ equation is a challenging, and not yet fully understood problem that lately is receiving a lot of attention.

We propose a new way of looking at the Hamilton-Jacobi equation in terms of upper and lower bounds for the Hamiltonian in terms of functional inequalities. We do this in a the context of infinitesimal Riemannian geodesic spaces. We establish uniqueness in this context, and additionally establish existence for our formulation of the problem in the context of the Wasserstein 2 space.

We are given finitely many unknown probability distributions that can be sampled from and our aim is, through sequential sampling, to identify the one with the largest mean. This is a classical problem in statistics, simulation and learning theory. Lately, methods have been proposed that identify a sample complexity lower bound that any algorithm providing probabilistic correctness guarantees must satisfy, and algorithms have been developed that asymptotically match these lower bounds even for general sampling distributions, as the probabilistic error guarantees converge to zero. We review these ideas and propose a novel algorithm that relies on exploiting the underlying fluid structure in the evolution of the optimal sampling process and improves upon existing asymptotically optimal algorithms.

Systems with lattice geometry can be renormalized exploiting their coordinates in metric space, which naturally define the coarse-grained nodes. By contrast, complex networks defy the usual techniques, due to their small-world character and lack of explicit geometric embedding. Current network renormalization approaches require strong assumptions (e.g., community structure, hyperbolicity, scale-free topology), thus remaining incompatible with generic graphs and ordinary lattices. Here we introduce a graph renormalization scheme valid for any hierarchy of heterogeneous coarse-grainings, thereby allowing for the definition of “block-nodes” across multiple scales. This approach identifies a class of scale-invariant networks characterized by a necessary and specific dependence on additive hidden variables attached to nodes, plus optional dyadic factors. If the hidden variables are annealed, they lead to realistic scale-free networks with assortativity and finite local clustering, even in the sparse regime and in the absence of geometry. If they are quenched, they can guide the renormalization of real-world networks with node attributes and distance-dependence or communities. As an application, we derive an accurate multiscale model of the International Trade Network applicable across arbitrary geographic partitions. These results highlight a deep conceptual distinction between scale-free and scale-invariant networks, and they provide a geometry-free route to renormalization.

References: Phys. Rev. Research 5, 043101 (2023) and https://arxiv.org/abs/2212.08462

There is empirical evidence that collaboration in academia has increased significantly during the past few decades, perhaps due to the breathtaking advancements in communication and technology during this period. Although there have been several studies on the dynamical aspects of collaboration, systematic statistical models have been lacking. In this talk, we will present a dynamic mean-field model of academic collaboration and discuss the associated estimation framework. We will consider several popular indices of collaboration from the scientometrics literature and study their dynamics under the proposed model. Using metadata from arXiv, we will also look at empirical aspects of the mean-field collaboration dynamics in disciplines such as Computer Science, Mathematics and Physics.

Based on joint work with S. Chatterjee and T. Sadhukhan.

Friday, 02 February 2024

Geometric network models formalize the natural idea that similar vertices are likely to connect. Indeed, many real-world networks can be accurately modeled by positioning vertices of a network graph in hyperbolic spaces. Nevertheless, if one observes only network connections, it is not obvious how to establish the presence of a latent geometry. This problem is an example of a broader problem of model selection for networks: how to model a real-life network with a random graph as accurately as possible? For detecting geometry, triangle counts and clustering coefficients are the standard statistics in the literature. In this talk we show that these statistics are insufficient because they fail to detect geometry induced by hyperbolic spaces. We, therefore, introduce a different statistic, weighted triangles, which weighs triangles based on their evidence for geometry. We show analytically, as well as on synthetic and real-world data, that weighted triangles are a powerful statistic to detect hyperbolic geometry in networks. This is a joint work with Riccardo Michielan and Clara Stegehuis.

We consider two classes of natural stochastic processes on finite unlabeled graphs. These are Euclidean stochastic optimization algorithms on the adjacency matrix of weighted graphs and a modified version of the Metropolis MCMC algorithm on stochastic block models over unweighted graphs. In both cases we show that, as the size of the graph goes to infinity, the random trajectories of the stochastic processes converge to deterministic curves on the space of measure-valued graphons. Measure-valued graphons, introduced by Lov\'{a}sz and Szegedy in \cite{lovasz2010decorated}, are a refinement of the concept of graphons that can distinguish between two infinite exchangeable arrays that give rise to the same graphon limit. We introduce new metrics on this space which provide us with a natural notion of convergence for our limit theorems. This notion is equivalent to the convergence of infinite-exchangeable arrays. Under suitable assumptions and a specified time-scaling, the Metropolis chain admits a diffusion limit as the number of vertices go to infinity. We then demonstrate that, in an appropriately formulated zero-noise limit, the stochastic process of adjacency matrices of this diffusion converges to a deterministic gradient flow curve on the space of graphons introduced in~\cite{Oh2023}. A novel feature of this approach is that it provides a precise exponential convergence rate for the Metropolis chain in a certain limiting regime. The connection between a natural Metropolis chain commonly used in exponential random graph models and gradient flows on graphons, to the best of our knowledge, is new in the literature as well. This is joint work with Soumik Pal, Raghav Somani, and Raghavendra Tripathi.

I will describe some results on the bulk of the spectrum of sparse and dense inhomogeneous random graphs and how the two spectrums are related. Based on joint work with Luca Avena and Nandan Malhotra.