Probability Distribution

DOT strict digraph rankdir=LR node [style=filled fillcolor=lightyellow penwidth=3 color=black fontname="Helvetica"] HERE NODE node [style=filled fillcolor=lightblue] WHERE /^⇒/ LINKS HERE -> NODE node [style=filled fillcolor=white] HERE NODE WHERE /^⇒/ LINKS HERE -> NODE node [style=filled fillcolor=white penwidth=3 color=black] LINKS HERE -> NODE node [style=filled fillcolor=white penwidth=1 color=black] HERE NODE LINKS HERE -> NODE node [style="filled,rounded,dotted" fillcolor=white] edge [style=dotted] HERE NODE BACKLINKS NODE -> HERE

DEVRIENDT, Karel, MARTIN-GUTIERREZ, Samuel and LAMBIOTTE, Renaud, 2022. Variance and Covariance of Distributions on Graphs. SIAM Review. 5 May 2022. Vol. 64, no. 2, p. 343–359. DOI 10.1137/20M1361328.

> We develop a theory to measure the variance and covariance of probability distributions defined on the nodes of a graph, which takes into account the distance between nodes. Our approach generalizes the usual (co)variance to the setting of weighted graphs and retains many of its intuitive and desired properties. Interestingly, we find that a number of famous concepts in graph theory and network science can be reinterpreted in this setting as variances and covariances of particular distributions. As a particular application, we define the maximum variance problem on graphs with respect to the effective resistance distance, and we characterize the solutions to this problem both numerically and theoretically. We show how the maximum variance distribution is concentrated on the boundary of the graph, and illustrate this in the case of random geometric graphs. Our theoretical results are supported by a number of experiments on a network of mathematical concepts, where we use the variance and covariance as analytical tools to study the (co)occurrence of concepts in scientific papers with respect to the (network) relations between these concepts.

The variance of a Probability Distribution is a fundamental concept in the toolkit of probability theory and statistics and is routinely applied throughout science, engineering and numerous practical settings. Intuitively speaking, the Variance captures how spread-out the outcomes of a distribution are, and thus reflects the inherent variability in this distribution. In many practical cases however, probability distributions are defined on the nodes of a network: websites on the internet, individuals in a social network, neurons in the brain, etc. These nodes are the building blocks of a network, and when studying distributions or signals defined on nodes it is natural to take the underlying network structure into account. As the usual definition of variance can not take this structure into account, we thus lack a basic methodological tool when analysing distributions and signals on a graph.

[…] we propose a measure of variance and covariance for distributions defined on a network, which take into account the underlying structure of the network by considering the distances between nodes. These distances provide a notion of what it means to be ‘spread out’ on the network, which in turn allows us to define (co)variances of distributions on the network. Our proposed formulas for variance and covariance take a very simple mathematical form (as a quadratic product and matrix trace, respectively) yet still capture many of the intuitive and mathematical properties of the usual (co)variance. To illustrate our new measures in practice, we apply the proposed variance and covariance measures to the analysis of an empirical network of mathematical concepts with data from a collection of scientific papers.

Our approach allows for a unified and intuitive treatment of the structural (relations between concepts) and functional data (usage of concepts in papers) in this system and we describe some qualitative and quantitative findings. As a second application, we show that the variance and covariance of some particular distributions correspond to previously known graph characteristics, offering a new framework to interpret and understand them.