Definitions (2.1) and (2.2) measure the variance and covariance of distributions on a graph with respect to a certain ‘distance’ between the nodes. The most famous distance on graphs is the shortest-path distance (or geodesic distance) where d(i, j) is the length^2 of the shortest path between two nodes i and j. In addition to capturing the intuitive notion of a distance between nodes, the geodesic distance also satisfies the mathematical properties of a metric [8]. Another important metric between the nodes of a graph is the effective resistance [27, 3, 17, 39]. Similar to the geodesic distance, the effective resistance reflects the length of the paths between a pair of nodes. However, instead of only taking the shortest path into account, the effective resistance is influenced by all paths (and their lengths) between a pair of nodes, and becomes smaller as more paths are available. Due to this more integrative notion of distance and its nice mathematical properties, the effective resistance is often preferred over the shortest-path distance when studying networks. We write ωij for the effective resistance between two nodes i and j, and define this resistance based on the Laplacian matrix of a graph. The Laplacian matrix Q of a graph with n nodes is an n × n matrix with entries (Q)ii = ki on the diagonal, (Q)ij = −cij for all links (i, j) and zero otherwise, and can be used to define the effective resistance as ωij = (ei − ej )T Q†(ei − ej ), where the unit vectors have entries (ei)k = 1 if k = i and zero otherwise, and where Q† is the Moore-Penrose pseudoinverse of the Laplacian. For our application, the most relevant property is that both ω and its square root √ω are metrics between the nodes of a network [27, 12].
[…]
# 3. Variance and covariance in a network of knowledge. As an example application, we study a ‘network of knowledge’ made up of mathematical ideas and results with links between related concepts. The code and data of our analysis are available on GitHub [33]. We consider a list of mathematical concepts (theorems, lemmas, equations) compiled from four Wikipedia pages that list these concepts, and we infer links between the concepts from hyperlinks between their respective Wikipedia pages. More information about the data retrieval and filtering of the data set can be found in [41], where (a higher-order variant of) this network was investigated. The resulting network of concepts consists of n = 1150 nodes and m = 4109 links and is shown in Figure 2 below.
Fig. 2. Hyperlink network of Wikipedia pages of the considered mathematical concepts (see [41]). The size of the nodes is proportional to their PageRank and the color coding corresponds to communities found using the Louvain algorithm.
We consider this network as the underlying structure of mathematical concepts and use it to investigate how these concepts are used in practice by their occurrences in scientific papers.
To study the functional aspect of the network of knowledge, we use a corpus of 140k+ papers from the arXiv and the mathematical concepts used therein. For each paper we count which of the mathematical concepts appear and represent this by a uniform distribution over the used concepts. Every paper i thus has a corresponding subset of concepts Vi and distribution p(i) uniform over this set of concepts.
A first question we consider is whether the mathematical papers contain ‘coherent’ sets of mathematical concepts. In terms of variance, this question can be addressed by comparing the variance of the paper distributions p(i) with a null model, representing ‘virtual papers’.
~
[33] S. Martin-Gutierrez, R. Lambiotte, and K. Devriendt, Variance and covariance of distributions on graphs. https://github.com/rlambiot/variance, 2020.
[41] V. Salnikov, D. Cassese, R. Lambiotte, and N. S. Jones, Co-occurrence simplicial complexes in mathematics: identifying the holes of knowledge, Applied Network Science, 3 (2018), https://doi.org/10.1007/s41109-018-0074-3.
> **Abstract**. In the last years complex networks tools contributed to provide insights on the structure of research, through the study of collaboration, citation and co-occurrence networks. The network approach focuses on pairwise relationships, often compressing multidimensional data structures and inevitably losing information. In this paper we propose for the first time a simplicial complex approach to word co-occurrences, providing a natural framework for the study of higher-order relations in the space of scientific knowledge. Using topological methods we explore the conceptual landscape of mathematical research, focusing on homological holes, regions with low connectivity in the simplicial structure. We find that homological holes are ubiquitous, which suggests that they capture some essential feature of research practice in mathematics. k-dimensional holes die when every concept in the hole appears in an article together with other k+1 concepts in the hole, hence their death may be a sign of the creation of new knowledge, as we show with some examples. We find a positive relation between the size of a hole and the time it takes to be closed: larger holes may represent potential for important advances in the field because they separate conceptually distant areas. We provide further description of the conceptual space by looking for the simplicial analogs of stars and explore the likelihood of edges in a star to be also part of a homological cycle. We also show that authors’ conceptual entropy is positively related with their contribution to homological holes, suggesting that polymaths tend to be on the frontier of research.