What’s the backstory of what I’m calling the multicomputational paradigm? For me, the realization that one can formulate such a broad and general new paradigm is something that’s emerged only over the past year or so. But given what we now know it’s possible to go back and see a quite tangled web of indications and precursors of it stretching back decades and perhaps more than a century. A key technical step in the development of the multicomputational paradigm was the idea of what I named “multiway systems”. I first used the term “multiway systems” in 1992 when I included a placeholder for a future section about them in an early draft of what would become my 2002 book A New Kind of Science. A major theme of my work during the development of A New Kind of Science was exploring as broadly as possible the computational universe of simple programs. I had already in the 1980s extensively studied cellular automata—and discovered all sorts of interesting phenomena like computational irreducibility in them. But now I wanted to see what happened in other parts of the computational universe. So I started investigating—and inventing—different kinds of systems with different underlying structures. And there, in Chapter 5, sandwiched between a section on Network Systems (one of the precursors of the discrete model of space in our Physics Project) and Systems Based on Constraints, is a section entitled Multiway Systems. The basic thrust is already the core multicomputational one: to break away from the idea of (as I put it) a “simple one-dimensional arrangement of states in time”:
I studied multiway systems as abstract systems. Later in the book I studied them as idealizations of mathematical proofs. And I mentioned them as a possible (but as I thought then, rather unsatisfactory) underpinning for quantum mechanics. I came back to multiway systems quite a few times over the years. But it was only when we started the Wolfram Physics Project in the latter half of 2019 that it began to become clear (particularly through an insight of Jonathan Gorard’s) just how central multiway systems would end up being to fundamental physics.
The basic idea of multiway systems is in a sense so straightforward and seemingly obvious that one might assume it had arisen many times. And in a sense it has, at least in special cases and particular forms—in quite a range of fields, under a wide variety of different names—though realistically in many cases we can only see the “multiway system character” when we look back from what we now know.
The rather trivial barely-a-real-multiway-system case of pure trees no doubt arose for example with the construction of family trees, presumably already in antiquity. Another special and rather trivial case arose in the construction of Pascal’s triangle, which one can think of as working just like the “very simple multiway system” shown above. (In Pascal’s triangle, of course, the emphasis is not on the pattern of states, but on the path weights, which are binomial coefficients.) Another very early—if implicit—use of multiway systems was in recreational mathematics puzzles and games. Perhaps in antiquity, and certainly by 800 AD, there was for example the wolf-goat-cabbage river crossing problem, whose possible histories form a multiway system. Mechanical puzzles like Chinese rings are perhaps even older. And games like Mancala and Three Men’s Morris might date from early antiquity—even though the explicit “mathematical” concept of “game trees” (which are typically multiway graphs that include merging) seems to have only arisen (stimulated by discussions of chess strategies) as an “application of set theory” by Ernst Zermelo in 1912.
In mathematics, multiway systems seem to have first appeared—again somewhat implicitly—in connection with groups. Given words in a group, written out as sequences of generators, successive application of relations in the group essentially yield multiway string substitution systems—and in 1878 Arthur Cayley drew a kind of dual of this to give what’s now called a Cayley graph.
But the first more-readily-recognizable examples of multiway systems seem to have appeared in the early 1900s in connection with efforts to find minimal representations of axiomatic mathematics. The basic idea—which arose several times—was to think of the process of mathematical deduction as consisting of the progressive transformation of something like sequences of symbols according to certain rules. (And, yes, this basic idea is also what the Wolfram Language now uses for representing general computational processes.) The notion was that any given deduction (or proof) would correspond to a particular sequence of transformations. But if one looks at all possible sequences what one has is a multiway system.
And in what seems to be the first known explicit example Axel Thue in 1914 considered (in a paper entitled “Problems Concerning the Transformation of Symbol Sequences According to Given Rules”) what are essentially string equivalences (two-way string transformations) and discussed what paths could exist between strings—with the result that string substitution systems are now sometimes called “semi-Thue systems”. In 1921 Emil Post then considered (one-way) string transformations (or, as he called them, “productions”). But, like Thue, he focused on the “decision problem” of whether one string was reachable from another—and never seems to have considered the overall multiway structure.
Thue and Post (and later Markov with his so-called “normal algorithms”) considered strings. In 1920 Moses Schönfinkel introduced his S, K combinators and defined transformations between what amount to symbolic expressions. And here again it turns out there can be ambiguity in how the transformations are applied, leading to what we’d now consider a multiway system. And the same issue arose for Alonzo Church’s lambda calculus (introduced around 1930). But in 1936 Church and Rosser showed that at least for lambda calculus and combinators the multiway structure basically doesn’t matter so long as the transformations terminate: the final result is always the same. (Our “causal invariance” is a more general version of this kind of property.) Even in antiquity the idea seems to have existed that sentences in languages were constructed from words according to certain grammatical rules, that could (in modern terms) perhaps be recursively applied. For a long time this wasn’t particularly formalized (except conceivably for Sanskrit). But finally in 1956—following work on string substitution systems—there emerged the concept of generative grammars. And while the focus was on things like what sentences could be generated, the underlying representation of the process of generating all possible sentences from grammatical rules can again be thought of as a multiway system. In a related but somewhat different direction, the development of various kinds of (often switch-like) devices with discrete states had led by the 1940s to the fairly formal notion of a finite automaton. In many engineering setups one wants just one “deterministic” path to be followed between the states of the automaton. But by 1959 there was explicit discussion of nondeterministic finite automata, in which many paths could be followed. But while in principle tracing all these paths would have yielded a multiway system it was quickly discovered that as far as the set of possible strings generated or recognized was concerned, there was always an equivalent deterministic finite automaton. One way to think about multiway systems is that they’re the result of “repeated nondeterminism”—in which there are multiple outcomes possible at a sequence of steps. And over the course of the past century or so quite a few different kinds of systems based on repeated nondeterminism have arisen—a simple example being a random walk, investigated since the late 1800s. Usually in studying systems like this, however, one’s either interested only in single “random instances”, or in some kind of “overall probability distribution”—and not the more detailed “map of possible histories” defined by a multiway system. Multiway systems are in a sense specifically about the structure of “progressive evolution” (normally in time). But given what amount to rules for a multiway system one can also ask essentially combinatorial questions about what the distribution of all possible states is. And one place where this has been done for over a century is in estimating equilibrium properties of systems in statistical physics—as summarized by the so-called partition function. Even after all these years there is, however, much less development of any general formalism for non-equilibrium statistical mechanics—though some diagrammatic techniques are perhaps reminiscent of multiway systems (and our full multicomputational approach might now make great progress). Yet another place where multiway systems have implicitly appeared is in studying systems where asynchronous events occur. The systems can be based on Boolean algebra, database updating or other kinds of ultimately computational rules. And in making proofs about whether systems are for example “correct” or “safe” one needs to consider all possible sequences of asynchronous updates. Normally this is done using various optimized implicit methods, but ultimately there’s always effectively a multiway system underneath. One class of models that’s been rediscovered multiple times since 1939—particularly in various systems engineering contexts—are so-called Petri nets. Basically these generalize finite automata by defining rules for multiple “markers” to move around a graph—and once again if one were to make the trace of all possible histories it would be a multiway system, so that for example “reachability” questions amount to path finding. (Note that—as we saw earlier—a token-event graph can be “rolled up” into what amounts to a Petri net by completely deduplicating all instances of equivalent tokens.) In the development of computer science, particularly beginning in the 1970s, there were various investigations of parallel and distributed computer systems where different operations can occur concurrently, but at flexible or essentially asynchronous times. Concepts like channels, message-passing and coroutines were developed—and formal models like Communicating Sequential Processes were constructed. Once again the set of possible histories can be thought of as a multiway system, and methods like process algebra in effect provide a formalism for describing certain aspects of what can happen. I know of a few perhaps-closer approaches to our conception of multiway systems. One is so-called Böhm trees. In studying “term rewriting” systems like combinators, lambda calculus and their generalization the initial focus was on sequences of transformations that terminate in some kind of “answer” (or “normal form”). But starting in the late 1970s a small amount of work was done on the non-terminating case and on what we would now call multiway graphs describing it. We usually think of multiway graphs as being generated progressively by following certain rules. But if we just look at the final graphs, they (often) have the mathematical structure of Hasse diagrams for partial orderings. Most posets that get constructed in mathematics don’t have particularly convenient interpretations in terms of interesting multiway systems (especially when the posets are finite). But for example the “partially ordered set of finite causets” (or “poscau”) generated by all possible sequential growth paths for causal sets (and studied since about 2000) can be thought of as a multiway system.
Beginning around the 1960s, the general idea of studying systems based on repeated rewriting—of strings or other “terms”—developed essentially into its own field, though with connections to formal language theory, automated theorem proving, combinatorics on words and other areas. For the most part what was studied were questions of termination, decidability and various kinds of path equivalence—but apparently not what we would now consider “full multiway structure”. Already in the late 1960s there started to be discussion of rewriting systems (and grammars) based on graphs. But unlike for strings and trees, defining how rewriting might structurally work was already complicated for graphs (leading to concepts like double-pushout rewriting). And in systematically organizing this structure (as well as those for other diagrammatic calculi), connections were made with category theory—and this in turn led to connections with formalizations of distributed computing such as process algebra and π calculus. The concept that rewritings can occur “in parallel” led to use of monoidal categories, and consideration of higher categories provided yet another (though rather abstract) perspective on what we now call multiway systems.
(It might be worth mentioning that I studied graph rewriting in the 1990s in connection with possible models of fundamental physics, but was somewhat unhappy with its structural complexity—which is what led me finally in 2018 to start studying hypergraph rewriting, and to develop the foundations of our Physics Project. Back in the 1990s I did consider the possibility of multiway graph rewriting—but it took the whole development of our Physics Project for me to understand its potential significance.)
**An important feature of the multicomputational paradigm is the role of the observer and the concept of sampling multiway systems, for example in “slices” corresponding to certain “reference frames”.** And here again there are historical precursors.
The term “reference frame” was introduced in the 1800s to organize ideas about characterizing motion—and was generalized by the introduction of special relativity in 1905 and further by general relativity in 1915. And when we talk about slices of multiway systems they’re structurally very much like discrete analogs of sequences of spacelike hypersurfaces in continuum spacetime in relativity. There’s a difference of interpretation, though: in relativity one’s dealing specifically with physical space, whereas in our multiway systems we’re dealing first and foremost with branchial space.
But beyond the specifics of relativity and spacetime, starting in the 1940s there emerged in mathematics—especially in dynamical systems theory—the general notion of foliations, which are basically the continuous analogs of our “slices” in multiway systems. We can think of slices of multiway systems as defining a certain ordering in which we (or an observer) “scans” the multiway system. In mathematics starting about a century ago there were various situations in which different scanning orders for discrete sets were considered—notably in connection with diagonalization arguments, space-filling curves, etc. But the notion of scanning orders became more prominent through the development of practical algorithms for computers.
The basic point is that any nontrivial recursive algorithm has multiple possible branches for recursion (i.e. it defines a multiway system), and in running the algorithm one has to decide in what order to follow these. A typical example involves scanning a tree, and deciding in what order to visit the nodes. One possibility is to go “depth first”, visiting nodes all the way down the bottom of one branch first—and this approach was used even by hand for solving mazes before 1900. But by the end of the 1950s, in the course of actual computer implementations, it was noticed that one could also go “breadth first”. Algorithms for things like searching are an important use case for different scanning orders (or in our way of describing things, different reference frames). But another use case intimately tied into things like term rewriting is for evaluation orders. And indeed—though I didn’t recognize it at the time—my own work on evaluation orders in symbolic computation around 1980 is quite related to what we’re now doing with multicomputation. (Evaluation orders are also related to lazy evaluation and to recent ideas like CRDTs.)
The most typical “workflow” for a computation is a direct one (corresponding to what I call here the computational paradigm): start with an “input” and progressively operate on it to generate “output”. But another “workflow” is effectively to define some goal, and then to try to find a path that achieves it. Early examples around automated theorem proving were already implemented in the 1950s. By the 1970s the approach was used in practice in logic programming, and was also at a theoretical level formalized in nondeterministic Turing machines and NP problems. At an underlying level, the setup was “very multiway”. But the focus was almost always just on finding particular “winning” paths—and not on looking at the whole multiway structure. Slight exceptions from the 1980s were various studies of “distributions of difficulty” in NP problems, and early considerations of “quantum Turing machines” in which superpositions of possible paths were considered.
But as so often happens in the history of theoretical science, it was only through the development of a new conceptual framework—around our Physics Project—that the whole structure of multiway systems was able to emerge, complete with ideas like causal graphs, branchial space, etc.
Beyond the formal structure of multiway systems, etc., another important aspect of the multicomputational paradigm is **the central role of the observer**. And in a sense it might seem antithetical to “objective” theoretical science to even have to discuss the observer. But the development of relativity and quantum mechanics (as well as statistical mechanics and the concept of entropy) in the early 1900s was predicated on being “more realistic” about observers. And indeed what we now see is that the role of observers in these theories is deeply connected to their fundamentally multicomputational character.
The whole question of how one should think about observers in science has been discussed, arguably for centuries, particularly in the philosophy of physics. But for the most part there hasn’t been much intersection with computational ideas—with exceptions including Heinz von Foerster’s “Second-Order Cybernetics” from the late 1960s, John Wheeler’s “It from Bit” ideas, and to some extent my own investigations about the origins of perceived complexity.
In some respects it might seem surprising that there’s such a long and tangled backstory of “almost sightings” of multiway systems and the multicomputational paradigm. But in a sense this is just a sign of the fact that the multicomputational paradigm really is a new paradigm: it’s something that requires a new conceptual framework, without which it really can’t be grasped. And it’s a tribute to how fundamental the multicomputational paradigm is that there have been so many “shadows” of it over the years.
But now that the paradigm is “out” I’m excited to see what it leads to—and I fully expect this new fourth paradigm for theoretical science to be at least as important and productive as the three we already have. – Stephen Wolfram. page (September 9, 2021)