Machine Learning

One can think of machine learning as being about deducing models of what can happen in the world from collections of “training examples”. Often one imagines a collection of conceivable inputs (say, possible arrays of pixels corresponding to images), for which one wants to “learn a structure for the space”—in such a way that one can for example find a “manifold-style” “coordinatization” in terms of a feature vector. How can one make a multicomputational model for this process? Imagine for example that one has a neural net with a certain architecture. The “state” of the net is defined by the values of a large number of weights. Then one can imagine a multiway graph in which each state can be updated according to a large number of different events. Each possible event might correspond to the incremental update of weights associated with back-propagating the effect of adding in a single new training example. In present-day neural net training one normally follows a single path in which one applies a particular (perhaps randomly chosen) sequence of weight updates. But in principle there’s a whole multiway graph of possible training sequences. The branchial space associated with this graph in effect defines a space of possible models obtained after a certain amount of training—complete with a measure on models (derived from path weights), distances between models, etc. But what about a token-event graph? Present-day neural nets—with their usual back-propagation methods—tend to show very little factorizability in the updating of weights. But if one could treat certain collections of weights as “independently updatable” then one could use these to define tokens—and eventually expect to identify some kind of “localized-effects-in-the-net” space. But if training is associated with the multiway (or token-event) graph, what is evaluation? One possible answer is that it’s basically associated with the reference frame that we pick for the net. Running the net might generate some collection of output numbers—but then we have to choose some way to organize these numbers to determine whether they mean, say, that an image is of a cat or a dog. And it is this choice that in effect corresponds to our “reference frame” for sampling the net. What does this mean, for example, about what is learnable? Perhaps this is where the analog of Einstein’s equations comes in—defining the possible large-scale structure of the underlying space, and telling us what reference frames can be set up with computationally bounded effort?