Mapping Stimuli to Reactions

Assuming our eyes are working properly, we will have our raw perceptions sorted into nice looking hash buckets. But what to do with these now?

First, we should consider how come the hash buckets exist in the first place. This is so that we explicitly describe what gives them a reason to come into being, because the buckets are the distinctions drawn as a result of having some intention.

Some intentions are about using the incoming connection of our interfaces, such as “distinguish all the pieces of equipment in my home office”. Some others, however, require using the outgoing portion of our interfaces, for example: “look up a friend’s phone number and call”.

Intention induces relationships between the hash buckets corresponding to the things we perceive, and the things we know how to do. The map is built so that, after a period of training, we can quickly select the actions that we predict will satisfy our requirements with high probability. Of course, the things to do themselves are sorted into hash buckets as well.

Thus, in the same way that the 3x + 1 problem implied a behavior map in the integers, the space of distinctions observed as hash buckets according to our intentions needs to be mapped into the space of hash buckets corresponding to things to do. The problem is now how to choose the map wisely so that the feedback cycle of this mapping application converges to an attractor that has the emergent properties that satisfy our intentions.

This can be difficult for two reasons. If we see ourselves, together with the two hash bucket spaces, embedded into a space that has time as one of its dimensions, our first problem is that it is frequently impossible to have full knowledge of our position in this space because our presence affects it too much. But even when we achieve a reasonable degree of clarity of observation, our hash buckets typically have limited resolution — and even then, they can only resolve a subset of the information space we live in. More concretely, there are only so many colors we can see, so many sounds we can hear, and in general there is only so much resolution with which we can perceive.

Because of this lack of precision, the things we decide to do are (most of the time) approximations made in the hope that the actual results will be close enough to what we predicted. It is truly amazing that we can do so much with so little. At the same time, this means we are bound to run into cases when our approximations will not be as good as we need them to be. What are we going to do about that?

Let’s be bold and take for granted that we are always going to have prediction error. Then, predicting too far out into the future will cause the prediction error to accumulate and grow quickly.

Note how this relates so well to numerical integration of differential equations!

The better our hash buckets model the actual process at hand, the smaller the prediction error will be. However, no matter how good our understanding of something is, there is always the risk that something we are not aware of is at play.

Hence the huge market for risk management of all sorts.

It follows that, more than being able to make extremely good predictions about the future, what matters is the ability to make corrections and adjust the predictions on the fly as soon as prediction error is detected, together with moving in the information space in such a way that always leaves room to make such corrections. In this way, given some general direction in which we know we want to move towards in our information space, we should spend time making sure that no matter which path we are going down on, the prediction error is not leading us astray and the required precision of movement is not beyond our reach. In short, what we need to ensure is that our lack of precision does not end up moving us so far away from the attractor we want to reach that we end up moving into the influence zone (or basin) of an unwanted attractor instead.

If we allow our environment to be represented by a system of linear equations up to some degree of accuracy, then its attractors will be given by the eigenvalues and eigenvectors of the approximating matrix. Thus, all we need to do to let it converge is to make sure our prediction error is less powerful than the eigenvalues. If all we are looking for is our particular attractor, it does not even matter if the prediction error is comparatively large, because as long as it is bounded quick convergence to the eigenvectors is guaranteed.

Moreover, note how we can draw a distinction between programming languages that are early bound and those that are late bound. Early binding assumes that developer predictions into the future are of an extraordinary, almost clairvoyant quality. This is especially so as soon as one accepts that things change all the time. As such, the cost of the associated correction that becomes necessary when it is found that the attractor is too far away rises accordingly so much so that success typically lies beyond reach.

Progress is defined in terms of the contraction of the possible solution space.

Let’s review a concrete example that illustrates how these mechanisms work. Basically, there are two ways in which one can write programs that play chess. On one hand, we have the brute force methodology which examines every single possible move up to some practical limit. On the other hand, we have algorithms which try to determine whether certain play combinations are worth examining further. In some cases, they may also decide whether to wait until the next turn and stop making decisions until new information is known. Today’s chess programs are somewhere between these two extremes.

[…]

Going from chess back to our more abstract problem, how can we summarize what we learned from the discussion above? To put it succintly, the design of the map between hash buckets corresponding to perceived distinctions and hash buckets corresponding to actions to take depends heavily on the estimated quality of our predictions.

If we think we can predict well over long periods of time, then naturally we should increase the resolution of the hash buckets to take advantage of this. However, we should keep in mind that high quality predictions have a tendency to take significant time to calculate. In some cases, the time lag between prediction computation and action can introduce enough prediction error to the point that the results of some or all of the effort spent deciding what to do become useless or counterproductive.

If, on the other hand, we can only predict reasonably over short periods of time, we might as well lower the resolution of the hash buckets we are using. What is the point of spending a lot of energy coming up with a prediction that has less uncertainty than the known minimum prediction error? In these cases, it is more important to correct any deviations as quickly as possible, instead of trying to determine the best possible course of action. In other words, when predictions are known to be of limited quality, what we should do is to use a good enough prediction calculated efficiently together with a consistent error correction mechanism.

Another case in which it can be beneficial to lower the hash bucket resolution is when we can show that coarse, comparatively cheap short term predictions produce results just as good as more expensive, longer term considerations. As a valuable side effect, this may also have the advantage of lowering the feedback cycle time lag without compromising the quality of the overall convergence of the behavior to its attractor, thus making it easier for it to react quickly to unexpected circumstances.

⇒ Choosing Which Benefit Function to Maximize