The Space to Traverse

Since we will be considering the traversal of a space, the first thing we should consider is the shape of the space in question. The Smalltalk image can be seen in many ways, but as we are interested in references, we could take a look at it from a garbage collection point of view.

With these intentions, the image “starts” at some globally visible object that is considered to be not garbage by convention. Typically, this is the `Smalltalk` dictionary. Anything that is directly or indirectly referenced by this non garbage context is said to be not garbage either.

And what is a reference? If we look at objects from a Smalltalk point of view, references occur via regular instance names. For example, associations reference the values of their `key` and `value` instance names. In addition, references also occur by means of index based storage. For instance, arrays reference whatever is stored in any of their integer named slots.

These are the only ways in which an object can reference another object. This is because things like class names and pool dictionaries are implemented in terms of instance names or integer named slots.

So it seems we could think of the Smalltalk image as a tree starting at `Smalltalk`. But since the leaves could easily reference their parents or even `Smalltalk` itself, it does not quite qualify as a tree. In reality, it is a graph where nodes are objects and references are represented by arrows connecting the nodes.

Interestingly, the node for Smalltalk is not special by itself. We just start traversing the graph from this node only because of our convention to never consider it garbage.

To be more precise, we are interest in finding out all the objects that are connected to `Smalltalk`. The fact that we are going to say they are not garbage does not matter as far as the connectedness check itself is concerned.

Also, note how the arrows only have one direction: no object has immediate knowledge of where it is referenced from. This is why tools like a reference finder become necessary so that we can derive this knowledge from the graph.

The truly interesting references to an object are the ones that occur closest to `Smalltalk`. Since these are the most natural reference paths, we should build a breadth first reference finder — one that works as some sort of onion peeler in reverse.

Unfortunately, the nature of the space and the implementation techniques available make it too easy to implement depth first reference finders instead. For example, we could let an object iterate over instance names and indexable slots. However, this forces the static context to remember which ones it chose to dive through, and things typically get quite complicated soon thereafter. As if managing that iteration was not difficult enough already!

We could also try to take advantage of an existing primitive that answers where a given object is directly referenced from. But, alas, the answer of this primitive leaves us with figuring out a breadth first path from `Smalltalk` to the object in question, while the information we are provided happens to be oriented in a direction opposite to the one we want. Making sense out of the result is typically left as an exercise for the developer. Sorting through the mess of false positives, even when done mechanically, takes a lot of time and effort.

To summarize, while these techniques make it very easy to implement a depth first reference finder, the problem is that depth first traversal has the irritating behavior of producing extremely long and convoluted reference paths. What we need is a breadth first reference finder, and these tools give us no easy way to implement one. It looks like this is going nowhere.

But there is another way to take a look at objects and their mechanisms for referencing other objects. What if we saw the reference graph in terms of forms and distinctions instead?

~