One More Level Of Indirection

It is said that there is no problem in computer science which cannot be solved by one more level of indirection (this sounds like a quote, has anybody references?).

The quote is credited to David Wheeler: en.wikipedia.org (computer_scientist) [[please fix the URL -- how does one encode parens?)

But it also can add complexity to the system, and therefore is not a free lunch.

Definition of 'indirection' for the purpose of this page:

Proposed: Indirection is the ability to reference something using a name, reference, or container instead of the value itself. (from Wiki Pedia)

Proposed: Indirection refers to dispatch of communications via an intermediary. The intermediary is the cause of indirection, and is (strictly) an entity capable of examining a message and making a decision regarding it (e.g. to forward it or answer it, or on where to forward it, etc.). Because there may be more than one such intermediary, each intermediary along a communications path constitutes one level of indirection. Levels of indirection may vary on a per interaction basis.

Note that names of constant things (as often seen in functional programming) do not constitute 'indirection' under this definition because the 'name' is unable to make any decisions regarding which value to return, meaning one would have a perfectly equivalent system by simply rolling the value associated with the constant name directly into the source that originally referenced it. This is important: if a system is logically equivalent to one without indirection, then one shouldn't be able to say there is an indirection. That said, variable names do constitute indirection - they 'decide' to return whatever it is they've most recently been told to return, and such decision is at least binary (boolean) as it can vary over time.

OTOH, one might distinguish usefully between indirections that exist only prior to Compile Time and indirections that continue to exist in the runtime. Compilation, and particularly Partial Evaluation and specialization, are strategies that automatically reduce runtime indirection (sometimes at expense of code-size, often at significant gain in performance). Compile Time Resolution would also fit in there somewhere, introducing a Compile Time-only indirection.

And indeed if you think about it all problems can be made to fit this pattern:

Customer needs more than one address: Add relation between customer and address.

Site should display in multiple languages: Add resolution of keys to translated strings by language.

Content should be displayed in multiple formats: Add selection of format and delegate to that.

File gets too large: Split into parts and simulate by indirection.

Need to boot more then an OS: add bootloader which selects OS.

Need redundant hard discs: Use multiple discs and let RAID controller simulate a redundant one.

...

I think you need to take some care to ensure that these actually qualify as 'indirection'. If you interpret the meaning too loosely, of course every pattern will fit. But it will also be weakened to the point its value is questionable (a bit like saying that all problems in computer science are solved by reconfiguring hardware and software). 'Indirection' as used in the typical context of the phrase above refers primarily to runtime dispatch of communications via an intermediary. As such, while most of your above examples work, the 'relation between customer and address' does not. Further, most of your examples below do not (e.g. the modular componentization, use of moderator, reducing strength, parallelization), though caching is an indirection.

I agree. Seems I have fallen prey to the Tautological Definition Fallacy myself. I added a definition of indirection at the top. By the way a moderator (in the sense of one being asked for futher information) is an indirection.

For a 'moderator' to qualify for indirection, it would mean you ask everything of the moderator and the moderator then figures out whom to ask. This is a sense of 'moderator' I have heard before. Not, of course, that this would actually help for problems of irreducible complexity (if you can't split a problem into components, you trivially can't split it across components in different people's skulls). Having a moderator only simplifies the problem of figuring out of whom to ask a question.

I mentioned the moderator as a hint at how people might split knowhow among each other. The organizational processes are more complex than that of course, but you get the idea: Essential complexity can be handled not by reducing it to have one person understand it but by having people understand it collectively.

But what about 'non-functional' aspects?

System is too slow:

Add caching (which is an indirection of access) OR

Parallelize by distributing access (indirection over loadbalancer)

Reduce strength (e.g. move code out of the loop, this is also a kind of caching; if handled by the compiler this amounts to another compilation phase - which is an indirection)

System is too complex to understand

split into smaller components which are understandable. The indirection is the call from one component to the one below.

If the complexity is irreducible (Essential Complexity): Use more people to understand the problem and have a moderator!

...

I think this could even be generalized to science at large.

What do you think?


In trying to find who first made this quote, I found an interesting counter-quote, equally unattributed:

Any performance problem can be solved by removing a layer of indirection.

I found it here: 64.233.167.104

That's probably for the same reason of Ridiculous Simplicity Demands Ridiculous Resources vs. Ridiculous Simplicity Gives Ridiculous Resources. It depends on the kind of indirection. An indirection that has an excessive span (too many callers/callees) causes bottlenecks while too little span (only one caller/calle) is simply overhead. The trick seems to be to have many average indirections (like in the memory hierarchy).

Or, according to Java and Ruby advocates.. any performance problem can be solved by buying more CPU's. Har Har Har

When you consider the cost of those additional CPUs versus the cost of programmer's paychecks for the amount of time optimizing code, they're right.

{Maybe. I think that would depend on how readily the problem at hand is parallelized, and how well Ruby and Java in particular are at taking advantage of the parallelization inherent to the problem. As languages go, they certainly are poorly designed for it... but I can see how "Java and Ruby advocates" wouldn't solve that by switching to a better language for parallelization. And extra CPUs will never help for problems of indirection across large network connections or deep indirections against a slow hard disk.}

I think these issues are quite self-evident, even to those who code in Ruby/Python/etc. Where I've worked, most software was written in Java, with glue logic provided by Perl, Python, etc. The "throw more CPUs at it" is the one truism that applies across these companies. Strangely enough, although none of these languages are conducive to parallelism, and the problems they are solving aren't even conducive to parallelism, somehow people still find ways of parallelizing. Even in a 100% sequential problem, you still have several, distinct stages of processing. Were this not possible, it would be utterly impossible to program a computer to solve it. Recognizing this, you can set up multiple CPUs, each concentrating on a particular stage, and pipelining between them. Yeah, latency in solving a problem is still high (in fact, it'll be higher!). However, when you have a lot of stuff to process, you end up saving time overall. --Samuel Falvo

{Straightforward parallelizing helps if you need to solve the same problem repeatedly. Pipelining helps if you need to solve the same problem repeatedly but either the problem itself or the behavior needs to vary based on previous problems solved. I believe both are 'conducive' to parallelization. Neither of them improve performance when the task is just to perform one, very large problem that requires all the data. Sometimes the only solution to parallelizing is to change the problem.}

It seems that parallelism is accomplished much more easily when we can dissect the problem. If the problem is hard to dissect, it is harder to parallelize. I had a visualization that I was opening up my web browser (with 120 tabs open) and I used 120 different web servers to do something useful for me. After all, the internet is parallel capable in some respects (but not setup to make use of parallelism easy for the typical person, since a browser just gets dumb web pages from one server at a time.. but think about the 120 simultaneous tabs open visualization for a minute). The real issue after I had this visualization of 120 tabs open, was thinking of what it is that these 120 web servers in each tab that I am remotely controlling will do for me that is useful (laughing out loud.. well, if each server returned a resultset and then a I had a collector program that grabbed all the results and stored them in a distributed database, I'd be back to the ideas noted in Eventual Side Effects again.).


See original on c2.com