Smalltalk Blocks Are Thunks In Disguise

One of the (maybe not so) dirty secrets of Smalltalk Language is that its block construct is a cleverly-disguised, OO friendly version of a thunk. What Isa Thunk? In general terms, it's a mini-procedure (or a pointer to same, generally with no arguments) which is passed to a function; when the function wants to retrieve the argument in question it evaluates the thunk and uses the returned value as the argument. Thunks were introduced in Algol Sixty as the implementation technique for Call By Name--a nowadays-controversial (and often Considered Harmful) parameter passing mode that Algol-60 included in order to allow functions to simulate macros. As macros (especially the simplistic, text-substitution macros of that era, and of certain languages invented by the phone company) by their nature have Normal Order Evaluation (meaning that arguments are not evaluated until used; and re-evaluated multiple times if used multiple times), thunks were the only reasonable implementation mechanism.

Algol Sixty, whose Call By Name "feature" was notoriously difficult and error-prone to use; perhaps have given thunks a bad name. Call By Thunk, after all, is the most general parameter passing mode there is--it can do anything that Call By Value and Call By Reference can do; and it can do things--like Lazy Evaluation and Normal Order Evaluation--that the other two modes cannot do (at least not without explicitly passing in functions to perform the deferred evaluation--in other words, Call By Thunk). Call By Thunk is much slower of course; you have the overhead of two jumps for each argument handled in this manner (one into the thunk, one back out)--but it's the most general. That said, most languages are designed to avoid thunks; and they are used as a last resort.

Which brings us to Smalltalk. Smalltalk is unusual among programming languages in that its control structures are implemented as messages (in other words, functions) rather than as compiler built-ins. Consider the two "minimal" control structures need for Structured Programming--selection (if/then/elif/else) and looping (for, while, etc.) The former requires Lazy Evaluation to be semantically correct--an if statement that evaluated both the true and false cases (even if returning only the "correct" value) would be virtually useless. The latter requires Normal Order Evaluation, as the body of the loop may be executed n times, where n is a non-negative integer.

Smalltalk Language, like most modern languages (excluding explicitly non-strict Functional Programming Languages like Haskell Language) features Strict Evaluation of function arguments--each argument is evaluated exactly once, before the body of the function. The ifTrue: ifFalse: method strictly evalutes both arguments (the true and false part) before the body is executed. Likewise for the timesRepeat: method on an integer; its argument is executed once before the body.

Which violates the required semantics for selection and looping constructs. So what to do?

Enter the Smalltalk block. By enclosing a bunch of expressions in [], you create an object which responds to the "value" message; it is a pointer to this block object that gets passed and evaluated. Evaluating the pointer does nothing. The body of the true ifTrue: ifFalse: then sends the value massage to the true side; false ifTrue: ifFalse: does the opposite. timesRepeat: repeatedly sends value to its argument.

By this arrangement--of sending a message (calling a function) to cause the data/expression referred to by the block object to be generated--Smalltalk is essentially simulating Call By Thunk. Unlike manual thunk-writing in most procedural/OO languages, which is a pain (it's easier in Functional Programming Languages, where a thunk can be easily made with a lambda), blocks in Smalltalk make it convenient and simple.

(um.. Blocks are lambda's)

[Blocks are more than just lambdas; they are full-fledged objects. Of course, where you draw that particular line is an interesting question; lambdas with variable capture are largely equivalent to objects; object systems formalize a bunch of additional stuff. Most OO languages use blocks, or other sorts of Functor Objects, to model lambdas; some even subsume all functions under the "object" label. An Abstraction Inversion, to be sure; but one which doesn't bother me at all...]

Exactly, functional programmers like to say their functions are first class objects, which makes perfect sense to me. Lambda's are basically full fledged objects too, they have private variables. They are just anonymous, classless objects. You don't need a class to have an object. Objects are just a list of lambda's, either way you look at it, it makes me happy.

Smalltalk Blocks have two other features necessary for them to be used in this manner. First, they have Lexical Scoping, and can do Variable Capture on their surrounding scope(s); functions do not normally nest in Smalltalk. Second, a return from a block returns from the enclosing function, rather than returning from the value: message itself (transferring control back up to the function that sent the value message to the block)--this is necessary to simulate early exit in loops, or if (x) { return; } constructs.

In short, Smalltalk blocks are a clever and OO-friendly way of implementing thunks. Which isn't a bad thing, of course... but if such thunkery weren't in place and so convenient; Smalltalk would need to do like most other languages do--and implement its low-level constructs as special forms. While that would most likely be faster, it wouldn't have the flexibility that Smalltalkers love about their favorite language.

See original on c2.com