Smalltalk Syntax

Smalltalk syntax is enjoyable for some similar reasons to Python Syntax.

Smalltalk is completely message based. The only way to interact with an object is to send messages to it. So a good place to start is Smalltalk Message Rules.

Braces: Braces are needed, but for different reasons than in a traditional Algol-like language.

Smallest amount of typing: In some ways Smalltalk is very compact, but the culture encourages long method and variable names for readability. The argument is that you're going to type the code once and read it 10 times:
which should be made more efficient ?

Simplicity of syntax: There are actually very few rules that make up the language. There's no special syntax for data structures since everything is an object.

Blocks: Blocks are a powerful and clear way of specifying closures (and useful for lazy evaluation). Sort of changes the idioms for using collections compared to Java, for example:

''Java version'' Iterator it = myCollection.iterator(); while (it.hasNext()) { Myobject anObject = (Myobject) it.next(); anObject.doSomething(); }

''Smalltalk version'' myCollection do: [:anObject | anObject doSomething].

Readability: Once one becomes familiar with the syntax (and rules of precedence), it is amazingly clear to read. Algol languages seem to be cluttered with dots and brackets.

Multi-parameter messages: Inherently self-documenting since every parameter is labeled in the method name, for example:

''Java version'' myDictionary.put(key, value);

''Smalltalk version'' myDictionary at: key put: value.

''...or'' myString findString: 'foo' startingAt: 2

No. This is of course nice, but for decades, people have called everything you could imagine "self-documenting", up to and including C and assembler. No. Something is "self-documenting" only if it somehow includes multiple paragraphs of freeform English text. Everything is "self-documenting" if the programmer is sufficiently disciplined, and nothing is if they are the opposite. Example:

''Java version'' jkfdjkfjdk.xxxxrrrrr(yytytytde, qererere);

''Smalltalk version'' jkfdjkfjdk xxxxrrrrr: 'yytytytde' qererere: 2;

Standard methods in standard classes can't go wrong to that extent, of course, but the standard methods/classes are not the real problem in languages, because they're well known and guaranteed to have thorough documentation, etc.


I am currently trying to write (with Lex/Yacc) a parser for Smalltalk.

(Just for fun, because I like writing small interpreters and Smalltalk syntax seemed easy enough.) Fair enough, the resulting grammar is probably quite a lot simpler than a C++ grammar, but nevertheless I discovered the following syntactic warts that make Smalltalk somewhat more challenging to parse:

Cascades are problematic with LALR(1) recognition. Take a construct like 1+2+3;+4, where the final +4 message should be sent to the result of (1+2). The problem is that the parser has to handle the last binary message just prior to the semicolon different than the previous ones, since the receiver of this last message is also the receiver of all the cascaded messages. Budds Little Smalltalk avoids this problem by changing the rules:
in Little Smalltalk the whole expression in front of the semicolon (1+2+3) becomes the receiver of +4. Fix: parse a slightly too lax grammar (expr cascade), and then do a semantic check.

My Smalltalk transformed 1+2+3;+4 into (1+2)+3;+4 before parsing it. I think that makes the parsing problem easier -- Tom Stambaugh

More to the point would be to parse the obvious way, then simply walk the parse tree, using ; nodes to flag that the cascade needs to be retargeted from child to grandchild.

foo should be lexed as identifier foo, foo: should be lexed as keyword #foo:, but foo:= should be lexed as two identifiers:
identifier foo and the assignment operator. This doesn't work with the longest match rule of Lex. Fix: add a new syntactic category: identifier directly followed by :=.

It's not impossible in lex. You can use forward context to rule out foo: if the lookahead is =. Or more optimally, treat : as an operator in lex, and combine with names in the yacc rule.

This problem has to do with ANSI's "Smalltalk Interchange Format":
in that format, whether a bang(!)-delimited piece of text should be interpreted as a statement list or a method definition cannot be determined by the parser: for this, semantic information is needed. (Namely: did we just read a "Class Name methodsFor: 'protocol'".)

Budds Little Smalltalk solved these problems by bending the rules of the language a little. Would this be acceptable to you?

Uh, this would not be acceptable to me. Mostly because I view it as unnecessary. The "Smalltalk Interchange Format" (which I grew up calling "Chunk Format") is designed to be read by an object in a live Smalltalk environment, not a compiler -- the receiver unpacks the bang-delimited string and passes it to the compiler when needed. -- Tom Stambaugh


Note, that Smalltalk has no special if, case and looping constructs, but the base-library contains methods implementing "if" and looping and they are usually optimized aggresively by the compiler. There is no Smalltalk Case Statement and there is usually no need for it.


A nice demo of extreme simplicity in Smalltalk Syntax:


See original on c2.com