Homoiconic Languages

Languages in which program code is represented as the language's fundamental data type are called 'homoiconic'. Such languages allow code and data to be Deeply Intertwingled, so that new code can be generated and manipulated by the program itself at runtime.

Examples (in alphabetical order):

Jay Language -- halfway; operators written in "tacit" style are structured, manipulable data, but those written in "explicit" style are just text.

Tool Command Language - however, this is done at a higher level, simply representing everything as text.


Lisp specifies that code and data are both S-expressions, code S-expressions can be passed/returned/assigned to variables and have contents mutated, so code is First Class.

The canonical example language is Lisp; both code and data are primarily represented in the same form: as trees (S-expressions) that are trivially and explicitly available for both unrestricted manipulation as data and also for unrestricted execution as code.

Prolog is another example: A Prolog program is itself a sequence of Prolog terms that can be read, manipulated and evaluated using built-in mechanisms.

Homoiconicity is a fundamental language property, not an algorithm nor an implementation issue. Lisp implemented in Pascal is still homoiconic. C is not homoiconic, whether the compiler is implemented in C or in Pascal, and C cannot be made homoiconic without adding fundamentally new and different constructs to the language definition.

Look at a Homoiconic Example In Many Programming Languages for concrete examples of the homoiconic property.

heteroiconic: non-homoiconic; it is nowhere near as commonly used as the term "homoiconic", though.

weakly homoiconic (term invented for this page): a language (not a program) that has homoiconic features sort of tacked onto the side, but the core language itself is not homoiconic (like Tick Cee below). Such languages can be called "homoiconic", but they are not the best archetypal examples of the category.

Note that Homoiconic Languages are strongly related to languages with a Meta Circular Evaluator, because it is always easy to make a Meta Circular Evaluator for a homoiconic language, but they are really two different topics. The Lisp example above is not metacircular, but it is homoiconic. You can write (with difficulty) an interpreter for C in C, but it will not make C homoiconic.

Eliminate this category, mention that meta-circularity is a prerequisite for homoiconicity, but that it doesn't imply homoiconicity. Also, the "strong" vs. "pure" seems to be "real-world implementation" vs. "mathematical ideal". We should mention that (afaik, I could be wrong here) no implemented language is a 'pure homoiconic' language.

strongly homoiconic (term invented for this page): a language (not a program) that supports homoiconicity for its core language features. E.g. Lisp, Tcl, Snobol.

pure homoiconic (term invented for this page): a language where homoiconicity applies to all language constructs. This certainly applies to core Lisp 1.0 as defined by John Mc Carthy in his landmark 1960 paper, which had only S-Expressions, but may not apply to most later dialects that supported data types other than Ess Expressions, because e.g. an array can't be directly evaluated as code (although this is possible indirectly).

Meta Circular Evaluator (more commonly, "Meta Circular Interpreter"): it is possible to trivially implement a homoiconic language in itself. "Trivially" means that the semantics need not be specified explicitly; instead, they are implemented directly by the language construct being implemented. For instance, Lisp eval might be implemented by calling eval. If you don't already know what eval does, then reading the source code for the Meta Circular Interpreter might not enlighten you, it might just say "eval means eval". Thus "metacircular".

It is famous that the core of the Lisp language can be written in about 20 lines of Lisp. This is possible because the implementation is a Meta Circular Interpreter. The average person who writes a C compiler or interpreter requires about 20,000 lines of C to do so, and must be (or become) moderately expert about compilers or interpreters. Implementing Lisp in Lisp as a Meta Circular Interpreter teaches one extremely little about compilers/interpreters for other languages.

Understanding that that is true, and understanding why it is true, may well be a prerequisite for understanding the rest of the page.


Yes, but ...

One of the infamous features of the standard Unix C compiler (I don't know if it is true of gcc) was that the definitions for the characters like \n were not explicit. When you got to the part of the compiler that defined them, they were defined in terms of themselves. Somewhere in time there had to have been a compiler that gave the numeric values of these characters, but it had been rewritten to be just a tiny bit metacircular.

Certainly that's true of gcc. Things like that are always true of any language written in themselves; they need to be Boot Strapped somehow. That's not the same thing as homoiconic, but yes, it is at least "a tiny bit metacircular".

See Ken Thompson's Turing Award lecture "Reflections On Trusting Trust" for some non-trivial discussion.


Example

To make this more concrete, let's pretend that C has new homoiconic features, where an abstract syntax tree is a datatype called Tree, backslash can convert any language construct into data, and the function eval() takes data representing an abstract syntax tree and runs it:

Tree forloop1 = \for (i=0; i<10; ++i); Tree forloop2 = \for (i=0; i<1000; ++i); Tree dosomething1 = \printf("hello"); Tree dosomething2 = \printf("goodbye"); Tree tmp1, tmp2;

tmp1 = concatenate(forloop1, dosomething1); eval(tmp1); tmp2 = concatenate(forloop2, dosomething2); eval(tmp2);

This would print "hello" 10 times, and then print "goodbye" 1000 times. Furthermore,

prinf("%d", forloop1->right->right->data);

...conceivably would select the integer 10 from within forloop1, and print it.

This modification to C is just off the top of my head, not something well-thought out; adding this sort of thing to C is very strange. However it is a natural part of some languages (Lisp, Tcl, Snobol...)

Without these added features, C is not at all homoiconic (nor is C++, Java, Ada, Fortran).


Here's a little deeper discussion of the topic, from a Rochester course on Scheme Language:

[a homoiconic language] can be implemented with a '''MetaCircularInterpreter'''.

Most functional languages are NOT homoiconic.

We can use meta-circularity to define the semantics of Scheme, formally. Suppose M is a denotational function mapping Scheme expressions to their meaning, where the meaning is a mathematical object.

Also suppose I is the Scheme interpreter (itself a Scheme expression). For all Scheme expressions E, M(E) = (M(I)) (E), or put another way, M(I) = M.

Now let H(F) = F(I) where F is any function that takes a Scheme expression as its argument. We have H(M) = M(I) = M, so M is a *fixed point* of H. We can use H and the tools of denotational semantics to obtain a rigorous definition of M.


[Since a lot of argument has occurred on this page from people who weren't familiar with the notion of "homoiconic" previously, it would be nice if we had some more comments from folks who were previously familiar with the concept. There's been all too little of that, with the result that some of the arguments risk being a case of "the blind leading the blind", and then still future generations of readers see that, and make their own interpretations based on whatever random things were said here...after 10 generations of such things, hmm...remember the party/kids game of "telephone"? Distortion/entropy becomes inevitable, in the absence of error correcting codes. Proof: see Shannon, 1948. Seriously.]

Prerequisites:

code must be a first-class data type

first-class blocks: a representation of a collection of code

ability to create, inspect, mutate, and execute blocks at runtime; the creation/inspection/mutation constructs should be the same as those used to manipulate regular (non-programmatic) data in the language, for it to be strongly homoiconic.

See Homoiconicity Classification, in an attempt to shorten this page a bit.


I do not think the ability to *mutate* code is a prerequisite of homoiconicity. A purely functional language with the ability to *construct* code and evaluate it is homoiconic.


Here's a Rule Of Thumb regarding homoiconicity, as I understand it: if a value in a first-class variable type can be passed to the primary language evaluator directly and interpreted without any further modification (that is to say, the internal representation of the value is the same as the internal representation of code), then the language is homoiconic. (Comments and corrections welcome.)

But 'primary language evaluator' is an implementation concept, not a language concept. Sure, some language specs require an 'eval' function - but they never say anything about the 'internal representation' of either value or code. For example, eval in Corman Lisp takes a list, compiles it down to x86 machine code, and jumps to it. Eval in Goo Language compiles the AST to C, runs gcc on it, and dynamically loads it into memory. It's certainly possible to create an eval in Java that takes an AST of objects and interprets them inline.

The compiled Common Lisp program is not homoiconic. It is not equivalent to data lists. The source form was. The interpreted form was. Translations of Common Lisp to other languages may or may not preserve any given property, such as homoiconicity.

Note that it is possible for a Common Lisp system to never have an interpreter but still be a compiler. This is pretty common (SBCL and OpenMCL, for example). The compiled form is not homoiconic, but the source still is. Since the source is discarded immediately, doesn't that mean the language is not homoiconic? The answer is no. Lisp is still homoiconic even in this environment.

Q: How about Tick Cee (see www.pdos.lcs.mit.edu ) an extended variant of ansi C where you can store an expression in a variable, does it qualify as homoiconic?

A: Tick Cee comes much closer to meeting the definition, since it does do some of the things I've been trying to point out that C does not: in Tick Cee you can represent a program (expressions, statements...) as data, you can compile that data, and execute it, all at runtime, and you can do manipulations of that data prior to compiling it. And all of that is supported by the language, at Level 0. Comparatively speaking, that overcomes most of the obstacles.

But not all obstacles. It still fails the primary one! We're talking about "homoiconic". That means "same representation" of data and code. Tick Cee has a special datatype "cspec" that is used for this runtime code generation stuff. That "cspec" type cannot be used with any non-program data, and it has its own special operators that can only be used on "cspec" and not on anything else, etc etc etc. At this point, on the other hand, I really don't want to argue about it, so if you want to call Tick Cee homoiconic, ok, whatever, I suppose it's arguably close enough, but it's not a great example of homoiconicity.

Tick Cee is also kind of a kludge; you can't have dynamic generation of code-generating code (this isn't commonly useful, but it certainly is a flag of inelegance), and they support only compilation, not interpretation, which is again a red flag. These are side issues, but in the other 3 homoiconic languages mentioned on this page, none of this is a problem.

"homoiconic" isn't just some nitpicky definition, it's a signal of a clean deep elegant language property. Just one property; the rest of the language might be dirty, shallow, and trashy. :-) But a very important property; note again the connection with Meta Circular Interpreters (which is not identical to "interpreter"; the "metacircular" part means something).

(I should mention that, in terms of the C world, I think that Tick Cee is actually kind of cool; there's a different set of relative standards.)


This is actually closely related to the subject of Quine Programs, but I don't know if the relationship is clear without understanding the topological Brouwer Fixed Point Theorem alluded to in the Rochester course quote at top of page. (In good mathematical tradition, Brouwer went mad and renounced his own theorem, but no one believed him. A warning? "Abandon All Hope, Ye Who Enter Here", as Dante said. :-)

I notice there's at least one web page out there that talks about this - although it says that the fixed point theorem involved is not related to Brouwer's, which I believe is wrong. I'm not actually recommending this page, but just for completeness sake: www.eleves.ens.fr:8080


Homoiconicity is a Language Shield. It makes it trivial to switch to/from using dynamic code as an implementation detail, whereas in e.g. Cee Language, deciding to use dynamic code is not a minor detail, it requires jumping through huge hoops to accomplish -- and so if it is ever used in a program, odds are that it will always and forever continue to be used in a program. There's a barrier to entry, and then a major subsystem to throw away, which seems the opposite of a Language Shield. Therefore homoiconity in e.g. Lisp must be a Language Shield. -- dm


What about a language where code can be represented and manipulated as a first-class datatype, but only at compile-time? For example, Nemerle Language's macro facilities let you manipulate code as lists of expressions, pattern-match against subexpressions, etc, but it has no concept of an eval function, or anything like a REPL. Where would a language like that fit in, as far as homoiconicity goes?


Ironically, while C and C++ are not homoiconic, Cee Preprocessor macros would be if it permitted rewriting of macro definitions (it doesn't -- it sort of does, through the magic of having a header file include itself recursively). The C preprocessor works by a limited form of string rewriting; if it could operate on itself, it could treat the preprocessor macros as re-writable strings in the macro code itself. It would then be possible to do things such as

#define INSERT(x) #include x #define MESSAGE "Testing, %s..." #define DEF(x) #define x(y) MESSAGE, y

INSERT(<cstdio>) DEF(w)

main() {

printf("If this works, then CPP is partially homoiconic.\n"); printf(w("one, two, three")); }

However, this is all hypothetical, as the standard does not allow the unquoted sharp in macros, which would be necessary to allow you to add new definitions programmatically. Note also that even if it were possible, it would only apply to the preprocessor macros (which make up a separate language that can be run - using cpp(1), the stand-alone version of the preprocessor - independently of C), not to C itself. It would present something of a degenerate case: a homoiconic language that is not Turing Equivalent. - Jay Osako

P.S.: Does anyone know if the m4 macro processor can re-write its own macros?


''from the but-that's-a-completely-different-sort-of-lambda dept.:''

Punsters may rejoice in the fact that both Alan Turing and the Universal Turing Machine are... no, I'd better not say it... - Jay Osako, who, like a Xanadu link, is Bi Visible


Homoiconic is what Hofstadter describes in Goedel Escher Bach as self-representation. He makes a very big distinction between self representation and self reference. -- Joel Jones


Having spent some time reading the various homoiconic related pages here, I've concluded that, like other language issues, some people want their language to be included no matter what. In an attempt to reduce friction and improve value, what if this page were reorganized so languages are grouped by their relative homoiconicity (strong, pure, weak, hetero)? That is, at the top of the list would be things like Lisp, which nobody argues about; "code and data are represented as a fundamental datatype in the language"; that's the criterion for 100% "definitional compliance". Below that would be definitions with other criteria, and the languages that meet those criteria.

If the Lisp example is a good compliance test, use that; if not, come up a better example, or even more than one. Create an example and let people implement that functionality in their language of choice (i.e. "Here's how it looks in language-x"). Rather than argue, people can compare code and form their own opinions.

If even a few people can come to consensus on the categories, criteria, and examples, I think that might go a long way. -- Gregg Irwin


Does this count for homoiconicity ?

(defun f (x) (car x)) (symbol-function 'f)

; all portable ; However the results are different ; which makes things like instrumentation (an obvious application to justify homoiconicity) ; non-portable. Folks do not get easy access to their code as data ; as it was promissed :

; In one implementation we get: ; #<CLOSURE F (X) (DECLARE (SYSTEM::IN-DEFUN F)) (BLOCK F (CAR X))> ; while in another ; (LAMBDA-BLOCK F (X) (CAR X))

;Furthermore if we make the mistake: (compile 'f) ; Our homoiconicity went down the drain altogether, ; the code behind f is no longer first class

There is standard FUNCTION-LAMBDA-EXPRESSION function that gives somewhat portable sexp of the function. so

(function-lambda-expression (symbol-function 'f))

on clisp will result in

'(lambda (x) (declare (system::in-defun f)) (block f (car x)))

while on lispworks it results in

'(lambda (x) (declare (lambda-name f)) (block f (car x)))

both implementation gives something implementation dependent. However the fourth element of this list is the same.,(block f (car x)), which is the code of the function. What you do above, typing just "#'f" to lisp listenr,will just print the implementation dependent way of representing function object as string. Using FUNCTION-LAMBDA-EXPRESSION will give the code as sexp, which you can then manipulate.

But still, once you compile the function #'f, FUNCTION-LAMBDA-EXPRESSION function won't work any more.

Let's explore one step further:

(caddr (cadddr (function-lambda-expression (symbol-function 'f)))) ; this will give me (car x) ; let's try some real stuff

(setq body (caddr (cadddr (function-lambda-expression (symbol-function 'f))))) ; now body points to the real code for the function, or so it seems. ; let's try it

(rplaca body 'cdr) ; now the function should be fun x -> cdr x ; let's inspect again (symbol-function 'f) ; this returns #'(LAMBDA (X) (DECLARE (LAMBDA-NAME F)) (BLOCK F (CDR X))) ; It looks like we are successful, aren't we ? (f '(1 . 2)) ; this returns 1, OOPS ; it seems there's an optimization underneath our interpreter (compile 'f) ; OK (f '(1 . 2)) ; this returns 2 HOORAY. ; Or sort of, because in order to force our first homoiconic manipulation of f ; we destroyed f's homoiconicity

Then we move to CLisp and observe that the above code works without compiling f. Is this standard ? www.lisp.org . Yes, it returns a lambda expression (it doesn't say that it returns the lambda expression use by the interpreter . See www.lisp.org . Well, it seems that ANSI Common Lisp doesn't gurantee full homoiconicity. I was thinking of how to write generic code that given a function symbol it instruments the code by logging every call to the function. That would be a nice showcase of homoiconicity. But it ain't that easy, and it's not guarantee to work.

This commentary seems to imply the requirement that all code be accessible as data, all the time, in a "fully homoiconic" language. Is that really a necessary characteristic? Or is it the only case that a homoiconic language allows you to build data structures that can subsequently be used as code? (In any case, requiring all code to be accessible as data at all times mandates overhead that I'd rather not have unconditionally required.)

"program code is represented as the language's fundamental data type" -- That's the key phrase for me. It's not about overhead, having an EVAL statement, or any other implementation issue. What the statement implies, which I think is important, is that the laguage has a "fundamental data type". It's about not having to build data structures you can use as code, thanks to the language's design. --Gregg Irwin


I still do not get it. From the above I gather a language is homoiconic if the representation used for data is the same as the representation used for code. There is one huge problem with this definition. Unlike the representation of data, the representation used for code occurs in the *meta*-language, not in the *object*-language. So there is no way of comparing the two representations. Firstly, because a language has no access to its meta-language, and secondly because a language can be implemented in many different meta-languages.

Consider this example in LISP:

'(ADD 2 3)

This is data: It's a list with three elements. The first element is the string "ADD", the second element is the number 2, and the last element is the number three. (The quote in front (') is a short hand for "treat this as data.")

Consider this example in LISP:

(CAR '(ADD 2 3))

"CAR" is a function that returns the first item of a list. This line of code returns the value "ADD".

Consider this example in LISP:

(EVAL '(ADD 2 3))

The "EVAL" function evaluates a list as if it is code. In other words, it causes the "ADD" function to be called, in this example, returning the sum -- the numeric value 5.

In LISP, it's trivially easy to generate and execute code at run time. It's part of the language.


This sounds an awful lot like assembler (machine-language) and doing the one thing every assembler programmer learned the hard way not to do: write self-modifying code.

It's a useful form of code generation. It's the basis of many Aspect Oriented Programming implementations. It's a powerful tool. But like any powerful tool, it can be a problem if used carelessly.


If I may be excused for descending to the crude level of implementation for a moment... I must note that a homoiconic language seems to require that the language runtime include a compiler/loader or interpreter for the language itself. I'm not an expert, but it seems that this could cause some implementation problems... what if you want to use the language for some special-purpose program, say for an embedded controller, where you don't want to slog extra megabytes of code around with you?

You would have to use a different language, I suppose. Or not use the homoiconic features of the language. It's a really nice tool, but there are trade-offs. -- Simon Heath

Yes, but homoiconic languages tend to allow for small implementations of the compiler/interpreter part (there are Lisp interpreters, that fit into 2K of C-code).


Is Snobol still homoiconic when one uses Snocone?


By maintaining a one-to-one correspondence between the interpreted, byte code interpreted, and compiled structure of a lisp/scheme language, we can also edit, modify, and evaluate code at the byte-code and also the compiled c-code level. For the c-code, threaded function calls style would preserve the one-to-one correspondence. This would require 3 orthogonal code parsers/tree walkers, but the resulting language would be VERY homoiconic, self-referential, metacirular, etc . I know of no such language or implementation, but sounds like a fun project. --James Goldan



See original on c2.com