Global Variables Considered Harmful

W.A. Wulf, M. Shaw; Global Variables Considered Harmful, ACM SIGPLAN Notices 8:2, Feb 1973, pp. 80-86

Considered by many to be one of the classic papers of computer science. Nowadays seems obvious - global variables enjoy a reputation only slightly better than that of the infamous Go To statement. I still use 'em occasionally, but cringe whenever I do.

However, the interesting thing about this paper is that it claims to skewer one prominent language feature - with an effect similar to that of Global Variables - embraced by many here on Wards Wiki - Lexical Scoping, as found in languages such as Pascal Language, Algol Language, and, more importantly, Lisp Language (and to a lesser extent, Smalltalk Language).

In fact, the arguments in the paper are really targeted at Nested Scopes; they apply equally (or more forcefully) to Dynamic Scoping combined with Nested Scopes.

In any case, the paper gives four main arguments:

Side effects. Just like with globals; functions modifying variables other than their own locals can cause surprises of all sorts; if pass-by-reference is used then aliasing can occur when it isn't expected.

Indiscriminate access. The programmer cannot prevent sub-procedures from modifying the values of a local variable's procedures.

Vulnerability. New declarations may be interposed between when a variable is declared in an outer scope and when it is used in an inner scope.

No overlapping definitions. It is difficult to control shared access to variables.


This paper is outdated; nested lexical scoping is considered a very desirable feature these days. Here are counterarguments to each of the points above:

Side effects: Some languages have "final" or immutable variables. In that case, accessing only final variables in outer scopes (either as a coding convention, or enforced by the language) would eliminate any concerns about "surprising" side effects. Note that nested lexical scoping is also applicable to Pure Functional Languages, in which all variables are immutable.

The point about pass-by-reference is largely irrelevant since very few modern programming languages use pass-by-reference. (Languages that pass "references" by value are not pass-by-reference.)

[Huh? Pass-by-reference ALWAYS requires passing a reference by value. That's how it works. The question is whether the referenced object is a COPY of the caller's object, or an ALIAS for the user's value. Most modern languages pass by reference for non-primitive types.]

Indiscriminate access: Essentially a repetition of point 1, and the same counterargument applies.

Vulnerability: "New declarations may be interposed between when a variable is declared in an outer scope and when it is used in an inner scope." So Dont Do That. This is not difficult to avoid, and it would be easy for a compiler to warn about this situation (it should not be an error).

Note that any rule other than accessing the "innermost" declaration of a variable, would change the meaning of a code fragment if it is moved to another context, and some local variable coincidentally shadows a variable in an outer scope. In any case, variables can and should be renamed in cases where shadowing makes a program too confusing.

No overlapping definitions: It is not difficult to control shared access to variables because of Lexical Scoping. On the contrary, Object Capability Languages generally support nested lexical scoping precisely because it makes it easier to avoid using global variables, and thereby makes controlling shared access to variables easier.

Many arguments against global variables that are independent of both Lexical Scoping and Nested Scopes are given in Global Variables Are Bad.


In addition (the paper doesn't discuss this); implementing Lexical Scoping (with inner functions having access to outer function variables) poses lots of implementation difficulties for the language. You need a Static Chain (or a "display") to be able to access variables defined in enclosing scopes; implementing First Class Lexical Closures becomes a pain in the butt.

This is merely conjecture on my part; but I think one of the reasons that Cee Language was so successful early on is that it threw Lexical Scoping into the bin. It simplified both the semantics and the implementation of the language greatly. The success of C (and C++, No Flames Please) are good evidence that Lexical Scoping is not needed.

[Huh? C is lexically scoped. And lexical scope has nothing to do with "procedures being able to modify global variables". That is a totally orthogonal issue. How is it any better when a dynamically scoped language allows global variables to be altered by a procedure? In fact, it can be much worse, since the name of the variable, built into the procedure, could then enable the procedure to refer to some variable some poor sap who calls the procedure just happened to name the same way]

Yes, C is lexically scoped. The above comments were made before this page was changed to distinguish between Lexical Scoping and Nested Scopes. It is Lexical Scoping combined with Nested Scopes that causes the implementation difficulty.

In fact Dynamic Scoping combined with Nested Scopes is even more difficult. Henry Baker wrote a paper on this, which is on-line at home.pipeline.com . Despite the title "The Buried Binding and Dead Binding Problems of Lisp 1.5", the problems it describes apply to Dynamic Scoping in general.

Class-like variables can be used in C by declaring them static within a .C file. These variables will be visible to all methods within the file, but hidden to methods outside the file.

[This misunderstands the point. Here's an example of C extended to have nested lexical scoping:]

typedef (*FuncPtr)(); FuncPtr /* f returns pointer to function */ f() { FuncPtr ret; int i = 0; /* here is a nested lexically scoped func that accesses i */ void g() { printf("%d\n", i); } g(); /* ==> 0 */ ++i; g(); /* ==> 1 */ return g; /* return ptr to nested function g from f */ } main() { FuncPtr h; h = f(); /* prints 0, then 1 */ (*h)(); /* indirect call to g(), prints 1 */ }

This indicates the difficulty. When f() is called via the Func Ptr h in main(), g() still has valid access to the stack variable i in f(), which means that the C compiler is not allowed to throw away the stack frame(s) created by f() when f() returns. Also, if f() is recursive, the i that g() references must be the one in the most recent stack frame created by f().

This is full "nested lexical scoping". As someone said above, it requires a Static Chain/display to achieve this effect, it has to do fancy stuff with tracking stack frames, etc. It's all hugely complicated compared with C semantics... and all of the obvious implementations can require an arbitrary amount of computation, in the worst case, just to access the value of "i".

As they said, it is indeed arguable that leaving out this crud helped with C's success.

Furthermore, although Lisp's "modern" lexical scoping is preferable to the older dynamic scoping, and although it is sort of necessary just to be able to create local variables with e.g. the LET macro, at least in the paradigm the Lisp world is accustomed to, there is indeed a strong argument that it is nonetheless evil in the absolute.

any language that I'm aware of...>

Please tell me you don't mean "first-class" as equivalent to "can be manipulated as a value or object at runtime", because (A) I don't think that's the most accurate definition of "first-class", and (B) macros exist only at compile-read time by definition, in any language, including assembler where they were first invented, so of course they can never be manipulated at run-time, that would be self-contradictory. But they could be First Class at compile-read time... I would have thought that there were indeed Lisp dialects that did so.

might be possible to Unify Macros And Functions at some point - Lisp Language comes close in that it can read in new code on the fly (and process macros invoked therein). As discussed in Cee Preprocessor Statements; there are things that can be done only with macros (and things that can only be done with functions, at least in current languages).>

And from the point of view of the innermost function, it is all about a variable that is in fact "global" from that nested function's point of view.

[No, macros are dynamic scoping, and C/C++ (as opposed to C/C++ considering macros as part of the language) are lexically scoped. Lexical/dynamic scope have nothing to do with whether or not nested scopes are allowed. C is simpler to implement (and in many cases simpler to understand) than the Algol family of languages because it does not have nested scopes, not because it does not have lexical scope. When you define a global variable in a file with a function, that one level of scope allowed is lexical. The function, when called, always refers to that variable which is in its own lexical scope. However, when a macro is expanded, if it has a free variable in its definition, that free variable will end up referring to the free variable in its context of expansion. So, macros are scoped dynamically, not lexically]

More precisely, macros are [[Shallow Binding dynamic scoping. They look up variables in the scope of their expansion, but they don't trace unbound variables down the dynamic call stack. Dynamic Scoping, as used in early Lisp dialects and Elisp, will continue searching the call stack until it finds the appropriate variable. This is Deep Binding]]

Perhaps, but now I'm wondering if there's an authoritative source for getting the terminology right, because after all, you can have scoping based on lexical level as well as Nested Scopes but still not need chains/displays; many of these issues are more often than not all crammed together into a single term, like Lexical Scoping has done.


See original on c2.com