Scoping Rules

The scoping rules of a Programming Language dictate how Free Variables - symbol names which are found in the body of a scope (a function, macro, class, whatever) but not defined there - are resolved.

Several different strategies exist:

Disallowing them altogether. Very rare; though some languages may disallow free references to all but a pre-defined set of symbols (often called keywords or special forms) which are provided by the language.

Only two scoping levels - global and local. Cee Language and many assemblers use this rule. A Free Variable defined in a C function must refer to a "global" symbol (meaning defined either at file or global scope; as opposed to within a function body). This simplifies the implementation of C greatly (and many C programmers don't miss more advanced scope rules at all). Cee Plus Plus sticks to the C tradition in many ways, though the presence of features such as classes and namespaces cause C++ to relax the scoping rules quite a bit. (Still, no C++ function can bind to the local variables of another C++ function). Java, with Inner Classes, relaxes the scoping rules further.

Lexical Scoping. Used by Common Lisp, Scheme Language, Algol Language, Pascal Language, and many others. If a variable isn't found in a given scope, the enclosing scope is searched; repeating until the outermost scope is reached. Two important variants of this are Deep Binding and Shallow Binding. With Deep Binding, variables are bound to the environment of where the function is defined; with shallow binding variables are bound to the environment of where the function is called. Most languages which support Lexical Scoping support Deep Binding for functions; most macro systems (excluding Scheme's Define Syntax) use shallow binding. It is possible to divide Deep Binding further into two separate forms (I am not aware of any generally accepted terminology to describe these forms). In one form, a Free Variable essentially is an alias for the actual variable in the enclosing scope; if that variable changes than the value of the variable in the function being considered also changes. (If the enclosing scope has exited, possible if First Class functions are mixed in with Lexical Scoping, then the value the variable had at exit changes). Most languages which support Deep Binding support this form. In the other form; the Free Variable takes the value that it has at the point when the function in question is first defined; and does not change. (In other words, the function using the Free Variable makes a copy of the value provided by the enclosing scope). I'm not aware of any languages which do this for Free Variables; though objects used as closures have this behavior. Java Inner Classes (when defined within a function) sidestep this issue by only allowing references to variables in the enclosing function which are declared to be final - in other words, those whose value does not change.

Dynamic Scoping (early dialects of Lisp, Common Lisp special variables, exported environment variables in Unix Os): The caller is checked for a binding for the variable; if one is found, it is used. Otherwise, the caller's caller is checked, and so on. If no definition is found, it is either an error or a default value is used, depending on the semantics of the language.

The following C-like program illustrates the different scoping rules. Apply a different rule, that's that what would print. Of course, this is not legal C in real life, as Cee Language doesn't allow nested functions.

int main (void) { const char *scope = "Lexical, deep, by copy "; void print_scope (void) { printf ("%s\n", scope); }

scope = "Lexical, deep, aliasing";

void (*)(void) helper_func (void) /* Returns ptr to function; pretend its a closure */ { const char *scope = "Lexical, shallow scoping"; void do_print_scope (void) { print_scope(); } return do_print_scope; }

void do_it (void) { const char *scope = "Dynamic Scoping"; helper_func()(); /* Call the function returned by helper_func(); */ } do_it(); /* Print what scoping we are using */ }


Another example, in pseudo Pascal Language: (from 'Compilers: Principles, Techniques, and Tools')

program scoping; var r : string; procedure show; begin writeln(r); end; procedure scope; var r : string; begin r:='Dynamic'; show; end; begin r:='Scope'; show; r:='Lexical'; scope; end.


Note: C does use Lexical Scoping but does not allow function definitions to be nested (Standard C doesn't, but GNU C does). C++ also uses Lexical Scoping. Namespaces and classes are just lexical scopes, like any other, as are Java Inner Classes.

For example:

int func() { int outer_local = 1; { // this block introduces a new lexical scope int inner_local = 2;

printf( "%i %i\n", inner_local, outer_local ); } // inner_local is no longer in scope. }

Will print:

1

The important point about C/C++ is that it doesn't allow Free Variables in one function to refer to anything defined in another function; this eliminates the need for a Static Chain and/or closures.


So this perfectly valid ISO 9899:1999 does not use Lexical Scoping?

int main() {

for (int i = 0; i < 10; i++) { dosomething(i); } // i is no longer accessible. }

This is not a GNU extension, this is valid C99. AFAIK the standard does say that blocks group a set of declarations and statements into a syntactic unit, and a compound statement is a block.


Actually, C++ does allow nested functions; you just have to spell the internal function differently. The following is not legal in C++:

void outer (int x) { void inner (int y); { cout << y+1; }

inner(x); }

The following, however, is.

void outer (int x) { class { void operator () (int y) { cout << y+1; } inner;

inner(x); }

In other words, functors are a great way of faking nesting functions. Of course, the functor (still) has no access to the variables defined in outer.


See original on c2.com