What Is Null

What is this thing called "NULL"?!?

Null or Nil Characteristics:

Etymo: Latin nullus "not any, none," from ne- "not, no" (see un-) + illus "any," dim. of unus "one."

Semantically: Very difficult. In most cases, NULL or nil means "no value" or "not applicable". Sometimes (like, when passed to a routine as an argument) it means "no explicit value". In SQL, it often means "not known". Sometimes it might mean "whatever". Given this overloading of meanings, it is a little problematic - you sometimes encounter problems related to confusion about the meaning of NULL. Even so, it is a very useful concept - but heavy use of NULLs is often a sign that you should be using lists of values or tagged unions instead.

Semantically: Very simple. It is the same as the Japanese character "mu" [Chinese "wu"; see Mu Answer]. It means "no thing". In logic it means "neither yes nor no" or "dumb question". You call that simple? The character has at least four different readings in different languages. In addition, it has a native reading in Japanese Language of "nai" which means "does not exist" or "there is no". Depending on the context, this means at least six different things. Hell, that makes the computer nil concept seem pretty well-defined... Yes, I call "mu" simple.

Nil In Lisp: The Lisp Language defined NIL (and perhaps invented) the use of "NIL" as a special object that is both a Symbol and a List - it is the only such object. NIL is the same as (). () is the empty list. NIL is a constant that has NIL as its value. Lists in Lisp are built from Cons Cells. Thus, if there is some list (1 2 3), the "FIRST" function answers the first element, which can be either a Atom or a list (in this case the number 1). The "REST" function answers "the rest of the list" (in this case, "(2 3)"). Every list has an implied "NIL" at the end, so that the "REST" of the list "(3)" is NIL. This use of NIL to terminate a list is reminiscent of the use of "NULL" to terminate a char array in Cee Language (which came later). It is also reminiscent of the use of The Nil Object in Smalltalk Language (not surprising, given Dan Bobrow's influence on the Smalltalk Language team). The type of NIL is NULL. NULL is also a predicate that returns T when its argument is NIL. NIL is also (NOT T). It is also the only object that is (NOT T). So NIL serves as the value for logical false. NIL is also the empty datatype.

Nil In Scheme: Scheme Language doesn't have a value named nil, but nil as generally taken to be the empty list (as in other Lisps). However, unlike Common Lisp, the booleans in Scheme are #t and #f (instead of t and nil). The empty list is written as '(). Scheme advocates claim that this clarifies case expressions.

This is a strange position to take, given that the Lisp community has happily, and without any problems whatsoever, been punning on the multiple meanings of NIL for forty years. Perhaps the real problem lies elsewhere? -- Alain Picard

It bugged enough Lispers that Scheme Language finally completely split the boolean concept and list concept. [rest of argument about different treatment of nil in Common Lisp and Scheme moved to Is Scheme Lisp]

Null Pointer: Pointers in some languages, like C, C++, and Pascal, can be given the value "NULL," which indicates that this pointer does not currently point to a valid location in memory. Dereferencing a Null Pointer causes Undefined Behavior. [Often Null Pointer is declared as 0 (cast to the appropriate type), which causes some machines to generate a hardware interrupt when it is dereferenced. Usually the runtime (when configured to do so) catches this and throws an exception, often through a software interrupt. In older C implementations the results are undetermined - in other words, it blows up in your face.] NULL is commonly used to terminate lists and other complex structures, or to indicate that the given object is absent or unknown.

Nul Character: The NUL character, with code point 0 in most Coded Character Sets (ASCII, ISO-Latin-x, Unicode, etc.). This is not a pointer.

Nul Code Unit: In some Character Encoding Formats, in general more than one Code Unit is used to represent a character. Nevertheless, the zero Code Unit is almost always reserved for terminating a string, and cannot occur in any character other than Nul Character (this also applies to Character Encoding Formats in which Code Units are more than 8 bits). It can therefore be used to terminate string representations (e.g. Char Star strings in C).

Null String: The string containing no characters; Empty String. (This is typical usage for C, where some people call an empty "C style" string ("") a "Null String". Physically, it is a pointer to a character array that starts with a Nul Character, '\0'.)

Is this standard terminology? In my C days, I thought of {char *s = NULL;} as a Null String, and {char *s = "";} as an Empty String. Yes, this is pretty standard.

The Nil Object: The Smalltalk Language uses "nil" - a reference to a unique object (Singleton Pattern) of the class Undefined Object; used to represent the absence of an object.

None: Python Language uses "None" in much the same way that Smalltalk Language uses The Nil Object

Void: In the Eiffel Language, object references are initialized to Void. Void is a constant reference to a singleton object of a class that is derived from all other classes in the system (the Bottom Type in type theory). Therefore any reference can be assigned the value Void or tested for (in)equality to Void without breaking or complicating the type system. The Void reference does not implement any methods (Eiffel allows a derived class to undefine base class methods or hide methods from client objects).

Null Object: A design pattern (Category Pattern) for providing an object with "do nothing" functionality.

Relational Null Value: The "NULL" value, as it's used in relational databases. The rules for using nulls appropriately are quite subtle, e.g. two different nulls are never equal. See Nulls And Relational Model.

Variant Null: The NULL type (value?) of Micro Soft's Com Variant type. (Variants can hold values of several selected types, including integers, strings, and arrays.) In the case of a NULL Com Variant, the vt member of the structure will be set to VT_NULL and will contain no valid data in the union member. Thus a Variant Null is primarily a type, not a value.

Nothing: In Visual Basic - a Null Value. (Technically the variant type is 'vtObject' and the value is null. Is not the same thing as Variant Null.)

Actually, I think it's closer to VB's Null Pointer equivalent. The keyword Null in VB is a Variant Null.

Variant Empty: Used for "clean" newly initialized Variant variables that "haven't yet been given a type or value." 'Empty' keyword in Visual Basic. 'Var Type(unintializedVariant) = vbEmpty'

Missing: In Visual Basic, 'Missing' is used for optional Variant parameters that have no default value. Missing is a particular value representing an error.

Un Def:In Perl Language, "for functions that can be used in either a scalar or list context, nonabortive failure is generally indicated in a scalar context by returning the undefined value, and in a list context by returning the null list", according to the perlfunc documentation. The undefine function - undef() - undefines a scalar variable, an array element or an entire array.

Nothing: In Haskell Language the other possible value of the 'Maybe' datatype. If you type something as 'Maybe String' it can contain either 'Just textValue' or 'Nothing' where Just and Nothing are type discriminants to be used in typecase expressions (or pattern-matching). There's an equivalent type in Ocaml Language, called option, instead of Maybe, and the cases are called None and Some instead of Nothing and Just. You can have several levels of 'maybeness' without losing type information, that is if you have a 'Hash Table Int (Maybe String)' (a hash-table with Int keys and nullable String values) the lookup function has a correct type signature (via type inference): 'lookup :: Hash Table keyType valueType -> Maybe valueType', in this case the result is 'Maybe (Maybe String)'.

Statistics Missing: In statistical programs such as Splus/R and SPSS there is a 'missing' value for each type of data, representing data that is unknown. (almost) Any operation applied to this value returns 'missing'. This has the effect that you need not take missings into account when transforming the data.

Failures In Icon: When an operation in Icon "fails" (but in an expected way, such as looking up an element in an array and finding it's not there) it returns a "condition". These can be used to cause backtracking (the feature d'ĂȘtre of Icon Language); but they are returned in contexts where Null might be used instead. Note that failures are different from exceptions.

---- A few other things that act like NULL in some contexts.

end():

Not really NULL, but similar. Cee Plus Plus Standard Template Library containers all implement a method called end() which returns an External Iterator which points to one past the endof the container. (For vector, this might actually be a pointer off the end of an array; for other containers it might be something else). Each container (and each instance of a container) has its own unique "end" iterator; but all of them have behavior similar to NULL. Such an iterator is not dereferenceable; trying to apply unary *, -> or [] usually (hopefully) throws an exception. One difference between end() iterators and NULL is that some pointer arithmetic on end() is meaningful; end()-- is a valid iterator (for bidirectional and random-access containers at least).

Uhh... I'm fairly certain that no iterator operation is allowed to throw - EVER. Could you imagine trying to make something exception-safe if you couldn't even count on being able to move around iterators? Also, that checking would impose overhead, which is unacceptable. In any case, proper use of STL algorithms will keep you from illegitimately using a past-the-end iterator, in the same way that [] is safe on built-in arrays if your for loop through the indexes is done right. (Note that none of what I just said applied to a debugging checked-STL, which everyone should at least be using if NDEBUG is undefined.)

[Dereferencing an end iterator has undefined behavior, which means that it certainly can throw an exception, or perhaps cause the Yosemite supervolcano to erupt. End iterators may be somewhat like NULL in that similar sorts of operations are defined or undefined on them; however, they're much more like past-the-end pointers when you consider the role some of them can play in arithmetic expressions.]

EOF:

End-of-file macro in C/C++. Usually an integer value guaranteed not to be a valid value for a char type (since char is signed in some C/C++ compilers). If passed to a function expecting a char argument (which is really an int in C, but not C++, due to integer promotion rules), might have strange and wonderful effects.

nil:

Represents of lack of positive or negative value in a numeric field.

Self Reference

It is an old trick to use a self-reference (this == this.next) to terminate lists instead of a NULL pointer. It may be an old trick, but I find it rather dubious. Not only it risks an infinite loop rather than Fail Fast on a badly written recursion or loop condition, but it also destroys the uniformity of representation: if you use this trick the empty list is a special case. [More precisely, one wants the empty list (if allowed) to be a special case, but in a sensible, elegant way. It's ugly to use a trick which invites a coding error to show up as a non-ending loop rather than immediate failure.]


EDS had a "OWL," a research language related to the Smalltalk Language. It supported several NULL-related concepts:

Similar to Smalltalk Language "nil". Normally used to represent relational database NULL values.

Not Set: Initial value for variables that had not been initialized. Often used to trigger lazy evaluation methods to go find an appropriate value.

Unknown: After due consideration, the system has determined that it cannot find a value. Typically, after prompting the user, and finding that they don't know, this value would propagate to all dependent values.

(I think there were two or three others too, but I no longer have access to the documentation.)

I often found relational database Three Valued Logic to be annoying. This system's 5-value "boolean" logic could be really confusing... Like "true and unknown ==> unknown", "true or unknown ==> true". -- Jeff Grigg


Yes. The problem of trying to add semantic to null is that the list goes on forever. Chris Date (from the database community) wrote a paper a few years ago that showed a huge number of possible meanings for null in a given context (e.g. not applicable, not known, not declared, and so on). If I remember, the essence of the article was that trying to enumerate different forms of null was basically impossible, since somebody will always come up with a new meaning. The way we went at AT&T (where I was working at the time) was to use different Null Object classes, each with different meanings. When an application needed a new meaning, we added a new Null Object class. So, there were no fixed semantics. -- Anthony Lauder

[argument about different treatment of nil in Lisp and Scheme moved to Is Scheme Lisp]

Will it not always be the case that a separate return value is required to indicate "not found" in a hash lookup? Perl Language has "exists" as well as "defined". The instant one tries to allow storage of a value meaning "does not exist" (as opposed to merely "undefined") in the hash, you go back to needing another return value to distinguish the cases.

It is just the problem of Quoting Meta Characters all over again. -- Matthew Astley


Null String and Null String Pointer a problem

One of the problems with C/C++ type languages is that a Null String and a Null String pointer are handled inconsistently in libraries. Most take either as meaning "no value", but some require a Null String and fall over if passed a Null pointer or reference.

With the addition of Unicode or STL strings this becomes even more confusing, where there are now three possible Null values:

A null string pointer or reference.

A zero length string

A string with a single NULL character.

When calling system functions it is often necessary to convert the string to old fashioned Char Star strings, and the zero length string needs to be converted to one of the others, again inconsistently.


According to a recent interview with Joshua Bloch, Java 1.5's automatic unboxing will convert null to 0. Oh dear. There is some hope, though. It's not settled, and it could change to throwing an exception.

After some discussion they decided to throw an exception instead. There's a mention about it at Lambda The Ultimate.


In type theory, NIL/NULL/whathaveyou is usually handled in one of several ways:

As a special unit type; thus a pointer or reference to T which can be NULL is really UNION (T, UNIT(NULL)). A dereference of such a pointer or reference involves (implicitly) checking which case of the union is active, and performing the cast if not NULL, doing something else (throwing an exception, one hopes) if it is NULL. Java references kinda act this way, even though Java doesn't have unions (typesafe or otherwise). Use of a distinguished address (like zero) which points to something that will cause a page fault if peeked/poked is a convenient optimization of this mechanism.

or

As the Bottom Type (an empty type which is the subtype of all other types). This is essentially what Eiffel does with its Void type.

Myself, I prefer the former. For one, it allows multiple NULLs with different meanings; though I'm not aware of any languages which define multiple NULLs like that. For another, many Functional Programming Languages use the Bottom Type to indicate a raised exception (which I think is also braindead) or to indicate divergence (which kindasorta makes sense - "returning" a non-existent type to indicate a condition which, according to the Halting Problem Theorem, we cannot detect...) For a third reason, it allows us to define two types of references; those which can be NULL and those which can't. (C++ has this capability; though it's easily bypassed... a functional Java variant called Nice (Nice Language) also provides references which are guaranteed to not be NULL). If we use Bottom Type for NULL, then **ALL** objects in the system might possibly be NULL (including things like ints and bools where it doesn't make sense in most cases).

-- engineer_scotty (Scott Johnson)


What is NULL? Baby, don't hurt me...


I've found this helpful. Given the other discussions on site about Null in databases, I thought I'd contribute several sections of definitions from ISO's evolving set of Geographic Information standards.

ISO/PDTS 19103 Conceptual Schema Language (also in 19107 Spatial Schema):

NULL means that the value asked for is undefined. This Technical Specification assumes that all NULL values are equivalent. If a NULL is returned when an object has been requested, this means that no object matching the criteria specified exists. EMPTY refers to objects that can be interpreted as being sets that contain no elements. Unlike programming systems that provide strongly typed aggregates, this Technical Specification uses the mathematical tautology that there is one and only one empty set. Any object representing the empty set is equivalent to any other set that does the same. Other than being empty, such sets lack any inherent information, and so a NULL value is assumed equivalent to an EMPTY set in the appropriate context.

But ISO/CD 19126 Geographic Markup Language (an XML encoding) introduces a "convenience type" gml::Null Type which is a union of several enumerated values: inapplicable, missing, template (i.e. value will be available later), unknown, withheld and anyURI (for which the example is the lovely but non-existent my.big.org . This allows much more clarity of meaning than merely allowing 'minOccurs=0'. We're investigating it for a large database model we're developing.


The problem with NULL is that it is context-sensitive and each context can change the definition. To makes NULL consistent and remove context-sensitivity, you would need several types similar to NULL (uninitialized vars, pointer to nothing, empty object, empty var, etc). In Math it is even more of a problem since the possible meanings of NULL change the actual result. Does A+B+Null = A+B, or undefined, or null, or does it throw an error? This is confounding for developers and language designers, almost a Holy War (then again everything in language design is).

Does A + B + null = A + B? If a counter has the value of null, then the counter has not yet had anything placed in it, so its value is 0. So in that case, I would answer "yes". I can think of many situations where the assumption of 0 for null would be appropriate - for instance, for a stock quantity from a database. -- Peter Lynch


Donald Rumsfeld has a profound understanding of null -

"There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know."


All of this, and nobody's mentioned NaN (in FP math)? (As of January 2007)


Nice, schmice. You can have non-nullable references in stock Java now, with generics:

public final class Ref {

T obj; public Ref (T obj) { set(obj); } public void set (T obj) { if (obj == null) throw new NullPointerException(); this.obj = obj; } public T get () { return obj; }

}

For an immutable version, just make the set method private.

The downside is that the way Java generics work, the Ref may be more cumbersome to use than the T. For example, if Bar derives from Foo, you can assign a Bar to a reference of type Foo, but not a Ref to a Ref. So you need those variables to be of type Ref, which is Syntactic Salt at its finest. (This assignability rule is for a simple reason: otherwise you could assign a Ref reference to a Ref reference, use a Foo that is not a Bar in a set method invoked on the latter, and then pull a non-Bar out via the get method on the Bar reference. This does mean that you can't use the set method on a Ref though, although you can assign a new Ref object with a new referent.


What Is Null you ask?

Null is what you thought your freed pointer was reset to.

Null is what the function that -should- always return a valid pointer just gave you.

Null is how much bureaucracy a system needs on top of "nothing".

Null is the alpha and the omega, everything begins as null, is initialized, then reset and reused as Null.

Null is the absence of a number. Some say Null is the absence of a zero.

The Null that can be named is not the true Null.


Null is the value of x before and after it had a value.

Because Null is the value of all things we do not yet know, Null's domain may be the largest domain.


In some languages (Oz Language being an example) one may introduce a single-assignment variable without actually assigning it. You may then utilize this unassigned variable inside a data structure. One may unify two unassigned variables - i.e. asserting they are the same, such that if one is assigned, then so will be the other. Similarly, you may assign such a variable to a structure that contains yet more unassigned variables. All this is a weakly expressive form of Constraint Programming. More expressive Constraint Programming would allow you to restrict variables as well as assign them, i.e. to say that X > Y without knowing X or Y. In a secure language, the authority to assign a variable or manipulate constraints may be separate from the ability to view the variable. Oz Language added security via 'X=!!Y', which says that X can view Y but cannot assign Y.

I've long believed that Nulls - at least in the context of data (Sql Null) - should really be identified by these sort of 'unassigned free variables'. This would allow us to make very interesting observations, the way we do with algebraic and geometric expressions in mathematics. It would be possible, for example, to perform proper joins between tables containing these free variables. If we were using a Table Browser, one might distribute authority to assign these 'variables', such that one could meaningfully assign to a field that was previously unknown... and update the proper record.

Of course, it would still lead to interesting issues, such as: the sum of 10+X+12 can only be reduced to 22+X, and a join between two tables limited by X>Y may return some interesting 'contingent' records.



See original on c2.com