Patricia

Donald R. Morrison [JACM 15 (1968), 514–534] has discovered a very pretty way to Form N-Node Search Trees based on the binary representation of keys, without storing keys in the nodes. His method, called “Patricia” (Practical Algorithm To Retrieve Information Coded In Alphanumeric), is especially suitable for dealing with extremely long, variable-length keys such as titles or phrases stored within a large bulk file. A closely related algorithm was published at almost exactly the same time in Germany by G. Gwehenberger, Elektronische Rechenanlagen 10 (1968), 223–226.

Patricia’s basic idea is to build a binary Trie, but to avoid one-way branching by including in each node the number of bits to skip over before making the next test.

There are several ways to exploit this idea; perhaps the simplest to explain is illustrated in Fig. 33. We have a TEXT array of bits, which is usually quite long; it may be stored as an external direct-access file, since each search accesses TEXT only once. Each key to be stored in our table is specified by a starting place in the text, and it can be imagined to go from this starting place all the way to the end of the text. (Patricia does not search for strict equality between key and argument; instead, it will determine whether or not there exists a key beginning with the argument.)

Fig. 33. An example of Patricia's tree and TEXT.

The situation depicted in Fig. 33 involves seven keys, one starting at each word, namely “THIS IS THE HOUSE THAT JACK BUILT?” and “IS THE HOUSE THAT JACK BUILT?” and . . . and “BUILT?”. There is one important restriction, namely that no one key may be a prefix of another; this restriction can be met if we end the text with a unique end-of-text code (in this case “?”) that appears nowhere else. The same restriction was implicit in the Trie scheme of Algorithm T, where “⊔” was the termination code.

~

KNUTH, Donald Ervin, 1997. The art of computer programming. v. 3. Sorting and Searching. 2nd ed. Reading, Mass: Addison-Wesley. ISBN 0-201-89685-0, p. 498–499.