Mark And Sweep

Mark and sweep is a technique in Garbage Collection to free all unreferenced objects.

It works with three passes.

First, for all objects in the system, if they have their mark bit set, it is cleared.

The second pass traverses all pointers, starting at the accessible roots of a program (conventionally, globals, the stack, and registers) and for each object traversed it marks the object.

The third pass walks the heap linearly again and removes all objects that are not marked.

This approach requires two linear walks of the heap (steps 1 and 3) and one recursive traversal of the heap, which can be inefficient. In addition, this technique does not address fragmentation at all. A related technique, Stop And Copy does - but it has other drawbacks.

There is a simple variation of Mark And Sweep that needs to make only one linear sweep. It works simply by reversing the meaning of the mark bit from sweep to sweep. I am also under the impression that in practice marking tends to dominate sweeping, but this of course depends on what percentage of the heap is actually live at GC-time. -- Curtis Bartley

I've seen a variant that uses two bits - one bit for even sweeps and the other bit for the odd sweep. The even sweep clears the odd bit, and vice versa. It still takes a second pass to clean out garbage. -- Dave Smith

You can have one linear sweep and only one mark bit. The final sweep unsets the mark bits as it passes through all the objects. See www.brpreiss.com .

If one wishes to provide a way for objects to release their resources when they're about to be removed (what Java calls finalizing), an additional pass must be made; you can't just call the finalizer as you sweep, since if one dead object has a reference to another dead object, and uses it in its finalizer, the process would fail if the referenced object had already been deleted. The finalizing pass is after marking but before sweeping, and simply calls the finalizers of all dead objects. -- Mikael Brockman

I came to this page to look for what Mark And Sweep meant. When I read the descriptions however, it is reminiscent of Clock algorithm for page file replacement used in many operating systems. -- Anand Hariharan

I think it's worth noting that all Mark And Sweep algorithms have an execution time proportional to the size of the entire object space rather than the number of live objects. This (in addition to the need to pause system operation while executing) makes them less attractive than the various semi-space scavenging algorithms that copy live objects from "from space" to "to space". No such pauses are needed, and execution time is proportional to the number of live objects rather than available space. A price paid for this advantage is a factor of two increase in the available memory required.

'Paying for the garbage' is only true of naive implementations. Explained here is a method of mark&sweep that gets around "marking everything unused" by making that part of its allocation step: people.csail.mit.edu

A copying collector doesn't need to be a semi-space collector. For a copying collector, you only need enough free space to allocate the largest object in the heap - i.e. if you keep all objects under a page, then you only need one free page because (by nature) you'll free at least a page for every page you fill.

It's probably also worth noting that this feature can be an advantage in certain situations. For example, garbage collectors in embedded systems frequently use Mark And Sweep because memory sizes are small enough that capacity is the limiting factor. Often, the later stages in a generational collector also use Mark And Sweep because a high proportion of old objects remain live at each collection (this is the whole point of Generational Garbage Collection). -- Jonathan Tang


In the naïve mark-and-sweep method, each object in memory has a flag (typically a single bit) reserved for garbage collection use only. This flag is always cleared, except during the collection cycle. The first stage of collection sweeps the entire 'root set', marking each accessible object as being 'in-use'. All objects transitively accessible from the root set are marked, as well. Finally, each object in memory is again examined; those with the in-use flag still cleared are Not Reachable by any program or data, and their memory is freed. (For objects which are marked in-use, the in-use flag is cleared again, preparing for the next cycle.)

Wiki Pedia: Garbage Collection

See original on c2.com