Conditions Are Power-law Distributed

Eric Dobbs brings to our attention an article by Kent Beck.

Sometimes Unix pipe operator and streams and filters are the right tool for the job. This page may grow into a collection of interesting combinations.

> Kent Beck demonstrates that distribution of If Statements in a code base follows a power law. article

* extract the if statements from our codebase

Now we have just the conditions. How many of each are there? First sort them then pass them through uniq -c to count them.

Sort these numerically in reverse order and we can see the heavy hitters.

What we want eventually is a histogram showing how many single-use conditions there are, how many conditions are used twice, etc. Use “cut” to extract the counts, then the same “sort | uniq -c” trick to get a histogram.

Sure enough, there are lots of conditions (28K) used once, many fewer used twice, many fewer used three times, and on down. Down at the bottom we have one condition used 2332 times. Graphing this data we get an inkling that we’re not in Normalistan any more.

Shifting the axes to logarithmic shows something like a power-law distribution.

There is a trend in how often a condition “ought” to appear. And there you have it — Preferential Attachment at work. The more often a condition appears in a codebase, the more likely that condition is to be used the next time a conditional appears.