Software Metrics

Much has been written about software metrics from a product point of view. This author (craig@hubley.com) considers the most useful metrics to be those which measure the degree to which the software development effort reflects the priorities of its end users and funders, not necessarily in that order.

Some such useful metrics include the 'churn' of a project, which is the percentage of lines of code written that are changed before (next) release, and the 'dross' of a project, which is the number of lines which are never executed or omitted before release. A more technical metric, measuring the cohesion of the source code, is the number of places in the code body that a programmer must reference and/or change while making what he, psychologically, considers to be 'one bug fix' or 'one change'.


Overview of Software Metrics

NumberIfChildCLasses

NPathComplexity


I want both Agility and the spirit of ISO and CMMI. I want lightweight metrics that are practical for a small informal team. They should be concrete, concise, and unbiased.

My small team at IBM has worked with Laurie Williams at NC State over the last year to measure our XP project. We found we improved our defect injection rate by a factor of two. But more importantly, we came up with a measurement framework for anyone who wants to replicate our study.

We give you Xp Evaluation Framework. Please let me know if you have feedback or suggestions. You don't have to use it, but if you're looking for ideas on what to measure it may helpful to you. I plan to continue to use it on my projects.


Going a bit afield from the first paragraph above, ... a vastly more general (and therefore vague question): What is the role of metrics in ideal software development ?

The motivation for this question is simple: I want to be a better programmer. How do I get there ? Well, some of the standard wiki answers (deduced from recent traffic here and some more general desiderata) might include:

Write programs. Becoming better involves experience

Reflect upon your code. Strive to understand what worked and what didn't.

Read Tomes of Lore. These range from generic OO (Booch or Meyer) to language specific (Coplien or Beck) to project-type specific (Greenspun) to patterns (the PLOPs) to ...

Perform Unit Tests

But the question remains: What is it that these things do ? If I hand you an application (both source and executable), it should take under a day to get an idea of the author's skill levels and whether the code is any good (which is not the same as whether the program is any good or whether the requirements were met or ...).

Which means that, in principle, there ought to be tools that can measure this as well (based on the general principle that Anything I can be certain of deducing in a single day can be deduced by a computer program )

This, I think, is the idea behind metrics. Or, at least, one of the ideas.

But, it seems to me, in all the talk here, formal metrics don't appear much. Can the benefits of Refactor Mercilessly be put in terms of metrics ? Are there general heuristics for doing so ? The Law Of Demeter seems to be somewhat measurable. But what about the other practices espoused on Wiki ?

As I said, a vague question. But I seem to be getting more interested in metrics these days and so I thought I'd ask.


Refactoring improves coupling and cohesion. This can be measured. The tool Pattern Lint [1] helps pinpoint some kinds of design warts by weighting static and dynamic object communication and comparing them. Two papers accessible from the above link have some examples. -- Aamod Sane

Aamod - are you asserting that it is possible to measure coupling and (especially) cohesion mechanically? I wouldn't have thought so; have you a reference? --Ron Jeffries

S.R. Chidamber and C.F. Kemerer, "A Metrics Suite for Object Oriented Design," IEEE Trans. Software Eng., vol. 20, no. 6, pp. 476-493, 1994

and

L.C. Briand, J.W. Daly, and J.K. W�st, "A Unified Framework for Coupling Measurement in Object-Oriented Systems," IEEE Trans. Software Eng., vol. 25, no. 1, 1999 (don't know the exact pages)


Anything I can be certain of deducing in a single day can be deduced by a computer program

Well, fortunately for us humans, it ain't necessarily so. We can look at a class and say whether the instance names and method names are meaningful: computers can't. We can quickly see whether the class seems to have one function or more than one: computers can't.

There are some metrics that might be interesting: for example, you could look for duplicate code, or multiple messages sent to the same variable, or messages implemented but never sent. You could identify how many classes do not have Unit Tests.

The "only" metrics that matter to Extreme Programming are things like the number of Unit Tests and the Functional Test scores (Quality), your Resource levels, the number of cards done (Scope), whether you are working overtime (Time). Many of us like to know other things, like number of classes and methods over time, average number of methods per class, average number of lines per method. Once I actually used lines per method to find some bad code to look at ... but 99 times out of 100, the code metrics we've done have been literally useless.

I'm sure some folks will disagree strongly on this topic. Let's hear it, "some folks"!


Just a thought. If The Source Code Is The Design and programming is designing, then I want to know whether electrical and mechanical engineers use metrics to evaluate their ability to create detailed designs in a particular amount of time, or whether this is just one of software's petty masochisms. -- Michael Feathers


As far as software metrics go, I am interested in only a few measurements:

Time on task

LOC

Defects

(these are basically what PSP gathers)

What I need more than anything in the world is a way that I can collect this information reliably as I work without having to slow down excessively. I already face the fact that management does not feel that metrics have anything to offer us in our environment so I have to quickly prove their value.

Anybody want to jump in and give me some advice? I'm very willing to share my results with the community.


I was recently trying to get a handle on some new code (that is, old code that was new to me). People had pride in this old code, and I wanted a few measures that could be used to quantify some of my uneasiness. I didn't have and fancy tools, so I want for something very simple: LOC per class

This is obtained very easily. in C++, I got number of classes (in a package) as

grep <dir>/*.h "class " | grep -v \; | wc -l

and LOC as

cat <dir>/*.cc | wc -l

When expressed as a ratio, I found that the code I had felt comfortable with had a ratio of about 200 LOC/class and the less good had about 800 LOC/class. The 'better' code also had more whitespace, so the ratios underestimate the difference.

These figures served their purpose, which was simply to seed a discussion. Is this a useful metric more generally?


In automotive software we haven't found metrics useful, beyond our continual need to be aware of code and data size and execution speed. There are some pretty horrific measures of complexity, which we like to see small, but we don't enforce it after the event (which is the only time you can get the metrics), because we encourage practices which lead to non-complex code.

When I was at Rover Group, I invented a very simple metric for prioritising tests. Our overall process was: design it, code it, check it on the bench, check it in a vehicle, do formal white box tests (using ICE), release it. The theory was that some stuff was more likely to be buggy than some other stuff, and I decided that more complexity probably means more chances of bugs. I 'measured' complexity by multiplying object size in bytes by number of branches. We then test the big numbers first, and ignore the ones below a threshold. The down side for you guys is that we were programming in assembler, which makes it trivial to count branches.

When I look at the ideas espoused by XP, I see most benefit for me in the concept of writing the tests first, then writing code to pass the tests. The main difference I see is that you seem to be using the tests to confirm that the code is correct. I always write tests in an attempt to break the code. I'm then only happy if I fail to break the code. I've read somewhere else around here that your end users are more interested in the service they get (presumably in response to bug reports) than in whether there are any bugs in the shipped product. I can't afford to ship product with any bugs at all -- peoples lives may depend on it -- which means I have to take what you've done and massage it into a form suitable for safety critical development.

Like it says in my home page, I'm open to ideas...

-- Paul Tiplady, 2000/07/14


Software is about risks, threats, and iterators that tell you where to find risks or threats next. If you didn't think there were any cogent risks or threats, you wouldn't have a computer in front of you, you'd just be doing the job without it's "help". Software is a negative abstraction activity - it removes doubt, it doesn't add anything... if you make the user pay attention to something he could have ignored, you have just increased the danger. -- Craig Hubley, 2002/07/26


The key evaluation criteria for any project, software or otherwise, is "does it increase productivity?" When the current software is eventually replaced, did it cost less to develop and maintain the software than to not use the software and hire additional operational personnel instead. This is essentially what projects try to predict with a Cost-Benefit Analysis, unfortunately "costs" and "benefits" are highly subjective and there is not an agreed upon measurement system for either, but some imperfect surrogates could be defined. The key issue, is that "costs" come from the development side and "benefits" comes from the operations side. Any software metric needs to compare changes on the development side with changes on the operations side. Any metric that only considers the development side (which is true of most so-call software metrics), is not useful. -- Wayne Mack


Often I find projects struggling because the definition of project success was too narrow (features delivered on time, usually). As nasty as metrics can be, they can help to build a broader and more balanced picture of progress, taking into account factors like maintainability, total cost of ownership, organisational learning and so on. The danger with metrics is that you often get what you measure, so their design and implementation - as well as how they're acted upon - need to be refined along with the code itself.

Jason Gorman


See original on c2.com