Metrics: Analysis

Top  Previous  Next

It's fine to collect metrics, but what do they tell us?  It's tempting to apply them in many different contexts, but when are metrics telling us something and when are we reading too much into the numbers?

Defect Density

Defect Density is computed by: ( number of defects ) / ( 1000 lines of code ).

This is the number of defects found, normalized to a unit amount of code.  1000 lines of code, or "kLOC" is often used as a standard base measure.  The higher the defect density, the more defects you are uncovering.

It is impossible to give an "expected" value for defect density.  Mature, stable code might have defect densities as low as 5 defects/kLOC; new code written by junior developers may have 100-200.

What can defect density tell us?

Let's make an experiment.  We take a reviewer and have him inspect many different source files.  Source files vary in size from 50 lines to 2000 lines.  The reviewer inspects about 200 lines at a time so as not to get tired.  We'll record the number of defects found for each file.

What would we expect to find?  First, longer files ought to have more defects than shorter ones, simply because there's more code.  More code means more that could go wrong.  Second, some files should contain more defects than others because they are "risky" -- maybe because they are complex, or because their routines are difficult to unit-test, or because their routines are reused by most of the system and therefore must be very accurately specified and implemented.

If we measure defect density here, we handle the first effect by normalizing "number of defects" to the amount of code under review, so now we can sensibly compare small and large files.  So the remaining variation in defect density might have a lot to do with the file's "risk" in the system.  This is, in fact, the effect we find from experiments in the field.

So defect density can, among other things, determine which files are risky, which in turn might help you plan how much code review, design work, testing, and time to allocate when modifying one of those files.

Now let's make another experiment.  We'll take a chunk of code with 5 known algorithm bugs and give it to various reviewers.  We'll see how many of the defects each review can find in 20 minutes.  The more defects a reviewer finds, the more effective that reviewer was at finding the defects.  This is a simple way to see how effective each reviewer is at reviewing that kind of code.

Of course in real life the nature of the code and the amount of code under review varies greatly, so you can't just look at the number of defects found in each review -- you naturally expect more defects from a 200-line change than from a 2-line change.  Defect density provides this normalization so you can compare reviewers across many reviews.

If you're comparing defect density across many reviews done by a single person, you are measuring the relative "risk" of various files and modules.

Inspection Rate

Inspection Rate is computed by: ( lines of code reviewed ) / ( hours to do the review ).

This is a measure of how fast we review code.  A sensible rate for complex code might be 100 LOC/hour; generally good reviews will be in the range of 200-500 LOC/hour.  Anything 800 LOC/hour or higher indicates the reviewer hasn't really looked at the code -- we have found by experiment that this is too fast to actually read and critique source code.

Some managers insist that their developers try to increase their inspection rate.  After all this means "review efficiency" is improving. This is a fallacy.  In fact, the slower the review is, the better job the reviewers are doing.  Careful work means taking your time.

Instead, use inspection rate to help you predict the amount of time needed to complete some code change.  If you know this is roughly a "1000-line change" and your typical inspection rate is 200 LOC/hour, you can budget 5 hours for the code review step in your development.

If anything, a manager might insist on a slower inspection rate, especially on a stable branch, core module, or close to product release when everyone wants to be more careful about what changes in the code.

Defect Rate

Defect Rate is computed by: ( number of defects ) / ( hours to do the review ).

This is the speed at which reviewers uncover defects in code.  Typical values range between 5 and 20 defects/hour, possibly less for mature code, but not usually much greater.

The same caveats about encouraging faster or slower inspection rates apply also to defect rates.  Read that section above for details.

Metrics Applied

If we've learned one thing about metrics and code review it is: Every group is different, but most groups are self-consistent. This means that metrics and trends that apply to one group don't necessarily apply to another, but within a single group metrics are usually fairly consistent.

This between-group difference can be attributed to the myriad of variables that enter into software development: The background, experience, and domain knowledge of the authors and reviewers, programming languages and libraries, development patterns at different stages of a product's life-cycle, project management techniques, local culture, the number of developers on the team, whether the team members are physically together or separate, etc..