Size matters

Report: Open source code higher quality – until it supersizes

Elliot Bentley

Which code is better-written: open source or proprietary? The answer depends on the size of the codebase, it seems.

What kind of code is better-written: open source or proprietary? The answer depends on the size of the codebase, it seems.

An analysis has revealed that while the difference is negligible between smaller projects, OSS tends to produce higher-quality mid-sized codebases – but when it comes to projects with over a million lines of code, proprietary software wins out.

The report comes from Coverity, a company that provide a static analysis of code quality for both open and closed source projects. This year marks the fifth anniversary of Coverity’s annual report, which took into account over 450,000,000 lines of code.

Coverity measures software quality by “defect density” – the number of detectable high- or medium-impact defects per 1,000 lines of code. The 254 open-source projects in the study, which include Linux, PHP and Apache, had an average defect density of 0.69 – almost exactly the same as the 300 proprietary codebases sampled, which had an average of 0.68.

Both figures are well below the “accepted industry standard defect density” of 1.0. However, this may be an unrealistic reflection of the software industry at large: those uninterested in producing high-quality code are unlikely to bother having it analysed. In addition, the lack of formal deadlines and highly transparent nature of open-source development probably encourage higher-quality code.

More intriguing is the correlation between large codebases and quality. In projects with between 500,000 and one million lines of code, defect density increased to 0.98 in proprietary software and 0.44 in open-source software. Above one million, however, these reversed to 0.66 and 0.75 respectively.

Such humongous codebases are apparently simply too large for open-source projects to manage; Coverity themselves put it down to “differing dynamics within open source and proprietary development teams, as well as the point at which these teams implement formalized development testing processes”.

A notable exception to this trend is the Linux codebase, which has consistently scored above average in terms of code quality. In fact, this year’s scan is the best on record – the 7.6 million lines of code analysed were found to have a defect density of 0.59.

You can read the full report for free over on Coverity’s website. Photo by Michael Himbeault.
Inline Feedbacks
View all comments