Clojure and Scala are less bug-prone, Python induces more defects, study shows
© Shutterstock / GaudiLab
Have you ever wondered if the language of your choice is bug-prone? A study of the effect of programming languages on software quality (based on GitHub data) has concluded that “functional languages are somewhat better than procedural languages.” Let’s see the results.
Researchersa large data set from GitHub (728 projects, 63 million SLOC, 29,000 authors, 1.5 million commits, in 17 languages) in order to answer the following question: What is the effect of programming languages on software quality?
They discovered that:
- Language design does have a significant, but modest effect on software quality. Most notably, it does appear that disallowing type confusion is modestly better than allowing it, and among functional languages, static typing is also somewhat better than dynamic typing.
- Functional languages are somewhat better than procedural languages. It is worth noting that these modest effects arising from language design are overwhelmingly dominated by the process factors such as project size, team size, and commit size.
Warning: “Even these modest effects might quite possibly be due to other, intangible process factors, for example, the preference of certain personality types for functional, static languages that disallow type confusion,” according to the study published in Communications (October issue) of the ACM.
Q1: Are some languages more defect-prone than others?
Result1: Some languages have a greater association with defects than other languages, although the effect is small.
C, C++, Objective-C, Php, and
Python are associated with a greater number of defect fixes while
Clojure, Haskell, Ruby, and
Scala are less likely than average to result in defect fixing commits.
Q2. Which language properties relate to defects?
Result1: Researchers aggregated languages by language class and analyzed the relationship to defects. The functional languages as a group show a strong difference from the average. Statically typed languages have a substantially smaller coefficient yet both functional language classes have the same standard error. This is strong evidence that functional static languages are less error-prone than functional dynamic languages, however, the z-tests only test whether the coefficients are different from zero.
Result 2: There is a small but significant relationship between language class and defects. Functional languages are associated with fewer defects than either procedural or scripting languages.
Q3. Does language defect proneness depend on domain?
Result3: There is no general relationship between application domain and language defect proneness.
Q4. What is the relation between language and bug category?
Result4: Defect types are strongly associated with languages; some defect type like memory errors and concurrency errors also depend on language primitives. Language matters more for specific categories than it does for defects overall.
SEE ALSO: 18 lessons from 13 years of tricky bugs
The language and project data was extracted from the GitHub Archive, a database that records all public GitHub activities. Researchers choose the top 19 programming languages from GitHub (without CSS, Shell script, and Vim script as they are not considered to be general purpose languages) and added
Typescript. They retrieved the top 50 projects that are primarily written in that language (for each language!)
They aggregated projects based on their primary language, selected the ones with the most projects for further analysis. For each language, they filtered the project repositories written primarily in that language by its popularity based on the associated number of stars and dropped the projects with fewer than 28 commits.
For each project (728 in total!), researchers downloaded the non-merged commits, commit logs, author date, and author name using git. They computed code churn and the number of files modified per commit from the number of added and deleted lines per file, then retrieved the languages associated with each commit from the extensions of the modified files. For each commit, they calculated its commit age by subtracting its commit date from the first commit of the corresponding project and calculated other project-related statistics, including maximum commit age of a project and the total number of developers, used as control variables in our regression model, and discussed in Section 3. Researchers identified bug fix commits made to individual projects by searching for error related keywords: “error,” “bug,” “fix,” “issue,” “mistake,” “incorrect,” “fault,” “defect,” and “flaw,” in the commit log.
Researchers defined language classes based on several properties of the language thought to influence language quality.
The projects were classified into different domains based on their features and function using a mix of automated and manual techniques and their project descriptions and README files were analyzed using Latent Dirichlet Allocation (LDA).
Then they categorized the bugs based on their Cause and Impact using two phases:keyword search and supervised classification. Researchers modeled (using negative binomial regression) the number of defective commits against other factors related to software projects using regression.
Read the entire study here.