When you want to know if it is cheaper to rewrite a piece of software from scratch

A promising new metric to track maintainability

Alexander von Zitzewitz
© Shutterstock / Underawesternsky

A good metric to measure software maintainability is the holy grail of software metrics. In this article, Alexander von Zitzewitz, software architect and CEO at hello2morrow, explores a promising new metric to track maintainability.

A good metric to measure software maintainability is the holy grail of software metrics. What we would like to achieve with such a metric is that its values more or less conform with the developer’s own judgment of the maintainability of their software system. If that would succeed we could track that metric in our nightly builds and use it as the canary in the coal mine. If values deteriorate it is time for refactoring. We could also use it to compare the health of all the software systems within an organization. And it could help to make decisions about whether it is cheaper to rewrite a piece of software from scratch instead of trying to refactor it.

A good starting point for achieving our goals is to look at metrics for coupling and cyclic dependencies. High coupling will definitely affect maintainability in a negative way. The same is true for big cyclic group of packages/namespaces or classes. Growing cyclic coupling is a good indicator for structural erosion.

A good design on other hand uses layering (horizontal) and separation of functional components (vertical). The cutting of a software system by functional aspects is what I call “verticalization”. The next diagram shows what I mean by that:

A good vertical design

The different functional components are sitting within their own silos and dependencies between those are not cyclical, i.e. there is a clear hierarchy between the silos. You could also describe that as vertical layering; or as micro-services within a monolith.

Unfortunately, many software systems fail at verticalization. The main reason is that there is nobody to force you to organize your code into silos. Since it is hard to do this in the right way the boundaries between the silos blur and functionality that should reside in a single silo is spread out over several of them. That, in turn, promotes the creation of cyclic dependencies between the silos. And from there maintainability goes down the drain at an ever-increasing rate.

Defining a new metric

Now how could we measure verticalization? First of all, we must create a layered dependency graph of the elements comprising your system. We call those elements “components” and the definition of a component depends on the language. For most languages, a component is a single source file. In special cases like C or C++ a component is a combination of related source and header files. But we can only create a properly layered dependency graph if we do not have cyclic dependencies between components. So as a first step we will combine all cyclic groups into single nodes.

A layered dependency graph with a cycle group treated as a single logical node

In the example above nodes F, G and H form a cycle group, so we combine them into a single logical node called FGH. After doing that we get three layers (levels). The bottom layer only has incoming dependencies, to top layer only has outgoing dependencies. From a maintainability point of view we want as many components as possible that have no incoming dependencies, because they can be changed without affecting other parts of the system. For the remaining components we want them to influence as few as possible components in the layers above them.

Node A in our example influences only E, I and J (directly and indirectly). B on the other hand influences everything in level 2 and level 3 except E and I. The cycle group FGH obviously has a negative impact on that. So we could say that A should contribute more to maintainability than B, because it has a lower probability to break something in the layers above. For each logical node iwe could compute a contributing value c_i to a new metric estimating maintainability:

where n is the total number of components, size(i) is the number of components in the logical node (only greater than one for logical nodes created out of cycle groups) and inf(i) is the number of components influenced by c_i.

Now lets compute c_i for node A:

If you add up c_i for all logical nodes you get the first version of our new metric “Maintainability Level” ML:

where k is the total number of logical nodes, which is smaller than n if there are cyclic component dependencies. We multiply with 100 to get a percentage value between 0 and 100.

Since every system will have dependencies it is impossible to reach 100% unless all the components in your system have no incoming dependencies. But all the nodes on the topmost level will contribute their maximum contribution value to the metric. And the contributions of nodes on lower levels will shrink the more nodes they influence on higher levels. Cycle groups increase the amount of nodes influenced on higher levels for all members and therefore have a tendency to influence the metric negatively.

Now we know that cyclic dependencies have a negative influence on maintainability, especially if the cycle group contains a larger number of nodes. In our first version of ML we would not see that negative influence if the node created by the cycle group is on the topmost layer. Therefore we add a penalty for cycle groups with more than 5 nodes:

In our case a penalty value of 1 means no penalty. Values less than 1 lower the contributing value of a logical node. For example, if you have a cycle group with 100 nodes it will only contribute 5% (\frac{5}{100}) of its original contribution value. The second version of ML now also considers the penalty:

This metric already works quite well. When we run it on well designed systems we get values over 90. For systems with no recognizable architecture like Apache Cassandra we get a value in the twenties.

Apache Cassandra: 477 components in a gigantic cycle group

Fine tuning the metric

When we tested this metric we made two observations that required adjustments:

  • It did not work very well for small modules with less than 100 components. Here we often got relatively low ML values because a small number of components increases relative coupling naturally without really negatively affecting maintainability.
  • We had one client Java project that was considered by its developers to have bad maintainability, but the metric showed a value in the high nineties. On closer inspection we found out that the project did indeed have a good and almost cycle free component structure, but the package structure was a total mess. Almost all the packages in the most critical module were in a single cycle group. This usually happens when there is no clear strategy to assign classes to packages. That will confuse developers because it is hard to find classes if there is no clear package assignment strategy.

The first issue could be solved by adding a sliding minimum value for ML if the scope to be analyzed had less than 100 components.

where n is again the number of components. The variant can be justified by arguing that small systems are easier to maintain in the first place. So with the sliding minimum value a system with 40 components can never have an ML value below 60.

The second issue is harder to solve. Here we decided to compute a second metric that would measure package cyclicity. The cyclicity of a package cycle group is the square of the number of packages in the group. A cycle group of 5 elements has a cyclicity of 25. The cyclicity of a whole system is just the sum of the cyclicity of all cycle groups in the system. The relative cyclicity of a system is defined as follows:

where n is again the total number of packages. As an example assume a system with 100 packages. If all these packages are in a single cycle group the relative cyclicity can be computed as 100 * \frac{\sqrt{100^2}}{100} which equal 100, meaning 100% relative cyclicity. If on the other hand we have 50 cycle groups of 2 packages we get 100 * \frac{\sqrt{50*2^2}}{100} – approx. 14%. That is what we want, because bigger cycle groups are a lot worse than smaller ones. So we compute ML_{alt} like this:

where n_p is the total number of packages. For smaller systems with less than 20 packages we again add a sliding minimum value analog to ML_3.

Now the final formula for ML is defined as the minimum between the two alternative computations:

Here we simply argue that for good maintainability both the component structure and the package/namespace structure must well designed. If one or both suffer from bad design or structural erosion, maintainability will decrease too.

Multi module systems

For systems with  more than one module we compute ML for each module. Then we compute the weighted average (by number of components in the module) for all the larger modules for the system. To decide which modules are weighted we sort the modules by decreasing size and add each module to the weighted average until either 75% of all components have been added to the weighted average or the module contains at least 100 components.

The reasoning for this is that the action usually happens in the larger more complex modules. Small modules are not hard to maintain and have very little influence on the overall maintainability of a system.

Try it yourself

Now you might wonder what this metric would say about the software you are working on. You can use our free tool Sonargraph-Explorer to compute the metric for your system written in Java, C# or Python. ML_{alt} is currently only considered for Java and C#. For systems written in C or C++ you would need our commercial tool Sonargraph-Architect.

ML in Sonargraph’s metric view

Of course we are very interested in hearing your feedback. Does the metric align with your gut feeling about maintainability or not? Do you have suggestions or ideas to further improve the metric? Please leave your comments below in the comment section.


The work on ML was inspired by a paper about another promising metrics called DL (Decoupling Level). DL is based on the research work of Ran Mo, Yuangfang Cai, Rick Kazman, Lu Xiao and Qiong Feng from Drexel University and the University of Hawaii. Unfortunatel,y a part of the algorithm computing DL is protected by a patent, so that we are not able to provide this metric in Sonargraph at this point. It would be interesting to compare those two metrics on a range of different projects.

This post was originally published on hello2morrow.

Alexander von Zitzewitz
Alexander von Zitzewitz is a serial entrepreneur in the software business and one of the founders of hello2morrow, an ISV specializing in static analysis tools that can enforce architecture and quality rules during development and maintenance of software systems. He worked in the industry since the early 1980’s and focusses on the role of software architecture and technical quality on successful project outcomes. He moved from Germany to Massachusetts in 2008 to develop hello2morrow’s business in North America.

Inline Feedbacks
View all comments