We have a large Java/Groovy application with a JUnit test suite containing thousands of tests, that now takes several minutes to run. We wanted to try to reduce the test suite run time, and wanted a report of test times sorted longest to shortest, allowing us to focus on the worst offenders.

We used the JUnit RunListener facility to calculate the test run times and then sort and display a report at the end of the test suite. Specify the test suite class name in the testSuite property (field).

This class is implemented in Groovy, but the Java version would be very similar.

import org.junit.runner.Description
import org.junit.runner.JUnitCore
import org.junit.runner.Result
import org.junit.runner.notification.RunListener

 * Run a JUnit test suite and build a list of all tests (test methods) sorted
 * by run time, in descending order. Run this class as a Java Application.
 * Specify the test suite class in the testSuite property (field).
class RunTestSuiteTimingReport {

  private static Class testSuite = MyTestSuite
  private static long showTestTimesGreaterThanOrEqualToMilliseconds = 1000L

  private static class TimingListener extends RunListener {

    private Map<Description, Long> testStartTimes = [:]
    private Map<Description, Long> testRunTimes = [:]

    void testStarted(Description description) throws Exception {
      testStartTimes[description] = System.currentTimeMillis()

    void testFinished(Description description) throws Exception {
      long startTime = testStartTimes[description]
      testRunTimes[description] = System.currentTimeMillis() - startTime

    void testRunFinished(Result result) throws Exception {
      println "Test suite finished; ${testRunTimes.size()} total tests"
      // descending order, so reverse comparison
      def sortedRunTimes = testRunTimes.sort { a, b -> b.value <=> a.value } 
      for(Description description: sortedRunTimes.keySet()) {
        long runTime = sortedRunTimes[description]
        if (runTime < showTestTimesGreaterThanOrEqualToMilliseconds) {
        println " - $description: $runTime ms"

  static void main(String... args) {
    JUnitCore core= new JUnitCore()
    core.addListener(new TimingListener())

(This article originally appeared in the November 2012 issue of GroovyMag)

The ABC metric can be used to assess the size of Groovy code, from single methods all the way up through entire codebases. In this article, we see how to measure ABC using GMetrics, and how to enforce an upper limit using CodeNarc to improve the maintainability of our code.

The ABC metric was developed by Jerry Fitzpatrick in 1997 to provide a measure of software size and also address some of the deficiencies of using lines of code and other size metrics. It is defined as a function of the number of assignments, branches and conditions within a program [1]. It can be used to objectively and consistently measure the size of methods, classes and entire programs or libraries. The ABC metric can be used to compare relative code size across multiple codebases and even across different implementation languages.

          Other Options for measuring code size

The most familiar and most-used approach for measuring code size is lines of code (LOC). It is simple and quick to calculate, and has a straightforward, intuitive appeal. But it also has several drawbacks, including:

  • It is affected by coding style, including stylistic conventions such as placement of opening braces (in C-based languages) and standards for line-breaks and line-length.
  • It is not comparable across different implementation languages or platforms.
  • There are several variations in how LOC can be calculated. Should it include blank lines? Should it include comments?

Function points are another well-known approach for measuring program size. They allow estimating the amount of business functionality of a program based on the number of outputs, inquiries, inputs, internal files, and external interfaces. The approach analyzes a program specification rather than the source code, enabling estimation of program size before its implementation. That has several benefits, but function points suffer from limitations that restrict their applicability and popularity, including:

  • Function points are difficult and costly to calculate. They cannot be automatically generated through static analysis of source code or specifications. They require labor-intensive, manual, skilled analysis.
  • Depending on how it is performed, the function point analysis can results in a subjective function point counts. Different function point counters may arrive at different results.
  • There are several variations, standards and customizations of the original function point definition.
  • Function points are not appropriate for measuring the size of individual classes or methods.
  • Function point analysis discounts the complexity and effort required in modern user interfaces.

          ABC: Why should you care?

The ABC metric was designed to specifically address many of the shortcomings described above and provide a language-agnostic measurement of program size.

Here are some reasons why you should care:

  • The ABC metric provides an objective, consistent, easy and automatic way to measure code size.
  • The ABC metric is independent of coding style and implementation language. It can be used to compare relative sizes of programs and libraries across different projects and different technology stacks.
  • You can use the ABC metric value (magnitude) as the unit of measurement to assess and enforce size thresholds for methods and classes, with the goal of producing clean code – specifically small, simple methods and classes. Smaller methods and classes tend to be easier to understand, test and maintain.
  • You can use the ABC size metric in combination with other metrics, such as cyclomatic complexity or code coverage, as inputs for more holistic analysis and planning.
  • Measuring and monitoring method and class size can be used as another layer of automated code review. It can flag new code containing excessively-large methods or classes, identifying candidates for refactoring and/or more intensive testing. Similarly, you can use it to assess an existing (legacy) code base and point toward potential trouble spots.

          The formula

For Groovy code, the formula for calculating the ABC metric is:

Start with A=0, B=0, C=0, and then:

  • Add one to the assignment count (A) for each occurrence of an assignment      operator, excluding constant declarations: =,      *=, /=, %=, +=, <<=, >>=,      &=, |=, ^=, and >>>=.
  • Add one to the assignment count (A) for each occurrence of an increment or      decrement operator (prefix or postfix): ++      and –.
  • Add one to the branch count (B) for each method call or property access.
  • Add one to the branch count (B) for each occurrence of the new operator.
  • Add one to the condition count (C) for each use of a conditional operator: ==, !=, <=, >=, <, >, <=>, =~ and ==~.
  • Add one to the condition count (C) for each use of the following keywords: else, case, default, try, catch and ?.
  • Add one to the condition count (C) for each unary conditional expression.      These are cases where a single variable/field/value is treated as a      boolean value. Examples include if (x) and return !ready.

The resulting raw ABC value represents a three-dimensional vector, and is written as an ordered triplet of integers: <A, B, C>. For example, a raw ABC value of <7, 2, 4> would represent A=7, B=2 and C=4.

That ABC vector is converted into a single, scalar value by calculating the magnitude of the vector:

|ABC| = sqrt((A*A)+(B*B)+(C*C))

For example, the ABC vector of <7, 2, 4> would be converted to sqrt(49 + 4 + 16) == sqrt(69) == 8.3066. The ABC score is conventionally rounded to a single decimal place, so that would be 8.3.

          Some examples

Listing 1 shows a simple Groovy method. The source lines of the methods are annotated to indicate assignments (A), branches (B) and conditions (C). The findAdministrator() method contains two branches (B) and one condition (C). Note that the line PlanAdministrator.findById( contains both a method call and a property access, and so contributes two to the branch (B) count. The method’s ABC vector is <0, 2, 1> which results in a magnitude of 2.2.

PlanAdministrator findAdministrator(Plan plan) {
    if (!plan) {                            // C
        return null
    PlanAdministrator.findById(     // BB

Listing 1: Sample Groovy method with an ABC vector of <0, 2, 1> and magnitude of 2.2

Listing 2 shows another annotated method, from a Grails application. Its ABC vector is <2, 8, 1> which results in a magnitude of 8.3.

boolean isPlanReferencedByActivity(Plan plan) {
    def c = PlanActivity.createCriteria()   // AB
    def refCount = c.count {                // AB
       eq("plan", plan)                      // B
       or {                                  // B
         eq("standardCode", plan.code)       // BB
         eq("overrideCode", plan.code)       // BB
    return (refCount > 0)                   // C

Listing 2: Another sample Groovy method, with an ABC vector of <2, 8, 1> and magnitude of 8.3

In Groovy and Grails, closures are often used in place of methods. If a class field is initialized to a closure, then the ABC value for that closure can be analyzed just like a method. Listing 3 shows a Grails controller action (closure). Its ABC vector is <15, 39, 6> which results in a magnitude of 42.2. It was reduced somewhat from the original, to fit in the listing. Even in its reduced form, it includes a several assignments (15, actually), and branches, including nested if statements. The closure contains several distinct sections. Perhaps it is worth reviewing and possibly refactoring?

def updatePid = { CustomerCommand cmd ->
   def customer = getCurrentCustomer()
   if (customer) { =
      cmd.origCustNum = customer.custNum
      cmd.origSsn = customer.ssn
      cmd.origPid =
      cmd.plan = customer.plan
      boolean isSsn = isPidFromSSN(cmd.plan)
      if (isSsn) { = cmd.ssn
      } else { = cmd.custNum
      if (! {
         def results = changePid(cmd, isSsn)
         if (results.success) {
            flash.message = "Updated PID ${}"
            def planId =
            def p = [planId:planId,]
            redirect(action:'show', params:p)
         } else {

           def errMsg = results.errorMessage
           if (results.field) {
                 "ignore", errMsg)
           } else {
              cmd.errors.reject("part.bad", errMsg)
      } else {
         flash.message = "No change in PID"
      renderForCustomer(cmd, 'changePid')

Listing 3: Groovy method, with an ABC vector of <15, 39, 6> and magnitude of 42.2

          How much is too much?

Guidelines and thresholds for method and class size often generate debate and disagreement. It is impossible to establish limits that are acceptable to everyone and that apply in every situation. Nevertheless, having guidelines in place can help to improve code quality, consistency and maintainability. The specific numbers chosen are inevitably somewhat arbitrary. In some cases, methods or classes that exceed the size threshold can be justified, but many times such code warrants further attention and review for refactoring opportunities.

A blog post by Jake Scruggs specifies risk classifications for interpreting ABC magnitude values from the Flog tool (for Ruby). His guidelines for ABC values include:

  • 0 – 10 = Awesome
  • 11 – 20 = Good enough
  • 21 – 40 = Might need refactoring
  • 41 – 60 = Possible to justify

Anything above 60 is dangerous or worse. [4]

The folks behind the eXcentia Sonar ABC Metric Plugin (for Java) define a table of risk classifications for ABC scores for both methods and classes. They have based their classifications on “exhaustive experiments with real and very large projects” (translated):

  • 0 – 15 = No risk
  • 15 – 30 = Moderate risk
  • 30 – 45 = High risk
  • 45 – 60 = Very high risk

They also provide a similar table for class-level ABC values. [5]

Even though the Flog tool is for Ruby code, and the Sonar ABC Plugin is for Java code, the ABC score is calculated similarly (though adapted somewhat to account for language specifics) and the guidelines are transferable. By any of the above guidelines, the method in Listing 3 is too big.

Of course, you will need to come up with your own threshold based on your team’s conventions and sensibilities.

          Measurement: GMetrics

Fortunately, it is quite easy to calculate the ABC metric, either in an ad hoc way, or as part of an automated build process, including continuous integration. There are tools for several popular languages. The GMetrics ( tool calculates the ABC metric for Groovy source code (among other metrics).

Figure 1 shows an example of the standard GMetrics HTML report. It uses the MetricSet shown in Listing 4, which includes only the ABC metric. The report includes the default columns for both the total and average metric values at the method, class and package levels.

Note that the ABC metric value is only calculated at the method level. The values for the class and package levels are aggregated from the method-level values.


Figure 1: GMetrics HTML Report.

metricset {
   ABC {

Listing 4: GMetrics metric set specifying only the ABC metric

Figure 2 shows a GMetrics Single-Series HTML report of the methods with the highest ABC values, with a value of at least 20.0, in descending order. This type of report requires that we configure a metric, level and function to specify a single series of data. In this case, we specified the “ABC” metric, at the “method” level and the “total” function value. We also specified a sort value (“descending”) and a greaterThan value (“20.0”) and a maxResults (“10”) to further configure and customize the report. Listing 5 shows the corresponding Ant script that generates the GMetrics reports in Figures 1 and 2.


Figure 2: GMetrics Single-Series HTML Report: largest methods (ABC)

<taskdef name="gmetrics" classname="org.gmetrics.ant.GMetricsTask"/>
<target name="gmetrics-abc">
   <gmetrics metricSetFile="gmetrics-abc-metricset.groovy">
      <report type="">
         <option name="outputFile"
            value="reports/GMetricsReport.html" />
         <option name="title" value="ABC Metric Example" />
      <report type="">
         <option name="outputFile"
            value="reports/LargestABCMethods.html" />
         <option name="title" value="Largest Methods (ABC)" />
        <option name="metric" value="ABC" />
        <option name="level" value="method" />
        <option name="function" value="total" />
        <option name="maxResults" value="10" />
        <option name="sort" value="descending" />
        <option name="greaterThan" value="20.0" />
      <fileset dir="../src/grails">
         <include name="**/*.groovy"/>

Listing 5: Ant script for GMetrics

The GMetrics Ant Task is included with the GMetrics distribution. There are also GMetrics plugins for Grails, Griffon and Sonar. See [6] for more information.

          Enforcement: CodeNarc

The CodeNarc ( tool analyzes Groovy source code for defects, bad practices, inconsistencies, unused code, etc. Like GMetrics, it comes with an Ant Task.

CodeNarc includes an AbcComplexity rule, which uses the GMetrics ABC metric under the covers. The CodeNarc rule enables specifying upper limits for method size as well as class-level average method size. Note that the AbcComplexity rule was improperly named within CodeNarc, because the ABC metric measures program size, not complexity.

Listing 6 shows the CodeNarc ruleset containing the AbcComplexity rule. The ruleset customizes the maxMethodComplexity rule property to a value of 20, rather than the overly-generous default of 60. It also explicitly sets the maxClassAverageMethodComplexity rule property to 60 (which also happens to be the default), so that we do not see any class-level violations for now.

ruleset {
   AbcComplexity {
      maxMethodComplexity = 20
      maxClassAverageMethodComplexity = 60

Listing 6: CodeNarc ruleset

Listing 7 shows the Ant build file that executes CodeNarc with that ruleset. Figure 3 shows the CodeNarc report (excerpt), including the violations of the AbcComplexity rule. Note that we have configured the CodeNarc Ant Task to fail the build of there are any priority 1 or 2 violations. Thus, we are enforcing an upper limit on the size of the methods within our code base.

<taskdef name="codenarc" classname="org.codenarc.ant.CodeNarcTask"/>
<target name="codenarc-abc">
      <report type="html" toFile="reports/CodeNarcReport.html" title="ABC Example"/>
      <fileset dir="../src/grails">
         <include name="**/*.groovy"/>

Listing 7: Ant script for CodeNarc


Figure 3: CodeNarc AbcComplexity rule violations

The CodeNarc Ant Task is included with the CodeNarc distribution. There are also CodeNarc plugins for Grails, Griffon, Gradle, Maven, Hudon/Jenkins and Sonar, as well as IDE plugins for IntelliJ IDEA and Eclipse. See [7] for more information.


The ABC metric is a useful and easy way to assess a code base and get a valuable perspective on its maintainability. For Groovy code, we can measure method and class size and identify potential trouble spots with GMetrics, and enforce an upper threshold for method size with CodeNarc.


[1] The ABC Metric specification – originally published in The C++ Report, June 1997.

[2] Function Point Analysis – definition and description.

[3] Function Points Are Fantasy Points – opinionated but compelling blog post about the deficiencies of function points.

[4] This blog post describes some guidelines for interpreting the ABC score. The post refers to the Flog tool for Ruby code, but the ABC score is calculated similarly (though adapted somewhat to account for language specifics) and the guidelines should be transferable.

[5] eXcentia Sonar ABC Metric Plugin – the web page includes a table of risk classifications for ABC scores for both methods and classes.

[6] GMetrics – provides calculation and reporting of several metrics for Groovy source code, including the ABC metric.

[7] CodeNarc – analyzes Groovy source code for defects, bad practices, unused code and more.