Groovy Code Metrics: Cyclomatic Complexity

(This article originally appeared in the May 2012 issue of GroovyMag)

The cyclomatic complexity metric can be used to assess the complexity and maintainability of Groovy code. In this article, we see how to measure cyclomatic complexity using GMetrics, and how to enforce an upper limit for complexity using CodeNarc.

There are many ways to evaluate software, and a large number of metrics that we can apply to source code. One particularly useful and popular metric is cyclomatic complexity, proposed by Thomas McCabe in 1976 [1].

Cyclomatic complexity is defined in terms of the control flow graph for a piece of code (e.g. a method). But in practice, it measures the number of distinct (linearly independent) paths through the code. It does this by counting the number of branching statements and operators within the source code. See [1] for a formal definition of the metric and a discussion of the mathematical foundations. Despite the theoretical underpinnings, the cyclomatic complexity metric is quite straightforward and intuitive.

Why should you care?

Cyclomatic complexity is by no means a silver bullet, but it has proven its value over several decades, and continues to be one of the most popular metrics applied to source code. It is language-agnostic – cyclomatic complexity applies just as well to Java and Groovy as it did to C or Fortran.

Here are some reasons why you should care:

  • There is a strong correlation with defects. Code with higher cyclomatic complexity tends to be more likely to contain defects and cause problems [3].
  • The cyclomatic complexity value can serve as a guideline for the minimum number of required tests to properly cover the branches within a method.
  • Lower-complexity methods tend to be easier to understand, test and maintain.
  • Measuring and monitoring cyclomatic complexity can be used as another layer of automated code review. It can flag new code that is overly complex, identifying candidates for refactoring and/or more intensive testing. Similarly, you can use it to assess an existing (legacy) code base and point toward potential trouble spots.

The formula

For Groovy code, the formula for calculating cyclomatic complexity is:

Start with a initial (default) value of one (1). Add one (1) for each occurrence of the following:

  • if statement
  • while statement
  • for statement
  • case statement
  • catch statement
  • && and || boolean operations
  • ?: ternary operator and ?: Elvis operator.
  • ?. null-check operator

Some examples

Listing 1 shows a simple Groovy method with a cyclomatic complexity of four (4). Based on the formula described above, we can see that the result is the sum of:

  • The initial/default value of one
  • Plus one for the if statement
  • Plus one for the && operator
  • Plus one for the ternary operator (?:)
private String checkForError() {
    if (isReady() && count > 1) {
        return count < 10 ? 'Too small' : 'Too big'
    }
    return 'None'
}

Listing 1: Sample Groovy method with a cyclomatic complexity value of 4

Now let us briefly turn to the dark side. Listing 2 shows a Groovy method with a much higher cyclomatic complexity value (21). It is a long method, with lots of branching, including nested if statements. The method contains several distinct sections, and is clearly doing a bunch of different things, recklessly violating the Single Responsibility Principle along the way. Oh, the humanity! Hopefully, you can agree that this is not the way we should be writing code. The method’s high cyclomatic complexity value is a clue that something is wrong, and indicates the potential for issues with quality and maintainability.

private int processIncomeRecords(List records) {
  int processedRecordCount = 0
  records?.each {IncomeRecord rec ->
    int errorCount = 0
    def isDataEmpty = false
    if (!isDataEmpty) {
      if ((isNullOrEmpty(rec.planId)) ||
          (isNullOrEmpty(rec.productId)) ||
          (isNullOrEmpty(rec.custId)) ||
          (isNullOrEmpty(rec.custType)) ||
          (isNullOrEmpty(rec.custAccount)) ||
          (isNullOrEmpty(rec.pptIdType)) ||
          (isNullOrEmpty(rec.withdrawalAmt)) ||
          (isNullOrEmpty(rec.benefitBase))) {
        errors.add("Empty cust: $rec.custId")
        errorCount++
      }
      if (rec.pptIdType == "S") {
        if (rec.custId?.trim().length() == 9) {
          def ssn =
            (rec.custId?.trim()).substring(0,3)+'-'+
            (rec.custId?.trim()).substring(3,5)+'-'+
            (rec.custId?.trim()).substring(5,9)
          rec.pid = incomeDao.getPid(ssn,rec.planId)
          if (rec.pid == null) {
            errors.add("No PID $rec.custId")
            errorCount++
          }
        } else {
          errorCount++
        }
      } else if (rec.pptIdType == "O") {
        if (rec.custId.isInteger()) {
          rec.pid = Integer.parseInt(rec.custId)
        } else {
          errors.add("Invalid PID;plan $rec.planId")
          errorCount++
        }
        if (incomeDao.validatePid(rec) == 0) {
          errors.add("No PID $rec.custId")
          errorCount++
        }
      } else {
        errors.add("No PID Cust $rec.custId")
        errorCount++
      }
      def productIdList = incomeDao.getProductIds(
          rec.planId?.trim(),rec.providerId?.trim())
      if (!productIdList.contains(rec.productId)) {
        errors.add("No match ProdID $rec.productId")
        errorCount++
      }
      if (hasDuplicateKeys(records, rec.custId,
          rec.productId, rec.custType)) {
        errors.add("Dup:$rec.custId;$rec.productId")
        errorCount++
      }
      IncomeRecord convertedRecord
      if (isValidDataRecord(rec)) {
        convertedRecord = convertRecord(rec)
      } else {
        isDataEmpty = true
        errors.add("Invalid Record $rec.custId")
      }
      if ((!isDataEmpty) && (errorCount == 0)) {
        processRecord(convertedRecord)
        processedRecordCount++
      }
    }
  }
  return processedRecordCount
}

Listing 2: Sample Groovy method with a high cyclomatic complexity value (21)

As another example, Listing 3 shows a method with an even higher cyclomatic complexity score (28). At first glance, this method may not look so bad, but it contains subtle complexity and duplication. The same convoluted and dense pattern of null-check, Elvis operators and calls to trim() is repeated eight times. One way to significantly reduce the cyclomatic complexity of this method (and the class) would be to abstract and consolidate that complicated property access into a helper method. As seen in this example, duplication is often a significant contributor to a high complexity score.

List<Participant> getParticipants(Date runDate) {
  final SQL = '{ call ReadParticipants (?) }'
  def sql = new Sql(dataSource)
  def participants=[]
  def list = sql.rows(SQL, [asSqlDate(runDate)])
  list.each { result ->
    Participant part = new Participant()
    part.recordType = 'D'
    part.iid = (result?.IID?:'')
    part.firstName = (result?.FName?:'')?.trim()
    part.lastName = (result?.LName?:'')?.trim()
    part.initials = (result?.MName?:'')?.trim()
    part.address1 = (result?.Addr1?:'')?.trim()
    part.address2 = (result?.Addr2?:'')?.trim()
    part.city = (result?.City?:'')?.trim()
    part.state = (result?.State?:'')?.trim()
    part.zip = (result?.Zip?:'')?.trim()
    participants << part
  }
  return participants
}

Listing 3: Groovy method with a high cyclomatic complexity value (28) from duplication

How much is too much?

The value of ten (10) is often considered as the threshold between acceptable (low risk) and unacceptably-complex (higher risk) code. As McCabe ([1]) says, “The particular upper bound that has been used for cyclomatic complexity is 10 which seems like a reasonable, but not magical, upper limit.”. Other sources cite 15 as a useful threshold, and/or draw further delineations between low/medium/high/unacceptable complexity values.

By pretty much everyone’s guidelines, the methods in Listing 2 and Listing 3 are too complex.

Under-promise and over-deliver

Keep in mind that cyclomatic complexity focuses on the conditional logic (branching) within the code. That is reasonable because much of the effort and potential for errors in code is typically centered around the decisions and branches within the logic. That means that cyclomatic complexity is not a measure of the size or amount of code. You can have a thousand-line method with little or no conditional logic and it will have a low cyclomatic complexity score. As with any metric, it can be a valuable data point, but it does not tell the entire story.

Measuring and minimizing complexity is in line with the preference for clean code. Long methods are  bad. Short methods are good. Deep nesting of conditional code is bad. Straightforward logic is good. It encourages small, simple methods and promotes the practice of refactoring to reduce complexity.

By definition, refactoring to “simpler” methods does not affect the external behavior of the class/method. The real benefit is in the improved clarity and simplicity of the code. Smaller chunks are typically easier to understand, and there are fewer dark corners for bugs and incorrect assumptions to hide.

Measurement: GMetrics

Fortunately, it is quite easy to calculate cyclomatic complexity, either in an ad hoc way, or as part of an automated build process, including continuous integration. There are tools for most popular languages. The GMetrics (http://gmetrics.sourceforge.net/) tool calculates cyclomatic complexity for Groovy source code (among other metrics).

Figure 1 shows an example of the standard GMetrics HTML report. It uses the default MetricSet, which includes three metrics: CyclomaticComplexity, ClassLineCount and MethodLineCount. The report includes columns for both the total and average metric values at the method, class and package levels.

Note that the cyclomatic complexity value is only calculated at the method level. The values for the class and package levels are aggregated from the method-level values. Incidentally, the sum (total) of the cyclomatic complexity for the methods within a class is a related metric called weighted methods per class, and not surprisingly, represents the complexity of the class as a whole.

Sample GMetrics report

Figure 1: GMetrics Standard HTML Report.

Figure 2 shows a GMetrics Single-Series HTML report of the methods with the highest cyclomatic complexity, with a value of at least 10.0, in descending order. This type of report requires that we configure a metric, level and function to specify a single series of data. In this case, we specified the “CyclomaticComplexity” metric, at the “method” level and the “total” function value. We also specified a sort value (“descending”) and a greaterThan value (“10.0”) and a maxResults (“10”) to further configure and customize the report. Listing 4 shows the corresponding Ant script that generates the GMetrics reports in Figures 1 and 2.

Highest complexity methods (GMetrics report)

Figure 2: GMetrics Single-Series HTML Report: highest-complexity methods

<taskdef name="gmetrics" classname="org.gmetrics.ant.GMetricsTask"/>
<target name="gmetrics-cc">
  <gmetrics>
     <report type="org.gmetrics.report.BasicHtmlReportWriter">
        <option name="outputFile"
           value="reports/GMetricsReport.html" />
        <option name="title" value="Sample" />
     </report>
     <report type="org.gmetrics.report.SingleSeriesHtmlReportWriter">
        <option name="outputFile"
           value="reports/HighestComplexityMethods.html" />
        <option name="title" value="Highest Complexity Methods" />
        <option name="metric" value="CyclomaticComplexity" />
        <option name="level" value="method" />
        <option name="function" value="total" />
        <option name="maxResults" value="10" />
        <option name="sort" value="descending" />
        <option name="greaterThan" value="10.0" />
     </report>
     <fileset dir="../src/main/java">
        <include name="**/*.groovy"/>
     </fileset>
  </gmetrics>
</target>

Listing 4: Ant script for GMetrics

The GMetrics Ant Task is included with the GMetrics distribution. There are also GMetrics plugins for Grails, Griffon and Sonar. See [4] for more information.

Enforcement: CodeNarc

The CodeNarc (http://codenarc.sourceforge.net) tool analyzes Groovy source code for defects, bad practices, inconsistencies, unused code, etc.. Like GMetrics, it comes with an Ant Task.

CodeNarc includes a CyclomaticComplexity rule, which uses the GMetrics CyclomaticComplexity metric under the covers. The CodeNarc rule enables specifying upper limits for method complexity as well as class-level average method complexity.

Listing 5 shows the CodeNarc ruleset containing the CyclomaticComplexity rule. The ruleset customizes the maxMethodComplexity rule property to a value of 10, rather than the overly-generous default of 20. It also sets the maxClassAverageMethodComplexity rule property to 30, so that we do not see any class-level violations for now.

ruleset {
    CyclomaticComplexity {
        maxMethodComplexity = 10
        maxClassAverageMethodComplexity = 30
    }
}

Listing 5: CodeNarc ruleset

Listing 6 shows the Ant build file that executes CodeNarc with that ruleset. Figure 3 shows the CodeNarc report (excerpt), including the violations of the CyclomaticComplexity rule. Note that we have configured the CodeNarc Ant Task to fail the build of there are any priority 1 or 2 violations. Thus, we are enforcing an upper limit on the complexity of the methods within our code base.

<taskdef name="codenarc" classname="org.codenarc.ant.CodeNarcTask"/>
<target name="codenarc-cc">
    <codenarc
            ruleSetFiles="codenarc-cc-ruleset.groovy"
            maxPriority1Violations="0"
            maxPriority2Violations="0">
        <report type="html" toFile="reports/CodeNarcReport.html"
           title="Sample"/>
        <fileset dir="../src/main/java">
        <include name="**/*.groovy"/>
        </fileset>
    </codenarc>
</target>

Listing 6: Ant script for CodeNarc

CodeNarc report

Figure 3: CodeNarc CyclomaticComplexity rule violations

The CodeNarc Ant Task is included with the CodeNarc distribution. There are also CodeNarc plugins for Grails, Griffon, Gradle, Maven, Hudon/Jenkins and Sonar, as well as IDE plugins for IntelliJ IDEA and Eclipse. See [5] for more information.

Conclusion

The cyclomatic complexity metric is a useful and easy way to assess a code base and get a valuable perspective on its maintainability. For Groovy code, we can measure complexity and identify potential trouble spots with GMetrics, and enforce an upper threshold for complexity with CodeNarc.

References

[1] “A Complexity Measure” – the original paper describing cyclomatic complexity in IEEE Transactions on Software Engineering Vol. 2, No. 4, p. 308 (1976).

[2] “In pursuit of code quality: Monitoring cyclomatic complexity” – Andrew Glover, developerWorks, 2006.

[3] “McCabe Cyclomatic Complexity: the proof in the pudding” – Rich Sharpe, ENERJY, 2008.

[4] GMetrics – provides calculation and reporting of several metrics for Groovy source code, including cyclomatic complexity.

[5] CodeNarc – analyzes Groovy source code for defects, bad practices, unused code and more.

Advertisements
2 comments
  1. May I translate this good article in Chinese?

    • That sounds great. Thanks for the offer. I’ll include the link here when available.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: