Write Simple Units of Code

Limit the number of branch points per unit to 4. Do this by splitting complex units into simpler ones and avoiding complex units altogether.

Robert C. Martin pointed out that "the ratio of time spent reading versus writing is well over 10 to 1". You don't want to increase the reading side of the ratio any further by producing unnecessarily complex code. Decrease unnecessary complexity, and you will decrease the time to production.

Motivation

A simple unit is easier to understand, and thus modify, than a complex one.
Simple units ease testing.

Every developer loves to work on code that is easy to read and understand. But achieving this level of code simplicity isn't always easy. Why? The complexity of the code is not obvious to developers or the development team until it has progressed through the levels.

Developers spend more time developing their code and even more, time maintaining it. Additionally, because project goals and functionality change over time, the codebase often expands and grows in the process.

Whenever a developer inspects code in a later stage, they will come across many unnecessary lines of code. At this point, the code often becomes so complex that the developers can't fix it.

There is a point where code becomes so complex that modifying it becomes an extremely risky and very time-consuming task, let alone testing the modifications afterward. To keep code maintainable, we must put a limit on complexity. Another reason to measure complexity is knowing the minimum number of tests we need to be sufficiently certain that the system acts predictably.

Before we can define such a code complexity limit, we must be able to measure complexity.

Measure Code Complexity

There are several metrics can help measure code complexity and help you identify potential areas for improvement within the codebase:

Cyclomatic Complexity
Lines of Source Code
Lines of Executable Code
Coupling/Depth of Inheritance
Maintainability Index
Cognitive Complexity
Halstead Volume
Rework Ratio

Cyclomatic Complexity

This measures how much control flow exists in a program. Since programs with more conditional logic are more complex, measuring the level of complexity tells the developers how much needs to be managed. This is done by directly measuring the number of paths through the code. Say a program is a graph of all possible operations. Then, complexity measures the number of unique paths through that graph. Every if, while, or for statement creates a new branch. It is even possible for one branch to double the total number of paths. But the results from this method can sometimes be very unreliable.

In 1976, Thomas McCabe Snr proposed a metric for calculating code complexity, called Cyclomatic Complexity or McCabe Complexity or simply Complexity. It’s defined as:

A quantitative measure of the number of linearly independent paths through a program’s source code … computed using the control flow graph of the program.
-- Thomas McCabe

This means fewer paths through a piece of code, and the less complicated those paths are.

The classical way to calculate cyclomatic complexity is through McCabe's Formula. I cite Wikipedia:

The complexity M is then defined as:
M = E − N + 2*P
where:
E = the number of edges of the graph.
N = the number of nodes of the graph.
P = the number of connected components.

Translating it into simpler language, it works like this: You transform your code into a graph. Each node in the graph is a statement of the code. The edges are what connects the nodes. Finally, P simply means the exit point of the program or routine.

Imagine a function with three consecutive statements and no structure or decision statement. Its graph would look like this:

[] -> [1st statement] -> [2nd] -> [3rd] -> [exit]

We have five nodes, four edges, and one connect component. Let's replace all of that in the formula:

M= 4 - 5 + (2 * 1)

Thus, the complexity of this function is one.

Lines of Source Code or Lines of Executable Code

It counts the number of lines in the source code. It is the most straightforward metric used to measure the size of the program. However, functionality and complexity do not relate that well as a skilled developer might be able to deliver the same functionality with a significantly smaller code.

Example:

for (int i = 1; i <= 10; i++) { // 1

    // Do something
    doSomething(); // 2
    
}

Here, you see close curly brackets on a separate line, and these shouldn't count as a source lines of code, similar for linespace, comments. Therefore, this for loop has 6 lines, but only has 2 lines of source code.

Coupling or Depth of Inheritance

This measures how intertwined and dependent a class or function is in relation to all others in the codebase.

Maintainability Index

This single value measures how easy it is to maintain the code. It's a combination of the four metrics above.

Cognitive Complexity

This measures the amount of cognitive effort required to understand the code's flow. You compute cognitive complexity similarly to cyclomatic complexity. However, it doesn't increment with if statements that have logical operators in them. Cognitive complexity only increments once with switch cases, but it does increment complexity with nested flow breaks.

Halstead Volume

This measures the amount of information in the source code. It counts the number of variables and how often they appear.

Rework Ratio

Your Rework ratio is the percentage of recently delivered code your team is already rewriting. While this isn't a direct measure of code complexity, it is a potential indicator of overly complex code you may want to keep an eye on.

Typically, rework occurs because there's an issue in the quality of your code review process, such as superficial reviews that don't look for code complexity indicators.

How to Apply the Guideline

Code complexity limit

A common way to objectively assess the complexity is to count the number of possible paths through a piece of code.

The idea is that the more paths can be distinguished, the more complex a piece of code is. We can determine the number of paths unambiguously by counting the number of branch points.

A branch point is a statement where execution can take more than one direction depending on a condition. Examples of branch points in Java code are `if-else`, `switch-case`, `while`, `for`, `do-while`, `catch`, `? :`, `&&`, `||` . Branch points can be counted for a complete codebase, a class, a package, or a unit.

The number of branch points of a unit is equal to the minimum number of paths needed to cover all branches created by all branch points of that unit. This is called branch coverage.

However, when you consider all paths through a unit from the first line of the unit to a final statement, combinatory effects are possible. The reason is that it may matter whether a branch follows another in a particular order. All possible combinations of branches are the execution paths of the unit—that is, the maximum number of paths through the unit.

The following example shows the difference between branch coverage and execution path coverage.

In summary, the number of branch points is the number of paths that cover all branches created by branch points. It is the minimum number of paths and can be zero (for a unit that has no branch points). The number of execution paths is a maximum, and can be very large due to combinatorial explosion.

The cyclomatic complexity or McCabe complexity is the number of branch points plus one.

Consequently, the guideline “limit the number of branch points per unit to 4” is equal to “limit code McCabe complexity to 5”. This is the minimum number of test cases that you need to cover a unit such that every path has a part not covered by the other paths. A unit with no branch points has one execution path, and needs at least one test.

Example 1

int sum(String str1, String str2) {
    int number1 = Integer.parseInt(str1);
    int number2 = Integer.parseInt(str2);
    int sum = number1 + number2;
    return sum;
}

This function has no branch point, thus the complexity of this function is one.

Example 2

boolean isAdult(int age) {
    int isAdult = false;
    if (age >= 18) {
        isAdult = true;
    }
    return isAdult;
}

The function now has a logical branch. If the age is equal to or greater than 18, returns trure, otherwise returns false. Therefore, the complexity of this function is two.

Example 3

void printNumberDayOfMonth(int month) {
        if (month == 1
                || month == 3
                || month == 5
                || month == 7
                || month == 8
                || month == 10
                || month == 12
        ) {
            System.out.println("Month=" + month + " has 31 days");
        } else if (month == 2) {
            System.out.println("Month=" + month + " has 28 days");
        } else if (month == 4
                || month == 6
                || month == 9
                || month == 11
        ) {
            System.out.println("Month=" + month + " has 30 days");
        } else {
            System.out.println("Invalid month");
        }
}

There are 12 possible cases and for the unknown/default case, the minimum number of isolated paths to be tested (control flow branches) is 13. The same complexity if using switch-case:

void printNumberDayOfMonth(int month) {
    switch (month) {
        case 1:
        case 3:
        case 5:
        case 7:
        case 8:
        case 10:
        case 12:
            System.out.println("Month=" + month + " has 31 days");
            break;
        case 2:
            System.out.println("Month=" + month + " has 28 or 29 days");
            break;
        case 4:
        case 6:
        case 9:
        case 11:
            System.out.println("Month=" + month + " has 30 days");
            break;
        default:
            System.out.println("Invalid month");
            break;
    }
}

Dealing with Nesting

To understand why complex code is such a problem for maintenance, imagine having to modify or test this method:

public static int calculateDepth(BinaryTreeNode<Integer> t, int n) {
    int depth = 0;
    if (t.getValue() == n) {
        return depth;
    } else if (n < t.getValue()) {
        BinaryTreeNode<Integer> left = t.getLeft();
        if (left == null) {
            throw new TreeException("Value not found in tree!");
        } else {
            return 1 + calculateDepth(left, n);
        }
    } else {
        BinaryTreeNode<Integer> right = t.getRight();
        if (right == null) {
            throw new TreeException("Value not found in tree!");
        } else {
            return 1 + calculateDepth(right, n);
        }
    }
}

To improve readability, we can get rid of the nested conditional by identifying the distinct cases and insert return statements for these. In terms of refactoring, this is called the Replace Nested Conditional with Guard Clauses pattern. The result will be the following method:

public static int calculateDepth(BinaryTreeNode<Integer> t, int n) {
    int depth = 0;
    if (t.getValue() == n)
        return depth;
    if (n < t.getValue() && t.getLeft() != null)
        return 1 + calculateDepth(t.getLeft(), n);
    if (n > t.getValue() && t.getRight() != null)
        return 1 + calculateDepth(t.getRight(), n);
    throw new TreeException("Value not found in tree!");
}

Although the unit is now easier to understand, its complexity has not decreased. In order to reduce the complexity, you should extract the nested conditionals to separate methods. The result will be as follows:

public static int calculateDepth(BinaryTreeNode<Integer> t, int n) {
    int depth = 0;
    if (t.getValue() == n)
        return depth;
    else
        return traverseByValue(t, n);
}

private static int traverseByValue(BinaryTreeNode<Integer> t, int n) {
    BinaryTreeNode<Integer> childNode = getChildNode(t, n);
    if (childNode == null) {
        throw new TreeException("Value not found in tree!");
    } else {
        return 1 + calculateDepth(childNode, n);
    }
}

private static BinaryTreeNode<Integer> getChildNode(
    BinaryTreeNode<Integer> t, int n) {
    if (n < t.getValue()) {
        return t.getLeft();
    } else {
        return t.getRight();
    }
}

This actually does decrease the complexity of the unit. Now we have achieved two things: the methods are easier to understand, and they are easier to test in isolation since we can now write unit tests for the distinct functionalities.

Dealing with Conditional Chains

A chain of if-then-else statements has to make a decision every time a conditional if is encountered. An easy-to-handle situation is the one in which the conditionals are mutually exclusive; that is, they each apply to a different situation. This is also the typical use case for a switch statement.

There are many ways to simplify this type of complexity, and selecting the best solution is a trade-off that depends on the specific situation.

For example:

public List<Color> getFlagColors(Nationality nationality) {
    List<Color> result;
    switch (nationality) {
        case DUTCH:
            result = Arrays.asList(Color.RED, Color.WHITE, Color.BLUE);
            break;
        case GERMAN:
            result = Arrays.asList(Color.BLACK, Color.RED, Color.YELLOW);
            break;
        case BELGIAN:
            result = Arrays.asList(Color.BLACK, Color.YELLOW, Color.RED);
            break;
        case FRENCH:
            result = Arrays.asList(Color.BLUE, Color.WHITE, Color.RED);
            break;
        case ITALIAN:
            result = Arrays.asList(Color.GREEN, Color.WHITE, Color.RED);
            break;
        default:
            result = Arrays.asList(Color.GRAY);
            break;
    }
    return result;
}

We have two alternatives to reduce complexity:

The first is the introduction of a Map data structure that maps nationalities to specific Flag objects. This refactoring reduces the complexity of the getFlagColors method from 6 to 2.

private static Map<Nationality, List<Color>> FLAGS =
    new HashMap<Nationality, List<Color>>();

static {
    FLAGS.put(DUTCH, Arrays.asList(Color.RED, Color.WHITE, Color.BLUE));
    FLAGS.put(GERMAN, Arrays.asList(Color.BLACK, Color.RED, Color.YELLOW));
    FLAGS.put(BELGIAN, Arrays.asList(Color.BLACK, Color.YELLOW, Color.RED));
    FLAGS.put(FRENCH, Arrays.asList(Color.BLUE, Color.WHITE, Color.RED));
    FLAGS.put(ITALIAN, Arrays.asList(Color.GREEN, Color.WHITE, Color.RED));
}

public List<Color> getFlagColors(Nationality nationality) {
    List<Color> colors = FLAGS.get(nationality);
    return colors != null ? colors : Arrays.asList(Color.GRAY);
}

A second, more advanced way to reduce the complexity is to apply a refactoring pattern that separates functionality for different flags in different flag types. You can do this by applying the Replace Conditional with Polymorphism pattern: each flag will get its own type that implements a general interface. The polymorphic behavior of the Java language will ensure that the right functionality is called during runtime.

public interface Flag {
    List<Color> getColors();
}

public class DutchFlag implements Flag {
    public List<Color> getColors() {
        return Arrays.asList(Color.RED, Color.WHITE, Color.BLUE);
    }
}

// ...

public class ItalianFlag implements Flag {
    public List<Color> getColors() {
        return Arrays.asList(Color.GREEN, Color.WHITE, Color.RED);
    }
}

public class DefaultFlag implements Flag {
    public List<Color> getColors() {
        return Arrays.asList(Color.GRAY);
    }
}

private static final Map<Nationality, Flag> FLAGS =
    new HashMap<Nationality, Flag>();

static {
    FLAGS.put(DUTCH, new DutchFlag());
    FLAGS.put(GERMAN, new GermanFlag());
    FLAGS.put(BELGIAN, new BelgianFlag());
    FLAGS.put(FRENCH, new FrenchFlag());
    FLAGS.put(ITALIAN, new ItalianFlag());
}

public List<Color> getFlagColors(Nationality nationality) {
    Flag flag = FLAGS.get(nationality);
    flag = flag != null ? flag : new DefaultFlag();
    return flag.getColors();
}

This refactoring offers the most flexible implementation. For example, it allows the flag type hierarchy to grow over time by implementing new flag types and testing these types in isolation. A drawback of this refactoring is that it introduces more code spread out over more classes. The developer much chooses between extensibility and conciseness.

Benefits of measuring code complexity

Measurement of software complexity based on defined algorithms provides a comprehensive assessment of the code. Regardless of the size of the code, measuring it can make your code objective, repeatable, consistent, and cost-effective.

Here are the most significant benefits of measuring software complexity:

Dependable Testing: measuring the complexity of the code tells the developer how many paths there are in the code. Therefore, the developer knows how many paths there are to test. This will help them calculate the minimum number of tests required to cover the entire code.
Reduced Risk of Defects: one of the most famous sayings in the IT industry is – “It’s harder to read code than to write it. Since code is read far more than it is written, this saying is true. By reducing code complexity, developers can reduce the risk of introducing more bugs. After all, a good developer is never assessed by the lines of code they have written, but the quality they have maintained.
Lower Maintenance Cost: by reducing complexity, we reduce the probability of introducing defects. When the risk of potential bugs is reduced, there are fewer defects to find. Therefore, maintenance cost significantly reduces.
Greater Predictability: measuring the complexity of code, helps developers say how long a section of code takes to complete. This knowledge allows organizations to predict better how long a release takes to ship. Hence, businesses dependent on such software can better set their goals and expectations. It also allows development teams to set more realistic forecasts and budgets.

Common Objections to Writing Simple Units of Code

Of course, when you are writing code, units can easily become complex. You may argue that high complexity is bound to arise or that reducing unit complexity in your codebase will not help to increase the maintainability of your system. Such objections are discussed next.

Objection: High Complexity Cannot Be Avoided

“Our domain is very complex, and therefore high code complexity is unavoidable.”

When you are working in a complex domain, it is natural to think that the domain’s complexity carries over to the implementation, and that this is an unavoidable fact of life.

We argue against this common interpretation. Complexity in the domain does not require the technical implementation to be complex as well. In fact, it is your responsibility as a developer to simplify problems such that they lead to simple code. Even if the system as a whole performs complex functionality, it does not mean that units on the lowest level should be complex as well. In cases where a system needs to process many conditions and exceptions (such as certain legislative requirements), one solution may be to implement a default, simple process and model the exceptions explicitly.

It is true that the more demanding a domain is, the more effort the developer must expend to build technically simple solutions. But it can be done! We have seen many highly maintainable systems solving complex business problems. In fact, we believe that the only way to solve complex business problems and keep them under control is through simple code.

Objection: Splitting Up Methods Does Not Reduce Complexity

“Replacing one method with McCabe 15 by three methods with McCabe 5 each means that overall McCabe is still 15 (and therefore, there are 15 control flow branches overall). So nothing is gained.”

Of course, you will not decrease the overall McCabe complexity of a system by refactoring a method into several new methods. But from a maintainability perspective, there is an advantage to doing so: it will become easier to test and understand the code that was written. So, as we already mentioned, newly written unit tests allow you to more easily identify the root cause of your failing tests.

PreviousWrite Short Units of Code NextWrite Code Once

Last updated 2 years ago