Fail Fast principle

When a problem occurs, it fails immediately and visibly.

Whenever an error occurs in a running software application there are typically three possible error-handling approaches:

  • The Ignore! approach: the error is ignored and the application continues execution.

  • The Fail fast! approach: the application stops immediately and reports an error.

  • The Fail safe! approach: the application acknowledges the error and continues execution in the best possible way.

Which approach is the best one? Which one should you apply in your application?

There are some general rules:

  • We should never “Ignore!” an error — unless there is a really good reason to do so.

  • During development we should apply the "Fail fast!" approach.

  • In critical applications the "Fail safe!" approach must be implemented in order to minimize damages.

Fail fast!

The fail fast principle stands for stopping the current operation as soon as any unexpected error occurs. When a problem occurs, it fails immediately and visibly.

The Fail fast! approach helps in debugging.

  • As soon as something goes wrong, the application stops and the error message helps to detect, diagnose and correct the error.

  • Therefore the Fail fast! approach leads to more reliable software, reduces development and maintenance costs and prevents frustrations and catastrophes that would otherwise risk to appear in production mode.

  • Even if a bug doesn’t lead to a severe failure, it is always best to detect it as soon as possible, because the costs to fix a bug raise exponentially with the time passed in the development cycle (compile-, test-, production-time).

The consequences of bugs appearing during development mode are generally not harmful. The customer doesn’t complain, money doesn’t go to the wrong account, and rockets don’t explode.

Failing immediately and visibly’ sounds like it would make your software more fragile, but it actually makes it more robust. Bugs are easier to find and fix, so fewer go into production.”

-- Jim Shore / Martin Fowler — Fail Fast

Immediate and visible failure

The opposite to fail-fast is fail-silently.

try {
    doSomething();
} catch (Exception ex) {
    log.error(ex);
}

A common justification for wrapping everything in a generic try-catch block is that it makes a software feel more stable by not letting end users know about errors in it. This generally is a bad practice.

The problem with such approach is that instead of revealing issues in the software, we mask them and thus extend the feedback loop. If something goes wrong with the application, it wouldn’t be obvious. The incorrect behavior is now hidden from the eyes of developers and end users and might stay unnoticed for a long time.

Moreover, the application’s persistence state may get corrupted if the code continues executing after an error took place.

The fix here is simple, we just need to add a "throw" statement to the catch block:

try {
    doSomething();
} catch (Exception ex) {
    log.error(ex);
    throw ex;
}

Consider a method that reads a property from a configuration file. What should happen when the property isn’t present? A common approach is to return null or a default value:

public int maxConnections() {
    String maxConnections = getProperty("maxConnections");
    if (property == null) {
        return 10;
    } else {
        return maxConnections.toInt();
    }
}

For the code that returns a default value, everything will seem fine. But when customers start using the software, they’ll encounter mysterious slowdowns.

The outcome is much different when we write the software to fail fast. Then the software stops functioning, and developer slaps his or her forehead and spends a little bit time fixing the problem.

A program that fails fast will throw an exception:

public int maxConnections() {
    String maxConnections = getProperty("maxConnections");
    if (property == null) {
        throw new NullReferenceException("maxConnections property not found in " + this.configFilePath);
    } else {
        return maxConnections.toInt();
    }
}

Fail-fast fundamentals

Assertions are the key to failing fast. An assertion is a tiny piece of code that checks a condition and then fails if the condition isn’t met. So, when something starts to go wrong, an assertion detects the problem and makes it visible.

When writing an assertion, think about what kind of information you’ll need to fix the problem if the assertion fails. Include that information in the assertion message. Don’t just repeat the assertion’s condition; the stack trace will lead to that. Instead, put the error in context.

You can create a global exception handler to gracefully handle unexpected exceptions, such as assertions, and bring them to the developers’ attention.

If you use a global exception handler, avoid catch-all exception handlers in the rest of your application. They’ll prevent exceptions from reaching your global handler.

Fail safe!

In development mode we should always apply the Fail fast! approach.

In production mode:

  • We should generally favor the Fail fast! approach by default.

  • Critical applications that risk leading to high damages in case of a malfunction need customized, context-specific and damage-eliminating (or at least damage-reducing) behavior.

  • Fail safe and react appropriately! approaches must be applied in fault-tolerant systems.

  • Repair what you can — but when you must fail, fail noisily and as soon as possible.

  • Errors should preferably be automatically detected at compile-time, or else as early as possible at run-time.

Last updated