Errors as Values: Free Yourself From Unexpected Runtime Exceptions

Introduction

When I try to sell people on Functional Programming, I’ll say things like “Imagine a world with no null pointer exceptions”. That’s a bit misleading as I’m actually referring to is the power of a sound types.

However, it’s assumed in Functional Programming that do not have runtime Exceptions at all. Instead, functions that can fail will return if they worked or not. When referring to this, people will sometimes say “Errors as Values” instead of Errors being a runtime exception that has the error inside of it. That belief system is what I want embraced, not sound types as many are using dynamic languages, so the belief is more impactful in those type-less areas.

It’s quite an alien viewpoint, and hard to visualize how you’d program this way if you’ve never been exposed to it. This is especially true if you’re using non-FP languages (excluding Go and Lua) which can look weird if you start returning values.

This is a bit nuanced so I wanted to cover this core concept here so people clearly understand you can live in a programming world without unexpected runtime exceptions. Keyword there: “unexpected”. You can do this returning errors from functions instead of intentionally raising errors. Optionally, using sound types will get you to 100% of code, while not solving resource exhaustion exceptions.

The benefit to you? Your code is more predictable, you can release to production with more confidence, and you can deliver more features, faster.

You do this by treating errors as values; just like you return a string or number of discriminated union from a function, so too can you return an error instead of throwing/raising it.

Why Treat Errors as Values?

Your code has 4 advantages doing it this way.

2 Outcomes of Code vs. 3

All functions have only 2 possible outcomes: they work or they don’t. This as opposed to 3 where it works, it doesn’t, or it throws an unexpected error (as opposed to an intentional throw or raise).

2 Outcomes of Program vs. Exponentially Large

When you start combining these functions into a program, your program now either works or it does not. This is where runtime exceptions start to manifest 2 horrible things in exponential ways. They first start occurring in unexpected areas of your code, making it hard, if not impossible in dynamic languages to track exactly where you need to put try/catches. The 2nd is, even in strongly typed ones, you still can get uncaught null pointers, and your program now has 3 possible outcomes of it works, it fails, or it unexpectedly fails. The typical dynamic language approach here is to just use the power of dynamic languages: run the code, quickly, to suss out all the unexpected paths, find them, then fix them.

It’s not technically correct to say “2 outcomes” as you may get a Union type that has numerous possible states; I just mean your program always returns “it worked” or “some deviation”.

Slightly Less to Test

Your code is easier to test in a true happy and unhappy path. There is no “unexpected path”. You still will get logic errors, have trouble with concurrency, and run out of system resources.

Clear Intent

Your code intent is more clear, especially in dynamic languages which have no types to help.

What’s Wrong With Runtime Exceptions?

Beyond the guy who invented them saying it was a costly mistake, they remove all confidence your code works 100% of the time, they take time away form building features, and they encourage creating complexity.

Let’s just show you some basic examples that illustrate the problem. I’m lumping in “all runtime Exceptions” with null pointers here as this happens a lot more in dynamic languages than strongly typed ones.

Here’s a basic Python AWS Lambda:

def handler(event):
  if event['methd'] == 'GET':
    return true
  return False

There are 3 things wrong with this function that will cause it to raise an Exception:

  1. The handler in AWS Lambda for Python requires 2 parameters; we’ve only provided 1: event. JavaScript doesn’t enforce function arity, so you can safely ignore the 2nd parameter, context, there; not so in Python. This may work in unit tests, but not when deployed to AWS and invoked.
  2. The event is a JSON (Python Dictionary) that is from an Application Load Balancer. It’ll have a method that’s GET, POST, etc., some headers, and and possibly queryParameters and body. However, we misspelled method without the “o”; methd so it’ll fail at runtime when the Lambda is invoked after fixing the first error.
  3. Python Boolean’s are capital “T” True and capital “F” False. Our False at the bottom is correct, but our lowercase true is not correct and will fail… when it’s actually successful.

You don’t know about these problems, in Python, unless you utilize Python 3 optional typings, you have some sort of linter to find these types of common issues, or like most dynamic languages, “you run the code”. The unit test might miss the arity bug. This is a common practice in dynamic languages, and for good reason: fast feedback loops.

However, feedback loops eventually end; at some point your code needs to go to production where you aren’t the one running it, but a computer is. While this doesn’t excuse a slow CICD process; i.e. being capable of quickly responding to issues in production and remediating them, you want some assuredness that you won’t have to. In dynamic languages, it’s often a copious amount of automated and manual testing to suss out some of those problems above.

In summary, we don’t know about the problems until we run the code, use add-on non-standard tools to augment our language, and lots of automated and manual tests. We’re not just referring to languages & associated runtimes such as Python, JavaScript, Lua, Elixir, and Ruby. We’re also referring to languages that have strong typing, but still can result in null pointer exceptions such as Java, Kotlin, Go, C#, F#, and TypeScript to name a few. The typing systems in those languages do not result in guarantees at runtime.

These problems matter because despite using these add-ons and tests, we still can have emergent errors occur while in production, where we do not want unknown errors to occur. This results in unplanned reactions, unplanned UI issues, and just general downtime for customers with stress for engineers & their teammates.

Mitigation Strategies

There are typically 5 mitigation strategies currently used to varying degrees to avoid unexpected runtime exceptions in production systems for non-FP languages.

Linters

In dynamic and typed languages, linters are used. These are used before you run or compile the code. They vary in purpose, but all typically format code, help find common errors, and help guide on language best practices. For typed languages, these tools work alongside the compiler, giving you extra quality checks that the compiler doesn’t provide natively. Examples include PyLint for Python, ESLint for JavaScript, Vet for Go, and PMD originally for Java. These can prevent many runtime exceptions.

Try/Catch Blocks

The 2nd is try/catch blocks. In dynamic languages, these are placed around areas more likely to throw, and in strongly typed languages, around areas you’re required to do so.

// JavaScript
try {
  const result = await getDataFromTechnicalDebtFilledAPI()
} catch (error) {
  console.log("API broke again, surprise surprise:", error)
}

There is no guidance what “more likely” is; you just go with your gut. Developer guts vary. In languages like Go and Lua, these are actually return values from functions, and you have a choice, much like in catch, if you handle it or give up and let the program crash.

-- Lua
status, dataOrError = pcall(getData, 1)
if status == false then
    print("failed:", dataOrError)
end

In Erlang/Elixir, where the philosophy is to “let it crash”, you still have the opportunity to handle the error, or take some other mitigation strategy.

# Elixir
case result do
  {:ok, data} ->
    transform_data(data)
  _ ->
    log_result_failed()

These can possibly handle most known, and some unknown runtime exceptions, but will never catch all as you’d have to place try/catch all possible errors. It’s a bit easier to do this in Go, and a bit easier to ignore this in Erlang/Elixir.

Types

Types are typically used as part of the language to help either a compiler and/or the runtime understand what the programmer means. The types make sense, such as adding 2 numbers together, then the program will compile.

// JavaScript
const add = (a:number, b:number):number =>
    a + b

If you attempt to add a number to a Cow, the compiler won’t compile the code, and will tell you where the error is.

add(1, "cow") // <-- won't compile

However, types aren’t just for logic, they also solve simple things like misspellings like our Python example above, ensure you’re handling potential errors in your code, and don’t do dangerous things like add integers to floats, or assume an Array always has a value inside it.

Types come with 2 costs, though, and these are perceived differently depending on the engineer and language. You have to intentionally type things vs. assume thing like in dynamic languages. Whether that’s effort or not depends on the engineer. Second, the compiler has to compile the program vs. run it like in dynamic languages, and this can cut deeply into the fast feedback loop cycle.

Also, not all types are created the same. Most languages are strict, yet still allow unexpected runtime errors to occur. Some languages are sound, which means it won’t compile unless are errors are handled. That still doesn’t make them immune from runtime exceptions. In Elm’s case, you can still exhaust the browsers memory, and the Elm application will crash. In ReScript/OCAML, you can still run out of time or exhaust the CPU/memory cap of the AWS Lambda.

That can also allow incorrectness to seep through, such as ensuring a number is within a particular range or the number is only even which is where dependent types may help.

Bottom line: types help remove a large swath of potential runtime exceptions, often quickly, without having to run the code, and some can guarantee it. The development, compile time, and in TypeScript or Python 3’s case using typing or mypy the type maintenance costs are underestimated at your own peril.

Testing

Once most of the code is written, or before using Test Driven Development, a combination of unit, property, and functional tests are written and run in an automated fashion. Manually run tests are also used, including “just running the app”. All these combined together either ensure no unexpected runtime exceptions occur, or if they do, they’re handled. Like linters and try/catch blocks, these handle all the possibilities you’ve accounted for, but not all.

# python
assert add(1, 2) == 3
assert_throws add_cow(1) 

Let It Crash

First used (from my limited knowledge) in the Apollo Guidance Computer, and later popularized by Erlang, rather than avoid crashes with lots of work and still missing them, many developers today are just accepting crashes can happen. In Erlang/Elixir and the Akka framework, it’s common to create a lightweight process that’s sole job is to watch child process. The child process is what runs the actual code. If the child process crashes, the parent just spawns another one. This philosophy has moved from software to hardware in the disposable hardware movement, and now it’s just assumed if the software crashes, you just spawn an entirely new server.

Examples include Docker containers running on Elastic Compute Service (ECS) or Elastic Kubernetes Service (EKS) for Amazon, automatically assigned Dyno’s on Heroku, or simple functions running in AWS Lambda / Azure Functions. In these situations, entire applications can be run, and if even 1 has an unexpected runtime exception for whatever reason, that Docker container is shut down, and a new Docker container is spun up. For Lambda, it’s about the same; your function runs, and it fails, whoever is listening to the response from the Lambda gets notified it crashed. Both Docker and Lambda allow you to spawn thousands of these at the same time, quickly, with confidence all errors are handled, and you can control how often, and how many, are spun up in their place in case of an error.

This doesn’t prevent the errors from happening, and in no way helps UI developers building web browser or mobile applications. It does, however, limit their blast radius, and helps your application both scale, be resilient, and sometimes self-heal.

Solution: Return Errors From Functions, Don’t Intentionally Throw/Raise

The only way to ensure your language doesn’t have runtime errors is to not use exceptions. Instead, return them from functions.

In most dynamic languages, even errors have a runtime type, such as Error in JavaScript and Exception in Python. You can create them without breaking or stopping your program, inspect them, and even return them from functions.

Most non-FP developers are comfortable both handling them in try/catch blocks, and in some cases throwing / raising them or custom ones in their code.

# Python
def blow_up():
  raise Exception("b00m")
// JavaScript
const blowUp () => {
  throw new Error("b00m")
}

However, you’ll almost never seen them stored in variables and used later:

# Python
def show_error():
  my_boom = Exception("b00m")
  print("my_boom:", my_boom)
const blowUp () => {
  const myBoom = new Error("b00m")
  console.log("myBoom:", myBoom)
}

To a normal Python/JavaScript developer, that’s quite alien. Why would you keep an error around? The whole point is to let the entire program know something went wrong, and you do that by taking that error and throwing / raising it, not creating it and hanging onto it for awhile.

Golang Method

However, that is exactly how Go works, and Lua can be much the same. Here’s a Go example:

file, err := os.Open("filename.ext")
if err != nil {
  return nil, err
}

3 things to pay attention to here.

First, notice how os.Open returns 2 values vs 1; a file first, and then an error second. Go allows you to return multiple values from functions, so they have a convention that you do your data first, and error last. You don’t know what you’re going to get, you just setup variables for both if a function can possibly fail.

Second, notice how the code first checks to see if err is not a nil value. If it’s not nil, then that means it’s an actual error, and thus something went wrong. Instead of running further code, it’ll stop here.

Third, notice how it returns. This first stops all additional code in this block from running, and second, follows the same “function could break” convention by data first, and error second. Since we have no data, we return nil, and just forward the original error back up the chain.

This convention isn’t used everywhere; there are some functions which are pure and can’t fail, or some that can fail such as writing to the cache, but that’s ok if it fails. In those cases you just log it.

The Python Golang Method

Python also supports returning multiple values. This means you can mirror how Go works, and your Python code will look just like Go.

def open_file(filename):
    try:
        f = open(filename, "r").read()
        return f, None
    except Exception as e:
        return None, e

And now, to use it, you just mirror the same style:

file, err = open_file("demofile.txt")
if err is not None:
    return None, err
print("file:", file)

Python 3 Result

In Python 3, there is a type called Union. It does what it says and unifies, or combines together, two or more types into one. Using a Union, instead of returning multiple values from a function, and having to check which one is actually not null, you can instead just return 1 value. There is a rabbit hole of techniques in how you use that value, so we’ll just focus on updating our code above to return that single value.

def open_file(filename:str) -> Optional[str, Exception]:
    ...

Now, when you use it, you’ll either get a string or exception back as a single value.

Promise / Future

While Python 3 Union types help enforce the concept of an “either or” value getting returned, it’s often easier to have a single type of value returned. For functions that can fail, this is extremely useful because it’s a situation where there are only 2 possible outcomes: either it worked or it didn’t. That type can then handle both situations in a common interface.

That’s how Promises or Futures work. JavaScript has them built in, and Python & Lua have libraries that support their usage.

fs.readFile(filename)
.then(data => console.log("file data:", data))
.catch(error => console.log("error:", error))

There are few advantages of this approach. In dynamic languages, this is a slight advantage because this forces you to use more functions vs. imperative code to inspect values. This reduces the risk you’ll accidentally misspell something or do imperative code that accidentally triggers a runtime Exception.

Additionally, if you always return a Promise, then your functions will never throw an error because they’re always returning a value. If they embrace promises instead of async/await syntax, then they have built-in try catch so all runtime exceptions are automatically handled.

Finally, no matter what type you return inside the Promise, every function in your program knows how to work with the Promise in the common interface of then for the value, and catch for the Error with the ability to change what’s returned if need be.

Development Cost

Now that you know how to return Errors as values from functions instead of throwing them, let’s talk about the development costs of this approach and what it affects in your workflow.

Returning vs Throwing

This is a huge change for developers accustomed to throwing exceptions, or at a bare minimum, handling them, often by type. Instead of throw or raise, they’ll use return. Instead of matching on types in the catch/except blocks, they’ll pattern match or just use a catch method. Instead of asserting a function throws some type of error in a unit test, they’ll instead assert on return values. Once you deviate from language norms, Google results for common language examples/problems are most likely not in this return error style.

This has a pretty huge cost on languages that don’t natively support this style of development, such as Java. Languages like JavaScript and Python basically support all styles of programming so are more forgiving. Languages like Go, Lua, and other functional programming languages embrace it, so it should feel natural there.

This is typically a personal or team decision on the implementation details and possibly library choice in languages that do not natively support this style. It’s worth investing the time try out implementations to ensure everyone is onboard.

Debugging Methods

How you debug may change. In Python for example, you’ll lose the stack trace using this method unless you specifically call the print methods on the error itself. Letting it explode as normal automatically prints this to the screen which is expected in how you deal with the unexpected.

Normally, dynamic language programmers will run the code and expect unexpected runtime exceptions. The whole methodology is to run the code, fix, test, repeat in quick iterations. Now, instead of a result you were expecting, or an Exception with a stack trace that you weren’t, you instead look at function return values logged to the screen. As this is more of a functional programming mentality, you’re look for function output values, not variable values or stack traces.

You can still use print statements, and still use debugging break points. You’ll just spend less time wading through stack traces to find where errors occurred. The errors instead should tell you what function and module they occurred and why. More importantly, though, you’ll have code handling those errors; expected code handling the unexpected. When you run a program, and it doesn’t crash, but doesn’t result in what you expected, there are some leanings here on how to identify that. If side-effects, you’ll either have more logs or more return values that indicate if the side-effect was successful or not, or at least context to help understand what might of occurred. If just return values, you’ll learn how to massage your data to include that context of “was the program successful or not” in the output value.

Testing

Although not exactly 100%, nearly all of your tests should be in the form of:

  1. a function takes an input
  2. the function returns a value
  3. you assert that value matches what you expect for that input
file_result = open_file("test.txt")
assert file_result.is_successful() == True

You can still use stubs and mocks, but there should be a lot less of them. There won’t be any “assert that this block of code eventually throws some type of error”. Now that errors are return values just like normal data, you just assert on the type of data. For class based architectures, this can feel quite alien as most classes will have methods/functions that do not return values, have lots of side effects, and you cannot easily test them in this way. This style of development is not conducive to Object Oriented Programming, which is one reason why Go doesn’t have classes.

Strict or Sound Types

If you’re using sound, or even strict types, there is less of a need to test for the function outputs in unit tests. Rather, you should use more property/fuzz tests to ensure that you always get a success result (data you expect), and errors for bad inputs. This will ensure the types are doing their jobs.

The only real difference is you’re asserting on the output vs. attempting to try/catch all runs of a property test.

Let It Crash or Not?

This is a big one, and again, should be a team decision. In cloud providers such as AWS, Exceptions are a normal, and expected, contract between reactive architectures. In short, code is expected to return a value or crash. Violating that contract is against cloud best practices. AWS was built this way because the software development industry is built this way. Not everything follows Go or Erlang or Haskell’s varying error handling philosophies. I have a talk about varying strategies, using the above, you can take using Lambda and Step Functions for example (video | slides).

AWS Lambda triggers will often handle their own retries. For example, if you use a message queue, such as SQS, and a Lambda is supposed to process each message, but fails, AWS will automatically retry. This isn’t by accident, but rather a wonderful feature of AWS. However, that can fly in the face of the best practice this article is suggesting: don’t throw errors. If you don’t throw errors, but have an error, how do you tell AWS that you have an error if you don’t throw it?

In server environments that use containers like Docker in Elastic Container Service or Elastic Kubernetes Service, it’s expected that if an unexpected runtime exception occurs, that the container will force crash itself so the servers can spin up a new healthy one. Again, crashes are expected and encouraged here.

One way to handle this is unwrapping. Rust and Python’s Returns library follow this technique. You can do all of your pure computations with no runtime exceptions, but as soon as you want to go back to the “imperative world”, you call unwrap. This will get you the value, or raise an Exception if there was an error instead. Think of it as a translator for your pure code to AWS who expects impure code.

For example, here’s some pure Python code that parses SQS messages from AWS:

def handler(event, _):
  return verify_event(event)
  .bind( lambda _: parse_sqs_message(event) )
  .bind( validate_message )
  .bind( process_message )

If the event is from SQS, successfully parsed off the event JSON dictionary, validated to be a message type we expected, and we successfully removed it from the SQS queue, then this Lambda will return Ok(True). However, if any of those 4 things fail, it will return a Error("reason"). AWS doesn’t know what an Error("reason") converted to a Dictionary JSON is… it’ll just assume the Lambda successfully processed the message. Which isn’t true. Simply calling unwrap at the end will ensure it’s True or it’ll raise an Exception if it’s an Error. This has the slight nuance of making you’re unit test for your Lambda have to check for an Exception 😜.

Sometimes, though, you want the ability to hand craft a response. Using API Gateway, or Application Load Balancers where your Lambda is a REST API, this is common. Successful? Cool:

{
  statusCode: 200
}

Failed? Cool:

{
  statusCode: 500
}

In that case, pattern matching is a better choice where you transform (or map) a Union type return value such as Result to an HTTP response. The example below shows how to do this assuming the Lambda is invoked by API Gateway or an ALB:

def handler(event, _):
    return verify_event(event)
    .bind( lambda _: do_work() )
    .bind( convert_to_http_response )

Now your convert_to_http_response function would be responsible to convert an Ok(True) to { statusCode: 200 } and an Error("reason") to an { statusCode: 500 }.

You’ll see a pattern here that while all triggers usually expect a custom response back (SQS doesn’t care, API Gatweay/ALB have strict requirements, lambda.invoke or Step Function expect JSON or nothing, etc). ALL services follow the “if it crashes, it’s assumed to be a failure or False” mantra. While that’s a case by case basis, the good news is it’s almost always the last function in your chained functions in your Lambda so you know where to find it.

Conclusions

Returning errors from functions instead of throwing them helps ensure more predictable code. More predictable code means less bugs, and more confidence deploying to prod with more features delivered faster. You can worry less about the dreaded unexpected runtime Exceptions, and worry more about testing logic and concurrency; really hard problems.

Ignoring unexpected runtime exceptions will continue to cost trillions, in both money and in stress to yourself.

You can avoid these by returning errors from functions, using types to help ensure they’re all handled correctly, while still retaining the ability to convert back in the case of working within infrastructure that expects crashes.

Optionally, you can use languages that support this functionality natively so you can never again worry about them. Strictly typed languages such as F#, Go, and Lua can help you ease into this style after you’ve mastered in your language of choice. Once you feel comfortable, soundly typed languages like Elm, ReScript, Rust, and Haskell can help you never worry about them again. Mostly.

Leave a Reply

Your email address will not be published. Required fields are marked *