Large Step Function Data – Dealing With Eventual Consistency in S3


Step Functions have a 32kb limit of data. I thought “just use S3” was simple, but there are some important details you need to know if you intend to update the data, and/or delete the data and read it many times. Below, we’ll cover how you can use S3 to solve this problem, and what bugs will occur if you’re not careful, and how to prevent those bugs from happening.

TL;DR; Updating an existing file on S3 and then reading it won’t always return what you just wrote; it may take a few seconds. Generate an MD5 hash of your file contents, and ensure any downstream services (Lambda, Batch, etc) utilize that MD5 hash. This will confirm if they have the latest version of the file; if not, simple fail with a specific error and have the Step Function wait a second and retry.

The Problem

This 32kb data limit can quickly pose a problem, and is more likely to occur if you come from a functional programming background. Step Functions allow you to compose your infrastructure into a railway style of programming. The Burning Monk calls it “Orchestration“. Following pure function best practices, you get more predictable code. Using that philosophy in Step Functions, you get more predictable infrastructure.

However, while “a function that takes an input and returns a value” sounds the same as “a Lambda function takes an input and returns a value”, those values tend to be a lot larger in applications that use many Lambdas together. REST API calls, or loop iterations collecting data in a reduce style, both are large JSON payloads can quickly hit the Step Function limit.

AWS recommends you utilize S3 to handle this. However, while S3 is one of AWS’ oldest, and easiest to use services, there are some serious important nuances you need to know about it if you’re going to use it to solve the Step Function data limit and intend to update that data over an execution’s lifetime. Specifically, if you plan to update and/or delete that data and read it later. If you want to write it once and read it, you’re fine.

Building a ZIP File

For context, we use Step Functions and a series of Lambdas + Batch to download PDF files and put them into a ZIP file. Given the sheer number of PDF files, and downstream concurrency concerns, we’re leveraging the Step Functions loop ability. This prevents a whole host of problems for us. Specifically…

Loop State in Failures

Lambdas can no longer timeout because of a slow downstream system; if they do, we just retry. If we retry, we don’t need to keep track of “where we were when we last tried”. That that state is handled by the Step Function. If you’ve ever done a Promise.all in a JavaScript loop, or a gather in Python loop, you know how awesome the above is.

Controlled TPS

We can run many of these Step Functions in parallel using Map and still ensure we’re not inundating downstream services by setting the MaxConcurrency value to a super low SLA (value). This allows us to create as many ZIP’s at the same time, but still operating within the performance confines of some of the services we’re using.

Performance Tests

However, once we started running performance tests, we quickly hit the 32kb limit. So we followed AWS best practice, and re-wrote our Lambdas + Batch to read and write their results to S3.

Then we hit the history quota limit. We fixed this by having Step Functions recursively call themselves when they started to get low on history queue events. The recursion would only spawn at most 51 total concurrently running step functions, well under our account limit for east and west. Caveat here is while AWS recommends using a Lambda to spawn a new execution, you don’t need to do that anymore. Having a Step Function call another one, or recursively call itself, results in an easy to follow trail in the AWS console of “who started you, and who did you start?”. While Step Functions aren’t optimized for tail calls, they don’t mind waiting up to a year for their child/children to return a value.

Enter the new bug.

I’d occasionally see a mis-match in the amount of PDF’s uploaded to S3 vs the list we’d give to Batch to download and make the ZIP. S3 would have 100 PDF’s, but Batch would get a list of 99, or sometimes 98, or rarely 97. We intentionally would have Batch fail here for yet another data integrity check. It would only happen sometimes. After 2 days of debugging, I found out that when I’d write data to S3, and read it back, it wouldn’t match what I just wrote.


S3 Eventual Consistency

Apparently this is a known thing and I’m just getting informed. Writing to S3 is pretty straightforward:

s3.putObject({ Bucket: "name", Key: "yo.txt", Body: "some data" })Code language: CSS (css)

And you can immediately read it back:

const data = await s3.getObject({ Bucket: "name", Key: "yo.txt" })
console.log(data.toString()) // "some data"Code language: JavaScript (javascript)

… but that is NOT how updating an existing file, or deleting a file work. From the docs:

Amazon S3 offers eventual consistency for overwrite PUTS and DELETES in all Regions.

Ok, but what does that mean in English? Further down, they give examples:

A process replaces an existing object and immediately tries to read it. Until the change is fully propagated, Amazon S3 might return the previous data.

OMG, this is my bug! Here’s how we fixed it.

MD5 Hash

For security (well, really data integrity) reasons you can create an MD5 Hash of your file, put it in the put object call, and AWS SDK will ensure when it’s done writing to S3 that it matches. This ensures your file wasn’t tampered with en-route and it saved what you expected. Check out the MD5Content header specifically in the docs.

However, for Step Functions, OTHERS need to know which “version” of the file to utilize. This MD5 hash is helpful for 2 reasons. First, it’s small, 22 to 24 characters. This won’t negatively contribute to the JSON size problem. Second, other Lambdas or services can simply use that as the input to them from the Step Function; it’s yet another value that comes into their Event.

Generating a Hash

This varies across languages, but in Node.js, before your Lambda writes to S3, you can use this hash function for the contents:

const md5Hash = string => crypto.createHash('md5').update(string).digest('base64')Code language: JavaScript (javascript)

Putting The File

When you put the file, you generate your own hash using the function above. Second, pass that hash back to the Step Function so other Lambdas/services can leverage it as the “version” of the file they want. I use “version” loosely here; I don’t mean S3 object versioning. I mean, “I wrote a file that has this hash; if you read a file and the contents don’t miss this hash, then you need to wait longer or throw an error”.

First get your hash from the contents:

const localHash = md5Hash(JSON.stringify(myData))Code language: JavaScript (javascript)

Second, when you upload your file, include that hash in the SDK’s headers:

  Bucket: "bucket name",
  Key: "some-uuidv4.json",
  Body: JSON.stringify(myData),
  ContentMD5: localHash
})Code language: CSS (css)

Third, when the call successfully finishes, pass that along:

const handler = (event, context) =>
  s3.putObject({ ... ContentMD5: localHash ... })
    _ =>
      ({ ...event, localHash })Code language: JavaScript (javascript)

Getting The File

Now any other input of the Step Function will have that localHash variable in the event, and they can use that for verifying the file they have is the latest:

const data = await s3.getObject({ Bucket: "name", Key: "some-uuidv4.json" })
const remoteHash = md5Hash(data.toString())
if(event.localHash !== remoteHash) {
  throw new S3MD5MisMatchBruh() // extends Error class
}Code language: JavaScript (javascript)

Handling Mis-matches

Most cases the mis-match is because S3 is still propagating your updates to the various servers that represent S3. So if you “just wait longer” it’ll fix itself. How you wait longer, however, is up to you and latency is a common problem in eventually consistent architectures.

For us, we know about this error, and can simply retry using the Step Function Retry syntax if we detect this particular Error then use a Wait for 1 second:

"Catch": [ {
  "ErrorEquals": [ "S3MD5MisMatchBruh" ],
  "Next": "Wait 1 Second"
} ]
"Wait 1 Second": {
  "Type": "Wait",
  "Seconds": 1,
  "Next": "Read File"
}Code language: JavaScript (javascript)


While MD5 hashes have a bad reputation for cryptography, they’re great for checksums like this. This ensures you can utilize S3 as a datastore for your Step Function execution well over the 32kb limit. To ensure your Step Function executions can run concurrently, utilize some form of uuid v4 in the name that way there are no name collisions with the files in S3.

To learn more about hashes, I’ve got a YouTube video explaining them in more detail.