Introduction
Step Functions have a 32kb limit of data. I thought “just use S3” was simple, but there are some important details you need to know if you intend to update the data, and/or delete the data and read it many times. Below, we’ll cover how you can use S3 to solve this problem, and what bugs will occur if you’re not careful, and how to prevent those bugs from happening.
TL;DR; Updating an existing file on S3 and then reading it won’t always return what you just wrote; it may take a few seconds. Generate an MD5 hash of your file contents, and ensure any downstream services (Lambda, Batch, etc) utilize that MD5 hash. This will confirm if they have the latest version of the file; if not, simple fail with a specific error and have the Step Function wait a second and retry.
(more…)