Friends have asked that I report a from the trenches update on my thoughts on Lambdas. I fell in love about a year ago, so how do I feel now? I’ll cover the pro’s and con’s I faced, what I learned, and what I still don’t know.
If you read my previous article on what Lambdas are, and how you build an automation workflow around them, you’ll see why I like them. In short, a front end developer with a bit of back-end Node experience can use continuous deployment for working REST services (or non) quickly, without worry of maintenance, or sudden spikes in traffic. I’m a coder, and with the AWS Node and Python SDK , I can code my way onto the cloud vs. learning Unix, various weirdness with Terraform/Chef/Ansible/Docker, etc.
I have Amazon manage my infrastructure so I can focus on coding. I love it.
If you utilize multiple AWS accounts and/or environments, the gotchas are basically your permissions. Specifically, IAM roles and what permissions those IAM roles have. If you follow the whole dev/qa/staging/prod etc. life cycle of testing your code in each environment and then “moving” to a higher environment once the code passes all testing, you’ll find you do the same thing here. In my experience, you’re just testing two things: Can my Lambda use it’s permissions the same (log to CloudWatch, access a particular S3 bucket, etc) as well as ensuring connectivity that you could of fat fingered somewhere (i.e. the subnet/security group in the Terraform that’s creating your Lambda isn’t correct for the prod environment)? Beyond that, everything seems to operate the same between environments which I loved.
Lesson: As long as you avoid configuration drift, you’re fine.
No API Gateway?
Lambdas are built for burst traffic or the occasional code run for simple services. However, it does NOT require an API Gateway. Meaning, you do not need a URL to have your code run. It truly is a “function”, or a bunch of code. How that code is run doesn’t have to be a URL being put in the web browser.
AWS is flexible. Lambdas are actually functions, not actual “API’s”. Meaning, anything can trigger them; they ARE functions. Files put on an S3 bucket (hard drive), some log messages in CloudWatch that match a string, and even periodic cron jobs managed by AWS. I can personally attest that during the S3 outage last year, when it came back up, not 1 of my messages over a 2 hour period was lost in my CloudWatch cro
More important, though, is that Lambdas can invoke other Lambdas.
Lesson: Even with AWS, you can code yourself out of any negative situation. :: flexes bicep :: Take a look the triggers that AWS provides for Lambdas. You truly can react to a variety of events.
Lambda’s Calling Lambda’s
Another subtle benefit is that you can call them in either a request / response for a more REST type of feel, or a fire and forget for longer running processes. While I haven’t been able to have the need yet, I loved playing with the step functions for when you have a bunch of microservices you’re trying to coordinate in a concurrent fashion.
Errors and Logging
Be aware that some connection errors aren’t verbose. This may be for security reasons. 99% of my “Lambda can’t connect to some thing” all had to do with a misconfigured security group, subnet, or IAM permission. That’s good to know since those things can be easily automated using Terraform or things like the SDK.
For logging, while Lambdas typically log to CloudWatch by default, make sure you log to a single log stream so you can utilize the console or SDK to more easily find logs in a time window. Searching various log streams with dates in the titles over time to find a particular log is harder if it’s not in a the same stream name.
Years ago, Lambdas were limited to 1 minute. Then they got 5 minutes. 5 minutes is still not enough once you have so much power at your finger tips, heh.
If you run into scenarios where 5 minutes isn’t enough, you have 2 options:
- Use step functions. This can be challenging for some developers because it inverses the responsibility of those doing the work to poll for jobs vs. you being responsible to tell them. This helps solve concurrency issues, too.
- Use an audit log + CloudWatch cron job. Either use a NoSQL database like Dynamo or an S3 bucket folder. Your Lambda’s will do their work, then audit that they “started and stopped” with a UUID. This could be a timestamped data entry in Dynamo, or a file in S3 bucket. You then have that long running process make an entry with the same UUID/correlation ID in the same audit log (Dynamo or S3). Finally, you have a CloudWatch cron job launch a Lambda every minute or so to check on the status. When the Lambda finds a stopped job audit entry that matches a start, it can mark the job as successful, error’d, or timed out based on your criteria. This CloudWatch cron jobs are beast, yes, they can be trusted.
Deployment is huge contention amongst many developers. I’m a huge proponent of automating everything with the AWS SDK. Others like to use Terrform/Chef/Ansible, etc. Others like to upload ZIP files through the console.
Whatever method you do, I still encourage the use of existing integration testing solutions. AWS’s update process is super easy; you’re literally uploading code to the Lambda directly or an S3 bucket, and “poof”, you’re Lambda is updated. You can use the environment & versioning if you wish. I personally didn’t like that, though, and instead liked using a green/blue deployment model where you have 2 Lambdas with a blue/green suffix in the name. When you upload new code, you simply switch the trigger as the last step once all your integration tests have been run. If you screw up, no big deal, it’s near instant to switch it back without uploading new code.
Things like ping and health checks are still extremely important, as are dry runs. Ping’s are simply a way of ensuring your Lambda function is there. If it’s API Gateway, that’s a URL you can hit to ensure it works. If it’s an S3 bucket trigger, you can drop a particular filename in a particular place, and your Lambda can respond either there or in CloudWatch. For health checks, this is the most important; it ensures your Lambda can talk to all the required services it needs and report back. This goes hand in hand with a dry run to ensure it can talk to those things, but doesn’t affect them; like making accidental database entries. Instead, it’s just ensuring your connectivity (i.e. IAM role permissions, subnet/security groups, VPC’s) are all correct.
Where Do We Go From Here?
A guy reached out to me on Slack asking for help in Python. He’s never coded in his life, about my age (38), and manages databases that are recently moved from an on-prem situation to his AWS. He wanted to basically have an alarm (an AWS log event) trigger a Lambda, and if it’s cause was the database was running out of hard drive, to allocate 50 megs more. I gave him a crash course in Python and what I had learned of AWS and Lambdas, and he was up and running in about a month in prod with a 30 line Python Lambda.
That’s so awesome.
The people using AWS Batch and other short lived EC2’s for minutes to seconds worth of work are moving some of that to Lambdas for price reasons. Various Lambdas are in the background, facilitating various logging, monitoring, or reactionary/reactive roles to help functionality in Ops or for developers. It’s become “just another tool in the toolbox”, not some new, shiny thing to not be trusted by the cynics.
We’re still at the beginning of serverless. It is growing, and more and more people from a variety of skillets want to use them. That’s telling. If you’re not doing a lot of server-side development and just frontend websites, I encourage you to take a look at hosting your files on S3 and simply putting CloudWatch in front of it. AWS has a good wizard for this.