First Attempt at Gathering DORA Metrics

Attempted to get all 4 DORA metrics today from our AWS pipeline, and wow, a lot harder than I thought to both get them, AND do what I thought were simple calculations (some of the extra math ended up being fun).

Deployments per day ended up just being a recursive call to list the executions from CodePipeline. However, the parsing of the data went from just 12 lines of JavaScript code to hundreds very quickly. This included defaulting all days to 0, Array.length on executions per same day via Lodash groupBy… insanity. Time to TDD it tomorrow after work. I should probably use a real programming language at this point given how much work this side project is.

I realized our deployment numbers on dev are great, but QA is so bad, I’ll have to run the math to convert the percentage of per week and then per month. I’m not doing 6 months. We don’t ever go to prod (if we do, I don’t have awareness of it), so I have to get something.

I haven’t actually worked on the other 3, just doing recon on what it would take to get ’em. I think Mean Lead Time for Change is a bit rough because I’ll have to either do some git command or Bitbucket API to get all the develop and main hashes for the past 5 months, then correlate those to CodeBuild which contains it, THEN find the artifact hash it built (which isn’t correlated, I’ll just have to use dates sadly) via S3’s list-object-versions, which once I have 1 to X of those, I can easily figure out which CodePipeline execution it belongs too. Using the CodePipeline execution finish time – the Bitbucket commit hash should give me “how much time passed”. There may be a way to output a hash of the commit so it’s easier to correlate in CodeBuild but that doesn’t fix CodePipeline being triggered by S3 with no correlation.

Mean Time To Recover, I only have 2 incidents I was in where I can find the logs on Microsoft Teams (🤢) and correlate till when the commit hash to report to when we actually resolved it. Very manual, need a better way in future.

Change Failure Rate, beyond the 2 incidents, is just listing all the CodeBuilds + CodePipeline executions, and dividing those filtered out failures by the total.

So yeah, tracking how long a commit takes to get to dev/QA is the only super hard, super miserable-not-taking-advantage-of-my-skills one. Still haven’t figured out how I’ll track these over time, but at least the code, when I’m done, outputs a CSV which is easy to copy-pasta into Google Sheets. I started doing some of the math in Excel, but it was just easier to test and verify it in code, then generate it, and just use Excel to make things pretty.

What a pain. I sure hope these help.