Speaking the Same Language: Mocks, e2e, and Good Code

A lot of discussions this week on X and on LinkedIn about mocks, e2e, Go vs C#, and good vs badly written code. Watched the threads from afar, and if you squint, most in the thread have different definitions for those words.

I find it difficult to participate because 2 people talking about 2 completely different things, and debating some point of view doesn’t make any sense when the point of view isn’t viewed correctly or the same by people in the thread.

I can help with the Mock and Stub. I can help a little with the e2e. I can’t help much with “good code”. I can’t help you at all comparing language ecosystems. So I’ll just cover the first 3.

As usual, beware of synonyms. In programming, people have many words for the same thing that aren’t actually the same thing sometimes based on context (e.g. programming language, ecosystem culture, library & function names, or how far someone is in their career).

Mocks

If you find yourself debating whether to use Mock’s or not, make sure you and others in the discussion are talking about the same thing. Ask “What does a ‘Mock’ mean to you?” then immediately following up with “Thank you, that’s helpful; now what does a Stub mean to you?”

Gerard Meszaros defined these awhile ago in his book xUnit Test Patterns.

You can find these succinctly in Wikipedia.

However, the definitions don’t all make sense to me. They also have a heavily Object Oriented Programming bias.

Martin Fowler gave a lot of context around their usages in different styles of testing which helps clarify. More importantly, beyond his examples, he redefined them in more approachable terms in his “Mocks Aren’t Stubs” article.

I’ll put the 2 definitions here:

Mocks: objects pre-programmed with expectations which form a specification of the calls they are expected to receive.

Things like validating expectations:

warehouseMock.expects( once() ).method("hasInventory")
.with(eq(TALISKER),eq(50))
.will(returnValue(true))Code language: CSS (css)

Or setting up those expectations and verifying them later:

warehouseMock.hasInventory(TALISKER, 50);Code language: CSS (css)

… later in test …

warehouseMock.verify()Code language: CSS (css)

Stubs: provide canned answers to calls made during the test, usually not responding at all to anything outside what’s programmed in for the test.

I’ve seen, just this week, Mock’s defined as Dummy Objects/Fixtures, Stubs, Fakes

we at Microsoft also mock the api calls to reduce flakyness in e2e tests
— Gaurav (@gaurav_compiles) December 13, 2023

, Spies, and even if then statements.

In essence, a mock consists of implementing your production code as

if currentlyTesting then doX else doY

Which means you end up testing exactly the thing you don't run in prod.
It's worse than pointless.
— Tom Sydney Kerckhove (@kerckhove_ts) December 13, 2023

“Mocks are bad!”
“So you think Stubs are bad?”
“That’s not what I said.”
“Yes you did.”
OH LOOK AN IMPASSE, FINALLY

I’m not here to tell you to avoid or use Mocks. I just want to make sure y’all are talking about the same thing and have easy to link to definitions if you have these discussions. You’ll need the same thing when you read blog articles who talk about the pro or con of a Mock, define it, and you quickly see it doesn’t line up with the definitions above, so you now have to do translation of what their actual gripe is.

End to End Tests (aka e2e)

e2e is harder to pin down. Hillel Wayne has a succinct definition here which helps because you can read the other kinds of tests and it may help in context.

The great feature, and problem, with Hillel’s definition, is what he calls “Acceptance Tests” and then cites the synonyms: “End-to-end tests, feature tests”. There are a great many people out there who make a clear delineation between e2e and Acceptance Tests, specifically, Dave Farley. Dave’s gripe with e2e tests is they test too much and don’t help us understand what failed which he covers here:

The spirit of Hillel’s definition (you know we’re already in deep trouble when we’re using the word “spirit” to talk about engineering) does seem to be generally accepted: testing the UI or API as a user would. Sadly it quickly breaks down when you find how various devs implement that test of their system, and you quickly find none of the implementation details match up.

For example, I only Whitebox test using Cypress for web applications. This ensures the tests are deterministic, fast, and repeatable. The trade off is we won’t found out an API change broke it till we deploy to a different environment, typically after the tests have run. The temptation is to use Blackbox tests, but it’s just easier and more reliable to do exploratory testing first, then find a way to ensure you don’t have API changes that aren’t versioned, or at the very least surface that error more clearly and handle it more gracefully.

Hillel covers some of these nuances, which makes the definition not really helpful. It’s like asking your friend, “Hey, you want to go camping with me?” they say yes then wonder why they’re suddenly hiking 5 miles up a mountain with just a tarp and sleeping pad. “Where’s the tent, mattress, heater, and fire pit… and what’s with all this walking!?”

This is why Dave recommends Contract Testing instead, to head off those API contracts breaking, and if they do, you know exactly where the problem is, and have some strategies to fix it such as schema’s or API versioning. Hillel associates Contract Testing with Integration testing, but if you’ve ever talked to most OOP programmers, they have a completely different veiw of what Integration means; to them, that’s more than 1 class interacting, often without a test double. A completely different thing.

Hillel references the design by contract.

Whereas if you look at Pact, a Contract Testing tool in for JavaScript Node.js API’s, it attempts to nail down the inputs and outputs that Dave is getting at with his push for contract testing the isolated parts of your system.

Bryan Finster has a more pragmatic approach, but it’s comes with the caveat that there is someone senior to help them avoid icebergs later on. He repeatedly cites that Approval Tests are ensuring “You know the code works”. This is a lot more approachable, and each team can decide how they write the automated approval tests to ensure code is only released when it passes the tests. He encourages a BDD (Behavior Driven Development) approach for those new to Acceptance Testing. However, this only works for those learning; once they start hitting the footguns and challenges, the tests start earning their bad reputation that e2e have; slow, not deterministic/flaky, hard to maintain, etc. This is where you’d need someone more senior to help ensure they avoid those.

Good Code

I have no idea what good code is. Dave Farley defines it as “How easy it is to change”, but my next question is “What does ‘easy’ mean?”. Easy for someone of your skill? Experience level? I have a lot of experience, but it’d be hard for me to change a well tested Java code base. I believe (there’s another red flag, using “believe” in an engineering post) to Dave ‘easy’ means you quickly (sub 1 second) get feedback from your automated unit tests clearly indicating you broke something. He’d probably (probably? real-confidence inspiring definition here, Jesse) expound on that saying when you deploy, you know what part broke if anything, as quickly as possible.

He has a plethora of wonderful videos talking about code quality, and how to write better code, showing clearly bad and good examples, a few of which do NOT cover unit tests or Test Driven Development to get at the real basics. Here’s 5 examples he has:

However, he talks about methods. I haven’t used OOP in 9 years, and I code just about every day. Is my code not good? Is there some underlying concept of “long methods bad, short methods good” that I could apply to functions, what I primarily use in my work? What if those functions use composition which is inherently long if you’re automated code formater wraps the lines? How long is too long?

Kent Beck of XP and TDD fame talks about code being Tidy, and in part of defining what Tidy means, he defines Coupling and Cohesion.

The key takeaways is the Cost of Software is the Coupling. Software Design ensures we invest effort to reduce the coupling so it’s easy to change. Cohesion is when your sub elements are coupled; Cohesion is both good, but more of it is bad because it increases coupling, but it’s good because it’s Cohesive, but… (repeat). Putting code in more files increases Coupling and increases the cost of change.

However, he caveats all of it. Spending too much time on de-coupling can be more expensive than dealing with the coupling, and the reverse is true. There are also Lumpers who like to see all 1 things in 1 place (1 Elm file for an entire app) and Splitters who like many small things in many files and folders (many files over many folders for a Next.js app). More importantly, emergence of complexity, understanding of new Coupling you weren’t aware of, and iterating on ideas based on your current structure all help you reduce the cost of change. These are some of the design processes that help you arrive at Tidy code.

Is Tidy code Good code? Does that mean good code has a low cost of change, thus low coupling, but “enough” cohesion that you understand it? What is low-enough coupling and high enough cohesion? Is there some metric or is it contextual to the software, the team, and the tooling?

There’s also many rejections lately to DRY code (Don’t Repeat Yourself). Many of us do not want to have 3 implementations of something in our code that implement it slightly differently because if we find a bug in 1, we not only probably have a bug in the other 2, but it’s slightly different. Best to make all 3 just 1 by DRY’ing your code, right?

Well, maybe not. Sometimes the abstraction we build to make the single implementation work in 3 different places is painful to work with. Sometimes it’s easier to just have the 3 implementations. The Coupling cost of figuring out how the abstraction works means you can only longer immediately see that; you have to open the abstraction to see, either in a different file, or a split view if the same file. Sometimes initially the Abstraction is great, but then doesn’t work later, and some of your code is DRY using the abstraction and some of it isn’t, so now you have half-DRY code.

This brings in the time argument. When you made the DRY abstraction, it was “the right call”. However, now that the abstraction is painful to work with, making the code DRY is “the wrong call”. Does that retroactively make the original DRY’ing wrong? Should not you DRY in case future you suffers the Too Early Abstraction Cost? Should you DRY knowing that now is the right call, and you can ignore or modify the Abstraction in the future knowing “it’ll be the right call then”?

Some want comments. Some, like Dave, have repeatedly seen comments done poorly, sometimes making the code worse by implying the comments are either worthless, or wrong, and should be ignored. This is bad if some of the comments are actually helpful, but because your initial impressions are so bad, you just ignore them.

This back and forth, you WILL find a counterpoint, from a smart individual for every one code quality metric. Function too long? One person thinks it is completely clear because it’s very procedural and clear what is happening, in what order, where “because it’s all right there”. Others think it should be shrunken to smaller, more clear methods. The code works and passes tests in both scenarios, mind you, and while developers spend a lot of their time reading vs. writing code, so we should optimize for reading… who’s reading are we optimizing for?

Code coverage; should be high. Should be ignored because if you’re using TDD, it tests all the things. Should be ignored, except in React, because only component tests actually test the JSX, but 90% of your logic isn’t in JSX, so only ignore the component low-coverage. Should be taken seriously because if not using TypeScript, you won’t know if you’re component utilizes the default state of your props correctly. Should be taken seriously because if using TypeScript with implicit any on, you can have some props that aren’t fully typed, which means that the compiler will have missing cases that only a component test will find. Code coverage and automated testing is pointless for UI developers; we have a separate QA team, and most of the state is on the server anyway. 🙃

The code is slow. A dev makes it faster. Another dev cannot read the optimized code. Is that ok if the other dev just maintains that specific function? What if the project is due in 3 days, and after that the client maintains it. What if the code is owned by the team forever?

Some like ternary if’s. Some don’t. Is the code good if the other dev compromises and writes a more clear if statement? Is writing code for the least common denominator good? What are the costs of doing so?

What about the reverse; why aren’t we all using a Lisp?

Then what do we do with the code that we no longer want because in our journey to Blub languages, we realize the old language we still maintain code in is no good? How do you write good code in a no good language?

Coding Horror of StackOverflow fame talks about No Code is the Best Code, or at least Brief Code.

This is in reaction to, regardless of code quality, it seems that lots of code, even good code, is suddenly bad code because there is so much of it. How much is too much? How little is too little?

I don’t know what good code is. I know code I don’t enjoy working with. I know code other people hate. It’s clearly unique to each individual, and some people agree on some things, and disagree on others, and there are varying degrees of tolerances to which they’ll compromise, and all of them can change their mind over time.

Mocks

End to End Tests (aka e2e)

Good Code

More posts

Node.js’ config Library Shouldn’t Be Used in TypeScript

Not a Middy Fan

Asking Copilot About Writing a New Programming Language

3 Options to Avoid Side-Effects in Web Dev