Let’s talk about test generation. Because ChatGPT or Copilot, or whatever you’re using, will help you to generate lots of them in a blink of an eye. And someone needs to ruin the party.

I’ve seen test generation in all sorts of ways for more than 20 years. Yes, 20 years ago, when I sat in my cave, there was already a button on the cave wall, and I could create a test from the code I chipped on the wall. A bit primitive but it worked.

In the olden days, I was using Visual Studio (the black and white version), and when I right-clicked on the method it looked like this:

Test generation for method in Visual Studio


Ok, the picture is a bit more modern, and in color. It’s still there today.

Back then, it wasn’t a very smart wizard (remember those?). I right-clicked on the method ,and based on the signature I wanted to test, it would generate a test called… wait for it…

“AddTest”. Genius.

The test code generated for creating an instance of the class, and calling the method. That’s all. Obviously, you’d need to add what you actually want to send the function.

Test generation for method - resulting test

This type of test generation means two things. First, sometimes the test wouldn’t even compile (you’d need to supply better input, and maybe even add the call the method yourself). Second, if it did compile, it will always fail.

See that Assert.Fail()? Yes, I know it’s by design, but think what you expect from a generated test.

The other option (without the assumed failure, and removing the assert) is that the test will almost always pass. That’s what happens when you have tests without asserts. You’ll need to add those yourselves.

Ok, these are obviously not very good tests. Based on the generated result, I usually skipped the generation altogether, and wrote the tests myself. Another issue, for me at least, was that test generation won’t work with TDD. You know, test-first? Can’t genereate a test for non-existing code.

The API Era

This IDE capability (Microsoft wasn’t the only kid on the block with code generation magic), hasn’t changed much, at least in the unit testing world. By then, we’ve gotten really into APIs, and test generation moved with the times. We could now generate tests for API calls.

With a few caveats, of course.

Instead of a “test”, we now have “request”. Instead of “generation”, we edit parameters, fill in the blanks, send the request and record both request and the response. Imagine a UI tool, like Postman doing it, and even generating the wrapping code for you, in whatever langauage I need. Don’t imagine, it actually does it.

But it’s not complete. I’d have to rename the request, just like in the “AddTest” from before. And leaving the code as-is (which I never do, gotta rename stuff), even if the test recorded everything, inputs and outputs, it’s still faced with the problem: What is the expected result?

Is it just a 200 status? The whole of the response body? Maybe parts of it, without the auto-generated IDs on the back-end? And what about the headers?

Shazbat! Once again I need to step in. Tell the test what’s important for me. Depending on the tool, that would be either a big addition, some editing, or a rewrite from scratch.

Tales of the UI Test Generation Age

Then we got to the UI age. Now we’ve got test generators for the UI – web or mobile – that generates the test code, depending on your framework and stack, recording your workflow and allowing you to create asserts from the generator itself.

Playwright Test generation recorder

That’s Playwright’s web recorder, by the way.

Is that good enough for you, Gil? See, that’s real magic!
Well, not really. You see, the tool can indeed find elements, but not in the way I want to use them in the test. You see, in the picture, it locates the textbox, but it suggests locating it by a Placeholder (see the tooltip suggestion), and that’s how it will appear in the generated test.

Is it the right way to find a textbox? Would I write it like that? I mean, why doesn’t it locate it by its label? Wouldn’t the next guy or girl, who will look at the test think – why did Gil write it like that?

Yes, yes, they would. Can’t let that happen.

Generate Once, Maintain Always

What about maintainability? Those tests are very writable, but are they readable?
Let’s look at the generated test for that UI. (and that’s really a small, simple one, it’s taken from my Playwright Webinar).

Playwright Test generation - generated code.

Look at all the duplication. Look at all the hard coded values. Just look.

And now we welcome our AI generated test overlords. They know even more about what the code does. They “understand” it. Therefore the tests are better. The names are better. I don’t need to do anything!

Nope, still don’t like the results. It’s never like I would write them.

Ok, let’s take a break from all this ranting. I think that these issues will improve with time. I really hope so.

It’s not that I don’t like tools, I really do. I just think that tools are built for a purpose, and unfortunately, we, the users, have more than one goal at a time.

Sure, we’d like to cut time on writing our tests. Hell, if we could find a way to create quality code that works, without all these pesky tests, we’d use it. But we’re stuck with those tests. So we need to write them. And test generators are there to generate tests.

But, we have another goal for the tests – when the tests fail, we’d like to understand what exactly they’re trying to do, so we can find the problem. And the reading and maintaining goals – where real humans are involved – that’s where our mechnical overlords fail.

What test generators lack is context. In their view, there’s the interface, and everything else is up to the test writer. The writers need to build the right input, and specify the correct expected results. They need to explain how cases are different than one another, what’s the common setup, and names matter a lot.

Since they don’t understand the application or business process, and of course, the human developer, test generation tools get stuck at that point, and leave the rest of the work to the authors.

And that would be great, if we could just continue from where they stopped. Tools that get you halfway, even a third of the way are great. But if we need to replace half of the generated code, it’s not just waste, it’s annoying.

The Future’s So Bright

It’s not like we’re not making progress. Some AI tools analyze the tested code and try to come up with a probable scenario for the test. Their “understanding” of the context depends on the code, and even more – the system. The more they “understand” the code, the generated tests will be more useful.

It’s easy (ok, not that easy, but still) to generate tests for small bits of code. Understanding workflows inside big systems – that’s a big problem. Unfortunately, we’re not there yet, and we see that in the generated tests.

Let’s be real here – understanding software from just its code is hard. Which paths are important? Which parts are going to be used more often? You’ll need a lot of context to create the right tests. And don’t forget – what’s happen “outside” the code? Or in gen-speak – if all the generator knows in order to understand the code is my repo, if the calling or called code is in another repo, the level of understanding will get lower. Tests will not reflect reality.

But they will pass. and give us false confidence.

I’m going to remain a skeptic for a while, and write my own tests. Sure, I’ll use the tools to get ahead a bit, but relying on those generated completely?

Not today.

If you want to write tests yourself, and you should, check out my unit testing and TDD workshop. You’ll work for those puppies!

Categories: Uncategorized

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *