Fragile tests are a scourge on technology. And testing technology. Sometimes they pass, sometimes they fail, and to make life interesting, sometime they just stop, and on the odd case, come up with a very unusual result.

“Fragile tests” is what we call tests that we can’t trust. And we very much would like to trust our tests.

It all comes down to time and attention. When a test passes, and we trust it, that means that we can ignore that specific functionality we just tested, and instead we can now focus on that new failure over there.

Instead, fragile tests are not to be trusted. They spread doubt. When are they right? When they pass, or when they fail?

That doubt now translates, again, into time and attention. We go and debug, and try to reproduce the faulty behavior. And when that doesn’t work, we just say “some voodoo must have happened” and move on.

But that doesn’t help, of course, because if we can’t trust that test (now that we have proof!), can we trust the other tests that always (seem to) pass?

It is possible, that the tests fail because of bugs in them. The more complex the system we’re testing, the more complex the setup for both the environment and the tests. And complexity breeds bugs.

If that’s the case, we need to either get rid of these tests, replace them or fix them. However, we’re focusing on the tests, but are they the major cause?

Fragile Tests: Mystery Solved

While we blame the tests, we turn our focus away from the real culprit. Most of the time, it is not the tests that break. It’s the code they test.

If your tests are complex, I guarantee the code is even more so. After all, we fix a lot more bugs in production code than in tests.

Code that hasn’t been tested enough by the developers, would be buggy. And not just those easy-to-find bugs, the more illusive ones. Those that throw the blame on the so-called “fragile tests”.

So what do we do about this? First we need to measure. We need to know if we’re blaming the right piece of code. Because, if those tests break because of the code, it’s not that the tests are fragile, the code is not robust enough.

And to get “robust” code, we don’t need to just “unit test” it. As a team, we need to make sure that the devs know the expected quality of their output. How they are expected to test it, and how the testers are planning to test it.

The more the developers know, in advance, how they and the testers, are going to test, we minimize the risk of bugs. And by that, the risk of having fragile tests.

That means collaboration in test planning and design. Because quality is a team effort.

Next time you see a test you don’t trust, ask not how you can fix it. Ask how code of that quality came through the pipes, and plug that leak.

Want to make your tests, and code, more stable? Check out the “test automation” workshop for developers.

Categories: Uncategorized

2 Comments

Aaron · August 6, 2024 at 4:25 pm

I disagree with both the premise of this post and the solution.

While sometimes tests appear flaky due to buggy or complex code, that is not the case the majority of the time.

Tests are flaky most often because of limitations in automation — timing issues, etc. Secondarily, tests are flaky due to incorrectly written test code. This accounts for the vast majority of cases. So much so, that it is usually justified to “blame the test” and understandable why all other causes get lost in the noise.

The third most common source of test flakiness are environment related issues — either the test infrastructure environment or the system under test. This is an addressable problem but since it is often not under control of either the testers or developers, it is often neglected. Typically this is an operations issue, but devops are cranky, and we should excuse them for being dismissive from having encountered problems of the first and second type so often.

Finally, system complexity (not code complexity) is the real issue with creating test complexity. Having to automate a complex workflow — that need not be so complex — is a real problem, and exacerbates the problems of inherently brittle automation and poor quality test automation code.

One way to alleviate this is to simplify tests, for example, by testing functionality at the API layer instead of the UI layer where possible. Or by using lower layers (API or DB) for data setup and validation.

But my main complaint here is with putting the cart before the horse. If you have flaky tests that fail because of defective code, the problem isn’t that the developers have written defective code and the solution isn’t that they need to test it better — that what testing is for!

If your tests are finding bugs in code, that is their intended purpose. A test that does not find a defect is wasted effort. We do not always know which tests will find bugs, so this waste is expected.

The theory is that QA provides value by being able to look at the result of developer code and more efficiently find defects than the developers themselves. If that is not the case, then the problem is QA, but I don’t believe that. I believe that dedicated testers provide a fresh perspective, and have specific goals and incentive to find defects in a way that developers cannot — and that it can be done cheaper.

That doesn’t mean that developer tests are not also valuable. They are, but should not be expected to catch everything.

The issue is, I think, that tests that are themselves flaky, slow, or uninformative make the effort of finding real defects too costly, and is something that should be addressed.

Simplifying systems reduces the opportunity for defects, obviously, but is really outside the scope of the problem.

    Gil Zilberfeld · August 8, 2024 at 5:04 pm

    Thanks for the descriptive comment, Aaron.
    We both agree (your points 1,2,4) that system and tested code are comprising a lot of the flakyness. Note that it’s not just code, but system complexity, which is both developers and architectue, is in the developer realm. Of coures, if the testers are involved in making these decisions, they have the opportunity to make the system, not only more testable, but impact the stability, and therefore the trust in tests.

    I don’t agree with “tests need to find bugs”. Tests are written with some expectation in mind of “how the system should work”. When they fail it’s because the system does not work today, as it did yesterday. They are the flag. It could be because of a bug. It could be a transitional state. It could be looking at the wrong signal. How we interpret that flag is on us.

    If there’s good communication between testers and developers, the resulting tests will carry that inherent interpretation.

    And when the flag is raised without any pattern, we stop trusting it. The flakiness is maybe in the code (or system), or tests. But what we see is the flag.
    If there’s no/low communication between the devs and testers, the flag beareres will be blamed. If they work together, there’s a good chance the flakyness will be removed at the right place.

    As for system complexity – I agree we don’t do enough to simplify. I also think that complexity is growing becuase of system voodoo (other people libraries, servers’ building processes) – the code developers write is miniscule compared to the size of the running system. Since that is not going away, we should acknowledge that, and try to simplify design, architecture, and finally code. But – that also means that testing without understanding this voodoo will result with that flakyness.

    After all, flakyness is the name we give to things we don’t understand.

    I think that deserves a separate post.

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *