s3bw - testing

Mon 21 July 2025

Strategic Testing

The "Software Testing" series:

2: (here) Strategic Testing

Are our tests good? Are they bad? Do we have enough tests?

Beyond writing isolated and targeted unit tests, there are methods that ensure our tests are appropriate. This article covers some strategies that answer these questions. Namely; using test coverage metrics, include mutation testing, applying fuzzy tests and finally test fixtures.

Test Coverage

We can measure the number of lines executed when running our tests. For example the following code snippet doesn't include the fourth line 'return False' in the test suite.

def is_odd(n: int) -> bool:
    if n % 2:
        return True
    return False

def test_is_odd():
    assert is_odd(7)

Using the ratio of lines tested to lines not tested gives us the test coverage metric.

Along with context of how important each line of our code is; test coverage is helpful, however as a performance metric or as a blind target it's quite useless.

If you've got a 20% coverage and the code is critical to your business then getting that to 80% is crucial. Trying to eek out the extra 0.1% coverage when you're at 98% is fruitless, or having a goal to go from 98% -> 99% is a poor man's KPI.

Unless they're easy to add you'll be trying to make very minor coverage gains for edge-cases that might be rarely hit. At this point there is likely something more impactful to focus on.

It might be an interesting exercise to understand if the code you're not covering is reachable during the programs lifetime. If not, the dead code should just be removed instead of tested.

Mutation Testing

We rely on tests to ensure our code is correct and works as expected but how do we ensure our tests are correct and they work as expected. What tests our tests? Mutation testing aims to fill this gap.

How often have you written a passing test and you purposely make it fail just to ensure that the test is catching the case you are intending it to catch? This forms the basis of mutation testing.

When applied, it goes through your test suite and makes subtle changes to your tests. If we use the test_is_odd method as an example, it might bump 7 to 8. Sometimes it might take a string and remove a character. It will also change operators in your test such as changing <= to <. The test will then run as normal but under the expectations that it should fail with the mutations. If our example still passes when we pass 8 instead of 7 then something is wrong, the test isn't working as expected which might indicate that we are mocking too many dependencies, we aren't being specific enough or nothing is really being tested.

Fuzzy Tests

Some software may receive user inputted or malformed data, in these cases you might not want the system to behave irregularly. A developer might not know of all the funky data that could be provided to the method upfront, in these cases they may rely on writing fuzzy tests.

As an example, if we had a method that expects a user provided string, we can define a fuzzy test which enumerates a data bank of known edge-cases for strings such as providing an emoji, an empty string or a large string of zero width characters.

Test Fixtures

As the code base grows you might notice that we are writing repeated lines of code in order to setup a user object or prepare data before passing it to the method we are testing.

Large projects get around this by defining test fixtures. These can be passed as parameters to our tests so that we know the setup the test requires before it runs. The benefit of keeping the fixture separate to the test is that it reducing the amount of code that is duplicated across tests and if the setup for the user changes then only the fixture should require changing.

Tests should be focused on asserting one thing and fewer lines in the test makes it easier to see what's going wrong when something breaks.

Finally

If you're printing it, maybe you should assert it.

Mon 14 July 2025

Assert

The "Software Testing" series:

1: (here) Assert

Strategic Testing

Natural language is context dependant and ambiguous. Do you think you can one shot a solid business idea? It took twitch seven years to pivot into gaming. It wasn't seven years of accumulating stacks of code that helped them stick this landing.

We are prompting a machine using an ambiguous and context dependent natural language to create precise and detailed machine instructions. It is no wonder we are finding those with experience with coding are at an advantage when it comes to commanding the machine. The vibe coder is overlooking the techniques and the vocabulary the profession has developed over several decades.

We've learnt in order to generate the best response from an LLM we need more precision and less ambiguity from our prompts. If only we could develop a language that helps us achieve a precise way of creating machine instructions that eliminates ambiguity, perhaps we could call this a programming language?

Fingers crossed

Nothing is built on stone; all is built on sand, but we must build as if the sand were stone.

Jorge Luis Borges (From "Software Engineering at Google")

Most software is built on hope. I write a function that multiplies two integers together and hope that it works. We can also write a test to assert that these numbers will output the correct number, but are you going to write a test for every combination of all numbers?

Once we put lines of code into production the function may or may not be run with the exact input that we expected when we wrote the code.

We are required to create programs without knowledge of the concrete values that will be passed into it; to think of a result in terms of it's name.

double_n = add(n, n)

For every computation we rely on hope.

Program testing can be used to show the presence of bugs, but never to show their absence!

EWD-249 (1970)

Staying Organised

How we ensure our programs are correct also tends to relate to how we scale a project. We've recognised the limitations of a single mind to contain the details of an entire program

It's the core responsibility of a software engineer to watch and manage this complexity.

The art of programming is the art of organizing complexity, of mastering multitude and avoiding its bastard chaos as effectively as possible.

EWD-249 (1970)

Since then we've had multiple attempts at growing a project. There's a link between how we structure code and how we test it. Tests enables us to offload the checking of our functionality and well structured code tends to be easier to test.

This line of thinking led to the practice of Test Driven Development (TDD) where it's thought that writing out the tests as a first step leads the programmer to write more cohesive and well structured code.

Describing Tests

If our tests are determining how we structure the code, what's determining how we structure the tests?

First let's address one of the biggest issues in software engineering. The way we teach and introduce how-to-test is vague and ambiguous, using abstract examples of unrealistic classes and functions. The worse offending term being the "Unit Test" as the definitive boundary for a unit can always be argued.

We have a better understanding of what is not a unit test, than what a unit test is.

The second offender is the testing pyramid. Vehement advocates will disagree on the boundaries of each layer and these layers won't apply to all projects. Setting out to define these at the beginning of a project just waste our time. Often we can only determine where areas of a project will grow with hindsight and we are already building software on a foundation of hope so we should stick to just enough testing.

We shouldn't let the question "Where should we test it" get in the way of testing it.

Managing Tests

We should start thinking more about how we manage tests.

The first thing to address is test duplication. It is all too easy to see a test, make a copy and change it slightly. This can lead us to having the same thing tested across multiple tests. We can reduce the amount of code we are maintaining if we have tests that are targeted. If small changes lead to an unexpected amount of tests breaking we have too much assert duplication.

I compare testing to a climber scaling a mountain with a limited number of pegs. If you are too cautious and nail a peg after every metre you'll find it tougher to make changes when you change direction as the climber's rope is limited by the distance between each peg. Each peg also requires removing when a direction needs to change larger than a metre. However if you nail a peg every 10 metres the climber is flexible to direction changes at the risk of taking a battering when they fall.

Techniques that lead us to balance being defensive and flexible lead us to having a better test suite. Reducing test duplication is one example of this. If we are using 3 pegs in the same location we aren't providing a greater level of safety and we risk requiring unnecessary changes in the future.

S Williams-Wynn at 12:07 | Comments() |

Mon 21 July 2025

Strategic Testing

Test Coverage

Mutation Testing

Fuzzy Tests

Test Fixtures

Finally

Further Reading

Mon 14 July 2025

Assert

Fingers crossed

Staying Organised

Describing Tests

Managing Tests