Mon 14 July 2025

Assert

The "Software Testing" series:

      1: (here) Assert

Natural language is context dependant and ambiguous. Do you think you can one shot a solid business idea? It took twitch seven years to pivot into gaming. It wasn't seven years of accumulating stacks of code that helped them stick this landing.

We are prompting a machine using an ambiguous and context dependent natural language to create precise and detailed machine instructions. It is no wonder we are finding those with experience with coding are at an advantage when it comes to commanding the machine. The vibe coder is overlooking the techniques and the vocabulary the profession has developed over several decades.

We've learnt in order to generate the best response from an LLM we need more precision and less ambiguity from our prompts. If only we could develop a language that helps us achieve a precise way of creating machine instructions that eliminates ambiguity, perhaps we could call this a programming language?

Fingers crossed

Nothing is built on stone; all is built on sand, but we must build as if the sand were stone.

Jorge Luis Borges (From "Software Engineering at Google")

Most software is built on hope. I write a function that multiplies two integers together and hope that it works. We can also write a test to assert that these numbers will output the correct number, but are you going to write a test for every combination of all numbers?

Once we put lines of code into production the function may or may not be run with the exact input that we expected when we wrote the code.

We are required to create programs without knowledge of the concrete values that will be passed into it; to think of a result in terms of it's name.

double_n = add(n, n)

For every computation we rely on hope.

Program testing can be used to show the presence of bugs, but never to show their absence!

EWD-249 (1970)

Staying Organised

How we ensure our programs are correct also tends to relate to how we scale a project. We've recognised the limitations of a single mind to contain the details of an entire program

It's the core responsibility of a software engineer to watch and manage this complexity.

The art of programming is the art of organizing complexity, of mastering multitude and avoiding its bastard chaos as effectively as possible.

EWD-249 (1970)

Since then we've had multiple attempts at growing a project. There's a link between how we structure code and how we test it. Tests enables us to offload the checking of our functionality and well structured code tends to be easier to test.

This line of thinking led to the practice of Test Driven Development (TDD) where it's thought that writing out the tests as a first step leads the programmer to write more cohesive and well structured code.

Describing Tests

If our tests are determining how we structure the code, what's determining how we structure the tests?

First let's address one of the biggest issues in software engineering. The way we teach and introduce how-to-test is vague and ambiguous, using abstract examples of unrealistic classes and functions. The worse offending term being the "Unit Test" as the definitive boundary for a unit can always be argued.

We have a better understanding of what is not a unit test, than what a unit test is.

The second offender is the testing pyramid. Vehement advocates will disagree on the boundaries of each layer and these layers won't apply to all projects. Setting out to define these at the beginning of a project just waste our time. Often we can only determine where areas of a project will grow with hindsight and we are already building software on a foundation of hope so we should stick to just enough testing.

We shouldn't let the question "Where should we test it" get in the way of testing it.

Managing Tests

We should start thinking more about how we manage tests.

The first thing to address is test duplication. It is all too easy to see a test, make a copy and change it slightly. This can lead us to having the same thing tested across multiple tests. We can reduce the amount of code we are maintaining if we have tests that are targeted. If small changes lead to an unexpected amount of tests breaking we have too much assert duplication.

I compare testing to a climber scaling a mountain with a limited number of pegs. If you are too cautious and nail a peg after every metre you'll find it tougher to make changes when you change direction as the climber's rope is limited by the distance between each peg. Each peg also requires removing when a direction needs to change larger than a metre. However if you nail a peg every 10 metres the climber is flexible to direction changes at the risk of taking a battering when they fall.

Techniques that lead us to balance being defensive and flexible lead us to having a better test suite. Reducing test duplication is one example of this. If we are using 3 pegs in the same location we aren't providing a greater level of safety and we risk requiring unnecessary changes in the future.

S Williams-Wynn at 12:07 | Comments() |

Mon 07 July 2025

Database Indexes

Often the most crucial part of a system is the database. Databases manage and handle the storage of our business data, an inefficient database can impact the overall performance of our system. There are, however, a few ways we can improve database performance, one such way is defining appropriate indexes.

This article describes indexes in Postgres and some of the lesser known indexes it uses, such as GIN and BRIN indexes.

Why do we use indexes.

We use indexes in postgres to optimise how we scan the data we have stored in the database. Similar to how binary search can optimise how quickly we find things in a dictionary (see my post on binary search).

We can use an index to inform the database on how to structure data in a way that allows us to find what we are looking for effectively i.e. fewer operations, fewer i/o and fewer comparisons leading to lower CPU usage and better performance.

What is an index.

An index is a data structure which allows us to minimise the number of instructions performed by the database in order to lookup table rows or improve the speed at which a result is sorted.

In the case of sorting if there's an index on a column that you require results to be ordered by, the query can scan the index to return the results in sorted order, if there is no index on the column the data must first be fetched and then sorted in a separate step.

Indexes aren't always good

Over doing the number of indexes on a table can lead to lower write performance as every update or insert into the table requires updating the index. Indexes also require disk space, so having unnecessary indexes will contribute to the rate at which your disk usage grows.

B-Tree

The default index type Postgres uses is a B-Tree. This data structure is similar to a binary tree except instead of having a node with only two pointers a node can contain many pointer. The number of pointers is determined by Postgres's default block size, which is 8kb. A single node will store as many sorted keys as it can until it reaches this size limit. Postgres refers to these nodes as pages.

Having a smaller index keys can improve the performance of your index as you'll be able to store more keys within the page limit. Resulting in fewer i/o instructions as fewer pages are required to be read from disk.

In postgres leaf nodes will point to the physical location of the row on disk and the intermediary nodes (or internal pages) will point to the nodes on the next level down the tree.

There's a neat animation that can help visualise b-trees: https://btree.app/

Partial Indexes

We can optimise the indexes we create by understanding usage patterns and having some intuition about the system and the business. One such optimisation is the use of partial indexes, these are handy when you have a large table but only a subset of the data is used most frequently.

As an example, we can imagine an online order system, it's quite likely that once an order is fulfilled the status transitions to "complete" and this order isn't likely to be accessed as frequently as the orders that are still in progress. We can partition our index so that it contains only unfulfilled orders which will be a significantly smaller portion of our overall orders.

CREATE INDEX idx_uncomplete_orders_customer ON
orders(customer_id) WHERE status != 'complete';

We also have to include this WHERE filter in our queries if we wish to make use of this index.

Covering Index

A covering index allows us to add columns to the index's leaf nodes and thus avoids the lookup and reading of the entire the table row from disk. Essentially everything that the query needs is available on the index, reducing the number of operations required to complete the query.

As an example if we typically request a user's first name and last name with an email we can write an index on the email that includes the first and last name.

CREATE INDEX idx_user_email_include
ON users (email) INCLUDE (firstname, lastname);

We have to bare in mind that we should only include columns which change as frequently or less frequently than the key we are indexes otherwise we are duplicating data and increasing the write overhead. This isn't an issue for columns that rarely change.

More on covering indexes see "INCLUDE".

Gin Index

GIN stands for Generalised Inverted Index, similar to the inverted index that Lucene is built on, the same Lucene that Elasticsearch is built on. The slight difference is that this inverted index is generalised so it expects items that are indexed to be composed of multiple values the same way a sentence is composed of multiple words, with the goal to support full-text search.

An index entry is created for each element in the composite column and the entry points to a list of matching locations. So as an example a row that has a column with the value "Here's is a blog post" will create an index entry for each word in the value, (e.g. "blog") and then each entry will point to the rows that contain "blog" inside the composite column.1

ALTER TABLE blogposts
ADD COLUMN search_vector tsvector;

CREATE INDEX idx_blogposts_search ON blogposts
USING GIN (search_vector);

GIN indexes aren't limited to just text, they are generalised so they can be used with any composite type, JSONB for example.

BRIN Index

BRIN stands for Block Range Index. These are indexes that specialise in handling very large tables in which certain columns have some natural correlation to their physical storage location. Essentially if each row in the table can be grouped in some manner to the blocks on disk then we have a potential use-case for the BRIN Index.

GPS tracking points and log tables are good examples of data with this natural correlation, if you have a table that is small or data is updated or inserted out of chronological order then they won't form a good fit for BRIN indexes.

Instead of storing pointers to entry locations in the B-Tree leaf nodes, BRIN indexes store summary information for the ranges that the blocks on disk are ordered. This allows for a smaller overhead compared to the default B-Tree which stores all entries and their corresponding locations.

CREATE INDEX idx_transactions_created_at_brin
ON transactions using BRIN (created_at);

This can be used to optimise queries that rely on fetching rows between certain ranges.

SELECT * FROM orders WHERE
created_at BETWEEN '2024-01-01' AND '2024-12-31';

  1. I'm describing a reverse index. 

S Williams-Wynn at 12:32 | Comments() |

Mon 30 June 2025

Low Latency Computation

Anything that loads faster than 200ms is instantaneous for humans, anything slower and we would perceive the delay. Some systems are built to respond quicker than this, business cost can be sensitive to latency and every ms can make a difference.

How do we develop highly performant systems? I happened to share a train ride with Mark Shannon, who at the time was leading a team at Microsoft that had one goal: Make Python Fast.

I asked him, how does one make code more performant? To which he responded.

Make it do less.

Here's what doing less looks like:

Loop Invariant

A simple example of doing less in order to achieve better performance is with loop invariant conditions.

We loop over arrays while we code and make calls to other functions while looping. During a refactor or while we move code around we might not realise that some of the variables we are computing are constant.

def expensive(trades: list[Trade], n: int):
    rolling_sum = 0
    for trade in trades:
        beta = compute_beta(n)
        rolling_sum += beta * trade.risk

    return rolling_sum / len(trades)

You'll notice in the example that I am looping over the list trades and computing a beta value, but this beta value is computed with the integer n which is consistent within the scope of this function. If we move the computation of beta to a point before we've entered loop we avoid having to compute beta for each trade in trades.

def expensive(trades: list[Trade], n: int):
    rolling_sum = 0
    beta = compute_beta(n)
    for trade in trades:
        rolling_sum += beta * trade.risk

    return rolling_sum / len(trades)

Now beta is computed at most once within this method scope, we have essentially achieved doing less. If the compute_beta method incurs an expensive overhead and we are looking at a lengthy trades list this subtle oversight would have a significant impact on our latency.

Python

There are several layers of abstraction in Python and there are ways we can code that allow us to utilise computation in C more effectively. One such example of this: using list comprehension over a vanilla for-loop.

If we wish to understand the number of steps the machine is doing for a method, Python allows us to disassemble bytecode using import dis. You might notice that there are additional LOAD_ATTR instructions in a vanilla for-loop so defaulting to a list comprehension avoids this overhead and allows us to do less.

Databases

Relying on external systems can be expensive, in some cases we might query the same system several times in order to complete an operation. If we have a distributed system or intend to scale our server horizontally we are increasing the number of things that open and close connections to the database. Each time we open/close a connection the CPU has to process these instructions, this overhead can be significant when we are working in low latency environments. To get around this we introduce a connection pool between the database and our server, the connection pool manages long lived connections on the database and frees up the database to focus on database things.

A common anti-pattern involving a database is having your ORM make N+1 queries. For this to occur we have a 1-many entity relationship. If we ask the database to return a group of n countries and then for each country we ask for all cities in that country we are making N (number of countries in our initial query) + 1 (the initial query) total requests to the database. These can be tricky to spot since the individual query could be performant, but the accumulative overhead and latency of all these queries can cause problems.

We get around this by either asking for all countries and then ask for all cities given a set of country ids and perform the mapping in memory on the server, or we can increase the complexity of the query using a JOIN and allow the database to make the mapping. Either way we avoid the back and forth overhead of making N+1 queries, effectively doing less.

When we can't do less

Sometimes we can't do less, but we can decide when work is done. There are situations that allow us to pre compute results and provide performant computation at the cost of a slower start time.

Chess programming is an area which relies on high performance, the quick our program the further into the future we can look on each move during the game. One way we speed up a chess engine is by precomputing the movesets for every piece on the board for every possible position they might be in before the game starts. This allows us to look up all possible board positions for a piece during the game instead of computing only when we need it.1

In 3b1b's wordle solver he actually computes a pattern matrix for each word and saves it to a file in order to avoid having to incur the overhead on each execution of his wordle solver.

Constant Folding

Lastly there's an interesting optimisation that compilers make which is also utilised in python called Constant Folding.

When python is interpreted it parses the AST before executing. During this AST construction it finds expressions who's outcome is constant and computes the constant at this point instead of computing the expression every time it is referenced. As an example the variable x = 24 * 60 will be converted to 1440 before compiling the bytecode instructs for the machine.

So in all cases, if we want a computer to be highly performant we just have to make it do less.

Further Reading

  • perflint try running this on your python codebase to check for performance optimisations.

  1. More on this will be explained when I write my second post on chess programming. 

S Williams-Wynn at 12:05 | Comments() |

Mon 23 June 2025

Leadership Patterns

Low performing teams have a lot in common. These teams often lack some of the basics. Low context, no direction and no understanding of purpose. We can summarise these basics into one thing: a lack of leadership.

Reflecting on failing teams, I've identified how I would nudge/how I've nudged a group of under performers into a team that delivers impact. Often there's a single missing ingredient and the team needs a catalyst for change. This can come from within the team and doesn't have to come from a leader.

Most failure patterns occur in companies that rely on organic organisation, as they've determined that autonomy is at odds with any intentional organisation design.

Things that are done with an intent in mind will always lead to a better outcome, otherwise you're just leaving it up to chance. We can have both, autonomy and organisation.

Failure case 1.

Everyone's a leader.

Communication is good, avoiding a decision is bad. In teams where everyone thinks/tries to be a leader the team behaves as if it were leaderless.

Individual contributors will state what they believe ought to be the best direction but it ends without consensus. These meetings are pointless as members stick to the idea that they've presented, and a team with 5 members will have 5 directions.1

This pattern is hard to notice since they usually come to light when you're in the middle of a stressful situation and everyone is sharing an opinion as a means to turn the ship around. It's also easy to be absorbed into giving your own bit of advice in order to feign contribution.

The members present a large amount of information and set no goal, or goals are highly individualistic. There's no point in calling yourself a team if there's no underlying vision/goal to achieve. You could now say this is hardly even a team and you would have saved time by not having a meeting at all.

The best way to address this situation is to take a step back and be honest. "I'm not going to remember half of this meeting." Then direct the attention to someone that is meant to be taking a leadership role and say: "Maybe x can summarize these thoughts into one or two points and we can focus on this, so we leave this meeting with the same aim". What ever is said next will certainly be remembered.

If you are the leader, then don't let your team leave more uncertain about how they should address the situation than when they arrived. A mature leader might even recognise that there's a team mate that's probably the most knowledgable and should use the opportunity to direct them to state the focus.

It's similar to a half time chat, you can't have 11 people in the team telling you what you should do better in the next half. That's not how a rally works.

Agreeing to commit to a direction and a goal is all part of being a team. Someone needs to stand up in moments when there's dysfunction.

Failure Case 2

No one is a leader.

Communication is good. Some teams are just sitting around waiting to be told what to do, if there's a sit around culture then nothing impactful gets done.

It's quite easy to tell if you're in a team that lacks leadership since no one will know what work should be done or how their own work fits into the bigger picture of the organisation.

Teams without a leader will lack context and purpose which are all traits that lead to lower motivation. Assuming that people will just know what to do, without forming some consensus is naive and might lead the team to doing pointless work.

Turning these kind of teams around requires a little more diagnosis of the no-leader symptom.

  1. Why is no one grabbing initiative?
  2. What are the risks involved in grabbing initiative?
  3. Is fear involved?
  4. Do we provide positive examples that initiative is rewarded?

You might come to realise that no one want's to take on the risk that comes from taking initiative, especially if there's only downside and no upside. Initiative should always be rewarded in organisations even if the wrong initiative was taken. Learning from mistakes is always better than sitting and doing nothing.2

Diagnosing why the team is behaving this way can lead you to notice that your org is not incentivising the correct behaviours.

There can be other reasons, like a lack of experience in taking ownership. There could also be a lack of hiring with intent leading to a skill misalignment. What ever it is, if there's a team that appears leaderless then there's a problem that needs fixing.

Leaders

In truth leaders are in a team not to be dictators, but to hear the concerns of everyone and distill it down to a direction and focus. Getting the whole team aiming at the same target can shift a low performing team to a high performing team over night.

Having a team aim all over the place will result in weak performance.

Motivating teams, is about context and purpose - you can leave them to do what they want as long as they have context and purpose. Without this they're directionless and will not meet expectations.

Some situations are a prime ground for someone to take ownership, just make sure you're awarding this behaviour.


  1. Not always the case, if the team has been like this for quite sometime you'll have members of the team state something in the discussion and they won't stick to it themselves because making points for the sake of it has become a culture. 

  2. Within reason. 

S Williams-Wynn at 12:02 | Comments() |

Mon 09 June 2025

Reframe Binary Search

Binary search is quite a simple algorithm to explain and thus it is often one of the first algorithms computer scientists learn. It's also occasionally brought up in software engineering interviews as it can be applied when solving optimisation problems.

The traditional explanation of binary search is flawed and acts as a barrier to its application. Binary search is not an algorithm that believes a set can be divided into three groups, e.g. these numbers are too cold, these number are too hot and this number is just right.

This goldilocks framing of binary search forces us to look for situations where the problem contains an array of porridge ordered by temperature and we are in search of the one falls into our preferred consumption temperature.

Binary search is an algorithm that is optimal at finding state transitions and framing it as such opens up the application of binary search as we no longer approach problems like goldilocks and instead; finding the best porridge becomes finding the point at which the ordered bowls of porridge transition from being tepid to scolding.1

The Basics

Commonly, binary search is introduced with the example of finding a word in a dictionary. As an example say we are tasked to find the word "Hippo":

The brute force approach is to start at A, and flick through the pages until you get to the page that contains "Hippo". This approach works, however the time complexity of this algorithm is O(n); where n is the number of pages in the dictionary. So in the worst case, "Zebra" or something, You're looking at all n pages.

Pretend H is Hippo:

BruteForceSearch

n operations is not inherently bad. If you have a list of 10 words. n operations isn't going to harm you. It starts to get a bit much when you have a billion words and it's not practical when you have n words to find among a billion words.

Here's where we can apply binary search. If we want to find the word "Hippo". We start by jumping to a random page, or we open the book in the middle.2 If the dictionary is 260 pages the middle will land us on 130 which will be M. We know that H is before M so we can throw away half the dictionary. M->Z, so from a single operation we've already chopped the worst case scenario to n/2. We don't stop here, now we go to page 65 (half of 130) which lands us somewhere between F and G we know H is after G so now we can pick something in between G -> M = J. We continue doing this, narrowing down the pages, until we land on the page that contains Hippo.

BinarySearch

In the worst-case scenario, when we are looking in a dictionary of 260 pages, we do 9 operations. If we relied on the previous algorithm we would do 260 operations in the worst-case.

The Weaknesses

This explanation does a good job to help us understand the algorithm but it overcomplicates binary search as it breaks down the operations that binary search is doing into three parts.

  1. Is the current word the word we are looking for?
  2. Is the word we are looking for ahead of the current word?
  3. Is the word we are looking for behind the current word?

These questions may help inform us as to how the algorithm behaves, but we can simplify it to the single binary question.

When does the word we are looking for no longer appear after our current word?

Instead of imaging the sequence as the words that might appear on each page, we can picture the sequence as the outcome of the statement: The word I'm looking for is after the current word.

StateTransition

As a result this frames binary search as a solution to finding a state transition. It is at the point when our word not longer appears after the current word that we find the word we are looking for.

Git Bisect

Binary search is used to find bugs. If you're familiar with git then you might have heard of git bisect. For those unfamiliar with git imagine a timeline with each point in that timeline being a change to a file. So every time the file is saved it adds a new node to the timeline.

Now instead of a single file, git works across a whole software project and each save is being noted down on the timeline. A change is what is being added and what is being removed, so the accumulated sum of all these changes is the current state of the project.

On large teams and complex projects bugs can occur after several changes have been made to the code. If we know that the code is broken in its current state but aren't sure what change caused the bug we can use git bisect to figure out which change is the culprit.

When we run git bisect we provide it with a point in time that we are certain no bug existed. git bisect will then present us a change between that time and now.

  1. No bug
  2. ?
  3. ?
  4. broken?
  5. ?
  6. ?
  7. Bug

We then have to answer; is the software at point 4 broken? If yes, the bug was introduced between 1 and 4. This rules out all points after 4. Cutting the points we have to analyse down in half. git bisect continues, running a binary search over the changes in the project until we find the change on the timeline that introduced the bug.

Essentially we are using binary search to answer the statement: The bug exists, because we are interested when the state transitions from "The bug doesn't exist" to "The bug exists".

The Boolean Function

We can now separate binary search into two distinct parts. The algorithm part, which moves a pointer either up or down the timeline and the function that computes a boolean response. With this we can ask more interesting questions for which binary search provides an answer, as an example; "When did spend increase?" or "When did the user go into debt?".

We can even provide two arrays and find the answer for "When did trend A over take trend B", without computing the delta for all observations on the timeline.

DeltaSearch

The main underlining assumption taught with the traditional explanation is that the set you're operating on should already be sorted for the search to work. The example above shows that this isn't true. Binary search only requires the underlying data to be bisect-able.

Binary search is an effective tool for these state transition problems. It is interesting that the algorithm on its own is easy to understand, the complexity and it's application seems to grow when you separate the algorithms iterator and the statement.

I'd be interested in finding out if the application of other algorithms grows when they're decomposed in a similar way.

Further reading


  1. I realise that using an analogy before explaining binary search might have lost some people. 

  2. We assume that words are evenly distributed in a dictionary to keep things simple. 

S Williams-Wynn at 12:15 | Comments() |
Socials
Friends
Subscribe