Mon 28 July 2025
Software Is Planning
Software is planning, and as the adage goes, failing to plan is planning to fail. This article goes into the development of Git and UNIX with the goal of dispelling the myth that the genius software engineer exists and when we hear something was written entirely over night there's key information being omitted.
Everyone likes the story of the programmer that disappears and resurfaces after a week with some revolutionising software. We all enjoy a compelling narrative but in reality great software takes time, additionally for most great projects the code is actually the smallest part of the project.
Planning
Projects don't go wrong they start wrong
How Big Things Get Done (2023)
We shouldn't view software creation from the point when coding starts. The process should include the upfront planning. In 1975 Brooke advocated that the coding portion of a software project should amount to 17% of the overall time and it should be noted that in the 1970s the most dominant programming languages were FORTRAN and COBOL.
If a software requirements error is detected and corrected during the plans and requirements phase, its correction is a relatively simple matter of updating the requirements specification.
Software Engineering Economics (1981)
When we think of planning we often think of planning permissions, regulation and endless bureaucracy. We don't have time for that, we are trying to move fast and break things. However, in software, the planning stage is usually when most of the thinking happens and the hard problems get ironed out. Thinking is hard and most people avoid it, make thinking easier and you can stand out from other developers.
The planning phase of a project is also the point in time when large changes are the cheapest.1
UNIX
One fable within software is that UNIX was developed over the weekend by Ken Thompson. Ken is quite the mythical figure in the software world, Brian Kernighan attests to this in multiple interviews.
We must keep in mind though that Ken worked for three years on Multics which he carried many features forward to UNIX. Here's not saying Ken isn't a great programmer, but we shouldn't discount that his mind was focused on an operating system for quite an amount of time and probably made some contribution towards formulating a better system.
Imagine your best programmer said to you, "sure let me work on the project for two to three years then we will scrap it; and start again. You are bound to have the best in class software." Instead businesses want every project to have a fabled programmer and to continue the trope.
Ken Thompson developed UNIX over 3 weeks, there's a video where Brian Kernighan remarks that modern software engineers aren't as productive2.
Git
At this point Git is the most popular form of version control for software, it's also famously known to have been written in a fairly short amount of time. Linus states that he wrote Git in 10 days and we mostly assume the timeline for the project was - I have an idea and ten days later we have Git.
However this is not the case. Linus wrote Git in 10 days but it took him 4 month of thinking about the problem until he had a solution that he was satisfied with. 10 days after that we had Git.
Being a Creator
When it comes to creation, sometimes we are too excited by the answer that we fail to think about the question. A lot of software gets written without getting to the root of the problem and thus fail to materialise any value.
Large projects fail when creation comes first and planning comes never. Even after you have complete knowledge of the problem, solutions may have challenges which require addressing upfront, addressing them when you are midway through a project is going to cost both time and money.
For some problems you can have an attitude of "we will work it out when we get there", but you don't want to have this attitude to all problems as when you get there you might be 95% of the way through your time and budget and you have crossed the point of no return and getting over this hurdle requires going over budget.
Use your planning to categorise problems into "it won't work unless we know this" and "we can figure this out when we get there".
Board Games
"Is it fun" is the most important question in board game design. Ideally you should find this out before you hire an artist, write storyboards or even print and design cards. In board game creation circles they advocate tearing up pieces of paper and scribbling out a prototype to answer this question. The same thing applies to software.
Determine if the question you are answering is the correct one and determine if your software will actual solve the problem, do this; Wizard of Oz style, before putting a shovel into the ground.
Assumptions
All projects start with assumptions. These are typically the first things that need to be addressed before you start coding. Sometimes asking someone if a solution exists or how something works can knock off some unknowns and you'll be in a better position to start than had you remained silent.
Design it Twice
A Philosophy of Software Design (2018)
John Ousterhout advocates that designing software once is not enough and when you are planning something out you should make multiple attempts at it before you move ahead. He's found this leads to better design.
Every project comes with its unknowns and challenges, we are probably never going to rebuild the exact same solution twice. So packing in as much learning as you can upfront about the challenges ahead will leave you at least the most prepared you could have been.
The beginning of project is the cheapest time to learn so we should maximise and front load learning. Learning midway through a project is an expensive way of learning. It's fine if we can afford these learnings but on big and critical projects you don't want to bring up the point that the work we've been doing for the last 6 months was all for nothing especially if we could have learnt this earlier on.
Break things down, explore and figure out how each piece of the project will work and that we are actually providing the right answer to the correct question, this is the only way to deliver on time and on budget.
If we jump straight into code our design strategy is hope.
-
Whenever I hear "prompt engineering" I roll my eyes, mainly because it's a term that offers zero value. If instead we referred to it as "prompt planning" I think we would get people onto the correct page. Having a clear idea of the answer, having thought about what you want to create and providing a clear unambiguous prompt is essentially doing the upfront planning, which I hope you're doing when writing your software. ↩
-
and I took that personally. ↩
Mon 21 July 2025
Strategic Testing
The "Software Testing" series:
-
1: Assert
-
2: (here) Strategic Testing
Are our tests good? Are they bad? Do we have enough tests?
Beyond writing isolated and targeted unit tests, there are methods that ensure our tests are appropriate. This article covers some strategies that answer these questions. Namely; using test coverage metrics, include mutation testing, applying fuzzy tests and finally test fixtures.
Test Coverage
We can measure the number of lines executed when running our
tests. For example the following code snippet doesn't
include the fourth line 'return False'
in the test suite.
def is_odd(n: int) -> bool:
if n % 2:
return True
return False
def test_is_odd():
assert is_odd(7)
Using the ratio of lines tested to lines not tested gives us the test coverage metric.
Along with context of how important each line of our code is; test coverage is helpful, however as a performance metric or as a blind target it's quite useless.
If you've got a 20% coverage and the code is critical to your business then getting that to 80% is crucial. Trying to eek out the extra 0.1% coverage when you're at 98% is fruitless, or having a goal to go from 98% -> 99% is a poor man's KPI.
Unless they're easy to add you'll be trying to make very minor coverage gains for edge-cases that might be rarely hit. At this point there is likely something more impactful to focus on.
It might be an interesting exercise to understand if the code you're not covering is reachable during the programs lifetime. If not, the dead code should just be removed instead of tested.
Mutation Testing
We rely on tests to ensure our code is correct and works as expected but how do we ensure our tests are correct and they work as expected. What tests our tests? Mutation testing aims to fill this gap.
How often have you written a passing test and you purposely make it fail just to ensure that the test is catching the case you are intending it to catch? This forms the basis of mutation testing.
When applied, it goes through your test suite and makes
subtle changes to your tests. If we use the test_is_odd
method as an example, it might bump 7 to 8. Sometimes it
might take a string and remove a character. It will also
change operators in your test such as changing <=
to <
.
The test will then run as normal but under the expectations
that it should fail with the mutations. If our example still
passes when we pass 8 instead of 7 then something is wrong,
the test isn't working as expected which might indicate that
we are mocking too many dependencies, we aren't being specific
enough or nothing is really being tested.
Fuzzy Tests
Some software may receive user inputted or malformed data, in these cases you might not want the system to behave irregularly. A developer might not know of all the funky data that could be provided to the method upfront, in these cases they may rely on writing fuzzy tests.
As an example, if we had a method that expects a user provided string, we can define a fuzzy test which enumerates a data bank of known edge-cases for strings such as providing an emoji, an empty string or a large string of zero width characters.
Test Fixtures
As the code base grows you might notice that we are writing repeated lines of code in order to setup a user object or prepare data before passing it to the method we are testing.
Large projects get around this by defining test fixtures. These can be passed as parameters to our tests so that we know the setup the test requires before it runs. The benefit of keeping the fixture separate to the test is that it reducing the amount of code that is duplicated across tests and if the setup for the user changes then only the fixture should require changing.
Tests should be focused on asserting one thing and fewer lines in the test makes it easier to see what's going wrong when something breaks.
Finally
If you're printing it, maybe you should assert it.
Further Reading
Mon 14 July 2025
Assert
The "Software Testing" series:
-
1: (here) Assert
Natural language is context dependant and ambiguous. Do you think you can one shot a solid business idea? It took twitch seven years to pivot into gaming. It wasn't seven years of accumulating stacks of code that helped them stick this landing.
We are prompting a machine using an ambiguous and context dependent natural language to create precise and detailed machine instructions. It is no wonder we are finding those with experience with coding are at an advantage when it comes to commanding the machine. The vibe coder is overlooking the techniques and the vocabulary the profession has developed over several decades.
We've learnt in order to generate the best response from an LLM we need more precision and less ambiguity from our prompts. If only we could develop a language that helps us achieve a precise way of creating machine instructions that eliminates ambiguity, perhaps we could call this a programming language?
Fingers crossed
Nothing is built on stone; all is built on sand, but we must build as if the sand were stone.
Jorge Luis Borges (From "Software Engineering at Google")
Most software is built on hope. I write a function that multiplies two integers together and hope that it works. We can also write a test to assert that these numbers will output the correct number, but are you going to write a test for every combination of all numbers?
Once we put lines of code into production the function may or may not be run with the exact input that we expected when we wrote the code.
We are required to create programs without knowledge of the concrete values that will be passed into it; to think of a result in terms of it's name.
double_n = add(n, n)
For every computation we rely on hope.
Program testing can be used to show the presence of bugs, but never to show their absence!
EWD-249 (1970)
Staying Organised
How we ensure our programs are correct also tends to relate to how we scale a project. We've recognised the limitations of a single mind to contain the details of an entire program
It's the core responsibility of a software engineer to watch and manage this complexity.
The art of programming is the art of organizing complexity, of mastering multitude and avoiding its bastard chaos as effectively as possible.
EWD-249 (1970)
Since then we've had multiple attempts at growing a project. There's a link between how we structure code and how we test it. Tests enables us to offload the checking of our functionality and well structured code tends to be easier to test.
This line of thinking led to the practice of Test Driven Development (TDD) where it's thought that writing out the tests as a first step leads the programmer to write more cohesive and well structured code.
Describing Tests
If our tests are determining how we structure the code, what's determining how we structure the tests?
First let's address one of the biggest issues in software engineering. The way we teach and introduce how-to-test is vague and ambiguous, using abstract examples of unrealistic classes and functions. The worse offending term being the "Unit Test" as the definitive boundary for a unit can always be argued.
We have a better understanding of what is not a unit test, than what a unit test is.
The second offender is the testing pyramid. Vehement advocates will disagree on the boundaries of each layer and these layers won't apply to all projects. Setting out to define these at the beginning of a project just waste our time. Often we can only determine where areas of a project will grow with hindsight and we are already building software on a foundation of hope so we should stick to just enough testing.
We shouldn't let the question "Where should we test it" get in the way of testing it.
Managing Tests
We should start thinking more about how we manage tests.
The first thing to address is test duplication. It is all too easy to see a test, make a copy and change it slightly. This can lead us to having the same thing tested across multiple tests. We can reduce the amount of code we are maintaining if we have tests that are targeted. If small changes lead to an unexpected amount of tests breaking we have too much assert duplication.
I compare testing to a climber scaling a mountain with a limited number of pegs. If you are too cautious and nail a peg after every metre you'll find it tougher to make changes when you change direction as the climber's rope is limited by the distance between each peg. Each peg also requires removing when a direction needs to change larger than a metre. However if you nail a peg every 10 metres the climber is flexible to direction changes at the risk of taking a battering when they fall.
Techniques that lead us to balance being defensive and flexible lead us to having a better test suite. Reducing test duplication is one example of this. If we are using 3 pegs in the same location we aren't providing a greater level of safety and we risk requiring unnecessary changes in the future.
Mon 07 July 2025
Database Indexes
Often the most crucial part of a system is the database. Databases manage and handle the storage of our business data, an inefficient database can impact the overall performance of our system. There are, however, a few ways we can improve database performance, one such way is defining appropriate indexes.
This article describes indexes in Postgres and some of the lesser known indexes it uses, such as GIN and BRIN indexes.
Why do we use indexes.
We use indexes in postgres to optimise how we scan the data we have stored in the database. Similar to how binary search can optimise how quickly we find things in a dictionary (see my post on binary search).
We can use an index to inform the database on how to structure data in a way that allows us to find what we are looking for effectively i.e. fewer operations, fewer i/o and fewer comparisons leading to lower CPU usage and better performance.
What is an index.
An index is a data structure which allows us to minimise the number of instructions performed by the database in order to lookup table rows or improve the speed at which a result is sorted.
In the case of sorting if there's an index on a column that you require results to be ordered by, the query can scan the index to return the results in sorted order, if there is no index on the column the data must first be fetched and then sorted in a separate step.
Indexes aren't always good
Over doing the number of indexes on a table can lead to lower write performance as every update or insert into the table requires updating the index. Indexes also require disk space, so having unnecessary indexes will contribute to the rate at which your disk usage grows.
B-Tree
The default index type Postgres uses is a B-Tree. This data structure is similar to a binary tree except instead of having a node with only two pointers a node can contain many pointer. The number of pointers is determined by Postgres's default block size, which is 8kb. A single node will store as many sorted keys as it can until it reaches this size limit. Postgres refers to these nodes as pages.
Having a smaller index keys can improve the performance of your index as you'll be able to store more keys within the page limit. Resulting in fewer i/o instructions as fewer pages are required to be read from disk.
In postgres leaf nodes will point to the physical location of the row on disk and the intermediary nodes (or internal pages) will point to the nodes on the next level down the tree.
There's a neat animation that can help visualise b-trees: https://btree.app/
Partial Indexes
We can optimise the indexes we create by understanding usage patterns and having some intuition about the system and the business. One such optimisation is the use of partial indexes, these are handy when you have a large table but only a subset of the data is used most frequently.
As an example, we can imagine an online order system, it's quite likely that once an order is fulfilled the status transitions to "complete" and this order isn't likely to be accessed as frequently as the orders that are still in progress. We can partition our index so that it contains only unfulfilled orders which will be a significantly smaller portion of our overall orders.
CREATE INDEX idx_uncomplete_orders_customer ON
orders(customer_id) WHERE status != 'complete';
We also have to include this WHERE filter in our queries if we wish to make use of this index.
Covering Index
A covering index allows us to add columns to the index's leaf nodes and thus avoids the lookup and reading of the entire the table row from disk. Essentially everything that the query needs is available on the index, reducing the number of operations required to complete the query.
As an example if we typically request a user's first name and last name with an email we can write an index on the email that includes the first and last name.
CREATE INDEX idx_user_email_include
ON users (email) INCLUDE (firstname, lastname);
We have to bare in mind that we should only include columns which change as frequently or less frequently than the key we are indexes otherwise we are duplicating data and increasing the write overhead. This isn't an issue for columns that rarely change.
More on covering indexes see "INCLUDE".
Gin Index
GIN stands for Generalised Inverted Index, similar to the inverted index that Lucene is built on, the same Lucene that Elasticsearch is built on. The slight difference is that this inverted index is generalised so it expects items that are indexed to be composed of multiple values the same way a sentence is composed of multiple words, with the goal to support full-text search.
An index entry is created for each element in the composite column and the entry points to a list of matching locations. So as an example a row that has a column with the value "Here's is a blog post" will create an index entry for each word in the value, (e.g. "blog") and then each entry will point to the rows that contain "blog" inside the composite column.1
ALTER TABLE blogposts
ADD COLUMN search_vector tsvector;
CREATE INDEX idx_blogposts_search ON blogposts
USING GIN (search_vector);
GIN indexes aren't limited to just text, they are generalised so they can be used with any composite type, JSONB for example.
BRIN Index
BRIN stands for Block Range Index. These are indexes that specialise in handling very large tables in which certain columns have some natural correlation to their physical storage location. Essentially if each row in the table can be grouped in some manner to the blocks on disk then we have a potential use-case for the BRIN Index.
GPS tracking points and log tables are good examples of data with this natural correlation, if you have a table that is small or data is updated or inserted out of chronological order then they won't form a good fit for BRIN indexes.
Instead of storing pointers to entry locations in the B-Tree leaf nodes, BRIN indexes store summary information for the ranges that the blocks on disk are ordered. This allows for a smaller overhead compared to the default B-Tree which stores all entries and their corresponding locations.
CREATE INDEX idx_transactions_created_at_brin
ON transactions using BRIN (created_at);
This can be used to optimise queries that rely on fetching rows between certain ranges.
SELECT * FROM orders WHERE
created_at BETWEEN '2024-01-01' AND '2024-12-31';
-
I'm describing a reverse index. ↩
Mon 30 June 2025
Low Latency Computation
Anything that loads faster than 200ms is instantaneous for humans, anything slower and we would perceive the delay. Some systems are built to respond quicker than this, business cost can be sensitive to latency and every ms can make a difference.
How do we develop highly performant systems? I happened to share a train ride with Mark Shannon, who at the time was leading a team at Microsoft that had one goal: Make Python Fast.
I asked him, how does one make code more performant? To which he responded.
Make it do less.
Here's what doing less looks like:
Loop Invariant
A simple example of doing less in order to achieve better performance is with loop invariant conditions.
We loop over arrays while we code and make calls to other functions while looping. During a refactor or while we move code around we might not realise that some of the variables we are computing are constant.
def expensive(trades: list[Trade], n: int):
rolling_sum = 0
for trade in trades:
beta = compute_beta(n)
rolling_sum += beta * trade.risk
return rolling_sum / len(trades)
You'll notice in the example that I am looping over the list
trades
and computing a beta
value, but this beta
value
is computed with the integer n
which is consistent within
the scope of this function. If we move the computation of
beta
to a point before we've entered loop we avoid having
to compute beta
for each trade
in trades
.
def expensive(trades: list[Trade], n: int):
rolling_sum = 0
beta = compute_beta(n)
for trade in trades:
rolling_sum += beta * trade.risk
return rolling_sum / len(trades)
Now beta
is computed at most once within this method
scope, we have essentially achieved doing less. If the
compute_beta
method incurs an expensive overhead and we
are looking at a lengthy trades
list this subtle oversight
would have a significant impact on our latency.
Python
There are several layers of abstraction in Python and there are ways we can code that allow us to utilise computation in C more effectively. One such example of this: using list comprehension over a vanilla for-loop.
If we wish to
understand the number of steps the machine is doing
for a method, Python allows us to disassemble bytecode using
import dis
. You might notice that there are additional
LOAD_ATTR
instructions in a vanilla for-loop so defaulting
to a list comprehension avoids this overhead and allows us
to do less.
Databases
Relying on external systems can be expensive, in some cases we might query the same system several times in order to complete an operation. If we have a distributed system or intend to scale our server horizontally we are increasing the number of things that open and close connections to the database. Each time we open/close a connection the CPU has to process these instructions, this overhead can be significant when we are working in low latency environments. To get around this we introduce a connection pool between the database and our server, the connection pool manages long lived connections on the database and frees up the database to focus on database things.
A common anti-pattern involving a database is having your
ORM make N+1 queries. For this to occur we have a 1-many
entity relationship. If we ask the database to return a
group of n countries and then for each country we ask for
all cities in that country we are making N
(number of
countries in our initial query) + 1
(the initial query)
total requests to the database. These can be tricky to spot
since the individual query could be performant, but the
accumulative overhead and latency of all these queries can
cause problems.
We get around this by either asking for all countries and
then ask for all cities given a set of country ids and
perform the mapping in memory on the server, or we can
increase the complexity of the query using a JOIN and allow
the database to make the mapping. Either way we avoid the
back and forth overhead of making N+1
queries, effectively
doing less.
When we can't do less
Sometimes we can't do less, but we can decide when work is done. There are situations that allow us to pre compute results and provide performant computation at the cost of a slower start time.
Chess programming is an area which relies on high performance, the quick our program the further into the future we can look on each move during the game. One way we speed up a chess engine is by precomputing the movesets for every piece on the board for every possible position they might be in before the game starts. This allows us to look up all possible board positions for a piece during the game instead of computing only when we need it.1
In 3b1b's wordle solver he actually computes a pattern matrix for each word and saves it to a file in order to avoid having to incur the overhead on each execution of his wordle solver.
Constant Folding
Lastly there's an interesting optimisation that compilers make which is also utilised in python called Constant Folding.
When python is interpreted it parses the AST before
executing. During this AST construction it finds expressions
who's outcome is constant and computes the constant at this
point instead of computing the expression every time it is
referenced. As an example the variable x = 24
* 60
will be converted to 1440
before compiling the
bytecode instructs for the machine.
So in all cases, if we want a computer to be highly performant we just have to make it do less.
Further Reading
- perflint try running this on your python codebase to check for performance optimisations.
-
More on this will be explained when I write my second post on chess programming. ↩