Mon 02 June 2025
Understanding Time and Space
On May 24th 2000 the Clay Math Institute announced a $1,000,000 prize for solving one of 7 maths problems. Since then only 1 has been solved. The mathematician that solved the problem rejected the money.1
One of those problems is the P vs NP question which relates to how we estimate time and space complexity for algorithms in computer science and software engineering.
Complexity
Complexity is an estimate of the growth in either time
or space given an increase in the size of the input. Some
algorithms have a linear relationship between time and input,
we signify this complexity as O(n)
.
Constant complexity is signified as O(1)
this means that
as we increase n
(the size of the input) the time and
space is consistent, the algorithm doesn't take any longer
and doesn't require more RAM.
An example of an algorithm that take constant O(1)
space is
finding the largest number in an unordered list of n
numbers. You'll notice below, as we move through the list
there's only one element we need to keep track of. This
element is the largest item we've seen up to that point.
When we see a larger number we replace the tracked number
with this new larger number.
No matter how many items we include in this list we would
still only need to track a single number, so the space
complexity of this algorithm is O(1)
. We might apply this
algorithm to find the largest file on a computer.
In terms of time complexity if we were to add more files or
more items to this list the amount of time it takes would
increase proportionally to the number of items we add. We
therefore say this algorithm has an O(n)
time complexity.
If we wish to improve the time complexity of finding the
largest file, we can do this by having something track the
largest file in the systems and checks and updates this
whenever a file is written. This would provide us with a
constant time O(1)
algorithm as we avoid having to check all
the file sizes every time we request the largest file. The
trade off here is added complexity when updating any file on
our system. If we were to reduce the size of the largest
file the second largest might now be the largest and we'd
have to find which file that might be.
We don't often require knowing what the largest
file on the system might be so this added complexity is
probably a waste of resource, additionally the overhead of
creating, updating and deleting files has now increased.
An O(n)
search is probably enough for most machines.
Sorting
Sorting is often used to explain time complexity as there are a number of ways to sort items each with their own complexity. It's also useful because we can visualise sorting. Below I have provided an example of bubble sort, this has a time complexity of O(n2).
This isn't the most efficient way to sort a list of items, but it's a good representation of an O(n2) algorithm.
Categorising complexity
So far we've been discussing polynomial time algorithms, there exist algorithms which take exponential time to produce a solution. As an example in the case we have 2 items this would take 4 operations, if we increase to 3 items it would take us 8 operations. Such an algorithm is said to have O(2n) time complexity.
An example of an exponential algorithmic problem is the N-Queens ii problem. The time complexity for an algorithm to solve this problem is O(n!). So 3 items is 6 operations, 4 items is 24 operations. These are algorithms which a computer would struggle to solve, in a sensible time, as n gets larger.
Within the realm of these exponential algorithms exist
problems which we can solve in O(n!)
time but we can
validate the solution in polynomial time. So even if n is
exceptionally large, given a solution, we are able to
validate the correctness relatively quickly.
P vs NP
This class of algorithms forms the basis for one of the millennium problems, known as P vs NP. The problem asks if there's a relationship between the complexity of validating the solution and complexity in solving the problem. If P = NP then problems that can be verified in polynomial time can also be solvable in polynomial time. However P ≠ NP implies that some problems are inherently harder to solve than they are to verify.
There are a few problems which have no existing polynomial solution, which upon finding one will determine that P = NP. Problems such as the traveling salesman problem (TSP), The boolean satisfiability problem and the vertex cover problem Proving that there exists a polynomial solution to these problems will net you $1 million.
In Practice
Solving these problems have practical applications. Solving TSP would provide us optimal solutions for delivery routing and vertex cover has an application in DNA sequencing. So how do we produce solutions in these areas when attempting to find the best solution can take a computer as long as the lifetime of the universe.
We tend to rely on algorithms that are fast on average, these algorithms may involve backtracking with a large amount of pruning; this approach is used in chess. There are also approximation algorithms or heuristic based approaches such as simulated annealing.
Trade Offs
Understanding time and space allows us to make trade offs between strategies. Certain algorithms can provide us an efficient approach to solving technical problems. There is a limit to how much we can do with algorithms and sometimes even problems that sound simple on the surface could have a $1 million bounty standing for 25 years, without you realising it.
-
I wonder if this will ever be adjusted for inflation otherwise the prize might not be that significant by the time the problem is solved. ↩
Mon 26 May 2025
Conventional Wisdom
There are two things that are impressive when looking at a software project. 1. The simplicity and 2 the consistency.
It can be overwhelming jumping into an organisation or project where the conventions are all over the place. Sometimes it's better if there's just one way to do things.
You're going to be battling with product and business problems, on top of that you're now having to deal with a team expecting XML and another team expecting JSON.
The more consistency and standardisation in a work place the easier it is to dive into a project and get familiar with the hard parts. Working across teams or on more than one code base can be a lot easier once the ways of working become familiar.
We can compare this to picking up a new board game, there's an initial overhead of learning the instructions and how the game plays out. Moving to a new team should feel like you're changing the board but the rest plays out the same way. Going between teams shouldn't feel like jumping between Dune Imperium and Risk1.
There are many decisions that need to be made in software and not having a generally agreed set of conventions adds more overhead to decisions. The downside is that there's a bit of a cold start issue as we get familiar to new guidelines2, but the payoff is worth it.
Having code that is consistent across the codebase improves comprehension for all of engineering
Software Engineering at Google (pg 173)
Having guidelines and rules help us lift the standard of engineering within an organisation and speeds up decision making on minor things like naming if we have defined our expectations.
How Stripe Builds APIs
When building APIs, internally or externally there are a few things it would be good to agree on upfront.
The Stripe API is generally considered to be good; after all, their API is their business. Many documentation tools even use "stripe-like" in their marketing. Stripe has rules and guidelines, which they follow when constructing APIs and they're usually backed up by solid reasoning.
Here are a few of their suggestions:
Avoid Jargon
The example they give is using an industry specific term for
an API property like card.pan
instead of card.number
.
Most people are familiar with a card "number" there are
fewer people familiar with a card "pan".
Accessible vocabulary can allow you to reach more users, you shouldn't gate keep your product and fence off your services to people that have insider knowledge.
Abbreviations are another example of this and should be called out e.g. GTM, you probably thought of Go To Market but I could have been talking about Google Tags Manager.
Your engineers might not come from the same industry and should ideally have diverse backgrounds. This can be played as a strength; if you notice someone asking for clarity on a phrase or word, perhaps you can find phrasing in your answer that is a more suitable term for the API.
Nested Structure
There will always be surprises with any integration.
I've witnessed a 200
status code returning
the message "Server Error"
.
Having properties such as account_number
and
account_created_at
is another one I've seen. Stripe avoids this
by opting for nested
structures, so in this case they would be returning:
account: {
created_at: <timestamp>,
number: 10
}
Properties as Enums
This one prepares us for the future since having a
property like canceled
being either true or false can get
in the way when you introduce more state. We can
avoid filling up our objects with a myriad of redundant
booleans by sticking to an enum from the beginning.
Express Changes with Verbs
Stripe also tends to use clear verbs if a state change will
occur when you hit that endpoint for example:
/paymant/:id/capture
.
Timestamps
Having all properties that are related to timestamps
suffixed with *_at
. This allows you to distinguish the
type, which is harder had you gone with "created
" as
this could be confused for a boolean.
Currencies
These should be represented in the lowest denomination, for example the pound is 100 pence, so we should be passing £10 as the integer 1000. This also helps against pesky floating point arithmetic. When providing a monitary amount when should also provide an indication as to which currency it is. E.g. "GBP" or "USD".
Metrics
When you're only worried about a single service it seems like defining metric conventions is a wasted effort. Metrics are often an after thought and usually always end up in a mess.
Defining a naming convention for your metrics, tags, and services is crucial to have a clean, readable, and maintainable telemetry data.
DataDog
DataDog has some good insight into how to start providing guidelines for metric names.
Avoid abbreviations
For the same reason mentioned above, these might have multiple meanings and in the metric world you don't want to confuse things like Status Code and Service Charge.
Namespaces
Namespaces are one honking great idea -- let's do more of those!
python -c 'import this'
They recommend having the metrics prefixed with the service or application that's generating those metrics.
Unified Tagging
Using standard tags like env
, service
and version
can
help your new service get off the ground quicker with
dashboards that are already written to aggregate your metrics
around those tags.
Avoid Overly Specific Metrics
If you have a metric for the number of requests you can
start tagging your metrics with "method
" which can be either
POST
, GET
et al. you shouldn't need to create a new metric
called post_requests
to specifically capture post
requests.
Cardinality
The other thing to keep in mind when creating metrics, which
is useful for a guideline doc, is reminding ourselves that
each unique key-value pair represents a new time series,
so we should ensure we don't have tags with high cardinality.
Avoid using tags like user_id
as you'll be storing
n
time series, having a bounded set is better; for example
request methods have at most ~5 values.
Databases
Software engineers are often required to give names to tables and columns in databases. You can generally take a look at the other tables and columns to get an idea for existing naming conventions but I'd lean more cautiously here as all you need is a single column out of line and the conventions go to shit.
Naming Tables
Should they be plural or singular? Well according to stack overflow the second answer with 388 votes says "Plural table names" and the first answer with 388 votes says "Singular names for tables". So it beats me... Just pick one.3
Naming Columns
Some people want to have the table name in the primary key,
like user.user_id
these people shouldn't write software.
Having all tables with the convention of id
as the primary
key for that table I find makes most sense, having this in
common across all tables keeps things simple.
Timestamps
Like with an API the tables are also going to contain
timestamps and having a mix of column conventions to
identify these can be a pain. Typically I've gone with the
suffix of *_at
. E.g. created_at
, completed_at
. I've
also seen created_date
or date_completed
. Similar with
foreign keys these should all be tablename_id
where
tablename
is the other table you're referencing.
Another convention that's typically followed with databases
is ensuring all tables have a created_at
and updated_at
column from the start. Some people even push to having
deleted_at
as one of those default columns.
Indexes
I've noticed an unspoken convention of naming indexes like
so: user_account_id_idx
which would be an index on the
account_id
column in the user
table. You can also suffix
it with the type of index; _include
, _gin
, _brin
.
Guidelines
Some of these can seem pretty straight forward if you've been writing code for a while, but you have to remember that not everyone would be exposed to these rules. There also exist engineers who believe they're better than any convention or guideline.
At the outset code reviews are the place to help form these conventions, they're certainly not for architectural design changes, the time for that has long past.
Setting out some initial guidelines can help you scale and
onboard. Having to select request.statusCode
and
request.status_code
because you're naming things different
across services will only get worse as the company gets
older (it doesn't even need to scale, we're battling with
time).
Some of these guidelines might follow just simple stylistic reasoning, but some of them allow you to get dashboards with very little effort or they help you avoid potential hurdles as the project grows.
Guidelines aren't here to hold developers back from self expression they're in place to help us work together and grow organisational learnings as well as provide a baseline for engineering standards.
How do you ensure your organisation is learning if you're not adding or molding your guidelines, does every new project start from zero?
Mon 19 May 2025
Scaling teams with CAP theorem
There's no hack to how you hold meetings, it all hinges on your organisation structure.
If you've had enough time writing software you would have run into the smug lead developer that looks back on projects with the benefit of hind-sight and explains Conway's Law.
Organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations.
Mr Conway 1967
This motivated me to look at the structure of an organisation as a system with it's own trade-offs and limitations. If we think about design and compartmentalisation in software shouldn't we also be applying this to how we structure the teams with-in an organisation.
Time to bring out the neoclassical economist in me and start using maths as a metaphor in-order-to explain how learning CAP theorem will allow you to scale software teams. Most of the motivation has come from my observations while working at scaling startups where I've seen teams go from being one person to a department and in some cases a team staying as one person while the organisation grows around them.
The Theorem
Generally CAP theorem is brought up in interviews when discussing trade-offs within a distributed system. The idea simplifies a system into three attributes, of which you're constrained to pick only 2. After making your selection there's a follow up discussion of the pros and cons.
We generalise that databases operate somewhere on these lines and understanding these trade-offs can help you decide the best solution to fit the system you're designing.
Partition tolerance
The first attribute is partition tolerance, which is typically a given, since you're trying to scale a system beyond a single computer or server or database and segmentation across multiple machines is needed. This is one of the choices that is made for you. Now it's up to you to decide between Consistency and Availability.
Consistency
Consistency boils down to all systems "agree" or "see the same data" even in the presence of concurrent updates. Among databases this involves using distributed transactions or a consensus algorithm to ensure a level of consistency.
Without consistency the system will not be able to agree on the appropriate order of each update. If you're updating the profile image of a user, other users seeing an older profile image temporarily doesn't matter but if you're updating a bank balance you'd best be sure the system agrees on the order of each transaction, otherwise you might have parts of your systems computing different balances.
Availability
If a client is waiting to see the bank balance because it's in the process of being updated this wait time is a detriment to availability. In the example of fetching an old profile image this makes very little difference to the service you're providing so you can forgo consistency in favour of availability.
Essentially availability gives the client access to some version of the data at all times, without wait. Ensuring every request to the system results in some positive response is a prioritisation of availability.
Modern CAP Theorem
In more contemporary software engineering and in practical terms there's more things we can talk about related to CAP. One can dig further into each attribute and get a slightly more technical discussion around eventual consistency. There's also some that argue against a discussion of CAP since databases have come far enough that they can deal with both availability and consistency in a manner that's good enough for most systems.
I'm not here to get into these weeds, I'd like to offer a different application of CAP and apply it to teams.
Organisational CAP
Applying system thinking to teams isn't new, there's an entire book called Team Topologies1 that defines team structural archetypes and how you can use these in an organisation to structure optimal output.
The theory I go into below is more around how a team should consider scaling as workload increases and is required to become distributed, in a sense, instead of relying on a single person to handle operations.
As with CAP, I use the same three attributes but we'll provide new definitions for them since they're being applied to the context of a team, remember we will need to pick only 2 out of the 3.
Partition Tolerance
As in software this is a given. If we are scaling an organisation we can't rely on a single person or a single team to become a bottleneck to our production. There's a chance this person will become overloaded with work and will no longer be able to operate at max capacity. Much like a database under significant load.
We can also consider this as The number of teams you can support and still produce output.
Availability
Much like a system being able to take requests and respond without waiting on prior work being completed, which is something you'd very much enjoy if this were work being given to a team. Alternatively you can consider it as the number of things that can be worked on at one time, if you've got spare capacity then you have someone waiting to pick up new work as or when it comes in. This team would be considered highly available.
In short this is when they can work (or how much work they can achieve).
Consistency
Consistency in a system is consensus between machines. Consistency in an organisation is an agreement on how things should be done, or why something should be done. In a one person team, one person makes this decision, in a small business it doesn't take much for everyone to get up to speed and chip in on how something should proceed. However things start to get tricky at scale. The more people/parts and teams you introduce into the organisation the harder it is to find agreement on direction or decisions.
This is why we have meetings, and when we scale it is sometimes important to make sure that there's consensus at large, across multiple teams instead of just individuals.
Everyone needs to have the same context, the same why. Unfortunately as you scale an organisation you also need to figure out how to propagate context. You can throw money at the problem by hiring a specialist for each team, however not all companies can afford to do this and so a specialist's time needs to be divided between teams in order for them to provide their insight.
We can consider this specialist as a much larger machine, the best SSD drives on the market and maxed out memory limitations. In reality this could be someone with a ton of experience, knows what needs to get done and how to do it. We don't have this luxury in cutting edge tech with a lot of unknowns and usually you won't find someone that can cover many topics deeply which is why there's value in a diverse skill set within a team. In most cases we rule out the specialist per team.
The application of CAP
As with a software system we are limited to picking between Consistency and Availability, since we want to scale the organisation by bringing in more teams so we can ship more product out the door.
Choosing between availability and consistency within a team is the same as choosing between workload and context. We can increase the context of the team by improving communication and introducing more meetings but this comes at a cost of availability which will reduce the amount of work they're able to output.
The opposite is also true, you can increase the amount of workload they can get through but you sacrifice context. Which mean you're getting through a lot of work but the work lacks context, so we'd find ourselves doing more repeated work across teams and work that is misaligned or doesn't meet the requirement because the teams haven't a clue on why they're doing it.
I've seen both extremes, work grinding to a halt as you spend more time in meetings than you have time for work and busy work being done for no purpose at all but to look busy.
I understand there's a sentiment at large about not liking meetings, however meetings should serve a purpose. In order to deliver the best work possible you need context of the bigger picture, context of where the solution fits in, context of who the end user is and context of what everyone else is doing and lastly consensus with how work should be done.
Scaling an Organisation
Typically as a company scales you begin to notice that existing solutions or people become bottlenecks. If a single team owns or executes a solution they can become inundated with requests from other teams. This typically happens when context of executing a task is isolated to that one team and they've got no capacity to share context. Either that or the capability of solving the solution exists only within that team. What can happen, and what I've seen, is other teams get fed up with waiting for their request to be fulfilled and they decide to solve the problem on their own, leading to a second system which solves the same work.
Team Topologies defines four fundamental team types, however I think it can be simplified to just two. They mention Steam-Aligned teams, these are responsible for delivery and are generally high business context teams and Platform teams which act as an enabling service for the stream aligned teams.
I believe you can have more than one platform team. These should be teams responsible for enabling how work gets done. To some extent all engineers that build internal tooling are actually defining how work gets done, they do their best job when they enable other teams to get things done faster, independently and without the need to grab context from this team or engineers.
The best way I believe we can solve the Availability vs Consistency balance within teams is by shifting the purpose of the team from one that just does the work, to being responsible for defining how that work should be done.
With proper instructions or with a self service system you enable other teams that are closer to the problem and thus have the most context to address the business requirements.
If a single team is a bottleneck to other teams this might be an indication that they need to shift from doing the work to enabling the other teams to getting that work done.
I believe this thinking requires having teams with clear purpose and clear context domains, when the domain starts getting blurred it's tricker to scale teams as no one person can hold the entire context of a large organisation, you need to define those context boundaries and define what the purpose of each team should be.
The Specialist
Expertises scales better
Software Engineering at Google
Instead of requiring a specialist per team we can have a team of specialists that focus on how our stream aligned teams serve themselves. This avoids having business focused product teams communicating their needs and their context to a team that is focused on serving an internal problem. If these specialists had to listen to all product teams they would quickly burn-out from all the meetings they're attending. This is why we'd need to draw the context boundaries around the specialists and have them focus on a self service system that's highly available to enable the product teams.2
Further Reading
-
Which I've read. ↩
-
I think businesses get team balance wrong all the time, most of the time this is caused by the assumption that the organic formation of the company will be most efficient but it takes a level of bravery to call out a larger structural issue in an org. It also take some buy in from the rest of the company ↩
Mon 12 May 2025
Software Localisation
American websites format the date as MM/DD/YYYY
and this
can be confusing for Europeans. If I see the date 05/03/2025
,
I can't be sure if we're dealing with March or May.
Localisation extends further than just the format of dates. There are many things that require localisation the most obvious is language. If your site does not support the dominant language within a geography you're creating a language barrier between you and your customers. In Typeform's case, their customer's customer.
Translating and localising your software opens your business to new markets where relying on English won't cut it. Providing your system in a locale that's familiar to the user allows your system to feel natural and trustworthy. Luckily for us the internet has been around for more than 40 years and this, is sort of an old problem. There's been a good effort put towards enabling multilingual support.
ID or string?
When translating software, the first thing you'll need to determine is how to identify text that requires translation.
There's two ways you can do this:
-
Mapping a key to the text. This key will be used to lookup the correct message given the user's preferred language. Something like this:
message_key: "MISSING_NAME_TEXT"
-
Alternatively; provide the text as is and using that as the message key:
message: "You're missing your first name"
Systems have been written using both styles so there's no consensus on which one you should pick (I wish you luck in driving consensus in your own place of work). Here are some things to consider.
Subtle punctuation can change the whole meaning of a sentence. This is why systems tend to favour the entire sentence as the key for translation. Updating the sentence, even if you're just adding punctuation, should invalidate the translation or at least flag the translation so that it can be double checked.
It's also useful keeping the full text within the context of where it's being used. This way the developer or engineer can determine themselves if it makes sense. It is harder to determine if you're using the correct message if you're relying on message keys like: "missing.name_text" and "missing.text_name", the full text provides a clearer indication of the output.
Scaling the message keys can also be tricky as you'll need to
avoid name clashes. The best thing to do is use them with
some sort of namespacing e.g. "signup.error.missing_name"
and redefine the key for every use-case, even if the full
text ends up being the same, this allows you to change each
text independently.
Localisation built in
For those of us gifted enough to using a Unix based system
you might have access to gettext
and xgettext
in the
command line. These are tools used to translate
"natural language messages into the user's language, by
looking up the translation in a message catalog".1
Python has some built-in libraries which allow you to manage
internationalisation and localisation. Unless you've dealt
with localisation, I think very few people are aware of the
existence of gettext
.
Localization for Python Applications
The python gettext
library provides an interface which
allows you to define your program in a core language and
uses a separate message catalog to look up message
translation. As an example we can define a message that
requires localisation like so:
from gettext import gettext as _
_("Welcome!")
Using xgettext
we can construct a .pot
file. Which
will be used as a template for our language catalogues.
xgettext -o messages.pot --language=Python src/*.py
The pot file should look like this after running
xgettext
.
#: main.py:3
msgid "Welcome!"
msgstr ""
It's pretty neat that it has provided us the file name and
the line number for the text, although more useful in larger
codebases, we can use this to track redundant translation
strings. You'll also notice that it's using the full string
as the msgid
instead of assigning it to a code or number.
From this we create .po
files (unrelated to the
teletubby)2
, these are the concrete
versions of the .pot
file which contain the translations,
if we were to make a .po
file for Norwegian this would look
like:
#: main.py:4
msgid "Welcome!"
msgstr "Velkomst"
Now that we have a localised form of our language catalogue
we can use msgfmt
to compile a binary version of our .po
file, like so:
msgfmt -o messages.mo no_NO/messages.po
This command takes our no_NO
(Norwegian) messages and
compiles a precomputed hash table of the msgid -> msgstr and
outputs it to the .mo
file. These files are stored in
binaries so they're not human readable, but are efficient to
load into the application at start up.
When you start accumulating a lot of these catalogues they might require their own system to manage. These systems are also useful interfaces for the person providing the translation. As an example you can see PoEditor or Lokalise
People that work in localisation and translation will be
familiar with .po
files since they're often the file
format used with translation software.
Translation within context
If our app supports more than one localisation we have to indicate which localisation should be returned to a user. For an API we can set the user's locale within the context of a request.
Flask offers a library called Flask-Babel
which allows you to set this locale.
So if a Norwegian was to hit our API we'd have the client set
a header on the request: Content-Language: no_NO
3, on
returning the response all the strings instantiated
with gettext
will be translated into Norwegian.
There are some cases where you'll need to switch the locale context mid request or mid process, for example; a Norwegian user triggers an alert to an English user. We can instantiate a context manager with flask-babel, which will translate the strings to a specified locale:
def handler():
# <norwegian scope>
with force_locale(to_user.locale):
# <english scope>
send_email(to_user)
# <norwegian scope>
Plural Forms
Language is weird and there's nearly an edge-case for
everything. One of these cases that gettext
supports is
defining rules for plural forms. For example in English we
might say "one apple" and "two apples". However in a
language like Hebrew the plural form for two apples can't be
used for three apples, so to account for this gettext
provides ngettext
. Which is used like so:
from gettext import ngettext as n_
n_("%(num)d apple", "%(num)d apples", 3) % {"num": 3}
This allows gettext
to pull the correct plural form given
the int 3
and then formats the returning string,
replacing %(num)d
with 3
.
Lazy strings
If you're reusing the same string across your application
and defining it at module level this string will be translated
as soon as the module is instantiated. The module will
always fallback to your app's default locale and your
strings will not be translated. To get around this we use
something called lazy_gettext
. This allows us to define
the string and reuse it across the application as
lazy_gettext
will keep a reference to the msgid
and
defer translation until the text is needed.
You can see support for lazy_gettext in the django documentation
Wikipedia
Wikipedia manages content in over 300 languages. There are numerous volunteers which help to translate wiki into other languages and they do this through an interface called translatewiki.net. There's an entire team managing the infra and tools used in localisation.
Similar to Wikipedia, I've seen interfaces used to
manage and update translation files as well as a process
that can be triggered automatically or on a schedule to
update the .mo
files that a service references. After
updating the .mo
file you can automatically roll out a
deployment. The new deployment should then load the new
.mo
files into memory when the service is instantiated.
You don't need to be working on a system the scale of Wikipedia to include translations. You can rely on a user's system locale to translate CLI tools. My Fiancee's system is set to Norwegian, if I ever write a CLI for her I think it would be fun to provide a Norwegian interface.
-
$ man gettext
↩ -
I'd like to draw attention that I stole this joke from myself. I don't want to draw attention to the poorly performed lightening talk I did. It was my first time and I tried to fit this entire post into 5 minutes. Link for posterity ↩
Mon 05 May 2025
A Piecemeal Approach
The "Technical Debt" series:
-
1: (here) A Piecemeal Approach
The piecemeal engineer knows, like Socrates, how little he knows
Karl Popper's reflections on totalitarianism has had one of the largest impact on my approach to software engineering.
Utopias exist in business, engineering and societal context. There are always fervent believers that have a do or die attitude to process. This often gets in the way of pragmatism.
Popper
In his reflections KP makes the argument that if we are to progress as a society we should not attempt such large scale shifts of policy in pursuit of largely frivolous utopians. There exist peoples with high levels of confidence in their ability to understand how the world and society work. We should be wary of those that are uncompromising on their ideals.
Utopian views are often stated as goals in startups and engineering. Partially due to the need to sell the dream or idea before it's realised and when they're presented to potential investors and stakeholders. "I can solve all your problems with my solution" sounds more valuable than "I can solve half of an existing problem, maybe we can think of solving the rest later?"
Within a totalitarian regime, similar promises are made to the governed by painting a picture of a utopian society, a dream world envisioned by a leader sold at the price of handing over control and power. In these cases the marketing strategy is to stoke fear and shift blame.
I have no bother with utopias if they form part of the ideation or they're used as a perspective to view a problem. It's when they're used as a justification to keep heading down a failing path, I find them to be dangerous.
If you find yourself hearing leadership or a colleague making unfalsifiable claims or using the "well it doesn't apply in this situation" instead of conceding that perhaps they were wrong; you've found the charlatan. A fear of being wrong and being averse to pivoting leads projects and businesses into failure. If you know something isn't going to work, the sooner you know and respond the better.
Sometimes, the best thing you can do is just say "I don't know".
Software Engineering at Google (pg. 40).
Ceteris paribus
The business world is run on pragmatism. If it were plagued with "too much unscientific thought"1 it would be brought down by complexity and mess. Dijkstra attempted to reign in on software complexity in business, by advocating for writing systems that allow an engineer to focus on one single concern at a time. Least they be overwhelmed by all the moving pieces.
Similar to Popper, this is a focus on changing one thing at a time in order to determine the effect of that action. Modern day vampire Bryan Johnson, Founder of braintree, attempts to live forever by running hundreds of tests on himself. One of the largest criticisms with his approach is in how the doctors measure causality when he consumes ~106 pills every morning.
Startups and businesses that aim to solve everything are at risk of not being able to measure what's working and what's failing. They also risk avoiding their core business issues until it's too late and they're out of runway. Start ups have limited time so finding and tackling the areas of highest value to the business should be a priority, also known as, finding product market fit.
Many successful businesses started out tackling issues by focusing on a niche market. Targeting a small user base that struggles the most with an issue allows them to focus on a core problem and refine their product without being distracted by the myriad of different people and their individual needs. You can look at PayPal targeting people with thousands of transactions over Ebay, in order to refine making payments online. Revolut focused on problems that travellers faced, starting specifically with currency exchange. Nintendo got it's start selling playing cards in 1889, at this point in time I can't imagine the founder envisioned an Italian plumber eating mushrooms and rescuing princesses. The key is to move one step at a time and gaining some initial start can get your ear to the ground.
Don't be perfect
Utopia's are a constant threat to getting us into better positions. If my team is flying a burning plane and we need to land ASAP, I understand landing 10metres from the office or your home might be ideal but right now landing anywhere will do.
Perfect is the enemy of good. If we are constantly striving for a form of perfection we should acknowledge that we are delaying or forgoing getting to places that are good enough. And since Utopians are often unrelated to anyone's lived experience, there's no proof that this vision of perfect is indeed a great place to be. Which is why we need some resemblance of validation at each step of the process.
There are a number of successful companies and they are equally running numerous processes and styles of business. You might be able to find support for every methodology, if the self help expert says that eating carrots make you see in the dark, try it. But if it doesn't work, ditch it. If you're a team of one, perhaps doing daily stand ups will look different to a team of six.
Don't let the utopian process get in the way of driving value.
Lastly
Be wary of anyone that speaks with confidence and doesn't read.
-
Dijkstra in EWD-447 (1974) ↩