Tue 21 April 2026

Probabilistic Data Structures

Keeping track of a constant set of items is fairly straight forward, however when the number of items start to grow larger than the capacity of a single machine things get expensive. There's a way around this and it's to rely on approximations instead of concrete numbers. There are two probabilistic data structures I'd like to cover in this post; Bloom Filters and Count-Min Sketch.

Generally these are applied when space is a constraint and you need predictable and consistent size. If you're counting or caching at scale there might be a chance that your database is relying on probability instead of certainty.

Hash Tables

A fundamental data structure in computer science is the hash table, useful in caching and counting, as well as representing objects in software.

A large hash table is often useful when you need a key value store such as one that would map user ID to a profile picture so that every request for a profile picture is speedy since hash tables operate in 0(1) for look ups.

This is done by having a number of buckets and a function that consistently converts a key to the index of a bucket. This is called the hash function. A cache or a key value store only requires computing the hash to locate the data.

If we were to hash the key "fox" (hash("fox")) and the resulting output was 5 we would know our data is in bucket 5.

Hash Tables

Hash functions won't compute unique hashes and occasionally they can collide with existing keys stored in the hash table. So both hash("fox") and hash("cat") might end up pointing to bucket 5. They can reduce the chance of this happening by increasing the number of buckets. Having 10 keys and 1 million buckets means the chance of collision becomes extremely small.

In practice hash tables store linked lists in the bucket locations and when a collision occurs they iterate through the list until it finds the key. When storing a new key it appends to the end of the list if it's not there already. Redis uses this technique in addition to resizing the number of buckets dynamically in-order-to keep the length of these lists to a minimum.

We can use hash tables to determine if we've seen something before, by storing keys as we see them, if the key exists in the hash table we know that this isn't the first time we've seen the item.

Bloom Filters

When we start dealing with data streams or billions of users, storing everything in memory can be expensive. Instead we can reduce the total memory consumed by using a probabilistic data structure; the bloom filter.

Bloom filters rely on approximate set inclusion. So instead of yes this item is in the set or no it isn't; we get the following outcome:

  1. This item is not in the set.
  2. This item might be in the set.

This is done by having a consistent number of buckets and using more than one hash function. As you can see in the illustration below, the key is hashed three times and each resultant bucket is set to 1.

Bloom Filters

When items are queried we will know for sure that we haven't seen it before if any of the buckets return a 0. However if all the buckets result in a collision then we know that we might have seen it before.

Both the number of buckets and hashes can be configured which allows us to trade more space for a reduction in the probability that we return false negatives. (We might have seen it, when we haven't).

The bloom filter is applied in situations where space is limited and keeping track of every element isn't an option. If we wish to avoid making expensive queries for data that doesn't exist; a bloom filter can help us reduce the number of expensive queries.

Browsers have used bloom filters in the past by providing a preset filter of malicious URLs. When we visit a URL and it's not included in the filter we can proceed however if it might be in the filter we can query a server to help determine if it's safe or not. We avoid this query on the majority of URLs as most URLs are safe.

Count-Min Sketch

The last probabilistic data structure I'd like to cover is the Count-Min Sketch, like a bloom filter it has multiple hash functions, unlike the bloom filter it tracks the number of times a key lands in a bucket.

When queried it hashes the key and returns the minimum from the counts stored in the corresponding buckets. This allows us to determine an upper bound estimate for the number of times we've seen a key.

Count Min Sketch

Count-Min Sketch is useful in large scale data processing, for example if we are interested in tracking the top-k searches we can do this normally by using a heap. If we have size restrictions and need to use constant space instead of O(n) space we can put the sketch in front of the heap.

Heap inserts are done in log(n) time which we can avoid doing if we know the item shouldn't be in the heap. Items that appear infrequently are then discarded before even making it to the heap. We do this by querying the sketch for an upper bound of the new item, for example 3, and if the kth item in our heap has 12 appearances then we can avoid adding the new item to the heap.

I have found it interesting that at scale we can use probability to optimise our systems but it also requires an understanding of how the data is distributed. These data structures work well on long tail distributions but when all items are as frequent as each other these become less useful. It would be interesting to discover how systems can map to different distributions of data and how these structures are set up to solve the given problems.

S Williams-Wynn at 12:00 | Comments() |

Mon 13 April 2026

Setting up success

I have a frightening memory of joining a company and my first Pull Request required me to SSH into a compute instance and do a git pull in order to release my change into production. On that occasion I found that I was also deploying more than just my change as the branch was not entirely up to date.

There's a battle of trade-offs in software where things that sound good on paper are often placed in the backlog and forgotten about. Something about the process being good enough or being the way things have always been done that kicks valid improvements down the priority list. It is with luck that some are granted the privileged to determine what they get to work on and companies will give that chance to people that have agency and a strong conviction that they know better.

How it began

The other pattern I noticed at that company was a lack of dev and prod environments. Everyone with access to the project had access to everything. At this stage the company was on-boarding more engineers and data scientists and each of them were being granted full access to this monolithic environment.

It was even more exciting to find legacy projects without owners and services being deployed into production using the developer's container orchestrator of choice. Docker-compose, Mesos, self-hosted kubernetes or GKS.

The company was growing quickly and it felt like with every new developer there was a new way to release changes into production. Every fire I was pulled into appeared to be running on it's own tech stack and required learning how things operated from the ground up. Nothing seemed transferable from one project to the next.

These were the problems I had determined to solve.

Taking on the problem

Over a week I mustered together a team to tackle this complexity. The first thing was a meeting with the CTO to propose setting up two new project environments; one for dev and one for prod. This was an easy request as the CTO was horrified to learn that things were being deployed straight to production.

The next thing to setup was a CI/CD pipeline that is generic enough to be used across any service. This meant that if you wanted to use our new dev/prod environments your servers had to be deployed through our automated pipelines. To help the other teams we wrote a service template and helm chart that would play nice with CI/CD. This also meant that we restricted the deployments to a single hosted container orchestrator and allowed us to consolidate all the different styles of deployment. As a consequence we were able to help across more teams as they ran into issues due to us becoming more familiar with kubernetes and not having to understand an entirely new workflow or orchestrator.

The Fun Didn't Stop

We had an outage at around 2pm in the afternoon when the company lost connection to all the servers in the production environment. At this point we had around 5 teams deploying code on our stack. An urgent message in one of the slack channel asking if anyone had changed something revealed that someone had assigned their service a new IP range which masked our network bridge to the rest of the company.

This is when we introduced terraform. Any network changes or changes to the infrastructure could now be reviewed, reverted and audited since it was defined in code. If something went wrong we could investigate the changes as well as having our infra committed to version control. We saw the introduction of terraform improve the adoption of our stack as new engineers were frightened to work in the other cow-boy projects that had existed on infra configured by hand through the web UI.

We started to notice that other teams were adopting our tech stack as they no longer needed to spend time on defining the release process as we had a template that they could clone and get started almost immediately. Now they could tackle business problems instead of weighing up the trade-offs of each container orchestrator.

Our helm chart also helped, we got rid of the mountains of bespoke yaml used for several k8s deployments. We could also serve engineers with features such as specifying cron jobs with very little configuration in their service spec. Most teams were data intensive and relied on scheduled batch processes, so in some cases they had only adopted these cronjobs.

We also developed a library for new services that setup logging, sentry integration, trace IDs and database connections and introduced a standard for running database migrations in the service template. At this point is was quite important to encourage shared ownership of these codebases. The more open we were to changes from other teams the more likely they were to use it and benefit other teams from these changes. This broke down the silos that had existed previously, as improvements were no longer isolated to a single service but could now be utilised across the company. It also meant these libraries were improving without needing someone on my team working on it full-time.

Clear Sailing

My team of 5 was serving 80 engineers and data scientists across 12 teams. Upon reflection not everything went smoothly. There were people that wanted to maintain control over their entire stack, perhaps we were a bit stretched and couldn't focus on features they required at the time. They might have also had disagreements with choices we had made in the service template.

There were also features we developed that didn't get adopted, which makes sense to me now. These were things that generally slowed people down for very little benefit. Contract testing is an example of this. On paper having clear contracts between services and having a means to define and test these contracts sounds like a great idea. The nature of the services at the time, being cronjobs or at most two endpoints meant that their interfaces weren't growing in complexity and introducing this step in the build process wasn't a big enough bang for their buck.

Service to service authentication was another feature that we didn't need at that time. Our services were internal and in a VPC. Obviously if we were compromised not being able to send requests to the servers in that network would be a great thing to have, but I think it would have been better to sink time into features that would speed up the adoption of our workflow instead of features that added friction. Not to say we didn't need this auth layer but we could have potentially address this at a later point.

Improvements

There are improvements to the workflows that I would have enjoyed introducing. For example; using a CI/CD server that didn't rely on defining our workflows in Kotlin. I feel that Kotlin added a barrier to contributions and we didn't need this. There are other CI/CD tools like gocd.org and concourse-ci.org which have an easier way of defining workflows. Although now-a-days we can get a lot done with github workflows and reduce the reliance on having a CI server.

We attempted introducing Istio, however, at the time this was an aspirational feature but if we had succeeded we would have allowed our teams to run canary deployments, which would allow them to divert a small amount of traffic to a new version of their service and if anything goes wrong, it avoids rolling out the broken version to all customers.

I still seem to come across companies that only give leadership the ability to deploy to production and changes go out once mid week. When this is the practice many changes tend to accumulate and when they're released everything goes out as a big-bang deployment. When something breaks in these scenarios it's often harder to pinpoint what went wrong. Smaller frequent deployments grants autonomy and shortens the feedback loop which speeds up a developer finding out what went wrong, releases also tend to be less disruptive and engineer's have a higher confidence in the changes they release into the world.

The biggest take away from my experience in leading an infra team is that we learn by making mistakes, so we shouldn't try reduce the number of mistakes we make. We must focus on reducing the cost of the mistake through incremental change, rollbacks and observability. Many companies and teams try to make no mistake at all and in doing so they cost themselves growth.

S Williams-Wynn at 12:00 | Comments() |

Mon 16 March 2026

Learning from Sales

It is not the job of sales to find customers with problems that we have solved, but also to find the problems that we can potentially solve. The role is one part customer relations and one part business development.

The challenge with this role is that features fall on a spectrum from nice-to-have to critical-for-business; being able to tell the difference is a key skill for anyone in sales. If we filter all the nice-to-have functionality out of the developer's backlog what remains is the high value and impactful functions that allow us to sign more deals.

Engineering teams often miss this business context and lack the skill to differentiate these features. Part of the problem stems from being a few steps removed from the customer and nice-to-have features can appear more exciting to work on but don't end up creating value.

Engineers that wish to be highly autonomous and are charged with seeking impactful work need to exercise their inner salesperson. Those that don't embrace these skills are at risk of wasting their time and working on the wrong thing.

Buy Signals

Those in sales and business dev look for hints that inform them of the project's potential to get to the deal signing stage. Focusing on projects with high potential allows us to avoid chasing cases that won't move forward or are likely to end without being closed.

If more than one customer is asking for a feature this can be an obvious indication that focusing on this can provide more impact. If you're working on internal software infrastructure, are you getting a feature request from more than one team, or is it from the esoteric data scientist that seems to have their own bespoke setup?

Sales rely on buy signals. A buy signal can indicate when a client is excited enough by a solution that they are willing to consider a purchase. The most obvious way they signal this is by giving you money. If they're trying to give you money before the solution is built, you'll know the solution is somewhat important to them.

We don't have to rely on the client putting down money for us to recognise a buy signal. Buy signals can appear from any skin a customer is willing to risk for a solution. If a customer is willing to vouch for the product to a superior this is a political stake. This allows us to move up the decision chain and provides a positive signal that a deal can make it to close. We also use this as a way of making progress in conversations without explicitly asking for money.

Software engineers should want to get closer to the projects that have the largest impact. We can use buy signals to determine the projects with the most buy-in from leadership and create the most business.

Discovery Questions

Developers can find themselves working on the right project but can end up developing the wrong thing without an accurate idea of the pain-point. We need to peel back the layers that surround a customer's pain to determine how we may best deliver.

Sales do this with discovery sessions or hearing it directly from the customer. Their goal is to avoid making incorrect assumptions about the problem which may lead to building the wrong solution. Software engineers could also benefit from this skill. If you're given a spec an engineer should ensure they're delivering the appropriate fix, otherwise they risk missing the target and have to do revisions or start again from scratch.

Sales does this with open ended questions, the more space they provide a customer to explore the problem the higher the chance we expose insight. Getting them to dig deeper on specific points can expose issues they might be having and asking how a solution might affect their business can help sales qualify this lead.

A classic technique of getting your customer to talk is called Mirroring covered in "Never Split the Difference" by Chris Voss.

Are your developers asking enough questions to expose misaligned assumptions?

Touch Points

Progress updates with customers allow you to show that you are keeping them in mind and their requests haven't been forgotten about. They also provide a heads up when new features are near release. They are used to build trust and allow the customer to feel like they are helping guide the process.

If your wins are also their wins they will feel like they have more skin in the project's success and become an internal champion across the industry and in their own business.

How often do you provide a task to a developer and only hear about it three months later when there's just two weeks until the deadline?

Regular touch points don't just serve the relationship you are building with your customers but also allow you to validate the project closer to real-time. Ensuring that you are headed in the correct direction is critical for success. If you're heading down the wrong path you'll want to know as soon as possible.

It's a shame that developers seem to be stereotyped as introverts lacking people skills that wish to be isolated from any call with the customer. The industry appears to lean into this idea by adding more barriers between the client and developer. The truth is the companies that have their developers closest to users and customers are the ones that succeed and the best way for an engineer to level up is to care more about the customer.

Determine Timelines

Software can rely on external processes and teams. Figure out what you're building and predict where you might need approvals or contracts, this will allow you to get the ball rolling in those departments. Sales already do this on their discovery calls by probing if the customer will need to bump other departments, bumping them now can lead to less delay.

Within a company we might require a third party tool for the new feature we're developing, so the sooner we loop in finance or the security team the better. We can have them work on their specifics in the background while we work on the feature. We can probe our PM before we start the project that we might require looping in these other teams.

We should also determine when something will be needed and understand the timeline. They could be aiming to deploy their MVP in a month which could affect how we prioritise building the solution. If the deadline is tight, perhaps we can determine if there's only a subset of features required for the MVP and avoid sinking time into the features that are expected at a later time.

Alignment

Developers can level themselves up by becoming more aligned with the business and not less. One way of doing this is by moving closer to the customer or learning how other roles operate. Software will continue to be a competitive industry and engineers can't afford to waste time on things that don't delight their users and/or provide impact.

S Williams-Wynn at 12:05 | Comments() |

Mon 09 March 2026

The Moderate Take

Economics and politics are determined by the most compelling stories and occasionally it is hit by reality. This is one of the challenges when following reactionary discourses on platforms like Linkedin, Reddit and Twitter.

When advice comes from board members and VCs I am reminded that they scroll the same threads on reddit that I do. Their opinion is often shaped by the highest voted comment in these forums.

Unfortunately we aren't gripped by stories that are filled with the context and the caveats that exist in the real world and this has shifted us into more extreme political leanings. Why should we fill our content with details when this jeopardizes the opportunity of going viral.

Good Work

False narratives get the clicks and impact stocks because they're entertaining and persuading. We are willing to sit still and listen to the stories that are novel and gripping. 14 years ago Elon said we would land people on Mars in 10 years. (15-20 years in the worst-case).

including caveats all the time makes articles too awkward to read and buries your actual point

No one finds being moderate sexy.

The boring parts of software

The engineers that have the largest impact are the ones that read the most documentation. Getting through software specs is tough, they're not filled with fluffy prose and they're often dense and technical, but nothing delivers value like saying: "We don't need to do that work, that feature already exists in the API".

Similarly, the largest mistakes I've seen made in software come from conclusions being made too quickly after reading a single page of documentation. Engineers also avoid reading documentation by jumping on a band wagon of agreement without verification.

This extends to how reactionary takes tend to mold their opinions. Life is easier when someone does the work for you. If things go belly up, hey that wasn't my misinterpretation.

Where's it shifting?

This doesn't just apply to engineer's, Steve Eisman had this to say about the finance industry in 2008, but this applies to everyone.

I think one of the hardest things for all human beings, me too, to deal with our paradigm shifts. You know, you exist in a paradigm. It's been around from a very, very long time. Your whole career is based on that paradigm. you've made a lot of money in that paradigm and then it turns out that the paradigm is either changing because of technology or maybe the paradigm was actually wrong because it was based on continuously increasing leverage which is what the financial services industry's paradigm was based on human beings have tremendously difficult time dealing with paradigm shifts. Tremendous.

It's like a nightmare. They don't want to deal with it.

r/ExperiencedDevs is an echo chamber degrading Ai slop and crafting existential dread in the industry. They assert that Ai can never be as detailed or accurate as they are when it comes to writing code, at the same time they're shaking in their boots about the future of their industry. When there's commentary on the benefits of AI these comments tend to be downvoted or deleted and the blame is put on automated bots making up pro-ai agitprop.

Software engineers tend to be defensive when it comes to generative code and state that most of their job wasn't code to begin with.1 These engineers aren't out of the "Ai makes mistakes" phase or they're moving goal post to take the target off their back.

The latest cohort of uni students are winning hack-a-thons without knowing how their applications works. r/ExperiencedDevs needs to come to terms with the unknown and as long as this industry has been around we've never actually known how projects work in their entirety. We've never read every line of code in our dependencies, not knowing how something works and being able to make contributions is not a new phenomenon. Those that have succeeded in this industry can wade through the unknown and that will continue to be the case.

It's normal to have strong feelings about the grads picking something up in a short amount of time that may have taken you longer to hone in on. It's the classic "back in my day" reaction.

Extremes

Unfortunately avoiding nuance leads us to extremes.

The truth is; software engineering looks different and requires different skills at different stages of the business and stack. These complaints might just be a misalignment with the business.

Ousterhout describes in his book "A Philosophy of Software Design" the difference between strategic coding and tactical coding. Where tactical coding relies on hacks to get the job done and strategic coding has a more long term view on the code base. He vouches that software should be done strategically but I believe we need to pick our battles.

There are some entities and functions that are core to engineering companies, but they become core at different times and we should make an attempt to recognise when this change occurs.

It is nice to pretend that your code is an integral part to the business's existence. The truth is a project is always initially an experiment and over time it is rewritten to more accurate specification as we learn and the business learns. Importance is discovered. Your first attempt will be done when you know the least about the topic.

Eventually it might become a core service to many teams within the company and at that point it's worth getting serious about engineering practices. Front loading your engineering standards is making your experimentation more expensive.

If you're treating every project like a personal flower garden you'll struggle to recognise when code is dead weight. Thank it for its cycles, praise it for the outage it caused, the war story and what you have learnt. Then delete it.

Software is about discovery. Code generation enables us to prototype and discover how things work. Prototype for your own education not just for the customer. The pace of learning has increased and we can discover and experiment quicker than ever.


  1. Like I did in "Software is planning" 

S Williams-Wynn at 12:12 | Comments() |

Mon 02 March 2026

Get more out of 1:1s

There isn't a course or career training when it comes to the one to ones with a line-manager. Every manager conducts them in their own way, and what happens in them can sometimes feel secretive. So here's a look under the curtain on how to get more from these meetings.

When you realise that your manager is human these meetings become easier. Once you start giving more accurate descriptions on where you'd like to grow, what you want to work on and share what you've achieved the meeting becomes more productive.

Unfortunately these meetings rely on how well you know yourself. Since managers don't possess an innate ability to know what makes you happy or what your goals and aspirations are. Opening up can feel like a sign of weakness or make you feel more vulnerable, but the more you share the easier you make it for them to help you achieve your goals.

So what does a good one to one look like?1

You set the agenda

We should approach setting the agenda pragmatically, if you are short on time come up with a few question that will lead to a productive outcome that you can ask in every catchup. The one question I typically ask is: How does the rest of the team feel?

This allows me to determine how confident others might be about a project. It helps me become aware of colleagues that are feeling stretched and could use my help or perhaps they wish me to help out in different ways. I've found this to be an effective way to find the gaps that need filling.

The areas you should be creating questions around should be on your general feeling about work, work-life balance, your growth, your interactions with others on your team and company and the progress you've made on tasks and goals. You don't need to fit this all into one catch up but theming your next catch up on one of these topics can help you prepare and get you started.

Achievements should be brought up. This is emphasised in a remote company, where your achievements can be missed if you're telling no one about them. A good manager will pass your achievements around the office, because typically their performance is tied to your own and you're trying to achieve as a team.

Sharing achievements builds trust, once those big ticket items land on the team's plate and they need someone to lead it. The rapport you've built up with your manager might just land you these challenges.

Coach your manager

If you layout sensitive scenarios for your manager this can help you broach a topic or help you handle those situations if you are ever in that position. It can also establish expectations and intent. As an example you could ask how they would let you know if you were underperforming or what an early sign might be.

Your manager is a sounding board. If you have someone in your team that isn't meeting expectations asking them for help is ok, but be sure to prepare how you'll handle the situation so it doesn't look like you're trying to avoid responsibility.

Align yourself with your manager's priorities. This can help you be more aligned with the business objectives and become more valuable to your manager. If you help them they'll probably want you to stick around and if their scope increases so might yours.

Give your managers work to do. If you find yourself working on something that is not challenging or can be tedious try giving it to your manager. If you pitch them something more important that you could be delivering they'll take the tedious blockers away from you.2

Sharpen your ideas

Only call the vote once you know you're going to win

There's an overhead to collaboration, and if you had to listen to every idea in the business you wouldn't have time to do any work. Use your manager to sharpen your ideas, convince them before convincing anyone else. If you slowly get people onto your side through catch ups when you present the idea to the business you'll already know how to answer the probing questions.

Use the time to develop your relationship and your ideas. Your manager has insight into who is working on what and they can direct you to the people that are excited to talk to you about your ideas and these people might know the challenges you are heading into.

Know Yourself

The classic "where do you want to be in 5 years" question isn't asked to determine if you will have a future at the company. The question is used to determine how well you know yourself. If you are managing someone that is unsure about what they enjoy or what makes them happy how are you going to put them on the work they are most passionate about.

It takes work to understand yourself so mentioning this to your manager can allow them to throw all sorts of tasks at you to see what you're best at. They can help you identify your weaknesses and strength. However they'll not do this unless you're comfortable with the challenge and the way to signal that you're comfortable is to ask.

Your manager will also provide you with work you enjoy and having a history of catch ups in which you've said "I really enjoyed working on x" can increase the chance you will work on those things.

Use your manager to polish your strengths and strengthen your weaknesses. Much of this comes with knowing yourself.

Discuss your weaknesses

The best way to deal with a weakness is by opening up to your manager. Then you can work to find situations that will allow you to improve and make mistakes. Don't expect them to provide you a magical solution to your weakness, some weaknesses take time to develop into a strength and skills take practice.

They can provide you some structure and ideas for actions but they can also provide you with work that allows you to stretch yourself. The best way to learn is to make note of mistakes so getting the opportunity to make more of them is worth while.

Not just your manager

When you're in a company you have access to people with all sorts of skills. People that know stuff that you don't, and you can utilise this for your own growth and understanding.

Try catch up with someone in sales and ask questions like, "What about sales would you like more software engineers to know?"

Or set one up with someone in product and ask how they ensure we are working on the best thing?

Finally

To summarise a lot of these meetings depend on what you want to get out of them and this article touches on some of the topics you might wish to dive into during these catch ups. Although they might be biased towards what I try to get out of them.

Perhaps managers could ask you to set up questions for the next meeting but they probably wish to avoid putting any unnecessary pressure on you as there can be personal reasons why someone doesn't wish to gun for more responsibility at this time.

The key is having more empathy with your manager, they probably have their own goals so use them as an example.


  1. I don't know, ask your manager. 

  2. The chances are they'll give it to someone else to do. 

S Williams-Wynn at 12:10 | Comments() |
Socials
Friends
Subscribe