Mon 09 June 2025

Reframe Binary Search

Binary search is quite a simple algorithm to explain and thus it is often one of the first algorithms computer scientists learn. It's also occasionally brought up in software engineering interviews as it can be applied when solving optimisation problems.

The traditional explanation of binary search is flawed and acts as a barrier to its application. Binary search is not an algorithm that believes a set can be divided into three groups, e.g. these numbers are too cold, these number are too hot and this number is just right.

This goldilocks framing of binary search forces us to look for situations where the problem contains an array of porridge ordered by temperature and we are in search of the one falls into our preferred consumption temperature.

Binary search is an algorithm that is optimal at finding state transitions and framing it as such opens up the application of binary search as we no longer approach problems like goldilocks and instead; finding the best porridge becomes finding the point at which the ordered bowls of porridge transition from being tepid to scolding.1

The Basics

Commonly, binary search is introduced with the example of finding a word in a dictionary. As an example say we are tasked to find the word "Hippo":

The brute force approach is to start at A, and flick through the pages until you get to the page that contains "Hippo". This approach works, however the time complexity of this algorithm is O(n); where n is the number of pages in the dictionary. So in the worst case, "Zebra" or something, You're looking at all n pages.

Pretend H is Hippo:

BruteForceSearch

n operations is not inherently bad. If you have a list of 10 words. n operations isn't going to harm you. It starts to get a bit much when you have a billion words and it's not practical when you have n words to find among a billion words.

Here's where we can apply binary search. If we want to find the word "Hippo". We start by jumping to a random page, or we open the book in the middle.2 If the dictionary is 260 pages the middle will land us on 130 which will be M. We know that H is before M so we can throw away half the dictionary. M->Z, so from a single operation we've already chopped the worst case scenario to n/2. We don't stop here, now we go to page 65 (half of 130) which lands us somewhere between F and G we know H is after G so now we can pick something in between G -> M = J. We continue doing this, narrowing down the pages, until we land on the page that contains Hippo.

BinarySearch

In the worst-case scenario, when we are looking in a dictionary of 260 pages, we do 9 operations. If we relied on the previous algorithm we would do 260 operations in the worst-case.

The Weaknesses

This explanation does a good job to help us understand the algorithm but it overcomplicates binary search as it breaks down the operations that binary search is doing into three parts.

  1. Is the current word the word we are looking for?
  2. Is the word we are looking for ahead of the current word?
  3. Is the word we are looking for behind the current word?

These questions may help inform us as to how the algorithm behaves, but we can simplify it to the single binary question.

When does the word we are looking for no longer appear after our current word?

Instead of imaging the sequence as the words that might appear on each page, we can picture the sequence as the outcome of the statement: The word I'm looking for is after the current word.

StateTransition

As a result this frames binary search as a solution to finding a state transition. It is at the point when our word not longer appears after the current word that we find the word we are looking for.

Git Bisect

Binary search is used to find bugs. If you're familiar with git then you might have heard of git bisect. For those unfamiliar with git imagine a timeline with each point in that timeline being a change to a file. So every time the file is saved it adds a new node to the timeline.

Now instead of a single file, git works across a whole software project and each save is being noted down on the timeline. A change is what is being added and what is being removed, so the accumulated sum of all these changes is the current state of the project.

On large teams and complex projects bugs can occur after several changes have been made to the code. If we know that the code is broken in its current state but aren't sure what change caused the bug we can use git bisect to figure out which change is the culprit.

When we run git bisect we provide it with a point in time that we are certain no bug existed. git bisect will then present us a change between that time and now.

  1. No bug
  2. ?
  3. ?
  4. broken?
  5. ?
  6. ?
  7. Bug

We then have to answer; is the software at point 4 broken? If yes, the bug was introduced between 1 and 4. This rules out all points after 4. Cutting the points we have to analyse down in half. git bisect continues, running a binary search over the changes in the project until we find the change on the timeline that introduced the bug.

Essentially we are using binary search to answer the statement: The bug exists, because we are interested when the state transitions from "The bug doesn't exist" to "The bug exists".

The Boolean Function

We can now separate binary search into two distinct parts. The algorithm part, which moves a pointer either up or down the timeline and the function that computes a boolean response. With this we can ask more interesting questions for which binary search provides an answer, as an example; "When did spend increase?" or "When did the user go into debt?".

We can even provide two arrays and find the answer for "When did trend A over take trend B", without computing the delta for all observations on the timeline.

DeltaSearch

The main underlining assumption taught with the traditional explanation is that the set you're operating on should already be sorted for the search to work. The example above shows that this isn't true. Binary search only requires the underlying data to be bisect-able.

Binary search is an effective tool for these state transition problems. It is interesting that the algorithm on its own is easy to understand, the complexity and it's application seems to grow when you separate the algorithms iterator and the statement.

I'd be interested in finding out if the application of other algorithms grows when they're decomposed in a similar way.

Further reading


  1. I realise that using an analogy before explaining binary search might have lost some people. 

  2. We assume that words are evenly distributed in a dictionary to keep things simple. 

S Williams-Wynn at 12:15 | Comments() |

Mon 02 June 2025

Understanding Time and Space

On May 24th 2000 the Clay Math Institute announced a $1,000,000 prize for solving one of 7 maths problems. Since then only 1 has been solved. The mathematician that solved the problem rejected the money.1

One of those problems is the P vs NP question which relates to how we estimate time and space complexity for algorithms in computer science and software engineering.

Complexity

Complexity is an estimate of the growth in either time or space given an increase in the size of the input. Some algorithms have a linear relationship between time and input, we signify this complexity as O(n).

Constant complexity is signified as O(1) this means that as we increase n (the size of the input) the time and space is consistent, the algorithm doesn't take any longer and doesn't require more RAM.

An example of an algorithm that take constant O(1) space is finding the largest number in an unordered list of n numbers. You'll notice below, as we move through the list there's only one element we need to keep track of. This element is the largest item we've seen up to that point. When we see a larger number we replace the tracked number with this new larger number.

MaxAlgo

No matter how many items we include in this list we would still only need to track a single number, so the space complexity of this algorithm is O(1). We might apply this algorithm to find the largest file on a computer.

In terms of time complexity if we were to add more files or more items to this list the amount of time it takes would increase proportionally to the number of items we add. We therefore say this algorithm has an O(n) time complexity.

If we wish to improve the time complexity of finding the largest file, we can do this by having something track the largest file in the systems and checks and updates this whenever a file is written. This would provide us with a constant time O(1) algorithm as we avoid having to check all the file sizes every time we request the largest file. The trade off here is added complexity when updating any file on our system. If we were to reduce the size of the largest file the second largest might now be the largest and we'd have to find which file that might be.

We don't often require knowing what the largest file on the system might be so this added complexity is probably a waste of resource, additionally the overhead of creating, updating and deleting files has now increased. An O(n) search is probably enough for most machines.

Sorting

Sorting is often used to explain time complexity as there are a number of ways to sort items each with their own complexity. It's also useful because we can visualise sorting. Below I have provided an example of bubble sort, this has a time complexity of O(n2).

Sorting

This isn't the most efficient way to sort a list of items, but it's a good representation of an O(n2) algorithm.

Categorising complexity

So far we've been discussing polynomial time algorithms, there exist algorithms which take exponential time to produce a solution. As an example in the case we have 2 items this would take 4 operations, if we increase to 3 items it would take us 8 operations. Such an algorithm is said to have O(2n) time complexity.

An example of an exponential algorithmic problem is the N-Queens ii problem. The time complexity for an algorithm to solve this problem is O(n!). So 3 items is 6 operations, 4 items is 24 operations. These are algorithms which a computer would struggle to solve, in a sensible time, as n gets larger.

Within the realm of these exponential algorithms exist problems which we can solve in O(n!) time but we can validate the solution in polynomial time. So even if n is exceptionally large, given a solution, we are able to validate the correctness relatively quickly.

P vs NP

This class of algorithms forms the basis for one of the millennium problems, known as P vs NP. The problem asks if there's a relationship between the complexity of validating the solution and complexity in solving the problem. If P = NP then problems that can be verified in polynomial time can also be solvable in polynomial time. However P ≠ NP implies that some problems are inherently harder to solve than they are to verify.

There are a few problems which have no existing polynomial solution, which upon finding one will determine that P = NP. Problems such as the traveling salesman problem (TSP), The boolean satisfiability problem and the vertex cover problem Proving that there exists a polynomial solution to these problems will net you $1 million.

In Practice

Solving these problems have practical applications. Solving TSP would provide us optimal solutions for delivery routing and vertex cover has an application in DNA sequencing. So how do we produce solutions in these areas when attempting to find the best solution can take a computer as long as the lifetime of the universe.

We tend to rely on algorithms that are fast on average, these algorithms may involve backtracking with a large amount of pruning; this approach is used in chess. There are also approximation algorithms or heuristic based approaches such as simulated annealing.

Trade Offs

Understanding time and space allows us to make trade offs between strategies. Certain algorithms can provide us an efficient approach to solving technical problems. There is a limit to how much we can do with algorithms and sometimes even problems that sound simple on the surface could have a $1 million bounty standing for 25 years, without you realising it.


  1. I wonder if this will ever be adjusted for inflation otherwise the prize might not be that significant by the time the problem is solved. 

S Williams-Wynn at 12:01 | Comments() |

Mon 21 April 2025

Gray Code

A modern laptop can run ~3.8 billion cycles per second. The cycle is determined by oscillation frequency of the electrical signal that hits the CPU. Contemporary CPUs manage synchronisation using all sorts of error correction tricks.

In mechanical systems, such as those used in medical equipment and robotics, the binary numbers that we are most familiar with can cause errors if they're read during state transition.

Decimal and Binary

We are most familiar with decimals, this is a base 10 counting notation where each position in the number represents a different power of ten. E.g. 10, 100, 1000.

The computer relies on binary as this takes advantage of the fundamental on/off state within an electronic circuit. Binary is base 2, so each position represents a power of 2. E.g. 2, 4, 8, 16.

Reading States

Binary numbers can cause errors if they're read during transitions. The more positions that require toggling, while switching between numbers, the higher the chance we introduce errors into the system. This is shown clearly as we transition between the numbers 3 and 4. Which requires changing three bit positions. 011 -> 100.

BinaryCounting

If these bits aren't switched instantly we can read any of the following numbers 1, 2, 5, 6 or 7 instead of 3 or 4. Not great if you're working with a critical system and need precision.

Grey Code

To get around this we use an alternative ordering of the binary system in which successive numbers are separated by a single bit. Incrementing a number only relies on switching one position and removes the chance of reading the wrong number during state transitions.

This ordering is called Gray code, and an animation of the bit positions, for an incrementing number, is shown below:

GrayCounting

Decimal Binary Gray
0 0000 0000
1 0001 0001
2 0010 0011
3 0011 0010
4 0100 0110

The Application

In addition to reducing read errors, relying on only a single toggle to move up the number scale consumes less energy than traditional binary due the fewer toggled bits.

Some systems require testing every position of multiple switches or toggles. Gray code can improve the efficiency of these tests. If we had to iterate through all 16 combinations of 4 switches. Using ordinary binary would need to flip 1, 2, 1, and then 3 toggles as we move from numbers 1 to 4, while gray code allows us to only ever need to flip a single toggle to eventually test all switch combinations.

One of the most common uses of gray code is in rotary encoders, also known as knobs. These convert angular position to an analog or digital signal. If we had to rely on a normal binary scale, when rotating the knob, it could end up sending the intermediary numbers between each angle, which would make it pretty useless.

S Williams-Wynn at 12:03 | Comments() |

Mon 17 March 2025

Chess Programming - Bitboards

The "Chess Programming" series:

      1: (here) Chess Programming - Bitboards

While exploring an interesting game engine which compiles python code to WASM; I thought, there's no better way to have fun than to write a chess engine. This was the start of a one month long project which led me to create my own chess AI.

Chess programming is a topic that has been written about extensively, it has been a proving ground for some of the first AI researchers and even today it is still used as a benchmark for techniques in applied research.

This article is the first part in a series of three in which I attempt to distill some insight into the application of algorithms in software, namely chess programming. In the hopes that it may inspire how you approach solving similar problems in the future.

Bit Operations

If you've ever done some leetcoding or any other live algorithm style interview prep in software engineering you might have come across the odd question on bit operations and wondered if you'll ever need to actually do these in your day to day job. I'm not here to answer that, but I am here to show you how it applies to chess programming.

Firstly what are bit operations:

a bitwise operation operates on a bit string, a bit array or a binary numeral at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic operations and directly supported by the processor.

Wikipedia

Essentially bitwise operations are physical operations that are embedded into the processor, they consist of your logical AND, OR, XOR and bit shifts. The mathematical operations that you're familiar with (such as Add, Subtract etc.) are all built upon bitwise operations.

Here's an example of a NOT operation:

NOT 0111  (decimal 7)
  = 1000  (decimal 8)

And an example of an AND operation:

    1010 (decimal 5)
AND 1100 (decimal 12)
  = 1000 (decimal 8)

We can see these operations supported in python:

>>> 0b1100 & 0b1010
8
>>> bin(0b1100 & 0b1010)
'0b1000'

There's two ways we can shift bits. (Left and right) Here's an example in python:

>>> # Left
>>> bin(0b0010 << 1)
'0b100'

>>> # Right
>>> bin(0b0010 >> 1)
'0b1'

Explaining how the processor does addition is out of scope of this blog post, but put together some of these bit operators and you've got yourself a function that can do addition.

Bitboards

Back to the chess:

We are doing all of this essentially to get as close to the processor as possible, since we want the most performance out of a computer as we can by avoiding any unnecessary operations.

In chess programming we will use a single number to keep track of all the positions of pieces on the board. This is what we call a "bitboard". We will have a few bitboards to maintain the game's state. The first one will be 'board' which tracks all the pieces. At the start of the game this number will be: 0xFFFF00000000FFFF. Which is hexadecimal for 18446462598732906495. If we print the binary form of this number and format it over 8 rows we get the following:

 Bitboard:

8  1 1 1 1 1 1 1 1
7  1 1 1 1 1 1 1 1
6  0 0 0 0 0 0 0 0
5  0 0 0 0 0 0 0 0
4  0 0 0 0 0 0 0 0
3  0 0 0 0 0 0 0 0
2  1 1 1 1 1 1 1 1
1  1 1 1 1 1 1 1 1
   a b c d e f g h

There are a few other bitboards to track during a game of chess. There will be 2 boards for each colour (white and black). And there will be a bitboard for each type (6). Which gives us a total of 9 bitboards, or 9 numbers to cover the entire game state of chess. Here's two more examples of bitboards, one for pawns and one for white pieces:

>>> pawns = 0x00FF00000000FF00
>>> white = 0x000000000000FFFF

Now we can uses these three numbers to give us the answer to some questions for example; Where are all the white pawns?

>>> board & pawns & white
65280

 Bitboard (65280):

8  0 0 0 0 0 0 0 0
7  0 0 0 0 0 0 0 0
6  0 0 0 0 0 0 0 0
5  0 0 0 0 0 0 0 0
4  0 0 0 0 0 0 0 0
3  0 0 0 0 0 0 0 0
2  1 1 1 1 1 1 1 1
1  0 0 0 0 0 0 0 0
   a b c d e f g h

During a game of chess you'll use these to calculate which pieces are under attack. (e.g. given: knightsmove & theirpieces the result will be 0 if there's no piece under threat). We can also use these bitboards workout where the valid moves are with moves & ~board. (Moves and not board).

A chess AI that's looking 4 moves ahead will have ~5mil positions to consider, so having these operations done on a low level can help boost the performance of the software. The more you squeeze out of performance the more accurate your chess AI will be, because this improves it's ability to look into the future.

Beyond Chess

In summary learning about how bitboards are applied to chess programming has allowed me to understand how bit operations can be applied practically in software. Before I dived into learning how chess is programmed I had only imagined keeping track of the board state using several objects and an array, however I now see there's more than one way to crack an egg; the board state egg.

I hope the article helps to providing insight into practical applications of bit operators and leaves them to be considered as valuable for solving technical problems or at least reminds software engineers that we have them in our toolbelt.

Further Reading

S Williams-Wynn at 21:34 | Comments() |

Tue 09 October 2018

Foolscap List Ordering

Foolscap was my answer to losing all my notes and forgetting the exact file I saved it.

Was it on Desktop or next to the project?

Thus I decided to write an application that took care of the saving and finding of my notes.

Unordered Lists

One thing I have taken for granted is the ordering of my files in any file system. I only realised how important it was after unordered dictionaries in python allocated my notes random positions on the list.

The random ordering meant if I have note "M" in mind, it could appear anywhere on the list. So my eyes would track down n positions in the list to find "M". An O(n) operation for my eyes.

After growing tired (a whole day of O(n) eye operations), I wrote the rule to provide me the list in alphabetical order. This was great my brain automatically switched to doing a binary search for note "M". O(log n)

Ordered Lists

Now that finding note "M" was optimised for my eyes, my finger had to press the down key until my cursor was on the note of choice. This wasn't good enough, I was referring to note "M" several times a day and the other notes I'd reference once a month were adding computation time to my finger.

This is where I introduced two of the most powerful rules of Foolscap. Since I tracked the number of views, and time it was last viewed or modified I could implement the following ordering:

1: LastView/Modified/RecentlyCreated
2: MostViewed
3: 2ndMostViewed
4: 3rdMostViewed
5: A
6: Z

This was neat!

I had significantly cut down the amount of time it took for me to get to the notes I was using most frequently.

Search Lists

Searching needed a slightly different ordering since applying the rules above to your search results will likely be meaningless. The meaningless will come from your inability to predict what will appear in the list to begin with.

Python has a neat method for it's strings.

note_title.startswith("text")

Which I used as the initial search feature. A search by the prefix of my note title. To me this made sense since my notes are titled by their subject matter, e.g. "linux", "elastic".

The iteration to this search functionality was to include the conditional:

if "text" in note_title:

I still wanted to provide the user with the result of the initial prefix search as the first result in the list.

This is when I fathomed this solution:

search = [title for title, _ in notes.items()
          if query in title]

# Sort by the position the query appears in the title
return sorted(search, key=lambda title: title.index(query))

Pretty neat!

S Williams-Wynn at 22:37 | Comments() |
Socials
Friends
Subscribe