Mon 30 June 2025

Low Latency Computation

Anything that loads faster than 200ms is instantaneous for humans, anything slower and we would perceive the delay. Some systems are built to respond quicker than this, business cost can be sensitive to latency and every ms can make a difference.

How do we develop highly performant systems? I happened to share a train ride with Mark Shannon, who at the time was leading a team at Microsoft that had one goal: Make Python Fast.

I asked him, how does one make code more performant? To which he responded.

Make it do less.

Here's what doing less looks like:

Loop Invariant

A simple example of doing less in order to achieve better performance is with loop invariant conditions.

We loop over arrays while we code and make calls to other functions while looping. During a refactor or while we move code around we might not realise that some of the variables we are computing are constant.

def expensive(trades: list[Trade], n: int):
    rolling_sum = 0
    for trade in trades:
        beta = compute_beta(n)
        rolling_sum += beta * trade.risk

    return rolling_sum / len(trades)

You'll notice in the example that I am looping over the list trades and computing a beta value, but this beta value is computed with the integer n which is consistent within the scope of this function. If we move the computation of beta to a point before we've entered loop we avoid having to compute beta for each trade in trades.

def expensive(trades: list[Trade], n: int):
    rolling_sum = 0
    beta = compute_beta(n)
    for trade in trades:
        rolling_sum += beta * trade.risk

    return rolling_sum / len(trades)

Now beta is computed at most once within this method scope, we have essentially achieved doing less. If the compute_beta method incurs an expensive overhead and we are looking at a lengthy trades list this subtle oversight would have a significant impact on our latency.

Python

There are several layers of abstraction in Python and there are ways we can code that allow us to utilise computation in C more effectively. One such example of this: using list comprehension over a vanilla for-loop.

If we wish to understand the number of steps the machine is doing for a method, Python allows us to disassemble bytecode using import dis. You might notice that there are additional LOAD_ATTR instructions in a vanilla for-loop so defaulting to a list comprehension avoids this overhead and allows us to do less.

Databases

Relying on external systems can be expensive, in some cases we might query the same system several times in order to complete an operation. If we have a distributed system or intend to scale our server horizontally we are increasing the number of things that open and close connections to the database. Each time we open/close a connection the CPU has to process these instructions, this overhead can be significant when we are working in low latency environments. To get around this we introduce a connection pool between the database and our server, the connection pool manages long lived connections on the database and frees up the database to focus on database things.

A common anti-pattern involving a database is having your ORM make N+1 queries. For this to occur we have a 1-many entity relationship. If we ask the database to return a group of n countries and then for each country we ask for all cities in that country we are making N (number of countries in our initial query) + 1 (the initial query) total requests to the database. These can be tricky to spot since the individual query could be performant, but the accumulative overhead and latency of all these queries can cause problems.

We get around this by either asking for all countries and then ask for all cities given a set of country ids and perform the mapping in memory on the server, or we can increase the complexity of the query using a JOIN and allow the database to make the mapping. Either way we avoid the back and forth overhead of making N+1 queries, effectively doing less.

When we can't do less

Sometimes we can't do less, but we can decide when work is done. There are situations that allow us to pre compute results and provide performant computation at the cost of a slower start time.

Chess programming is an area which relies on high performance, the quick our program the further into the future we can look on each move during the game. One way we speed up a chess engine is by precomputing the movesets for every piece on the board for every possible position they might be in before the game starts. This allows us to look up all possible board positions for a piece during the game instead of computing only when we need it.1

In 3b1b's wordle solver he actually computes a pattern matrix for each word and saves it to a file in order to avoid having to incur the overhead on each execution of his wordle solver.

Constant Folding

Lastly there's an interesting optimisation that compilers make which is also utilised in python called Constant Folding.

When python is interpreted it parses the AST before executing. During this AST construction it finds expressions who's outcome is constant and computes the constant at this point instead of computing the expression every time it is referenced. As an example the variable x = 24 * 60 will be converted to 1440 before compiling the bytecode instructs for the machine.

So in all cases, if we want a computer to be highly performant we just have to make it do less.

Further Reading

  • perflint try running this on your python codebase to check for performance optimisations.

  1. More on this will be explained when I write my second post on chess programming. 

Mon 31 August 2020

Python Deque

This is now my third article on lists. As someone that uses the built-in python list on a fairly regular basis, I might have built up a false sense of security. I'm pretty familiar with these listy-boys. However, recently I found out that I was not thinking about them correctly. Readers might smack themselves if they're familiar with data-structures but don't know how lists are implemented internally. The built-in lists are dynamic arrays.

How else could they optimise a sweet O(1) lookup time on indexing: mylist[4]. Especially when analysts are trying to avoid the built-in iterator and cursing their code with: for i in len(mylist): mylist[i].

Another trait an established data-structurer with be familiar with when it comes to dynamic arrays is that the append and pop methods are an amortised O(1). Amortised; because occasionally you have to suffer a cost of realloc(ating) memory across larger arrays.

Where the list starts to suffer is from poping and inserting at arbitrary positions.

Linked-List

The data-structurer will have had the linked-list slammed into their head often enough that it will pain them to hear about it again. So theory aside, I'll give you that sweet O(1) append and last item pop that you expect from a performant Stack.

Python deque provides a comparatively larger performance hit on initialisation to list and has poor O(n) performance when you want any arbitrary item somewhere in the middle. It does, however, have O(1); popleft, pop, append and appendleft. Due to being a doubly-linked list (or double-ended queue to get the abbreviation deque)

Deque in the wild

I saw a nice little quote from an enginneer on Quora:

In 8 years of getting paid to write computer programs, this post is the only time I’ve typed ‘deque.’

There are many places deque is used in the stdlib, most commonly whenever someone needs a queue or stack such as constructing a traceback, parsing python's sytax tree and keeping track of context scope.

My little run-in with deque was using it instead of a recursive function to avoid python's

maximum recursion depth exceeded

This limit happens to be set to 10^4. The solution was to add child nodes to a deque and when you were done with analysing the current node, popleft the next node.

Python Queue

You might be tempted to ask, well if deque is for queues. What on earth is from queue import Queue.

These queues are different (although, still using deque under the hood). They are optimised for communication across threads, which need to involve locking mechanisms and support methods like put_nowait() and join(). These are not intended to be used as a collective data-structure, hence the lack of support for the in operator.

More information

There is some neat documentation in the cpython repo which contains more data-structures and other alternatives to the standard built-in list. Tools for working with lists

References

Thu 06 December 2018

Chain of Responsibility

Easily one of my favourite patterns from the gang of four is the chain of responsibility pattern. It aims to avoid coupling the object that sends a request to the handlers of the request. This is especially useful if the structure to our handlers follows a sense of hierarchy.

I've had very few opportunities to implement design patterns, but gaming has really helped me to envision a context where several designs can be applied as handy solutions to managing complexity.

So we will apply the chain of responsibility to a situation where we have gathered four of the most powerful wizards in a room, each of whom will test their power by casting a spell.

For this example we have our wizards as the objects which send requests.

class Wizard:
    """Create a wizard."""
    def __init__(self, name: str, intelligence: int):
        #: Identify the wizard by name
        self.name = name
        #: Intelligence as a proxy of a wizard's power.
        self.intelligence = intelligence

    def cast_spell(self, spell: Spell):
        """Have the wizard cast a spell."""
        print(f"{self.name} casts {spell.name} spell.")
        spell.cast(self)

We will make each behaviour of the spell, which is determined by the power of the wizard casting it, as an object handling a request. If the wizard does not meet the handler's requirements it passes the request on, to it's successor. This is where we have applied a sense of hierarchy to the handler.

To begin we have an Abstract Handler:

from abc import ABC
from abc import abstractmethod


class AbstractHandler(ABC):
    def __init__(self, successor=None):
        self._successor = successor

    def handle(self, creature):
        reaction = self._handle(creature)
        if not reaction:
            self._successor.handle(create)

    @abstractmethod
    def _handle(self, spell):
        raise NotImplementedError("Must provide implementation in subclass.")

Now we can define our handler hierarchy:

class LowPowerFireSpell(AbstractHandler):
    def _handle(self, creature):
        if creature.intelligence < 10:
            print("Spell backfires.")
            return True


class MediumPowerFireSpell(AbstractHandler):
    def _handle(self, creature):
        if creature.intelligence < 20:
            print("Small fire ball is cast.")
            return True


class HighPowerFireSpell(AbstractHandler):
    def _handle(self, creature):
        if creature.intelligence < 30:
            print("A fire ball blazes across the room")
            return True


class GodlikePowerFireSpell(AbstractHandler):
    def _handle(self, creature):
        print("A Massive column of fire burns through the room!")
        return True

Finally a single object to identify the spell chain.

class Spell:
    def __init__(self, name: str):
        #: Identifying the spell by a name
        self.name = name
        #: How the spell behaves at differing levels of user's power
        self.chain = LowPowerFireSpell(
            MediumPowerFireSpell(
                HighPowerFireSpell(
                    GodlikePowerFireSpell()
                )
            )
        )

    def cast(self, creature):
        self.chain.handle(creature)

To test their skill, we have a spell which casts a "Fire Ball". We have the following wizards present:

if __name__ == '__main__':
    fire_spell = Spell('Fire Ball')
    merlin = Wizard("Merlin", 8)
    albus = Wizard("Albus", 18)
    howl = Wizard("Howl", 28)
    gandalf = Wizard("Gandalf", 38)

They are all gathered in a room, and take turns casting the same spell.

room = [merlin, albus, howl, gandalf]
for wizard in room:
    wizard.cast_spell(fire_spell)
    print("")

To which we should see the spell behave according to their intelligence:

(myenv) pc-name ~ $ python script.py
Merlin casts Fire Ball spell.
Spell backfires.

Albus casts Fire Ball spell.
Small fire ball is cast.

Howl casts Fire Ball spell.
A fire ball blazes across the room

Gandalf casts Fire Ball spell.
A Massive column of fire burns through the room!

(myenv) pc-name ~ $

Implementation can be found here

Thu 11 October 2018

Iterator Design Pattern

Before I started reading the gang of 4 ["Go4"], I was convinced I would not need an iterator. After all, in python, they are already implemented:

x = ['a', 'b', 'c']

for i in x:
    print(i)

In the frame of python lists as aggregate objects, the intent of an iterator is satisfied.

Provide a way to access the elements of an aggregate object sequentially without exposing its underlying representations.

Displaying notes

Here I explain, how the iterator became my favourite design pattern.

I have written about representation of lists before, and I don't think this will be my last time. When implementing the drop-downs in foolscap to allow a user to see sections contained in their note, I ran into an issue of complexity where I had blocks of conditionals dictating what should be displayed to a user. I wanted the user to have an indication that a note contains sections, and for them to be able to toggle a drop-down to see each section. Similar to:

 (+) | circleci  |
     | docker    |

Where my "circleci" note would contain sections and the "docker" note does not. Then expanding:

 <-> | circleci  |
  └─ | -workflow |
     | docker    |

This is where I realised the abstraction power of an iterator, And I could hide the collapsed sections behind an "if" conditional in an iterator.

class Menu:

    def __init__(self, items):
        self.items = items

    def visible(self):
        """Yield the next appropriate item for display."""
        for item in self.items:
            yield item.title
            if hasattr(item, 'expand') and item.expand:
                for sub_item in item.sub_items:
                    yield sub_item.title

Supporting further traversal policies in my Menu class is straightforward, and drawing becomes absurdly simplified:

def draw(menu):
    """Draw all viewable menu items."""
    for item in menu.visible():
        draw_item(item)

After this I thought of a more complex aggregate structure that I could traverse, like a tree depth first.

Realising the abstraction strength of the iterator, one finds its definition of intent far more compelling.

Provide a way to access the elements of an aggregate object sequentially without exposing its underlying representations.

Tue 09 October 2018

Procedural Villages

My inspiration for any tool or project has always been the thought that it could be done better, either that or I'd feel like I would benefit a lot from a missing feature.

In my first embarked I attempted to create a dungeon crawler, full of all the creatures and characters I thought were missing from traditional games.

It was through brute-force of the Roguebasin tutorial where I learnt how to code.

I must have created the same game 4 or 5 times before I decided to scrap the libtcod library and create a game without an interface.

Armed with stdout and turn-based printing, I implemented feature after feature. One of the features I was quite proud of was the randomly generated villages I'd spawn my player into.

Objects

I was aiming for something simple, just populate the map with houses that the player can interact with, for this I deconstructed a house into a rectangle.

class Rect:
    def __init__(self, x, y, h, v):
        self.x1 = x
        self.y1 = y
        self.x2 = x + h
        self.y2 = y + v

    def center(self):
        center_x = (self.x1 + self.x2) / 2
        center_y = (self.y1 + self.y2) / 2
        return (center_x, center_y)

    def internal(self, x, y):
        """
        [ ][ ][ ][ ]
        [ ][X][X][ ]
        [ ][X][X][ ]
        [ ][X][X][ ]
        [ ][ ][ ][ ]
        """
        ...
        return bool()

    def edges(self, x, y):
        """
        [X][X][X][X]
        [X][ ][ ][X]
        [X][ ][ ][X]
        [X][ ][ ][X]
        [X][X][X][X]
        """
        ...
        return bool()

    def sides(self, x, y):
        """
        [ ][X][X][ ]
        [X][ ][ ][X]
        [X][ ][ ][X]
        [X][ ][ ][X]
        [ ][X][X][ ]
        """
        ...
        return bool()

The poorly defined object above could now provide boolean confirmation to a co-ordinates existence in appropriate sections of a rectangle.

From the above Rect class I can place a door on a Rect.sides() and I can fill the area defined by Rect.internal() with items.

Empty space: | . |
Wall:        | # |
Monster:     | m |
Door:        | + |

| . | . | . | . | . | . |
| . | # | # | + | # | . |
| . | # | m | . | # | . |
| . | # | . | . | # | . |
| . | # | . | . | # | . |
| . | # | # | # | # | . |
| . | . | . | . | . | . |

Throwing in some size variations and randomising the house position on the map, (making sure there are no houses intersecting each other). Allowed me to generate maps that looked like these:

Example 1

| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | # | # | # | # | # | # | . | # | # | # | # | . | . | . |
| . | # | u | . | . | @ | # | . | # | . | m | # | . | . | . |
| . | + | . | . | . | . | # | . | # | . | . | + | . | . | . |
| . | # | # | # | # | # | # | . | # | # | # | # | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | > | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | # | # | + | # | # | # | . | . | . | . | . | . | . | . |
| . | # | . | . | r | . | # | . | . | . | . | . | . | . | . |
| . | # | . | . | . | . | # | . | . | . | . | . | . | . | . |
| . | # | # | # | # | # | # | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |

Example 2

| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | # | # | + | # | . | . | . | . |
| . | . | . | . | . | . | . | # | . | . | # | . | . | . | . |
| . | . | . | . | . | . | . | # | . | . | # | . | . | . | . |
| . | . | . | . | . | . | . | # | m | . | # | . | . | . | . |
| . | . | . | . | . | . | . | # | # | # | # | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |
| . | . | . | . | . | . | . | . | . | . | . | # | # | # | # |
| . | # | # | # | # | # | # | # | . | . | . | # | . | . | # |
| . | # | . | . | . | . | u | # | . | . | . | + | . | . | # |
| . | + | . | . | . | . | . | # | . | . | . | # | > | . | # |
| . | # | X | . | . | . | . | # | . | . | . | # | . | r | # |
| . | # | # | # | # | # | # | # | . | . | . | # | # | # | # |
| . | . | . | . | . | . | . | . | . | . | . | . | . | . | . |

Unfortunately my code was not beautiful back then. I still think it was a neat idea

Socials
Friends
Subscribe