Mon 31 August 2020

Python Deque

This is now my third article on lists. As someone that uses the built-in python list on a fairly regular basis, I might have built up a false sense of security. I'm pretty familiar with these listy-boys. However, recently I found out that I was not thinking about them correctly. Readers might smack themselves if they're familiar with data-structures but don't know how lists are implemented internally. The built-in lists are dynamic arrays.

How else could they optimise a sweet O(1) lookup time on indexing: mylist[4]. Especially when analysts are trying to avoid the built-in iterator and cursing their code with: for i in len(mylist): mylist[i].

Another trait an established data-structurer with be familiar with when it comes to dynamic arrays is that the append and pop methods are an amortised O(1). Amortised; because occasionally you have to suffer a cost of realloc(ating) memory across larger arrays.

Where the list starts to suffer is from poping and inserting at arbitrary positions.

Linked-List

The data-structurer will have had the linked-list slammed into their head often enough that it will pain them to hear about it again. So theory aside, I'll give you that sweet O(1) append and last item pop that you expect from a performant Stack.

Python deque provides a comparatively larger performance hit on initialisation to list and has poor O(n) performance when you want any arbitrary item somewhere in the middle. It does, however, have O(1); popleft, pop, append and appendleft. Due to being a doubly-linked list (or double-ended queue to get the abbreviation deque)

Deque in the wild

I saw a nice little quote from an enginneer on Quora:

In 8 years of getting paid to write computer programs, this post is the only time I’ve typed ‘deque.’

There are many places deque is used in the stdlib, most commonly whenever someone needs a queue or stack such as constructing a traceback, parsing python's sytax tree and keeping track of context scope.

My little run-in with deque was using it instead of a recursive function to avoid python's

maximum recursion depth exceeded

This limit happens to be set to 10^4. The solution was to add child nodes to a deque and when you were done with analysing the current node, popleft the next node.

Python Queue

You might be tempted to ask, well if deque is for queues. What on earth is from queue import Queue.

These queues are different (although, still using deque under the hood). They are optimised for communication across threads, which need to involve locking mechanisms and support methods like put_nowait() and join(). These are not intended to be used as a collective data-structure, hence the lack of support for the in operator.

More information

There is some neat documentation in the cpython repo which contains more data-structures and other alternatives to the standard built-in list. Tools for working with lists

References