An unfinished draft of Linearly Probed Robin Hood Hash Tables

I was working on a blog article a few months ago but I never finished it. Here’s the first draft, nonetheless.

Open Addressing: To store elements in the table itself rather than in nodes pointed-to by the table.

Linear Probing:  When collisions occur, scan rightwards until finding an unoccupied bucket.

An Explanation

So what is Robin Hood hashing anyway? Well, Wikipedia gives the following quote:

One interesting variation on double-hashing collision resolution is Robin Hood hashing. The idea is that a new key may displace a key already inserted, if its probe count is larger than that of the key at the current position. The net effect of this is that it reduces worst case search times in the table.

But what the hell does that mean? How does that apply to linear probing?

To get a better idea of what’s happening, let’s try visualizing the insertion algorithm.

A Microsoft Paint Visualization

Lots of diagrams use numbers to explain things, but I find numbers to be confusing so let’s use colors instead.

Our hash table will contain 12 buckets, of which we will focus on the four leftmost: the red bucket, the green bucket, the yellow bucket, and the blue bucket. I’ve listed these in ascending order, such that red < green < yellow < blue.

To start, let's try inserting a bunch of elements that hash to these four buckets, first using plain old linear probing: linearprobegif

Once the first four buckets fill up, each additional insertion appends an element onto the leftmost unoccupied bucket. This results in a long chain of elements matching the order of insertion.

That’s without Robin Hood. Here’s with it:robin

Notice something?

The inserted elements are sorted! In fact, they’re sorted by the bucket that they hash to. If you were to iterate a linearly probed Robin Hood hash table from the first bucket to the last bucket, you’d be iterating elements that hash to an ascending bucket order. With linear probing, Robin Hood hash tables are nothing more than sorted arrays interspersed with unoccupied buckets.

Faster Failed Searches

The primary advantage of Robin Hood hashing is that it enable failed searches to terminate early.

Consider the case of searching for a green element that doesn’t exist in a regular linearly probed hash table:linearlookup

With regular linear probing, the only way to determine whether an element doesn’t exist is to probe the table until reaching an unoccupied bucket.

With Robin Hood hashing, the search is allowed to terminate early due to the properties of sorted arrays. The search will terminate as soon as it reaches an element that hashes to a bucket higher than the bucket being searched for:robinlookup.png

This will, on average, reduce the number of iterations by half.

Non-faster, Non-failed Searches

Unfortunately, Robin Hood hashing provides no speedup for searches that succeed in finding the requested element. On average, the successful probe count of regular linear probing and Robin Hood hashing is exactly the same.

This can be illustrated by showing the probe count of each element in the table. First, a diagram illustrating regular linear probing:linearprobe

The probe count of each element is equal to the number of elements that must be iterated in order to find the element in a search.

Here’s Robin Hood:robinprobe.png

There’s a bit less variance in the probe counts of this table. Note that there’s only one probe count marked ‘1’, and that the maximum probe count is ‘9’, compared to ’12’ on the other table. This is an oft-cited advantage of Robin Hood hash tables: their probe counts have low variance.

Anyway, let’s try adding up the total probe counts of both tables, as that will be proportional to the average:

Regular Linear = 1+1+3+1+3+5+6+5+6+8+9+12 = 60 probes
    Robin Hood = 1+2+3+3+4+5+5+6+7+7+8+9  = 60 probes

Hmm, they’re both 60. That confirms that these two tables have the same average probe count, and thus the same average performance for successful searches. Note that there’s no way to improve these results by swapping elements around. Any gains achieved by swapping one element will be canceled-out by losses of the element it swapped with. Any configuration of this table will always have a total probe count of 60.

Keep this in mind when using Robin Hood hashing.

I’m bolding this point because too often it’s forgotten.

For example, take a look at Malte Skarupke’s recent blog post called I Wrote The Fastest Hashtable. In it, Skarupke implements and benchmarks an optimized version of linearly probed Robin Hood hashing, but unfortunately neglects to test regular linear probing alongside it. One such benchmark is the K-Nucleotides test; a benchmark where all searches succeed and none can possibly fail.

Testing K-Nucleotides, I get the following results:

Skarupke's "Fastest Hashtable Ever"        : 6248.724800 task-clock:u (msec)
A trivial implementation of linear probing : 5609.471389 task-clock:u (msec)

It should be clear from these numbers that Robin Hood is not the “Fastest Ever” for situations where searches always succeed; it needs searches to fail in order to pull ahead in performance.

Insertion and Erasure

Inserting and erasing from a linearly probed Robin Hood hash table require shifting elements around to preserve the sorted order. This is no different than inserting an element into a sorted array.

Example: Inserting a third red element into the table requires shifting elements to the right to make space.robininsert1.png

Once space is made the new element can be inserted:robininsert2.png

It’s worth noting that not every element needs to be shifted. Rather, only the first element of each color region needs shifting:


The shifting stops once it reaches an unoccupied bucket. Elements on the other side of the table (such as the purple ones) will not be affected:



Erasure works the same way as insertion, though elements will be shifted left rather than right:


The first shifted element will either be to the left of an unoccupied bucket, or to the left of an element with probe count of ‘1’:robinerase2.png

The above erasure results in the following:robinerase3.png

Sadly, because inserting and erasing from Robin Hood hash tables requires shifting elements around, the performance of these operations, should they succeed, are worse than that of regular linearly probed hash tables.

Circular Tables

A common implementation of open addressed hash tables is to implement the table circularly. That is, if a collision occurs on the very last, rightmost bucket, the probe sequence will “wrap around” to the first, leftmost bucket and continue on.

Without circular tables, a different strategy is needed: Allocate extra space at the end of the table, and if a collision occurs on the last, rightmost bucket, continue probing into the extra allocated space. In the unlikely chance that the extra space gets filled, additional space will be allocated.

In practice, the second strategy performs better when the table fits in the cache. Otherwise, the first strategy holds a small advantage for using less memory.

Unfortunately, the idea of Robin Hood tables being sorted arrays is slightly complicated when circular tables are involved. This makes it is necessary to go back to the Wikipedia definition for the implementation.


One interesting variation on double-hashing collision resolution is Robin Hood hashing. The idea is that a new key may displace a key already inserted, if its probe count is larger than that of the key at the current position. The net effect of this is that it reduces worst case search times in the table.

When implementing circular tables, one must compare probe counts rather than hashed-to bucket positions to maintain sortedness. This is because probe counts are a relative measurement, while bucket positions are not. Bucket positions rely on a global ordering of buckets (red < green < yellow < blue), which does not exist in a circular table.

Example: An incorrect circular table created by comparing bucket positions. Note the grey seam separating the lowest bucket in memory from the highest, and how it splits the purple elements in two.

Example: A correct circular table created by comparing probe counts. Purple elements remain contiguous.


Part 1: Bookkeeping

Implementing an efficient linearly probed Robin Hood hash table requires solving two problems:

  1. How to distinguish occupied buckets from unoccupied buckets.
  2. How to efficiently sort and compare the order elements without having to rehash elements.

Idea: Store hashes

One possible solution is to store a copy of the element’s hash in the table alongside the element. It’s easy to maintain Robin Hood sortedness this way by calculating buckets from the stored hashes.

To distinguish unoccupied buckets from occupied buckets, a hash of ‘0’ can be used to mark unoccupied buckets. This of course requires that hash function never actually return ‘0’.

Below is C++ code illustrating how searching works with such tables:

key_type* hash_table::find(key_type key)
	unsigned int initial_bucket = hash(element) % table_size;
	for(unsigned int b = initial_bucket;; ++b)
		if(table[b].stored_hash == 0)
			return nullptr;       // Unoccupied (failed search)
		if(initial_bucket < table[b].stored_hash % table_size)
			return nullptr;       // Early exit (failed search)
		if(key == table[b].key)
			return &table[b].key; // Successful match

Storing hashes comes with other benefits, too. One such benefit is that hash values do not need to be recalculated every time the table is enlarged; a costly operation if the hash function is expensive. Another benefit is that stored hashes provide a way of comparing elements without comparing the elements themselves. If the hashes of two elements differ then the elements differ and you do not need to bother comparing them at all.

Example Implementation: Rust’s HashMap

Idea: Store probe counts

An alternative to storing hashes is to store probe counts in the table instead.
To represent unoccupied buckets, a probe count of ‘0’ can be used. This is possible because the minimum probe count of any element is ‘1’.

Interestingly, using ‘0’ to mark unoccupied elements makes it possible to combine the check for unoccupied elements with the check for early exit into a single conditional statement; an obvious boon for performance.

key_type* hash_table::find(key_type key)
	unsigned int initial_bucket = hash(element) % table_size;
	for(unsigned int b = initial_bucket, probes_counted = 1;; ++b, ++probes_counted)
		if(table[b].probe_count < probes_counted)
			return nullptr;       // Handles both unoccupied and early exit (failed search)
		if(key == table[b].key) TODO TODO TODO TODO TODO
			return &table[b].key; // Successful match

Storing probe counts is especially nice because it doesn’t require changing the hash function. Sadly, stored probe counts lack the auxiliary benefits that stored hashes provide, though they do allow hash functions to return ‘0’.

Example Implementation: Malte Skarupke’s “World’s Fastest Hashtable”

Idea: Combine the two previous methods

There are two properties of probe counts worth thinking about:

  1. If the table isn’t 100% full, a probe count will always be smaller than table size.
  2. Given a probe count and an element’s position, it is possible to calculate the element’s hash modulo the table size using simple arithmetic.

So what the hell does that mean? Well, things get interesting when focusing on table sizes that are powers of two.

For such sizes the above properties can be rewritten:

  1. A table of size 2^N will require probe counts of N bits.
  2. A probe count of N bits can be used to calculate the lower N bits of the element’s hash using simple arithmetic.

Still confused? Well here’s where this is going:

The high bits of an element’s hash can be bitwise OR’d to an element’s probe count to result in a value that contains the information of both.

And it’s only a little bit awkward to calculate!

// First use the table's size (2^N) to compute
// an N bit mask:
table_size = 2048;      // 2 ^ 11
mask = table_size - 1;  // An 11 bit mask

// Then use the mask to remove the lower N bits
// from an element's hash:
hash        = 0xDEADBEEF;   // A 32-bit hash.
masked_hash = hash & ~mask; // The hash's upper 21 bits.

// Finally calculate the element's probe count
// (which will be N bits or less)
// and bitwise OR it with masked_hash.
resulting_value = masked_hash | probe_count;

Recalculating the complete hash using this value is not hard, but you’ll need to know the element’s position in the table:

unsigned_int hash;
hash = position - (table[position].key & mask) + 1;
hash |= table[position].key & ~mask;

Example Implementation: Amble and Knuth’s Sorted Hash Tables

Zero Memory Overhead

Do you know what the hash function for integers is in most languages?

The hash function for integers is just the identity function!

unsigned int hash(unsigned int i)
    return i;

Now, if an integer and its hash are exactly the same value, doesn’t it seem a little wasteful to store both a copy of the integer, and a copy of its hash in the table? That’s duplication! It’s better off to store just one.

This idea can be combined with the bitwise trick of the previous section to result in a space-efficient hash table. Such tables are nice for using less memory, but also for being capable of holding all integers representable in the integer’s type. This is unlike other linearly probed tables that require specifying an integer (usually ‘0’ or ‘-1’) to mark unoccupied buckets.

It’s worth mentioning that this trick is a bit more general than tables with integer keys. To be precise, this technique will work whenever the hash function is perfect.

Posted in Programming | Leave a comment

How I’m doing terminal/console UI

This post is going to cover the UI for my roguelike project. It’s fairly simple stuff, but still interesting enough to write about. Perhaps it will be useful for those developing a console-based game of their own.

I had known since I started the project that I was going to use the ncurses library for rendering, but I also wanted to have an SDL version for portability reasons. To allow use of both of these libraries, I created a very simple portability layer, terminal.hpp, which abstracted away both libraries into a small set of functions. I think most people would be better off skipping this step and instead only use one library.

Let’s talk about drawing to the screen. First off, it’s a terrible idea to use global coords for everything, as that leads to brittle layouts. To avoid this issue, I created a class called term_view which translates global coords into coords relative to a specified sub-rectangle. In other words, it adds an offset to the position of every drawing command. One of the most important features of term_view is that you can define new views in terms of others. This leads to the very useful functions, hsplit and vsplit, which split one term_view into two.

class term_view
    // The default view is the entire terminal, which means global coords.

    // Define a term_view in terms of another
    term_view(term_view const& other, rect subrect);

    // Size of our sub-rectangle
    dimen dimensions() const;
    // Convert relative coords into global ones.
    auto to_global(coord crd) const;

    // All drawing primitives are issued through term_view
    void draw_char(char32_t ch, coord pos, color_type color) const;

// A useful function to split one term_view into two.
std::pair<term_view, term_view> hsplit(term_view const& view, int left_width)
    int const right_width = view.dimensions().w - left_width;
    int const height = view.dimensions().h;
    return std::make_pair(
        term_view(view, rect{{0,0},{left_width, height}}),
        term_view(view, rect{{left_width,0},{right_width, height}}));

An alternative to term_view is to have a canvas abstraction (also know as framebuffers). When drawing to relative coordinates this way, you first create a “virtual canvas”, draw to it, and then blit the canvas onto the actual console’s canvas. I believe the libtcod library uses this technique, although for my purposes it was unnecessary.

With term_view done, the next step was to create a screen/window/widget system. I created a base for all of these elements called ui_element:

// Base class that every widget, screen, and dialog inherits.
class ui_element
    ui_element() = delete;
    ui_element(ui_key&&) {}
    virtual ~ui_element() = default;

    // If is_overlay is false, the ui drawer will only refresh the topmost
    // element. If true, it will refresh underlying elements, which allows
    // windows to exist without graphical artifacts.
    // Have this function return false for the best performance.
    virtual bool is_overlay() const = 0;

    // Draws this element to the term_view. Simple.
    // Note that the term_view is not cleared before this function is called;
    // you may need to clear it yourself.
    virtual void draw(term_view const& view) = 0;

    // wants_command will get called multiple times each keypress for each
    // command mapped to that key.
    // If wants_command returns true, then process_command or
    // process_raw_input will be called, depending on the command_type.
    virtual bool wants_command(command_type cmd) const = 0;
    virtual bool process_command(command_type) { return false; }
    virtual bool process_raw_input(char32_t) { return false; }

    // When a newer element gets popped from the stack, this function gets
    // called immediately, and if it returns true, then this element will
    // get popped too.
    // Example use: Closing a element based on outcome of y/n prompt.
    virtual bool resume_check_destroy() { return false; }

ui_elements are created and destroyed using a stack data structure. The nice thing about a stack is that it greatly simplifies object lifetimes; if you push two ui_elements onto the stack then you are guaranteed that the second one will get popped first. I use this property to allow ui_elements to “return” values upon destruction. Originally, this was done in a blocking manner: the push_ui_element function would block on input until the pushed element was destroyed, then return a value. This style is very easy to code with, but unfortunately didn’t work when compiling for emscripten, and so in the end I made it non-blocking with callbacks for return values.

Here’s a list of some ui_elements I used a lot:

  • yesno_popup – A popup window with a custom message that returned a bool.
  • paginator_widget – Formats a long string of text into a scrolling page.
  • select_one_menu – Displays a custom menu of choices and returns the value you select.
  • Additionally, I created some generic ui_elements that transform elements into new ones:

  • closeable_element<T> – Overloads the escape key input of T to destroy it.
  • centered_element<T> – Draws T inside of a sub-rectangle centered on the screen.
  • outlined_element<T> – Draws a 1-character width boarder around element T.
  • Here’s an example of how all of that is used:

                [&](main_menu::selection_type choice)
                        case main_menu::PLAY:
                            return true;
                        case main_menu::HELP:
                            std::ifstream reader("help.txt", std::ios::in);
                            std::string read_str((std::istreambuf_iterator<char>(reader)),
                            element_push<closeable_element<popup_element<paginator_widget> > >(
                                dimen{ 50, 20 }, box_data::style_data{}, read_str, false);
                            return false;
                        case main_menu::QUIT:
                            return true;

    Posted in Uncategorized | Leave a comment


    I’ve been coding a little roguelike this past week and I just got around to looking at FoV algorithms. Now, many of these algorithms I found hard to understand, but after some trial and error I figured out an easy way (in my mind!) for it to work. Now I’m curious, has anyone else used this technique too? I didn’t see it listed on roguebasin.

    Conceptually, it is an optimization over a method that traces hundreds of rays from the player outward. Instead of tracing the rays while the game is running, the work is done beforehand and stored in a static lookup table. The in-game algorithm does a few bitwise operations on the table to determine which rays have been blocked.

    The set of rays used to generate the lookup table determine how your FoV looks. Originally, my rays were straight lines from the center to perimeter coordinates. While this may seem sufficient, the results were actually rather ugly and contained artifacts. There were too many squares being marked unseen in unreasonable places. What I needed was permissive FoV. To do this, I used a modified version of Bresenham’s line algorithm that iterated every possible line and sub-line that I cared about. This turned out to work perfectly.

    Here’s a few comments I wrote in the source file. Sorry if my explanations are bad; if you want to see the source code, perhaps I’ll make it publicly available on Bitbucket.

    // This file generates a lookup table for a fast permissive field of view
    // algorithm.
    // The table is in the octant shaded by '#':
    // +-x------>
    // y\########
    // | \#######
    // |  \######
    // |   \#####
    // v    \####
    // A higher-level description of the algorithm is as follows:
    // We begin by finding every unique "ray" in the octant.
    // The rays start at the origin and emanate outwards.
    // We give each ray a unique index number to identify it.
    // Then, for every coordinate that the ray passes through, we add the ray's
    // index to a set belonging to that coordinate.
    // Finding the LOS is easy with this array of sets: Iterate every coordinate
    // in the the octant starting from the center and moving outwards.
    // If the coordinate contains a wall, look at what rays belong in its set.
    // Since you are iterating outwards, you now know that every ray in that
    // set will be blocked in future iterations too. Thus, every ray in that
    // set can be considered "dead". Now, for future iterations, the "live"
    // rays of a coordinate will be the set belonging to the coordinate _minus_
    // the set of previously iterated "dead" rays. If a coordinate has zero "live"
    // rays, then it will not be visible.
    // In practice, this algorithm can be done efficiently using bitwise
    // operations. Each ray is given a unique bit flag, and each coordinate set
    // is represented by the bitwise OR of every passing ray's flag.
    // Calculating the LOS uses bitwise 'AND's to compare "dead" rays to "live"
    // rays.

    Note: I looked at DCSS’s FoV code quite a bit before doing this, but I couldn’t grok it. I do know it involves lots of precomputation though. Perhaps it’s the same as my version?

    Posted in Programming | Leave a comment

    Multi-step iterators using coroutines.

    I wanted to turn this multi-step iteration function into an iterator:

    template<typename Func>
    void iterate_edges(rect r, Func func = Func())
        coord current = r.c;
        for(;current.x < r.rx(); ++current.x)
        for(;current.y < r.ry(); ++current.y)
        for(;current.x > r.c.x; --current.x)
        for(;current.y > r.c.y + 1; --current.y)

    It turns out that it can be done very simply using boost::asio::coroutine:

    rect_edge_iterator& rect_edge_iterator::operator++()
            while(m_current.x < m_rect.rx())
                yield return *this;
            while(m_current.y < m_rect.ry())
                yield return *this;
            while(m_current.x > m_rect.c.x)
                yield return *this;
            while(m_current.y > m_rect.c.y + 1)
                yield return *this;
            --m_current.x; // change to an 'end' position
        return *this;

    I thought that was kind of neat. It’s not very often I find a practical use of stackless coroutines in C++.

    Note: This only works cleanly for forward iterators.

    Aside | Posted on by | Tagged | Leave a comment

    The words “Design” and “Art” are suffixes.

    Take a quick look at these definitions of “design” and “art”:

    DesignThe process before creation.
    ArtSomething intended to be aesthetically pleasing.

    Notice how generic and nondescript these words are. They’re intended to group several similar things, rather than describe a single action.

    Unfortunately, not everyone understands this. It’s quite often I see people use the words “design” and “art” to make sweeping generalizations about very specific branches of those topics. Even worse, the more vague one of these statements is, the more acclaim it receives!


    Please, if you want to discuss “art” and “design” objectively then qualify them with another word. Be explicit and spell out what you mean, such as “graphic design”, “web design”, “software design”, etc.

    If you want to linkbait then continue using the words ambiguously. For example, “10 tips for design”, “Why design is hard”, and “Video games considered art” are some great titles for your next blog post. You’ll come off as being deep and knowledgeable while saying nothing of meaning!

    Posted in Uncategorized | Leave a comment

    Stack-based template metaprogramming in C++ (11)

    Many months ago right after G++ 4.7 was released I discovered a nifty syntax for writing metaprograms using a stack language similiar to Forth and Factor. I started writing a small library around it, but got sidetracked and never finished it. The following is an explanation and code.

    Preliminary reading is a great place to learn stack language programming and can be used as a reference when implementing words.


    Here’s a short, contrived example of a traditional metafunction (such as from Boost MPL) that computes the number of elements in a container that satisfy a predicate, squares the result, and then adds 5:

    template<typename Predicate, typename List>
    struct foo
    : add<
              typename length<
                  typename filter<
                      Predicate, List

    Yeah, that sucks.

    Now compare it to a stack-based approach which was working code in my library:

    struct foo : word<filter, length, square, int_<5>, add> {};
    // alternate syntax:
    using foo = word<filter, length, square, int_<5>, add>;

    The difference is light and day! And if you make a mistake the errors are nicer too! I recall getting it down to a static assert message that contained the type of error and current stack which was infinitely more readable than the 10000 page garbage standard TMP spits out.

    So how is this implemented?

    The Stack

    First off, let’s start out with implementing a stack using TMP. Stack languages use the stack for storing data.

    The simplest way is to use variadic templates. A stack with ‘foo’ on top and ‘qux’ on the bottom can be represented as the following using variadic templates:

    stack<foo, bar, qux>

    In addition, we need to be able to push, pull, and get the top of the stack in order to use it.

    Here is the full implementation:

    struct stack;
    template<typename X, typename... Xs>
    struct stack<X, Xs...> {
      using type = stack;
      using top = X;
      template<typename T>
      using push = stack<T, X, Xs...>;
      using pop = stack<Xs...>;
    struct stack<> {
      using type = stack;
      template<typename T>
      using push = stack<T>;

    (Note that templates aliases are used for the push/pop/top functions. This is because I find it to be more convenient than templated structs with a type member.)

    Unfortunately, this stack has a minor problem notation-wise; languages like Factor position the top of the stack on the right while our notation positions it on the left. While this has no effect on the semantics of the program, it can be confusing to use. Luckily, the evaluator can use any type of TMP stack that supports push, pop, and top and so it is possible to write TMP stacks that notate the top of the stack on the right, although that is beyond the scope of this article.

    Let’s also write a few helper functions that operate on our stacks as they’ll be needed later:

    // pops N times from Stack
    template<typename Stack, int N>
    struct stack_drop {
      using type = typename stack_drop<typename Stack::pop, N - 1>::type;
    template<typename Stack>
    struct stack_drop<Stack, 0> {
      using type = Stack;
    // returns top after popping N times from Stack
    template<typename Stack, int N>
    struct stack_index {
      using type = typename stack_index<typename Stack::pop, N - 1>::type;
    template<typename Stack>
    struct stack_index<Stack, 0> {
      using type = typename Stack::top;

    Evaluation Groundwork

    Implementing evaluation is where things get tricky. First off, let’s talk about words.

    A ‘word’ in stack languages is essentially the functions that make up the program, and thus the input words compose the program code. In fact, the way words are implemented is just as metafunctions – they input a stack and output stack.

    Words will be represented abstractly – a type that contains an apply metafunction can be used as a word. Here’s the pattern you’ll see:

    struct {
      template<typename Stack>
      struct apply { // apply the stack to this word
        // ??

    One way of constructing words is through word composition. Given a list of words, composing them involves piping the output stack of the previous word as the input to the next, thus creating a new word which sends its input to the first word of the composition, and returns the result of the final word.

    Here’s the skeleton of the implementation:

    // word composition
    template<typename... Words>
    struct word {
      template<typename Stack>
      struct apply {
        // ??

    Word composition is crucial to creating stack programs. In fact, a stack program is just a composition of words!

    Because of this, our eval function will have a word and a stack as parameters, with another stack being returned. Here’s the skeleton:

    template<typename Word, typename Stack = stack<>>
    struct eval {
      using stack = // ??

    The stack defaults to an empty stack for convenience.

    Here’s how it’s used:

    eval<foo_word, stack<>>::stack

    Complete Evaluator

    Our groundwork is done, now onto the actual implementation.

    Let’s first talk about implementing word composition.

    A simple method of composing a list (which I did not use) involves realizing that words and composition form a monoid. The binary operator simply composes two words, and the identity word is just a word that returns its stack unchanged. Because of this, we can compose a list of words by folding the binary composition over the list.

    Accidentally, however, I took a different path of implementation which I now realize is more powerful. The basis requires redefining word application to include another parameter – the list of words that come after the word being evaluated. For instance, if you’re evaluating this list:

    [foo, bar, qux]

    Then the list [bar, qux] would be applied to foo, which then applies [qux] to bar, and then [] to qux.

    Those familiar with continuation passing style will see the similarity!

    Here’s the new skeleton:

    template<typename... Words>
    struct {
      template<typename Stack>
      struct apply {
        template<typename... Remaining>
        struct continuation {
          // ...

    I’m using the term continuation and remaining as those are the closest terms I can think of. Keep in mind that it’s referring to the remaining words of the composed list, not of the entire program. Thus, it forms what I’ll call (probably mistakenly) a scoped continuation.

    Let’s start out implementing the simplest word we can think of: the identity word. As said early, the identity word returns its stack unchanged. While this may seem useless, we can actually use it as a starting point to create other words.

    Here’s the code:

    struct id {
      template<typename Stack>
      struct apply {
        // This only catches the empty list - specialization will catch others
        struct continuation {
          using stack = Stack;
        template<typename Word, typename... Remaining>
        struct continuation<Word, Remaining...> {
          using stack =
            typename Word::template apply<Stack>
            ::template continuation<Remaining...>::stack;

    This has two behaviors:
    – The resulting stack with an empty continuation/remaining list is simply the stack passed as an argument
    – The resulting stack with a non-empty continuation/remaining list is the result of applying the stack to the first word of the continuation list and passing the tail of the list along.

    Notice that id is actually doing word composition – it’s composing itself with the head of the continuation/remaining list.

    With that done, here’s the implementation of the word composition struct:

    template<typename... Words>
    struct word {
      template<typename Stack>
      struct apply {
        template<typename... Remaining>
        struct continuation {
          using stack = typename id::apply<
            typename id::apply<Stack>::template continuation<Words...>::stack
          >::template continuation<Remaining...>::stack;

    So what’s going on here? Well, id is used to set a starting stack value and then pass it through a list of words. The inner use of id is constructing a single word out of an inner list, and the outer use is composing it with the remaining outer list. It’s quite hard to describe in English!

    Finally, here’s the eval metafunction:

    template<typename Word, typename Stack = stack<>>
    struct eval {
      using stack = typename id::apply<Stack>::template continuation<Word>::stack;

    Again, id is used to set a starting stack value and then pass it through a word.


    With the evaluator done, let’s create a few primitive words so it can be tested out.

    Here’s a primitive which pushes a type value onto the stack:

    template<typename T>
    struct push {
      template<typename Stack>
      struct apply {
        template<typename... Remaining>
        struct continuation {
          using stack = typename id::apply<
            typename Stack
            ::template push<T>
          >::template continuation<Remaining...>::stack;

    Here are some integer primitives:

    template<int I>
    struct int_wrapper {
      static int const value = I;
    struct add {
      template<typename Stack>
      struct apply {
        template<typename... Remaining>
        struct continuation {
          using stack = typename id::apply<
            typename stack_drop<Stack, 2>::stack
            ::template push<intc<
              stack_index<Stack, 0>::stack::value
              stack_index<Stack, 1>::stack::value
          >::template continuation<Remaining...>::stack;
    template<int N>
    struct int_ : push<int_wrapper<N>> {};

    And finally a primitive which swaps the top two values of the stack:

    struct swap {
      template<typename Stack>
      struct apply {
        template<typename... Remaining>
        struct continuation {
          using stack = typename id::apply<
            typename stack_drop<Stack, 2>::stack
            ::template push<typename Stack::top>
            ::template push<typename stack_index<Stack, 1>::stack>
          >::template continuation<Remaining...>::stack;

    Let’s test it out!

    #include <iostream>
    struct ten_and_five : word<int_<5>, int_<10>> {};
    struct foo : word<ten_and_five, swap, int_<20>, add> {};
    int main() {
      std::cout << eval<foo>::stack::top::value;
      foo x;

    Improved Primitive Definitions

    Once you have a sufficient number of primitives you can simply define other words using them instead of writing complex code involving apply and continuation. However, to get to that point involves lots of boilerplate.

    To improve things, you can define metafunctions to automate this process. For example:

    struct add  : func2_to_word<boost::mpl::add> {};
    struct minus :func2_to_word<boost::mpl::minus> {};

    I also recall using X-macros for more complicated words.

    These will be left as excessive to the reader.

    Interesting Things #1 – Word order

    An interesting thing about stack languages is that you can easily factor out words without changing the program.

    Here’s a modified version of the test program:

    struct five : int_<5> {};
    struct ten_and_five : word<five, int_<10>> {};
    struct foo : word<ten_and_five, word<swap, word<int_<20>>>, word<word<add>>> {};

    While the words are named differently their order is the same and thus the output is the same. This property will not hold if you do things with the “scoped continuation” as explaining below.

    Interesting Things #2 – Quotations

    Quotations are essentially a list of words which exist on the stack. They are analogous to first-class functions in traditional programming.

    Two primitives which you will want for dealing with quotations are call and if_then_else, which you can learn about in the Factor manual.

    Here’s how you could use them:

    template<typename... Words>
    struct quot : push<word<Words...>> {}
    struct foo : word< quot<a, b>, call > {};
    struct bar : word< quot<a>, quot<b>, bool_<true>, if_then_else > {};

    Interesting Things #3 – Continuations/Remaining

    Recall earlier I mentioned that implementing words using continuation/remaining lists is more powerful than the alternative. Well, part of the reason is because you can make use of that list when defining words. Program manipulation and introspection become possible!

    For instance, our push word could use the head of the list as it’s argument:

    struct foo : word<push2, bar> {};
    // would be same as:
    struct foo : word<push<bar>> {};

    Or we could add (basic) if/then/else syntax:

    struct foo : word<if, pred, then, quot<bar>, else, quot<qux>> {};
    // else is optional
    struct foo : word<if, pred, then, quot<bar>> {};
    // pred is optional
    struct foo : word<if, then, quot<bar>> {};

    Or we could push that list onto the stack as a quotation which could be used to model exceptions or goto:

    struct foo : word<word<push_rem, foo, bar>, call> {};

    These are just a few examples – there are tons of different things you can do!

    Full code

    My original library code is a mess but here’s the code used in this post:

    Posted in Programming | Tagged , , , | Leave a comment

    Why don’t we take all the naming problems and push them somewhere else?

    Naming class members can be a mess in C++.

    The simplest idea is to name them like any identifier:

    class foo {
        int size;

    That’s nice, but what if we want to create a size member function? Unfortunately, you can’t have a a member with the same name as a member function, and so we’ll have to change things up:

    class foo {
        int size(); // we could have also used get_size() and left the member alone
        int m_size;

    This isn’t too bad (although I much prefer the first version), but it allows subtle bugs to arise (as seen here )

    class foo {
        foo(int size) : m_size(m_size) {}
        int size(); // we could have also used get_size() and left the member alone
        int m_size;

    m_size isn’t being properly initialized in the constructor! Not a huge problem anymore as newer compilers should be able to catch this.

    So, is there a solution to this silliness? Well, sort-of:

    namespace actual_class {
      class foo {
          foo(int size) :size(size) {}
          int size;
    struct foo : actual_class::foo {
      typedef actual_class::foo get;
      foo(int size) : actual_class::foo(size) {} // in C++11 use constructor inheritance instead
      int size() { return get::size; }

    This takes advantage of derived classes shadowing their base classes. The possible constructor bug can be eliminated with shadowing, and you don’t have to add extra characters to get unique identifiers.

    Is it a good solution? Well, it’s probably better to stick to m_, but it’s worthwhile to know.

    Posted in Programming | Leave a comment