Category: Data Structures

Parallel Bitmap Aggregation

A bitmap index represents predicates over records as sets consisting of the integer identities of each record satisfying each predicate. This representation is actually a few decades out of date, and systems like Pilosa use much more sophisticated data structures, and Sybase had even more on offer back in the 90s. But the chances are,

Iterating Over a Bitset in Java

How fast can you iterate over a bitset? Daniel Lemire published a benchmark recently in support of a strategy using the number of trailing zeroes to skip over empty bits. I have used the same technique in Java several times in my hobby project SplitMap and this is something I am keen to optimise. I

Building RoaringBitmaps from Streams

RoaringBitmap is a fast compressed bitset format. In the Java implementation of Roaring, it was until recently preferential to build a bitset in one go from sorted data; there were performance penalties of varying magnitude for incremental or unordered insertions. In a recent pull request, I wanted to improve incremental monotonic insertion so I could

Blocked Signatures

The interesting thing about BitFunnel, the search architecture used by Bing, is its unorthodoxy; it revisits many ideas ignored by contemporary search technologies. The paper’s references go back to the 80s, when one million records was considered an enormous database. Something about this appeals to me, that there might be long forgotten ideas waiting to

Bit-Sliced Signatures and Bloom Filters

While the inverted index is a familiar indexing structure for many with a casual interest in information retrieval, the concept of a document signature may not be. The idea seems to have a long history, having been popular until it lost favour in the late 90s, owing to file sizes and performance empirically inferior to