Author Archives: Richard Startin

Building RoaringBitmaps from Streams

RoaringBitmap is a fast compressed bitset format. In the Java implementation of Roaring, it was until recently preferential to build a bitset in one go from sorted data; there were performance penalties of varying magnitude for incremental or unordered insertions. In a recent pull request, I wanted to improve incremental monotonic insertion so I could

Matrix Multiplication Revisited

In a recent post, I took a look at matrix multiplication in pure Java, to see if it can go faster than reported in SIMD Intrinsics on Managed Language Runtimes. I found faster implementations than the paper’s benchmarks implied was possible. Nevertheless, I found that there were some limitations in Hotspot’s autovectoriser that I didn’t

Sum of Squares

Streams and lambdas, especially the limited support offered for primitive types, are a fantastic addition to the Java language. They’re not supposed to be fast, but how do these features compare to a good old for loop? For a simple calculation amenable to instruction level parallelism, I compare modern and traditional implementations and observe the

The Much Aligned Garbage Collector

A power of two is often a good choice for the size of an array. Sometimes you might see this being exploited to replace an integer division with a bitwise intersection. You can see why with a toy benchmark of a bloom filter, which deliberately folds in a representative cost of a hash function and

Multiplying Matrices, Fast and Slow

I recently read a very interesting blog post about exposing Intel SIMD intrinsics via a fork of the Scala compiler (scala-virtualized), which reports multiplicative improvements in throughput over HotSpot JIT compiled code. The academic paper (SIMD Intrinsics on Managed Language Runtimes), which has been accepted at CGO 2018, proposes a powerful alternative to the traditional