Iterating Over a Bitset in Java

How fast can you iterate over a bitset? Daniel Lemire published a benchmark recently in support of a strategy using the number of trailing zeroes to skip over empty bits. I have used the same technique in Java several times in my hobby project SplitMap and this is something I am keen to optimise. I […]

Faster Floating Point Reductions

At the moment, I am working on a hobby project called SplitMap, which aims to evaluate aggregations over complex boolean expressions as fast as possible using the same high level constructs of the streams API. It’s already capable of performing logic that takes vanilla parallel streams 20ms in under 300μs, but I think sub 100μs […]

Sum of Squares

Streams and lambdas, especially the limited support offered for primitive types, are a fantastic addition to the Java language. They’re not supposed to be fast, but how do these features compare to a good old for loop? For a simple calculation amenable to instruction level parallelism, I compare modern and traditional implementations and observe the […]

Multiplying Matrices, Fast and Slow

I recently read a very interesting blog post about exposing Intel SIMD intrinsics via a fork of the Scala compiler (scala-virtualized), which reports multiplicative improvements in throughput over HotSpot JIT compiled code. The academic paper (SIMD Intrinsics on Managed Language Runtimes), which has been accepted at CGO 2018, proposes a powerful alternative to the traditional […]