Java is very strict about floating point arithmetic. There’s even a keyword, strictfp, which allows you to make it stricter, ensuring you’ll get a potentially less precise but identical result wherever you run your program. There’s actually a JEP to make this the only behaviour. JLS 15.18.2 states clearly that floating point addition is not

Continue reading Floating Point: Manual Unrolling or Autovectorisation?

# Category: Benchmarking

## Iterating Over a Bitset in Java

How fast can you iterate over a bitset? Daniel Lemire published a benchmark recently in support of a strategy using the number of trailing zeroes to skip over empty bits. I have used the same technique in Java several times in my hobby project SplitMap and this is something I am keen to optimise. I

Continue reading Iterating Over a Bitset in Java

## Faster Floating Point Reductions

At the moment, I am working on a hobby project called SplitMap, which aims to evaluate aggregations over complex boolean expressions as fast as possible using the same high level constructs of the streams API. It’s already capable of performing logic that takes vanilla parallel streams 20ms in under 300μs, but I think sub 100μs

Continue reading Faster Floating Point Reductions

## Matrix Multiplication Revisited

In a recent post, I took a look at matrix multiplication in pure Java, to see if it can go faster than reported in SIMD Intrinsics on Managed Language Runtimes. I found faster implementations than the paper’s benchmarks implied was possible. Nevertheless, I found that there were some limitations in Hotspot’s autovectoriser that I didn’t

Continue reading Matrix Multiplication Revisited

## Sum of Squares

Streams and lambdas, especially the limited support offered for primitive types, are a fantastic addition to the Java language. They’re not supposed to be fast, but how do these features compare to a good old for loop? For a simple calculation amenable to instruction level parallelism, I compare modern and traditional implementations and observe the

Continue reading Sum of Squares