Multiplying Matrices, Fast and Slow

I recently read a very interesting blog post about exposing Intel SIMD intrinsics via a fork of the Scala compiler (scala-virtualized), which reports multiplicative improvements in throughput over HotSpot JIT compiled code. The academic paper (SIMD Intrinsics on Managed Language Runtimes), which has been accepted at CGO 2018, proposes a powerful alternative to the traditional […]

Autovectorised FMA in JDK10

Fused-multiply-add (FMA) allows floating point expressions of the form a * x + b to be evaluated in a single instruction, which is useful for numerical linear algebra. Despite the obvious appeal of FMA, JVM implementors are rather constrained when it comes to floating point arithmetic because Java programs are expected to be reproducible across […]

Spliterator Characteristics and Performance

The streams API has been around for a while now, and I’m a big fan of it. It allows for a clean declarative programming style, which permits various optimisations to occur, and keeps the pastafarians at bay. I also think the Stream is the perfect abstraction for data interchange across API boundaries. This is partly […]

Don’t Fear the Builder

Writing a nontrivial piece of code requires a programmer to make countless tiny decisions, and to produce good quality code quickly, many of these decisions must be made subconciously. But what happens if your instinct is mostly superstitious bullshit? I instinctively avoid allocating objects when I don’t have to, so would not voluntarily resort to […]