Does Inlined Mean Streamlined? Part 1: Escape Analysis

There’s a lot of folklore about the importance of inlining in the JVM. Undoubtedly, inlining can improve performance by removing the overhead of function calls, but, more importantly, various optimisations are disabled or reduced in scope when it can’t happen. However, I think the importance of inlining is often overstated, especially considering the trade off […]

Observing Memory Level Parallelism with JMH

Quite some time ago I observed an effect where breaking a cache-inefficient shuffle algorithm into short stages could improve throughput: when cache misses were likely, an improvement could be seen in throughput as a function of stage length. The implementations benchmarked were as follows, where op is either precomputed (a closure over an array of […]

Mixing Vector and Scalar Instructions

I saw an interesting tweet from one of the developers of Pilosa this week, reporting performance improvements from unrolling a bitwise reduction in Go. This surprised me because Go seems to enjoy a reputation for being a high performance language, and it certainly has great support for concurrency, but compilers should unroll loops as standard […]

Limiting Factors in a Dot Product Calculation

The dot product is a simple calculation which reduces two vectors to the sum of their element-wise products. The calculation has a variety of applications and is used heavily in neural networks, linear regression and in search. What are the constraints on its computational performance? The combination of the computational simplicity and its streaming nature […]