avx2

Benchmarking

Mixing Vector and Scalar Instructions

I saw an interesting tweet from one of the developers of Pilosa this week, reporting performance improvements from unrolling a bitwise reduction in Go. This surprised me because Go seems to enjoy a reputation for being a high performance language, and it certainly has great support for concurrency, but compilers should unroll loops as standard […]

Read More
Java

Base64 Encoding

Base64 encoding is the standard mechanism of converting binary data to text, and is used extensively in web technologies. You will encounter it wherever binary data such as authentication tokens or compressed BLOBs meet JSON. There are two common Base64 encoding formats: a standard format and an URL-safe format which replaces ‘/’ with ‘_’ and […]

Read More
Benchmarking
Java

Limiting Factors in a Dot Product Calculation

The dot product is a simple calculation which reduces two vectors to the sum of their element-wise products. The calculation has a variety of applications and is used heavily in neural networks, linear regression and in search. What are the constraints on its computational performance? The combination of the computational simplicity and its streaming nature […]

Read More
Benchmarking

Matrix Multiplication Revisited

In a recent post, I took a look at matrix multiplication in pure Java, to see if it can go faster than reported in SIMD Intrinsics on Managed Language Runtimes. I found faster implementations than the paper’s benchmarks implied was possible. Nevertheless, I found that there were some limitations in Hotspot’s autovectoriser that I didn’t […]

Read More
Benchmarking
Java

The Much Aligned Garbage Collector

A power of two is often a good choice for the size of an array. Sometimes you might see this being exploited to replace an integer division with a bitwise intersection. You can see why with a toy benchmark of a bloom filter, which deliberately folds in a representative cost of a hash function and […]

Read More