Data

UUIDs and Compressibility

Universally unique identifiers, or UUIDs, are often used for database primary keys in scenarios where coordination of persistence is either impossible or impractical. UUIDs offer very good probabilistic guarantees of collision avoidance, at the cost of 128 bits per key. 128 bits for a key is quite problematic in key scans and joins: with appropriately […]

Read More
Benchmarking
Java

Limiting Factors in a Dot Product Calculation

The dot product is a simple calculation which reduces two vectors to the sum of their element-wise products. The calculation has a variety of applications and is used heavily in neural networks, linear regression and in search. What are the constraints on its computational performance? The combination of the computational simplicity and its streaming nature […]

Read More
Java

Vectorised Algorithms in Java

There has been a Cambrian explosion of JVM data technologies in recent years. It’s all very exciting, but is the JVM really competitive with C in this area? I would argue that there is a reason Apache Arrow is polyglot, and it’s not just interoperability with Python. To pick on one project impressive enough to […]

Read More
Benchmarking

Stages

The conventional wisdom is that the stages of a task should be pipelined, so you don’t need to wait for the completion of one stage before the next is started. It surprises me that it seems you can sometimes do better when performing each stage of a pipeline in a short batch. Useful optimisation opportunities […]

Read More
Data Structures

Data Driven Logic

I really don’t like reading or writing blocks of if-else statements. They make my eyes glaze over. Rumour has it that processors don’t like executing them either, though that’s less true now than it once was. There are two problems with these blocks of statements, and neither one of them is performance: They are hard […]

Read More