Author: Richard Startin

Data Structures
Java

A Quick Look at RoaringBitmap

This article is an introduction to the data structures found in the RoaringBitmap library, which I have been making extensive use of recently. I wrote some time ago about the basic idea of bitmap indices, which are used in various databases and search engines, with the caveat that no traditional implementation is optimal across all data scenarios (in terms of […]

Read More
Data Structures

How a Bitmap Index Works

Bitmap indices are used in various data technologies for efficient query processing. At a high level, a bitmap index can be thought of as a physical materialisation of a set of predicates over a data set, is naturally columnar and particularly good for multidimensional boolean query processing. PostgreSQL materialises a bitmap index on the fly from query […]

Read More
Java

Advanced AOP with Guice Type Listeners

There are cross-cutting concerns, or aspects, in any non-trivial program. These blocks of code tend to be repetitive, unrelated to business logic, and don’t lend themselves to being factored out. If you have ever added the same statement at the start of several methods, you have encountered an aspect. For instance, audit, instrumentation, authentication, authorisation could all be […]

Read More
Java

Lifecycle Management with Guice Provision Listeners

Typically in a Java web application you will have services with resources which need lifecycle management – at the very least closing gracefully at shutdown. If you’d use a sledgehammer to crack a walnut, there’s Spring, which will do this for you with init and destroy methods. I’ll explain why I dislike Spring in another post. You […]

Read More
Big Data

Tuning Spark Back Pressure by Simulation

Spark back pressure, which can be enabled by setting spark.streaming.backpressure.enabled=true, will dynamically resize batches so as to avoid queue build up. It is implemented using a Proportional Integral Derivative (PID) algorithm. This algorithm has some interesting properties, including the lack of guarantee of a stable fixed point. This can manifest itself not just in transient overshoot, but in […]

Read More