Eliminating Bounds Checks with Intrinsics in Java 9

Open source projects are starting to capitalise on fast new intrinsics available in Java 9. For instance, Lucene is already experimenting with Arrays.mismatch, which I wrote about a few months ago, and Objects.checkIndex, which I didn’t know existed. A quick google search returned nothing on this method except the javadoc and a Baeldung blog post, which reiterates the javadoc. The intent behind these methods is obvious, and their signatures are simple enough to use blind, but why bother? What do you get, and when do you get it? This post analyses the range and index check methods to illustrate why it would be worthwhile for a project like Lucene, known for its performance, potentially supporting two versions of the same code. Edit: Robert Muir from Lucene points out in the comments here that his primary motivation for using these methods is actually consistency and correctness, rather than performance.

Objects.checkIndex, Arrays, and RangeCheckElimination

Suppose you have a double[] and want to perform a summation on the prefix of the array, expressed by a method parameter. You need to check that the parameter actually specifies a prefix of the array, but there also need to be bounds checks accessing the array. You might write code like this:


    @Benchmark
    public double RangePrefixSum_ManualCheck(BoundsCheckState state) {
        double[] data = state.data;
        int limit = state.index;
        if (limit < 0 || limit >= data.length) {
            throw new ArrayIndexOutOfBoundsException(limit + " >= " + data.length);
        }
        double result = 0D;
        for (int i = 0; i <= limit; ++i) {
            result += data[i];
        }
        return result;
    }

Or you could use Objects.checkIndex which takes an index and a length and will throw an ArrayIndexOutOfBoundsException if the index is not less than the length. The idea behind using it is that the JIT compiler can fuse this bounds check with any checks after the call, resulting in their elimination. You can see that this intrinsic functions as a “hint” rather than a handle to an assembly snippet in the source code. Using the new API you could write:


    @Benchmark
    public double PrefixRangeSum_CheckIndex(BoundsCheckState state) {
        double[] data = state.data;
        int limit = Objects.checkIndex(state.index, data.length);
        double result = 0D;
        for (int i = 0; i <= limit; ++i) {
            result += data[i];
        }
        return result;
    }

But does it make any difference? Or will both loops profit from RangeCheckElimination anyway, rendering any difference moot? As a point of comparison, I benchmark against the same code without any kind of precondition check so we can isolate the cost of performing the actual check.


    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public double RangePrefixSum_NoCheck(BoundsCheckState state) {
        double[] data = state.data;
        int limit = state.index;
        double result = 0D;
        for (int i = 0; i <= limit; ++i) {
            result += data[i];
        }
        return result;
    }

Running the benchmark, it turns out there is virtually no difference for shorter arrays, but there may be a small benefit when the arrays get longer. The assembly emitted is identical for each loop in any case. In short, bounds checks are probably being optimised away anyway; it is RangeCheckElimination, not Objects.checkIndex, taking effect here.

Benchmark Mode Threads Samples Score Score Error (99.9%) Unit Param: size
PrefixRangeSum_CheckIndex thrpt 1 10 8040.641837 371.264782 ops/ms 100
PrefixRangeSum_CheckIndex thrpt 1 10 716.093965 35.701464 ops/ms 1000
PrefixRangeSum_CheckIndex thrpt 1 10 73.321275 2.693158 ops/ms 10000
RangePrefixSum_ManualCheck thrpt 1 10 10433.584914 429.149725 ops/ms 100
RangePrefixSum_ManualCheck thrpt 1 10 676.266935 43.792141 ops/ms 1000
RangePrefixSum_ManualCheck thrpt 1 10 69.937342 5.215848 ops/ms 10000
RangePrefixSum_NoCheck thrpt 1 10 8754.536246 548.945401 ops/ms 100
RangePrefixSum_NoCheck thrpt 1 10 687.851103 37.487361 ops/ms 1000
RangePrefixSum_NoCheck thrpt 1 10 69.145729 2.783780 ops/ms 10000

For reference, here is the assembly used by the “naive” code:

com/openkappa/simd/boundscheck/CheckIndex.RangePrefixSum_ManualCheck(Lcom/openkappa/simd/boundscheck/BoundsCheckState;)D  [0x00000293bb8abe20, 0x00000293bb8abf38]  280 bytes
Argument 0 is unknown.RIP: 0x293bb8abe20 Code size: 0x00000118
[Entry Point]
[Verified Entry Point]
[Constants]
  # {method} {0x00000293d0b32228} 'RangePrefixSum_ManualCheck' '(Lcom/openkappa/simd/boundscheck/BoundsCheckState;)D' in 'com/openkappa/simd/boundscheck/CheckIndex'
  0x00000293bb8abe20: int3                      ;...cc

  0x00000293bb8abe21: nop     word ptr [rax+rax+0h]  ;...66
                                                ;...66
                                                ;...66
                                                ;...0f
                                                ;...1f
                                                ;...84
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00

  0x00000293bb8abe2c: nop                       ;...66
                                                ;...66
                                                ;...66
                                                ;...90

  0x00000293bb8abe30: mov     dword ptr [rsp+0ffffffffffff9000h],eax
                                                ;...89
                                                ;...84
                                                ;...24
                                                ;...00
                                                ;...90
                                                ;...ff
                                                ;...ff

  0x00000293bb8abe37: push    rbp               ;...55

  0x00000293bb8abe38: sub     rsp,50h           ;...48
                                                ;...83
                                                ;...ec
                                                ;...50

  0x00000293bb8abe3c: mov     r14,qword ptr [rdx+20h]  ;...4c
                                                ;...8b
                                                ;...72
                                                ;...20

  0x00000293bb8abe40: mov     ebx,dword ptr [rdx+18h]  ;...8b
                                                ;...5a
                                                ;...18

  0x00000293bb8abe43: mov     r13d,dword ptr [rdx]  ;...44
                                                ;...8b
                                                ;...2a

  0x00000293bb8abe46: vmovsd  xmm0,qword ptr [rdx+8h]  ;...c5
                                                ;...fb
                                                ;...10
                                                ;...42
                                                ;...08

  0x00000293bb8abe4b: vmovq   rbp,xmm0          ;...c4
                                                ;...e1
                                                ;...f9
                                                ;...7e
                                                ;...c5

  0x00000293bb8abe50: mov     rcx,rdx           ;...48
                                                ;...8b
                                                ;...ca

  0x00000293bb8abe53: mov     r10,5b517c90h     ;...49
                                                ;...ba
                                                ;...90
                                                ;...7c
                                                ;...51
                                                ;...5b
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00

  0x00000293bb8abe5d: call indirect r10         ;...41
                                                ;...ff
                                                ;...d2

  0x00000293bb8abe60: test    r14,r14           ;...4d
                                                ;...85
                                                ;...f6

  0x00000293bb8abe63: je      293bb8abecdh      ;...74
                                                ;...68

  0x00000293bb8abe65: mov     r11d,dword ptr [r14+8h]  ;...45
                                                ;...8b
                                                ;...5e
                                                ;...08

  0x00000293bb8abe69: cmp     r11d,0f80000b9h   ;...41
                                                ;...81
                                                ;...fb
                                                ;...b9
                                                ;...00
                                                ;...00
                                                ;...f8
                                                ;   {metadata({type array double})}
  0x00000293bb8abe70: jne     293bb8abef1h      ;...0f
                                                ;...85
                                                ;...7b
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;*iload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::RangePrefixSum_ManualCheck@42 (line 22)

  0x00000293bb8abe76: cmp     r13d,ebx          ;...44
                                                ;...3b
                                                ;...eb

  0x00000293bb8abe79: jnle    293bb8abec6h      ;...7f
                                                ;...4b
                                                ;*if_icmpgt {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::RangePrefixSum_ManualCheck@45 (line 22)

  0x00000293bb8abe7b: mov     r11d,dword ptr [r14+0ch]
                                                ;...45
                                                ;...8b
                                                ;...5e
                                                ;...0c
                                                ;*daload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::RangePrefixSum_ManualCheck@53 (line 23)
                                                ; implicit exception: dispatches to 0x00000293bb8abf0d
  0x00000293bb8abe7f: cmp     r13d,r11d         ;...45
                                                ;...3b
                                                ;...eb

  0x00000293bb8abe82: jnb     293bb8abed7h      ;...73
                                                ;...53

  0x00000293bb8abe84: vmovq   xmm0,rbp          ;...c4
                                                ;...e1
                                                ;...f9
                                                ;...6e
                                                ;...c5

  0x00000293bb8abe89: vaddsd  xmm0,xmm0,mmword ptr [r14+r13*8+10h]
                                                ;...c4
                                                ;...81
                                                ;...7b
                                                ;...58
                                                ;...44
                                                ;...ee
                                                ;...10
                                                ;*dadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::RangePrefixSum_ManualCheck@54 (line 23)

  0x00000293bb8abe90: inc     r13d              ;...41
                                                ;...ff
                                                ;...c5
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::RangePrefixSum_ManualCheck@57 (line 22)

  0x00000293bb8abe93: cmp     r13d,ebx          ;...44
                                                ;...3b
                                                ;...eb

  0x00000293bb8abe96: jnle    293bb8abebah      ;...7f
                                                ;...22
                                                ;*if_icmpgt {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::RangePrefixSum_ManualCheck@45 (line 22)

  0x00000293bb8abe98: nop     dword ptr [rax+rax+0h]  ;...0f
                                                ;...1f
                                                ;...84
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;*daload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::RangePrefixSum_ManualCheck@53 (line 23)

  0x00000293bb8abea0: cmp     r13d,r11d         ;...45
                                                ;...3b
                                                ;...eb

  0x00000293bb8abea3: jnb     293bb8abed2h      ;...73
                                                ;...2d

  0x00000293bb8abea5: vaddsd  xmm0,xmm0,mmword ptr [r14+r13*8+10h]
                                                ;...c4
                                                ;...81
                                                ;...7b
                                                ;...58
                                                ;...44
                                                ;...ee
                                                ;...10
                                                ;*dadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::RangePrefixSum_ManualCheck@54 (line 23)

  0x00000293bb8abeac: inc     r13d              ;...41
                                                ;...ff
                                                ;...c5
                                                ; ImmutableOopMap{r14=Oop }
                                                ;*goto {reexecute=1 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::RangePrefixSum_ManualCheck@60 (line 22)

  0x00000293bb8abeaf: test    dword ptr [293a7e50000h],eax
                                                ;...85
                                                ;...05
                                                ;...4b
                                                ;...41
                                                ;...5a
                                                ;...ec
                                                ;*goto {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::RangePrefixSum_ManualCheck@60 (line 22)
                                                ;   {poll}
  0x00000293bb8abeb5: cmp     r13d,ebx          ;...44
                                                ;...3b
                                                ;...eb

  0x00000293bb8abeb8: jle     293bb8abea0h      ;...7e
                                                ;...e6
                                                ;*iload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::RangePrefixSum_ManualCheck@42 (line 22)

  0x00000293bb8abeba: add     rsp,50h           ;...48
                                                ;...83
                                                ;...c4
                                                ;...50

  0x00000293bb8abebe: pop     rbp               ;...5d

  0x00000293bb8abebf: test    dword ptr [293a7e50000h],eax
                                                ;...85
                                                ;...05
                                                ;...3b
                                                ;...41
                                                ;...5a
                                                ;...ec
                                                ;   {poll_return}
  0x00000293bb8abec5: ret                       ;...c3

And using Objects.checkIndex we see essentially the same thing.

com/openkappa/simd/boundscheck/CheckIndex.PrefixRangeSum_CheckIndex(Lcom/openkappa/simd/boundscheck/BoundsCheckState;)D  [0x0000028c1a51d4a0, 0x0000028c1a51d5f8]  344 bytes
Argument 0 is unknown.RIP: 0x28c1a51d4a0 Code size: 0x00000158
[Entry Point]
[Constants]
  # {method} {0x0000028c2f7a1dd0} 'PrefixRangeSum_CheckIndex' '(Lcom/openkappa/simd/boundscheck/BoundsCheckState;)D' in 'com/openkappa/simd/boundscheck/CheckIndex'
  # this:     rdx:rdx   = 'com/openkappa/simd/boundscheck/CheckIndex'
  # parm0:    r8:r8     = 'com/openkappa/simd/boundscheck/BoundsCheckState'
  #           [sp+0x30]  (sp of caller)
  0x0000028c1a51d4a0: mov     r10d,dword ptr [rdx+8h]  ;...44
                                                ;...8b
                                                ;...52
                                                ;...08

  0x0000028c1a51d4a4: shl     r10,3h            ;...49
                                                ;...c1
                                                ;...e2
                                                ;...03

  0x0000028c1a51d4a8: cmp     rax,r10           ;...49
                                                ;...3b
                                                ;...c2

  0x0000028c1a51d4ab: jne     28c12a7c200h      ;...0f
                                                ;...85
                                                ;...4f
                                                ;...ed
                                                ;...55
                                                ;...f8
                                                ;   {runtime_call ic_miss_stub}
  0x0000028c1a51d4b1: nop                       ;...66
                                                ;...66
                                                ;...90

  0x0000028c1a51d4b4: nop     dword ptr [rax+rax+0h]  ;...0f
                                                ;...1f
                                                ;...84
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00

  0x0000028c1a51d4bc: nop                       ;...66
                                                ;...66
                                                ;...66
                                                ;...90

[Verified Entry Point]
  0x0000028c1a51d4c0: mov     dword ptr [rsp+0ffffffffffff9000h],eax
                                                ;...89
                                                ;...84
                                                ;...24
                                                ;...00
                                                ;...90
                                                ;...ff
                                                ;...ff

  0x0000028c1a51d4c7: push    rbp               ;...55

  0x0000028c1a51d4c8: sub     rsp,20h           ;...48
                                                ;...83
                                                ;...ec
                                                ;...20
                                                ;*synchronization entry
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@-1 (line 43)

  0x0000028c1a51d4cc: mov     r10d,dword ptr [r8+14h]  ;...45
                                                ;...8b
                                                ;...50
                                                ;...14
                                                ;*getfield data {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@1 (line 43)
                                                ; implicit exception: dispatches to 0x0000028c1a51d5bd
  0x0000028c1a51d4d0: mov     r11d,dword ptr [r12+r10*8+0ch]
                                                ;...47
                                                ;...8b
                                                ;...5c
                                                ;...d4
                                                ;...0c
                                                ;*arraylength {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@10 (line 44)
                                                ; implicit exception: dispatches to 0x0000028c1a51d5c9
  0x0000028c1a51d4d5: mov     ebp,dword ptr [r8+10h]  ;...41
                                                ;...8b
                                                ;...68
                                                ;...10
                                                ;*getfield index {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@6 (line 44)

  0x0000028c1a51d4d9: cmp     ebp,r11d          ;...41
                                                ;...3b
                                                ;...eb

  0x0000028c1a51d4dc: jnb     28c1a51d587h      ;...0f
                                                ;...83
                                                ;...a5
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0}
                                                ; - java.util.Objects::checkIndex@3 (line 372)
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@11 (line 44)

  0x0000028c1a51d4e2: test    r11d,r11d         ;...45
                                                ;...85
                                                ;...db

  0x0000028c1a51d4e5: jbe     28c1a51d59dh      ;...0f
                                                ;...86
                                                ;...b2
                                                ;...00
                                                ;...00
                                                ;...00

  0x0000028c1a51d4eb: movsxd  r11,r11d          ;...4d
                                                ;...63
                                                ;...db

  0x0000028c1a51d4ee: mov     r8d,ebp           ;...44
                                                ;...8b
                                                ;...c5

  0x0000028c1a51d4f1: inc     r8d               ;...41
                                                ;...ff
                                                ;...c0

  0x0000028c1a51d4f4: movsxd  r9,r8d            ;...4d
                                                ;...63
                                                ;...c8

  0x0000028c1a51d4f7: dec     r9                ;...49
                                                ;...ff
                                                ;...c9

  0x0000028c1a51d4fa: cmp     r9,r11            ;...4d
                                                ;...3b
                                                ;...cb

  0x0000028c1a51d4fd: jnb     28c1a51d59dh      ;...0f
                                                ;...83
                                                ;...9a
                                                ;...00
                                                ;...00
                                                ;...00

  0x0000028c1a51d503: cmp     ebp,7ffffffeh     ;...81
                                                ;...fd
                                                ;...fe
                                                ;...ff
                                                ;...ff
                                                ;...7f

  0x0000028c1a51d509: jnle    28c1a51d5adh      ;...0f
                                                ;...8f
                                                ;...9e
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;*dload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@27 (line 47)

  0x0000028c1a51d50f: add     ebp,0fffffffeh    ;...83
                                                ;...c5
                                                ;...fe

  0x0000028c1a51d512: lea     r11,[r12+r10*8]   ;...4f
                                                ;...8d
                                                ;...1c
                                                ;...d4

  0x0000028c1a51d516: vxorpd  xmm0,xmm0,xmm0    ;...c5
                                                ;...f9
                                                ;...57
                                                ;...c0

  0x0000028c1a51d51a: vaddsd  xmm0,xmm0,mmword ptr [r12+r10*8+10h]
                                                ;...c4
                                                ;...81
                                                ;...7b
                                                ;...58
                                                ;...44
                                                ;...d4
                                                ;...10
                                                ;*dadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@33 (line 47)

  0x0000028c1a51d521: mov     r9d,80000000h     ;...41
                                                ;...b9
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...80

  0x0000028c1a51d527: cmp     r8d,ebp           ;...44
                                                ;...3b
                                                ;...c5

  0x0000028c1a51d52a: cmovl   ebp,r9d           ;...41
                                                ;...0f
                                                ;...4c
                                                ;...e9

  0x0000028c1a51d52e: mov     r9d,1h            ;...41
                                                ;...b9
                                                ;...01
                                                ;...00
                                                ;...00
                                                ;...00

  0x0000028c1a51d534: cmp     ebp,1h            ;...83
                                                ;...fd
                                                ;...01

  0x0000028c1a51d537: jle     28c1a51d565h      ;...7e
                                                ;...2c

  0x0000028c1a51d539: nop     dword ptr [rax+0h]  ;...0f
                                                ;...1f
                                                ;...80
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;...00
                                                ;*dload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@27 (line 47)

  0x0000028c1a51d540: vaddsd  xmm0,xmm0,mmword ptr [r11+r9*8+10h]
                                                ;...c4
                                                ;...81
                                                ;...7b
                                                ;...58
                                                ;...44
                                                ;...cb
                                                ;...10

  0x0000028c1a51d547: vaddsd  xmm0,xmm0,mmword ptr [r11+r9*8+18h]
                                                ;...c4
                                                ;...81
                                                ;...7b
                                                ;...58
                                                ;...44
                                                ;...cb
                                                ;...18

  0x0000028c1a51d54e: vaddsd  xmm0,xmm0,mmword ptr [r11+r9*8+20h]
                                                ;...c4
                                                ;...81
                                                ;...7b
                                                ;...58
                                                ;...44
                                                ;...cb
                                                ;...20

  0x0000028c1a51d555: vaddsd  xmm0,xmm0,mmword ptr [r11+r9*8+28h]
                                                ;...c4
                                                ;...81
                                                ;...7b
                                                ;...58
                                                ;...44
                                                ;...cb
                                                ;...28
                                                ;*dadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@33 (line 47)

  0x0000028c1a51d55c: add     r9d,4h            ;...41
                                                ;...83
                                                ;...c1
                                                ;...04
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@36 (line 46)

  0x0000028c1a51d560: cmp     r9d,ebp           ;...44
                                                ;...3b
                                                ;...cd

  0x0000028c1a51d563: jl      28c1a51d540h      ;...7c
                                                ;...db
                                                ;*if_icmpgt {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@24 (line 46)

  0x0000028c1a51d565: cmp     r9d,r8d           ;...45
                                                ;...3b
                                                ;...c8

  0x0000028c1a51d568: jnl     28c1a51d57bh      ;...7d
                                                ;...11

  0x0000028c1a51d56a: nop                       ;...66
                                                ;...90
                                                ;*dload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@27 (line 47)

  0x0000028c1a51d56c: vaddsd  xmm0,xmm0,mmword ptr [r11+r9*8+10h]
                                                ;...c4
                                                ;...81
                                                ;...7b
                                                ;...58
                                                ;...44
                                                ;...cb
                                                ;...10
                                                ;*dadd {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@33 (line 47)

  0x0000028c1a51d573: inc     r9d               ;...41
                                                ;...ff
                                                ;...c1
                                                ;*iinc {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@36 (line 46)

  0x0000028c1a51d576: cmp     r9d,r8d           ;...45
                                                ;...3b
                                                ;...c8

  0x0000028c1a51d579: jl      28c1a51d56ch      ;...7c
                                                ;...f1
                                                ;*if_icmpgt {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@24 (line 46)

  0x0000028c1a51d57b: add     rsp,20h           ;...48
                                                ;...83
                                                ;...c4
                                                ;...20

  0x0000028c1a51d57f: pop     rbp               ;...5d

  0x0000028c1a51d580: test    dword ptr [28c08310000h],eax
                                                ;...85
                                                ;...05
                                                ;...7a
                                                ;...2a
                                                ;...df
                                                ;...ed
                                                ;   {poll_return}
  0x0000028c1a51d586: ret                       ;...c3

  0x0000028c1a51d587: mov     edx,0ffffffe4h    ;...ba
                                                ;...e4
                                                ;...ff
                                                ;...ff
                                                ;...ff

  0x0000028c1a51d58c: mov     dword ptr [rsp],r10d  ;...44
                                                ;...89
                                                ;...14
                                                ;...24

  0x0000028c1a51d590: mov     dword ptr [rsp+4h],r11d  ;...44
                                                ;...89
                                                ;...5c
                                                ;...24
                                                ;...04

  0x0000028c1a51d595: nop                       ;...66
                                                ;...90

  0x0000028c1a51d597: call    28c12a7de80h      ;...e8
                                                ;...e4
                                                ;...08
                                                ;...56
                                                ;...f8
                                                ; ImmutableOopMap{[0]=NarrowOop }
                                                ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0}
                                                ; - java.util.Objects::checkIndex@3 (line 372)
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@11 (line 44)
                                                ;   {runtime_call UncommonTrapBlob}
  0x0000028c1a51d59c: int3                      ;...cc
                                                ;*invokestatic checkIndex {reexecute=0 rethrow=0 return_oop=0}
                                                ; - java.util.Objects::checkIndex@3 (line 372)
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@11 (line 44)

  0x0000028c1a51d59d: mov     edx,0ffffff86h    ;...ba
                                                ;...86
                                                ;...ff
                                                ;...ff
                                                ;...ff

  0x0000028c1a51d5a2: mov     dword ptr [rsp],r10d  ;...44
                                                ;...89
                                                ;...14
                                                ;...24

  0x0000028c1a51d5a6: nop                       ;...90

  0x0000028c1a51d5a7: call    28c12a7de80h      ;...e8
                                                ;...d4
                                                ;...08
                                                ;...56
                                                ;...f8
                                                ; ImmutableOopMap{[0]=NarrowOop }
                                                ;*dload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@27 (line 47)
                                                ;   {runtime_call UncommonTrapBlob}
  0x0000028c1a51d5ac: int3                      ;...cc

  0x0000028c1a51d5ad: mov     edx,0ffffff7eh    ;...ba
                                                ;...7e
                                                ;...ff
                                                ;...ff
                                                ;...ff

  0x0000028c1a51d5b2: mov     dword ptr [rsp],r10d  ;...44
                                                ;...89
                                                ;...14
                                                ;...24

  0x0000028c1a51d5b6: nop                       ;...90

  0x0000028c1a51d5b7: call    28c12a7de80h      ;...e8
                                                ;...c4
                                                ;...08
                                                ;...56
                                                ;...f8
                                                ; ImmutableOopMap{[0]=NarrowOop }
                                                ;*dload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@27 (line 47)
                                                ;   {runtime_call UncommonTrapBlob}
  0x0000028c1a51d5bc: int3                      ;...cc
                                                ;*dload {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@27 (line 47)

  0x0000028c1a51d5bd: mov     edx,0fffffff6h    ;...ba
                                                ;...f6
                                                ;...ff
                                                ;...ff
                                                ;...ff

  0x0000028c1a51d5c2: nop                       ;...90

  0x0000028c1a51d5c3: call    28c12a7de80h      ;...e8
                                                ;...b8
                                                ;...08
                                                ;...56
                                                ;...f8
                                                ; ImmutableOopMap{}
                                                ;*getfield data {reexecute=1 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@1 (line 43)
                                                ;   {runtime_call UncommonTrapBlob}
  0x0000028c1a51d5c8: int3                      ;...cc
                                                ;*getfield data {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@1 (line 43)

  0x0000028c1a51d5c9: mov     edx,0fffffff6h    ;...ba
                                                ;...f6
                                                ;...ff
                                                ;...ff
                                                ;...ff

  0x0000028c1a51d5ce: nop                       ;...90

  0x0000028c1a51d5cf: call    28c12a7de80h      ;...e8
                                                ;...ac
                                                ;...08
                                                ;...56
                                                ;...f8
                                                ; ImmutableOopMap{}
                                                ;*arraylength {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@10 (line 44)
                                                ;   {runtime_call UncommonTrapBlob}
  0x0000028c1a51d5d4: int3                      ;...cc
                                                ;*arraylength {reexecute=0 rethrow=0 return_oop=0}
                                                ; - com.openkappa.simd.boundscheck.CheckIndex::PrefixRangeSum_CheckIndex@10 (line 44)

  0x0000028c1a51d5d5: hlt                       ;...f4

  0x0000028c1a51d5d6: hlt                       ;...f4

  0x0000028c1a51d5d7: hlt                       ;...f4

  0x0000028c1a51d5d8: hlt                       ;...f4

  0x0000028c1a51d5d9: hlt                       ;...f4

  0x0000028c1a51d5da: hlt                       ;...f4

  0x0000028c1a51d5db: hlt                       ;...f4

  0x0000028c1a51d5dc: hlt                       ;...f4

  0x0000028c1a51d5dd: hlt                       ;...f4

  0x0000028c1a51d5de: hlt                       ;...f4

  0x0000028c1a51d5df: hlt                       ;...f4

Point Access Range Check Elimination

If there’s no gain rolling up an array, because the optimisations applied to loops over arrays are so good, what if you need to provide random access to some kind of collection? You need to perform checks. We can put it to the test by comparing these two pieces of code:


    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public double PointAccess_ManualCheck(BoundsCheckState state) {
        double[] data = state.data;
        int index = state.index;
        if (index < 0 || index >= data.length) {
            throw new ArrayIndexOutOfBoundsException(index + " >= " + data.length);
        }
        return data[index];
    }

    @Benchmark
    @CompilerControl(CompilerControl.Mode.DONT_INLINE)
    public double PointAccess_CheckIndex(BoundsCheckState state) {
        double[] data = state.data;
        int index = Objects.checkIndex(state.index, data.length);
        return data[index];
    }

Again, disappointingly, there’s virtually no difference, and this time it’s in the naive approach’s favour.

Benchmark Mode Threads Samples Score Score Error (99.9%) Unit Param: size
PointAccess_CheckIndex thrpt 1 10 79259.682293 10737.451258 ops/ms 100
PointAccess_CheckIndex thrpt 1 10 59719.861827 15039.268226 ops/ms 1000
PointAccess_CheckIndex thrpt 1 10 25541.582169 18540.720843 ops/ms 10000
PointAccess_ManualCheck thrpt 1 10 81280.935102 7639.676934 ops/ms 100
PointAccess_ManualCheck thrpt 1 10 61888.048023 19194.480590 ops/ms 1000
PointAccess_ManualCheck thrpt 1 10 27379.437571 17218.454466 ops/ms 10000

The nice thing about not using the intrinsic is you can control the error message, and while I’m hesitant to make negative proclamations (perhaps my benchmark code is flawed), I am in no rush to use this method.

4 comments on “Eliminating Bounds Checks with Intrinsics in Java 9

  • Thanks for taking the time to run this benchmark and post results! I think you benchmarked a case where things already “work” without using this method, see the discussion/motivation here: https://bugs.openjdk.java.net/browse/JDK-8042997

    In the case of Lucene, the motivation behind using the method is instead correctness, relates to comments about “difficult to get right” and so on. This recent bug motivated me to want to clean this up: https://issues.apache.org/jira/browse/LUCENE-7946

    None of the stuff here in Lucene is really a serious hotspot, so having consistent checks is important to prevent confusing problems like that (e.g. where an incomplete check causes chaos later). In this case it probably wasted time for a new contributor trying to add Bengali language support.

    Same goes with the motivation behind using the new Arrays.mismatch/compare/equals methods, lucene has a lot of inconsistent stuff in its arrays code, its really problematic. In that case the intrinsics might really be faster because of vectorization, but thats secondary.

    Reply
    • Sure, I was just motivated to find when/if there’s a performance improvement from using the method, and there’s an absence of evidence for that in this post. There was some interest on twitter, triggered by your ticket, about the rationale for the change and bounds checking elimination. Like I say, arrays are optimised very well already, so it was perhaps not the best place to look for performance gains. On the other hand, the benefits of Arrays.mismatch are much clearer cut.

      It’s useful to clarify, especially given some of the interest on twitter, that what you hope to gain from these new intrinsics is consistency.

      Reply
      • If you are more interested in performance, maybe you will find an improvement if using Unsafe or similar like the JDK ticket example?

        Yes for me the goal is definitely consistency. Only one of the Objects.check* methods is even intrinsified, the other two are not: but they are useful!

        Arrays.mismatch/compareUnsigned/ brings similar benefits from my perspective, they all have good checks, where the current lucene methods do not.

        Reply
        • Yes, I think you’re right. It’s useful to know exactly what this feature made it in to the API for. I’m unlikely to write code using Unsafe directly any time soon so am happy to be ignorant about the performance trade off, but might investigate byte buffers and update the post.

          Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>