Update 05/09/2017: I have recently discovered that this phenomenon was first described by Alexey Shipilev on an OpenJDK ticket in 2014. As Alexey notes, this is due to the strength reduction of a multiplication by 31 to a shift and a subtraction, which creates a data dependency preventing an unroll. You can see the assembly emitted, confirming this data dependency, in the comments of this post.
One of the things to keep in mind with Java is that the best performance advice for one version may not apply to the next. The JVM has an optimiser which works by detecting intent in byte-code; it does a better job when the programmer is skilled enough to make that intent clear. There have been times when the optimiser has done a bad job, either through bugs or feature gaps, and compensatory idioms emerge. The value of these idioms degrades over time as the optimiser improves; a stroke of ingenuity in one version can become ritualistic nonsense in the next. It’s important not to be that guy who fears adding strings together because it was costly a decade ago, but does it always get better, even with low hanging fruit?
I reproduced the results of an extremely astute optimisation presented in 2015 by Daniel Lemire. I was hoping to see that an improved optimiser in Java 9, having observed several cases of automatic vectorisation, would render this optimisation null.
Hash Codes for Arrays
I encourage you to read the original blog post because it is informative, but for the sake of coherence I will summarise the key point here. Arrays.hashCode implements the following hash code:
public static int hashCode(int[] a) {
if (a == null)
return 0;
int result = 1;
for (int element : a)
result = 31 * result + element;
return result;
}
This results in a good hash code, but a scalar internal representation of this code has a problem: a data dependency on the multiplication, which is slower than moving data into registers. A significant and reproducible speed up can be observed when unrolling the loop, which enforces a prefetch of 128 bits of the array into registers before doing the slower multiplications:
public static int hashCode(int[] a) {
if (a == null)
return 0;
int result = 1;
int i = 0;
for (; i + 3 < a.length; i += 4) {
result = 31 * 31 * 31 * 31 * result
+ 31 * 31 * 31 * a[i]
+ 31 * 31 * a[i + 1]
+ 31 * a[i + 2]
+ a[i + 3];
}
for (; i < a.length; i++) {
result = 31 * result + a[i];
}
return result;
}
The improvement in performance from this simple change can be confirmed with Java 8 by running a simple benchmark.
| Benchmark | Mode | Threads | Samples | Score | Score Error (99.9%) | Unit | Param: size |
|---|---|---|---|---|---|---|---|
| BuiltIn | thrpt | 1 | 10 | 9.537736 | 0.382617 | ops/us | 100 |
| BuiltIn | thrpt | 1 | 10 | 0.804620 | 0.103037 | ops/us | 1000 |
| BuiltIn | thrpt | 1 | 10 | 0.092297 | 0.003947 | ops/us | 10000 |
| Unrolled | thrpt | 1 | 10 | 14.974398 | 0.628522 | ops/us | 100 |
| Unrolled | thrpt | 1 | 10 | 1.514986 | 0.035759 | ops/us | 1000 |
| Unrolled | thrpt | 1 | 10 | 0.137408 | 0.010200 | ops/us | 10000 |
The performance improvement is so obvious, and the change so easy to make, that one wonders why JVM vendors didn’t make the change themselves.
Java 9: Universally Better Automatic Optimisations?
As the comments on the blog post suggest, this is a prime candidate for vectorisation. Auto-vectorisation is a thing in Java 9. Using intrinsics or code clean enough to express intent clearly, you can really expect to see good usage of SIMD in Java 9. I was really expecting the situation to reverse in Java 9; but it doesn’t.
| Benchmark | Mode | Threads | Samples | Score | Score Error (99.9%) | Unit | Param: size |
|---|---|---|---|---|---|---|---|
| BuiltIn | thrpt | 1 | 10 | 9.822568 | 0.381087 | ops/us | 100 |
| BuiltIn | thrpt | 1 | 10 | 0.949273 | 0.024021 | ops/us | 1000 |
| BuiltIn | thrpt | 1 | 10 | 0.093171 | 0.004502 | ops/us | 10000 |
| Unrolled | thrpt | 1 | 10 | 13.762617 | 0.440135 | ops/us | 100 |
| Unrolled | thrpt | 1 | 10 | 1.501106 | 0.094897 | ops/us | 1000 |
| Unrolled | thrpt | 1 | 10 | 0.139963 | 0.011487 | ops/us | 10000 |
This is a still a smart optimisation two years later, but it offends my sensibilities in the same way hints do in SQL – a layer of abstraction has been punctured. I will have to try again in Java 10.
Have you had a look at how Java compiles this? It is possible (as hinted in my original blog post) that it does not use a multiplication, but rather a shift followed by a subtraction.
The String hashing is not the easiest code to autovectorize. You are receiving 8-bit values, that you must somehow upsample to 32-bit values and then multiply. However, what you are looking at is hashing arrays of 32-bit integers to a 32-bit integer. Something much easier to vectorize. Still, I can see how it might fall off from targetted autovectorization routines.
I’ve looked at this and am going to write about it, and will post the emitted assembly, soon in a post about removing false dependencies.
Here is the assembly code for each of these routines for reference:
Arrays.hashCode:
java/util/Arrays.hashCode([I)I [0x0000011b510aa4a0, 0x0000011b510aa698] 504 bytes Argument 0 is unknown.RIP: 0x11b510aa4a0 Code size: 0x000001f8 [Entry Point] [Verified Entry Point] [Constants] # {method} {0x0000011b65cb3448} 'hashCode' '([I)I' in 'java/util/Arrays' 0x0000011b510aa4a0: int3 ;...cc 0x0000011b510aa4a1: nop word ptr [rax+rax+0h] ;...66 ;...66 ;...66 ;...0f ;...1f ;...84 ;...00 ;...00 ;...00 ;...00 ;...00 0x0000011b510aa4ac: nop ;...66 ;...66 ;...66 ;...90 0x0000011b510aa4b0: mov dword ptr [rsp+0ffffffffffff9000h],eax ;...89 ;...84 ;...24 ;...00 ;...90 ;...ff ;...ff 0x0000011b510aa4b7: push rbp ;...55 0x0000011b510aa4b8: sub rsp,50h ;...48 ;...83 ;...ec ;...50 0x0000011b510aa4bc: mov r14,qword ptr [rdx+18h] ;...4c ;...8b ;...72 ;...18 0x0000011b510aa4c0: mov ebx,dword ptr [rdx+20h] ;...8b ;...5a ;...20 0x0000011b510aa4c3: mov r13d,dword ptr [rdx+8h] ;...44 ;...8b ;...6a ;...08 0x0000011b510aa4c7: mov ebp,dword ptr [rdx+10h] ;...8b ;...6a ;...10 0x0000011b510aa4ca: mov rcx,rdx ;...48 ;...8b ;...ca 0x0000011b510aa4cd: mov r10,51da8d20h ;...49 ;...ba ;...20 ;...8d ;...da ;...51 ;...00 ;...00 ;...00 ;...00 0x0000011b510aa4d7: call indirect r10 ;...41 ;...ff ;...d2 0x0000011b510aa4da: test r14,r14 ;...4d ;...85 ;...f6 0x0000011b510aa4dd: je 11b510aa619h ;...0f ;...84 ;...36 ;...01 ;...00 ;...00 0x0000011b510aa4e3: mov r10d,dword ptr [r14+8h] ;...45 ;...8b ;...56 ;...08 0x0000011b510aa4e7: cmp r10d,0f800016dh ;...41 ;...81 ;...fa ;...6d ;...01 ;...00 ;...f8 ; {metadata({type array int})} 0x0000011b510aa4ee: jne 11b510aa63dh ;...0f ;...85 ;...49 ;...01 ;...00 ;...00 ;*iload {reexecute=0 rethrow=0 return_oop=0} ; - java.util.Arrays::hashCode@16 (line 4487) 0x0000011b510aa4f4: cmp r13d,ebp ;...44 ;...3b ;...ed 0x0000011b510aa4f7: jnl 11b510aa60bh ;...0f ;...8d ;...0e ;...01 ;...00 ;...00 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - java.util.Arrays::hashCode@19 (line 4487) 0x0000011b510aa4fd: mov r8d,r13d ;...45 ;...8b ;...c5 0x0000011b510aa500: inc r8d ;...41 ;...ff ;...c0 0x0000011b510aa503: xor r11d,r11d ;...45 ;...33 ;...db 0x0000011b510aa506: cmp r8d,r11d ;...45 ;...3b ;...c3 0x0000011b510aa509: cmovl r8d,r11d ;...45 ;...0f ;...4c ;...c3 0x0000011b510aa50d: cmp r8d,ebp ;...44 ;...3b ;...c5 0x0000011b510aa510: cmovnle r8d,ebp ;...44 ;...0f ;...4f ;...c5 0x0000011b510aa514: mov ecx,dword ptr [r14+0ch] ;...41 ;...8b ;...4e ;...0c ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - java.util.Arrays::hashCode@25 (line 4487) ; implicit exception: dispatches to 0x0000011b510aa659 0x0000011b510aa518: cmp r13d,ecx ;...44 ;...3b ;...e9 0x0000011b510aa51b: jnb 11b510aa621h ;...0f ;...83 ;...00 ;...01 ;...00 ;...00 0x0000011b510aa521: mov edi,ebx ;...8b ;...fb 0x0000011b510aa523: shl edi,5h ;...c1 ;...e7 ;...05 0x0000011b510aa526: sub edi,ebx ;...2b ;...fb 0x0000011b510aa528: add edi,dword ptr [r14+r13*4+10h] ;...43 ;...03 ;...7c ;...ae ;...10 ;*iadd {reexecute=0 rethrow=0 return_oop=0} ; - java.util.Arrays::hashCode@34 (line 4488) 0x0000011b510aa52d: inc r13d ;...41 ;...ff ;...c5 ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - java.util.Arrays::hashCode@36 (line 4487) 0x0000011b510aa530: cmp r13d,r8d ;...45 ;...3b ;...e8 0x0000011b510aa533: jnl 11b510aa539h ;...7d ;...04 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - java.util.Arrays::hashCode@19 (line 4487) 0x0000011b510aa535: mov ebx,edi ;...8b ;...df 0x0000011b510aa537: jmp 11b510aa514h ;...eb ;...db 0x0000011b510aa539: cmp ebp,ecx ;...3b ;...e9 0x0000011b510aa53b: mov r10d,ebp ;...44 ;...8b ;...d5 0x0000011b510aa53e: cmovnle r10d,ecx ;...44 ;...0f ;...4f ;...d1 0x0000011b510aa542: mov r9d,r10d ;...45 ;...8b ;...ca 0x0000011b510aa545: add r9d,0fffffff9h ;...41 ;...83 ;...c1 ;...f9 0x0000011b510aa549: mov r11d,80000000h ;...41 ;...bb ;...00 ;...00 ;...00 ;...80 0x0000011b510aa54f: cmp r10d,r9d ;...45 ;...3b ;...d1 0x0000011b510aa552: cmovl r9d,r11d ;...45 ;...0f ;...4c ;...cb 0x0000011b510aa556: cmp r13d,r9d ;...45 ;...3b ;...e9 0x0000011b510aa559: jnl 11b510aa5e7h ;...0f ;...8d ;...88 ;...00 ;...00 ;...00 0x0000011b510aa55f: nop ;...90 ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - java.util.Arrays::hashCode@25 (line 4487) 0x0000011b510aa560: movsxd rbx,r13d ;...49 ;...63 ;...dd 0x0000011b510aa563: mov r11d,edi ;...44 ;...8b ;...df 0x0000011b510aa566: shl r11d,5h ;...41 ;...c1 ;...e3 ;...05 0x0000011b510aa56a: sub r11d,edi ;...44 ;...2b ;...df 0x0000011b510aa56d: add r11d,dword ptr [r14+r13*4+10h] ;...47 ;...03 ;...5c ;...ae ;...10 0x0000011b510aa572: mov r8d,r11d ;...45 ;...8b ;...c3 0x0000011b510aa575: shl r8d,5h ;...41 ;...c1 ;...e0 ;...05 0x0000011b510aa579: sub r8d,r11d ;...45 ;...2b ;...c3 0x0000011b510aa57c: add r8d,dword ptr [r14+rbx*4+14h] ;...45 ;...03 ;...44 ;...9e ;...14 0x0000011b510aa581: mov r10d,r8d ;...45 ;...8b ;...d0 0x0000011b510aa584: shl r10d,5h ;...41 ;...c1 ;...e2 ;...05 0x0000011b510aa588: sub r10d,r8d ;...45 ;...2b ;...d0 0x0000011b510aa58b: add r10d,dword ptr [r14+rbx*4+18h] ;...45 ;...03 ;...54 ;...9e ;...18 0x0000011b510aa590: mov r11d,r10d ;...45 ;...8b ;...da 0x0000011b510aa593: shl r11d,5h ;...41 ;...c1 ;...e3 ;...05 0x0000011b510aa597: sub r11d,r10d ;...45 ;...2b ;...da 0x0000011b510aa59a: add r11d,dword ptr [r14+rbx*4+1ch] ;...45 ;...03 ;...5c ;...9e ;...1c 0x0000011b510aa59f: mov r8d,r11d ;...45 ;...8b ;...c3 0x0000011b510aa5a2: shl r8d,5h ;...41 ;...c1 ;...e0 ;...05 0x0000011b510aa5a6: sub r8d,r11d ;...45 ;...2b ;...c3 0x0000011b510aa5a9: add r8d,dword ptr [r14+rbx*4+20h] ;...45 ;...03 ;...44 ;...9e ;...20 0x0000011b510aa5ae: mov r11d,r8d ;...45 ;...8b ;...d8 0x0000011b510aa5b1: shl r11d,5h ;...41 ;...c1 ;...e3 ;...05 0x0000011b510aa5b5: sub r11d,r8d ;...45 ;...2b ;...d8 0x0000011b510aa5b8: add r11d,dword ptr [r14+rbx*4+24h] ;...45 ;...03 ;...5c ;...9e ;...24 0x0000011b510aa5bd: mov r8d,r11d ;...45 ;...8b ;...c3 0x0000011b510aa5c0: shl r8d,5h ;...41 ;...c1 ;...e0 ;...05 0x0000011b510aa5c4: sub r8d,r11d ;...45 ;...2b ;...c3 0x0000011b510aa5c7: add r8d,dword ptr [r14+rbx*4+28h] ;...45 ;...03 ;...44 ;...9e ;...28 0x0000011b510aa5cc: mov edi,r8d ;...41 ;...8b ;...f8 0x0000011b510aa5cf: shl edi,5h ;...c1 ;...e7 ;...05 0x0000011b510aa5d2: sub edi,r8d ;...41 ;...2b ;...f8 0x0000011b510aa5d5: add edi,dword ptr [r14+rbx*4+2ch] ;...41 ;...03 ;...7c ;...9e ;...2c ;*iadd {reexecute=0 rethrow=0 return_oop=0} ; - java.util.Arrays::hashCode@34 (line 4488) 0x0000011b510aa5da: add r13d,8h ;...41 ;...83 ;...c5 ;...08 ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - java.util.Arrays::hashCode@36 (line 4487) 0x0000011b510aa5de: cmp r13d,r9d ;...45 ;...3b ;...e9 0x0000011b510aa5e1: jl 11b510aa560h ;...0f ;...8c ;...79 ;...ff ;...ff ;...ff ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - java.util.Arrays::hashCode@19 (line 4487) 0x0000011b510aa5e7: cmp r13d,ebp ;...44 ;...3b ;...ed 0x0000011b510aa5ea: jnl 11b510aa609h ;...7d ;...1d ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - java.util.Arrays::hashCode@25 (line 4487) 0x0000011b510aa5ec: cmp r13d,ecx ;...44 ;...3b ;...e9 0x0000011b510aa5ef: jnb 11b510aa623h ;...73 ;...32 0x0000011b510aa5f1: mov ebx,edi ;...8b ;...df 0x0000011b510aa5f3: shl ebx,5h ;...c1 ;...e3 ;...05 0x0000011b510aa5f6: sub ebx,edi ;...2b ;...df 0x0000011b510aa5f8: add ebx,dword ptr [r14+r13*4+10h] ;...43 ;...03 ;...5c ;...ae ;...10 ;*iadd {reexecute=0 rethrow=0 return_oop=0} ; - java.util.Arrays::hashCode@34 (line 4488) 0x0000011b510aa5fd: inc r13d ;...41 ;...ff ;...c5 ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - java.util.Arrays::hashCode@36 (line 4487) 0x0000011b510aa600: cmp r13d,ebp ;...44 ;...3b ;...ed 0x0000011b510aa603: jnl 11b510aa60bh ;...7d ;...06 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - java.util.Arrays::hashCode@19 (line 4487) 0x0000011b510aa605: mov edi,ebx ;...8b ;...fb 0x0000011b510aa607: jmp 11b510aa5ech ;...eb ;...e3 0x0000011b510aa609: mov ebx,edi ;...8b ;...df ;*iload {reexecute=0 rethrow=0 return_oop=0} ; - java.util.Arrays::hashCode@16 (line 4487) 0x0000011b510aa60b: mov eax,ebx ;...8b ;...c3 0x0000011b510aa60d: add rsp,50h ;...48 ;...83 ;...c4 ;...50 0x0000011b510aa611: pop rbp ;...5d 0x0000011b510aa612: test dword ptr [11b3d560000h],eax ;...85 ;...05 ;...e8 ;...59 ;...4b ;...ec ; {poll_return} 0x0000011b510aa618: ret ;...c3Your code:
com/openkappa/simd/hashcode/HashCode.Unrolled(Lcom/openkappa/simd/state/HashCodeData;)I [0x0000026c2ec7e160, 0x0000026c2ec7e478] 792 bytes Argument 0 is unknown.RIP: 0x26c2ec7e160 Code size: 0x00000318 [Entry Point] [Constants] # {method} {0x0000026c4b3108a0} 'Unrolled' '(Lcom/openkappa/simd/state/HashCodeData;)I' in 'com/openkappa/simd/hashcode/HashCode' # this: rdx:rdx = 'com/openkappa/simd/hashcode/HashCode' # parm0: r8:r8 = 'com/openkappa/simd/state/HashCodeData' # [sp+0x50] (sp of caller) 0x0000026c2ec7e160: mov r10d,dword ptr [rdx+8h] ;...44 ;...8b ;...52 ;...08 0x0000026c2ec7e164: shl r10,3h ;...49 ;...c1 ;...e2 ;...03 0x0000026c2ec7e168: cmp r10,rax ;...4c ;...3b ;...d0 0x0000026c2ec7e16b: jne 26c2e6fc200h ;...0f ;...85 ;...8f ;...e0 ;...a7 ;...ff ; {runtime_call ic_miss_stub} 0x0000026c2ec7e171: nop word ptr [rax+rax+0h] ;...66 ;...66 ;...66 ;...0f ;...1f ;...84 ;...00 ;...00 ;...00 ;...00 ;...00 0x0000026c2ec7e17c: nop ;...66 ;...66 ;...66 ;...90 [Verified Entry Point] 0x0000026c2ec7e180: mov dword ptr [rsp+0ffffffffffff9000h],eax ;...89 ;...84 ;...24 ;...00 ;...90 ;...ff ;...ff 0x0000026c2ec7e187: push rbp ;...55 0x0000026c2ec7e188: sub rsp,40h ;...48 ;...83 ;...ec ;...40 0x0000026c2ec7e18c: mov rax,26c4b3357a8h ;...48 ;...b8 ;...a8 ;...57 ;...33 ;...4b ;...6c ;...02 ;...00 ;...00 0x0000026c2ec7e196: mov esi,dword ptr [rax+10h] ;...8b ;...70 ;...10 0x0000026c2ec7e199: add esi,8h ;...83 ;...c6 ;...08 0x0000026c2ec7e19c: mov dword ptr [rax+10h],esi ;...89 ;...70 ;...10 0x0000026c2ec7e19f: and esi,3ff8h ;...81 ;...e6 ;...f8 ;...3f ;...00 ;...00 0x0000026c2ec7e1a5: cmp esi,0h ;...83 ;...fe ;...00 0x0000026c2ec7e1a8: je 26c2ec7e2f8h ;...0f ;...84 ;...4a ;...01 ;...00 ;...00 ;*aload_1 {reexecute=0 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@0 (line 17) 0x0000026c2ec7e1ae: mov eax,dword ptr [r8+14h] ;...41 ;...8b ;...40 ;...14 ; implicit exception: dispatches to 0x0000026c2ec7e319 0x0000026c2ec7e1b2: shl rax,3h ;...48 ;...c1 ;...e0 ;...03 ;*getfield data {reexecute=0 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@1 (line 17) 0x0000026c2ec7e1b6: cmp rax,0h ;...48 ;...83 ;...f8 ;...00 0x0000026c2ec7e1ba: je 26c2ec7e2e7h ;...0f ;...84 ;...27 ;...01 ;...00 ;...00 ;*ifnonnull {reexecute=0 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@6 (line 18) 0x0000026c2ec7e1c0: mov esi,0h ;...be ;...00 ;...00 ;...00 ;...00 0x0000026c2ec7e1c5: mov edi,1h ;...bf ;...01 ;...00 ;...00 ;...00 0x0000026c2ec7e1ca: jmp 26c2ec7e27ah ;...e9 ;...ab ;...00 ;...00 ;...00 ;*iload {reexecute=0 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@16 (line 23) 0x0000026c2ec7e1cf: nop ;...90 0x0000026c2ec7e1d0: movsxd rdx,esi ;...48 ;...63 ;...d6 0x0000026c2ec7e1d3: cmp esi,dword ptr [rax+0ch] ;...3b ;...70 ;...0c 0x0000026c2ec7e1d6: jnb 26c2ec7e31eh ;...0f ;...83 ;...42 ;...01 ;...00 ;...00 0x0000026c2ec7e1dc: mov edx,dword ptr [rax+rdx*4+10h] ;...8b ;...54 ;...90 ;...10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@35 (line 24) 0x0000026c2ec7e1e0: mov rcx,rsi ;...48 ;...8b ;...ce 0x0000026c2ec7e1e3: inc ecx ;...ff ;...c1 0x0000026c2ec7e1e5: movsxd r8,ecx ;...4c ;...63 ;...c1 0x0000026c2ec7e1e8: cmp ecx,dword ptr [rax+0ch] ;...3b ;...48 ;...0c 0x0000026c2ec7e1eb: jnb 26c2ec7e327h ;...0f ;...83 ;...36 ;...01 ;...00 ;...00 0x0000026c2ec7e1f1: mov ecx,dword ptr [rax+r8*4+10h] ;...42 ;...8b ;...4c ;...80 ;...10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@46 (line 24) 0x0000026c2ec7e1f6: mov r8,rsi ;...4c ;...8b ;...c6 0x0000026c2ec7e1f9: add r8d,2h ;...41 ;...83 ;...c0 ;...02 0x0000026c2ec7e1fd: movsxd r9,r8d ;...4d ;...63 ;...c8 0x0000026c2ec7e200: cmp r8d,dword ptr [rax+0ch] ;...44 ;...3b ;...40 ;...0c 0x0000026c2ec7e204: jnb 26c2ec7e330h ;...0f ;...83 ;...26 ;...01 ;...00 ;...00 0x0000026c2ec7e20a: mov r8d,dword ptr [rax+r9*4+10h] ;...46 ;...8b ;...44 ;...88 ;...10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@56 (line 24) 0x0000026c2ec7e20f: movsxd r9,ebx ;...4c ;...63 ;...cb 0x0000026c2ec7e212: cmp ebx,dword ptr [rax+0ch] ;...3b ;...58 ;...0c 0x0000026c2ec7e215: jnb 26c2ec7e339h ;...0f ;...83 ;...1e ;...01 ;...00 ;...00 0x0000026c2ec7e21b: mov ebx,dword ptr [rax+r9*4+10h] ;...42 ;...8b ;...5c ;...88 ;...10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@64 (line 24) 0x0000026c2ec7e220: mov r9d,0e1781h ;...41 ;...b9 ;...81 ;...17 ;...0e ;...00 0x0000026c2ec7e226: imul edi,r9d ;...41 ;...0f ;...af ;...f9 0x0000026c2ec7e22a: mov r9d,745fh ;...41 ;...b9 ;...5f ;...74 ;...00 ;...00 0x0000026c2ec7e230: imul edx,r9d ;...41 ;...0f ;...af ;...d1 0x0000026c2ec7e234: add edi,edx ;...03 ;...fa 0x0000026c2ec7e236: mov edx,3c1h ;...ba ;...c1 ;...03 ;...00 ;...00 0x0000026c2ec7e23b: imul ecx,edx ;...0f ;...af ;...ca 0x0000026c2ec7e23e: add edi,ecx ;...03 ;...f9 0x0000026c2ec7e240: mov rdx,r8 ;...49 ;...8b ;...d0 0x0000026c2ec7e243: shl r8d,5h ;...41 ;...c1 ;...e0 ;...05 0x0000026c2ec7e247: sub r8d,edx ;...44 ;...2b ;...c2 0x0000026c2ec7e24a: add edi,r8d ;...41 ;...03 ;...f8 0x0000026c2ec7e24d: add edi,ebx ;...03 ;...fb 0x0000026c2ec7e24f: add esi,4h ;...83 ;...c6 ;...04 0x0000026c2ec7e252: mov rbx,26c4b3357a8h ;...48 ;...bb ;...a8 ;...57 ;...33 ;...4b ;...6c ;...02 ;...00 ;...00 0x0000026c2ec7e25c: mov edx,dword ptr [rbx+14h] ;...8b ;...53 ;...14 0x0000026c2ec7e25f: add edx,8h ;...83 ;...c2 ;...08 0x0000026c2ec7e262: mov dword ptr [rbx+14h],edx ;...89 ;...53 ;...14 0x0000026c2ec7e265: and edx,1fff8h ;...81 ;...e2 ;...f8 ;...ff ;...01 ;...00 0x0000026c2ec7e26b: cmp edx,0h ;...83 ;...fa ;...00 0x0000026c2ec7e26e: je 26c2ec7e342h ;...0f ;...84 ;...ce ;...00 ;...00 ;...00 ; ImmutableOopMap{rax=Oop } ;*goto {reexecute=1 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@70 (line 23) 0x0000026c2ec7e274: test dword ptr [26c22620000h],eax ;...85 ;...05 ;...86 ;...1d ;...9a ;...f3 ;*goto {reexecute=0 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@70 (line 23) ; {poll} 0x0000026c2ec7e27a: mov rbx,rsi ;...48 ;...8b ;...de 0x0000026c2ec7e27d: add ebx,3h ;...83 ;...c3 ;...03 0x0000026c2ec7e280: mov edx,dword ptr [rax+0ch] ;...8b ;...50 ;...0c ;*arraylength {reexecute=0 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@21 (line 23) ; implicit exception: dispatches to 0x0000026c2ec7e363 0x0000026c2ec7e283: cmp ebx,edx ;...3b ;...da 0x0000026c2ec7e285: jnl 26c2ec7e2d4h ;...0f ;...8d ;...49 ;...00 ;...00 ;...00 0x0000026c2ec7e28b: jmp 26c2ec7e1d0h ;...e9 ;...40 ;...ff ;...ff ;...ff ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@22 (line 23) 0x0000026c2ec7e290: movsxd rbx,esi ;...48 ;...63 ;...de 0x0000026c2ec7e293: cmp esi,dword ptr [rax+0ch] ;...3b ;...70 ;...0c 0x0000026c2ec7e296: jnb 26c2ec7e368h ;...0f ;...83 ;...cc ;...00 ;...00 ;...00 0x0000026c2ec7e29c: mov ebx,dword ptr [rax+rbx*4+10h] ;...8b ;...5c ;...98 ;...10 ;*iaload {reexecute=0 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@87 (line 31) 0x0000026c2ec7e2a0: mov rcx,rdi ;...48 ;...8b ;...cf 0x0000026c2ec7e2a3: shl edi,5h ;...c1 ;...e7 ;...05 0x0000026c2ec7e2a6: sub edi,ecx ;...2b ;...f9 0x0000026c2ec7e2a8: add edi,ebx ;...03 ;...fb 0x0000026c2ec7e2aa: inc esi ;...ff ;...c6 0x0000026c2ec7e2ac: mov rbx,26c4b3357a8h ;...48 ;...bb ;...a8 ;...57 ;...33 ;...4b ;...6c ;...02 ;...00 ;...00 0x0000026c2ec7e2b6: mov ecx,dword ptr [rbx+14h] ;...8b ;...4b ;...14 0x0000026c2ec7e2b9: add ecx,8h ;...83 ;...c1 ;...08 0x0000026c2ec7e2bc: mov dword ptr [rbx+14h],ecx ;...89 ;...4b ;...14 0x0000026c2ec7e2bf: and ecx,1fff8h ;...81 ;...e1 ;...f8 ;...ff ;...01 ;...00 0x0000026c2ec7e2c5: cmp ecx,0h ;...83 ;...f9 ;...00 0x0000026c2ec7e2c8: je 26c2ec7e371h ;...0f ;...84 ;...a3 ;...00 ;...00 ;...00 ; ImmutableOopMap{rax=Oop } ;*goto {reexecute=1 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@93 (line 30) 0x0000026c2ec7e2ce: test dword ptr [26c22620000h],eax ;...85 ;...05 ;...2c ;...1d ;...9a ;...f3 ;*goto {reexecute=0 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@93 (line 30) ; {poll} 0x0000026c2ec7e2d4: cmp esi,edx ;...3b ;...f2 0x0000026c2ec7e2d6: jl 26c2ec7e290h ;...7c ;...b8 ;*if_icmpge {reexecute=0 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@77 (line 30) 0x0000026c2ec7e2d8: mov rax,rdi ;...48 ;...8b ;...c7 0x0000026c2ec7e2db: add rsp,40h ;...48 ;...83 ;...c4 ;...40 0x0000026c2ec7e2df: pop rbp ;...5d 0x0000026c2ec7e2e0: test dword ptr [26c22620000h],eax ;...85 ;...05 ;...1a ;...1d ;...9a ;...f3 ; {poll_return} 0x0000026c2ec7e2e6: ret ;...c3 ;*ireturn {reexecute=0 rethrow=0 return_oop=0} ; - com.openkappa.simd.hashcode.HashCode::Unrolled@97 (line 33)Your code does multiply, whereas Arrays.hashCode shifts and subtracts. I haven’t thought about why multiplication would be better, maybe there’s more pipelining (Arrays.hashCode doesn’t get unrolled, look at the control statements that have not been elided.)
Thanks for posting this result. I’ve created a JDK RFE for further investigation (https://bugs.openjdk.java.net/browse/JDK-8186203).
Thanks! The precise wording there makes me sound like a bit of a smart Alec… this wasn’t the intent, which was more: “look at this crop circle.” I’m a big fan of the JVM and the great optimisations which have gone into version 9.
Understood — I didn’t mean to ascribe any particular tone to you, so I’ve edited the wording to be less provocative. Thanks again!