# How much Algebra does C2 Know? Part 2: Distributivity

In part one of this series of posts, I looked at how important associativity and independence are for fast loops. C2 seems to utilise these properties to generate unrolled and pipelined machine code for loops, achieving higher throughput even in cases where the kernel of the loop is 3x slower according to vendor advertised instruction throughputs. C2 has a weird and wonderful relationship with distributivity, and hints from the programmer can both and help hinder the generation of good quality machine code.

### Viability and Correctness

Distributivity is the simple notion of factoring out brackets. Is this, in general, a viable loop rewrite strategy? This can be utilised to transform the method `Scale` into `FactoredScale`, both of which perform floating point arithmetic:

``````
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
@Benchmark
public double Scale(DoubleData state) {
double value = 0D;
double[] data = state.data1;
for (int i = 0; i < data.length; ++i) {
value += 3.14159 * data[i];
}
return value;
}

@CompilerControl(CompilerControl.Mode.DONT_INLINE)
@Benchmark
public double FactoredScale(DoubleData state) {
double value = 0D;
double[] data = state.data1;
for (int i = 0; i < data.length; ++i) {
value += data[i];
}
return 3.14159 * value;
}
``````

Running the project at github with the argument `--include .*scale.*`, there may be a performance gain to be had from this rewrite, but it isn’t clear cut:

Benchmark Mode Threads Samples Score Score Error (99.9%) Unit Param: size
FactoredScale thrpt 1 10 7.011606 0.274742 ops/ms 100000
FactoredScale thrpt 1 10 0.621515 0.026853 ops/ms 1000000
Scale thrpt 1 10 6.962434 0.240180 ops/ms 100000
Scale thrpt 1 10 0.671042 0.011686 ops/ms 1000000

With the real numbers it would be completely valid, but floating point arithmetic is not associative. Joseph Darcy explains why in this deep dive on floating point semantics. Broken associativity of addition entails broken distributivity of any operation over it, so the two loops are not equivalent, and they give different outputs (e.g. 15662.513298516365 vs 15662.51329851632 for one sample input). The rewrite isn’t correct even for floating point data, so it isn’t an optimisation that could be applied in good faith, except in a very small number of cases. You have to rewrite the loop yourself and figure out if the small but inevitable differences are acceptable.

### Counterintuitive Performance

Integer multiplication is distributive over addition, and we can check if C2 does this rewrite by running the same code with 32 bit integer values, for now fixing a scale factor of 10 (which seems like an innocuous value, no?)

``````
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
@Benchmark
public int Scale_Int(IntData state) {
int value = 0;
int[] data = state.data1;
for (int i = 0; i < data.length; ++i) {
value += 10 * data[i];
}
return value;
}

@CompilerControl(CompilerControl.Mode.DONT_INLINE)
@Benchmark
public int FactoredScale_Int(IntData state) {
int value = 0;
int[] data = state.data1;
for (int i = 0; i < data.length; ++i) {
value += data[i];
}
return 10 * value;
}
``````

The results are fascinating:

Benchmark Mode Threads Samples Score Score Error (99.9%) Unit Param: size
FactoredScale_Int thrpt 1 10 28.339699 0.608075 ops/ms 100000
FactoredScale_Int thrpt 1 10 2.392579 0.506413 ops/ms 1000000
Scale_Int thrpt 1 10 33.335721 0.295334 ops/ms 100000
Scale_Int thrpt 1 10 2.838242 0.448213 ops/ms 1000000

The code is doing thousands more multiplications in less time when the multiplication is not factored out of the loop. So what the devil is going on? Inspecting the assembly for the faster loop is revealing

```com/openkappa/simd/scale/Scale.Scale_Int(Lcom/openkappa/simd/state/IntData;)I  [0x000001c89e499320, 0x000001c89e4996f8]  984 bytes
Argument 0 is unknown.RIP: 0x1c89e499320 Code size: 0x000003d8
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x000001c8b3701b10} 'Scale_Int' '(Lcom/openkappa/simd/state/IntData;)I' in 'com/openkappa/simd/scale/Scale'
0x000001c89e499320: int3                      ;...cc

0x000001c89e499321: nop     word ptr [rax+rax+0h]  ;...66
;...66
;...66
;...0f
;...1f
;...84
;...00
;...00
;...00
;...00
;...00

0x000001c89e49932c: nop                       ;...66
;...66
;...66
;...90

0x000001c89e499330: mov     dword ptr [rsp+0ffffffffffff9000h],eax
;...89
;...84
;...24
;...00
;...90
;...ff
;...ff

0x000001c89e499337: push    rbp               ;...55

0x000001c89e499338: sub     rsp,40h           ;...48
;...83
;...ec
;...40

0x000001c89e49933c: mov     rbp,qword ptr [rdx+8h]  ;...48
;...8b
;...6a
;...08

0x000001c89e499340: mov     ebx,dword ptr [rdx+10h]  ;...8b
;...5a
;...10

0x000001c89e499343: mov     r13d,dword ptr [rdx]  ;...44
;...8b
;...2a

0x000001c89e499346: mov     rcx,rdx           ;...48
;...8b
;...ca

0x000001c89e499349: vzeroupper                ;...c5
;...f8
;...77

0x000001c89e49934c: mov     r10,51da8d20h     ;...49
;...ba
;...20
;...8d
;...da
;...51
;...00
;...00
;...00
;...00

0x000001c89e499356: call indirect r10         ;...41
;...ff
;...d2

0x000001c89e499359: mov     r11d,dword ptr [rbp+8h]  ;...44
;...8b
;...5d
;...08
; implicit exception: dispatches to 0x000001c89e4996c1
0x000001c89e49935d: cmp     r11d,0f800016dh   ;...41
;...81
;...fb
;...6d
;...01
;...00
;...f8
0x000001c89e499364: jne     1c89e4996a9h      ;...0f
;...85
;...3f
;...03
;...00
;...00
; - com.openkappa.simd.scale.Scale::Scale_Int@10 (line 41)

0x000001c89e49936a: mov     edi,dword ptr [rbp+0ch]  ;...8b
;...7d
;...0c
;*arraylength {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int@13 (line 41)

0x000001c89e49936d: cmp     r13d,edi          ;...44
;...3b
;...ef

0x000001c89e499370: jnl     1c89e49967ah      ;...0f
;...8d
;...04
;...03
;...00
;...00
;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int@14 (line 41)

0x000001c89e499376: mov     r10d,ebp          ;...44
;...8b
;...d5

0x000001c89e499379: mov     r8d,r13d          ;...45
;...8b
;...c5

0x000001c89e49937c: inc     r8d               ;...41
;...ff
;...c0

0x000001c89e49937f: shr     r10d,2h           ;...41
;...c1
;...ea
;...02

0x000001c89e499383: and     r10d,7h           ;...41
;...83
;...e2
;...07

0x000001c89e499387: xor     r9d,r9d           ;...45
;...33
;...c9

0x000001c89e49938a: cmp     r8d,r9d           ;...45
;...3b
;...c1

0x000001c89e49938d: cmovl   r8d,r9d           ;...45
;...0f
;...4c
;...c1

0x000001c89e499391: cmp     r8d,edi           ;...44
;...3b
;...c7

0x000001c89e499394: cmovnle r8d,edi           ;...44
;...0f
;...4f
;...c7

;...03
;...d0

0x000001c89e49939b: mov     r11d,4h           ;...41
;...bb
;...04
;...00
;...00
;...00

0x000001c89e4993a1: sub     r11d,r10d         ;...45
;...2b
;...da

0x000001c89e4993a4: and     r11d,7h           ;...41
;...83
;...e3
;...07

;...03
;...d8

0x000001c89e4993ab: cmp     r11d,edi          ;...44
;...3b
;...df

0x000001c89e4993ae: cmovnle r11d,edi          ;...44
;...0f
;...4f
;...df

0x000001c89e4993b2: nop                       ;...66
;...90

0x000001c89e4993b4: cmp     r13d,edi          ;...44
;...3b
;...ef

0x000001c89e4993b7: jnb     1c89e49968bh      ;...0f
;...83
;...ce
;...02
;...00
;...00

0x000001c89e4993bd: mov     r10d,dword ptr [rbp+r13*4+10h]
;...46
;...8b
;...54
;...10
; - com.openkappa.simd.scale.Scale::Scale_Int@23 (line 42)

0x000001c89e4993c2: mov     r9d,r10d          ;...45
;...8b
;...ca

0x000001c89e4993c5: shl     r9d,3h            ;...41
;...c1
;...e1
;...03

0x000001c89e4993c9: shl     r10d,1h           ;...41
;...d1
;...e2

;...03
;...ca

;...03
;...d9
; - com.openkappa.simd.scale.Scale::Scale_Int@25 (line 42)

0x000001c89e4993d2: inc     r13d              ;...41
;...ff
;...c5
;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int@27 (line 41)

0x000001c89e4993d5: cmp     r13d,r11d         ;...45
;...3b
;...eb

0x000001c89e4993d8: jl      1c89e4993b4h      ;...7c
;...da
;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int@14 (line 41)

0x000001c89e4993da: mov     r8d,edi           ;...44
;...8b
;...c7

;...83
;...c0
;...c1

0x000001c89e4993e1: mov     ecx,80000000h     ;...b9
;...00
;...00
;...00
;...80

0x000001c89e4993e6: cmp     edi,r8d           ;...41
;...3b
;...f8

0x000001c89e4993e9: cmovl   r8d,ecx           ;...44
;...0f
;...4c
;...c1

0x000001c89e4993ed: cmp     r13d,r8d          ;...45
;...3b
;...e8

0x000001c89e4993f0: jnl     1c89e499651h      ;...0f
;...8d
;...5b
;...02
;...00
;...00

0x000001c89e4993f6: nop     word ptr [rax+rax+0h]  ;...66
;...66
;...0f
;...1f
;...84
;...00
;...00
;...00
;...00
;...00

0x000001c89e499400: vmovdqu ymm8,ymmword ptr [rbp+r13*4+10h]
;...c4
;...21
;...7e
;...6f
;...44
;...10

0x000001c89e499407: movsxd  r10,r13d          ;...4d
;...63
;...d5

0x000001c89e49940a: vmovdqu ymm9,ymmword ptr [rbp+r10*4+30h]
;...c4
;...21
;...7e
;...6f
;...4c
;...95
;...30

0x000001c89e499411: vmovdqu ymm13,ymmword ptr [rbp+r10*4+0f0h]
;...c4
;...21
;...7e
;...6f
;...ac
;...95
;...f0
;...00
;...00
;...00

0x000001c89e49941b: vmovdqu ymm12,ymmword ptr [rbp+r10*4+50h]
;...c4
;...21
;...7e
;...6f
;...64
;...95
;...50

0x000001c89e499422: vmovdqu ymm4,ymmword ptr [rbp+r10*4+70h]
;...c4
;...a1
;...7e
;...6f
;...64
;...95
;...70

0x000001c89e499429: vmovdqu ymm3,ymmword ptr [rbp+r10*4+90h]
;...c4
;...a1
;...7e
;...6f
;...9c
;...95
;...90
;...00
;...00
;...00

0x000001c89e499433: vmovdqu ymm2,ymmword ptr [rbp+r10*4+0b0h]
;...c4
;...a1
;...7e
;...6f
;...94
;...95
;...b0
;...00
;...00
;...00

0x000001c89e49943d: vmovdqu ymm0,ymmword ptr [rbp+r10*4+0d0h]
;...c4
;...a1
;...7e
;...6f
;...84
;...95
;...d0
;...00
;...00
;...00
; - com.openkappa.simd.scale.Scale::Scale_Int@23 (line 42)

0x000001c89e499447: vpslld  ymm11,ymm8,1h     ;...c4
;...c1
;...25
;...72
;...f0
;...01

0x000001c89e49944d: vpslld  ymm1,ymm0,1h      ;...c5
;...f5
;...72
;...f0
;...01

0x000001c89e499452: vpslld  ymm0,ymm0,3h      ;...c5
;...fd
;...72
;...f0
;...03

;...fd
;...fe
;...e9

0x000001c89e49945b: vpslld  ymm0,ymm2,3h      ;...c5
;...fd
;...72
;...f2
;...03

0x000001c89e499460: vpslld  ymm7,ymm3,3h      ;...c5
;...c5
;...72
;...f3
;...03

0x000001c89e499465: vpslld  ymm10,ymm4,3h     ;...c5
;...72
;...f4
;...03

0x000001c89e49946a: vpslld  ymm15,ymm12,3h    ;...c4
;...c1
;...05
;...72
;...f4
;...03

0x000001c89e499470: vpslld  ymm14,ymm13,3h    ;...c4
;...c1
;...0d
;...72
;...f5
;...03

0x000001c89e499476: vpslld  ymm1,ymm9,3h      ;...c4
;...c1
;...75
;...72
;...f1
;...03

0x000001c89e49947c: vpslld  ymm2,ymm2,1h      ;...c5
;...ed
;...72
;...f2
;...01

;...fd
;...fe
;...f2

0x000001c89e499485: vpslld  ymm0,ymm3,1h      ;...c5
;...fd
;...72
;...f3
;...01

;...c5
;...fe
;...f8

0x000001c89e49948e: vpslld  ymm0,ymm4,1h      ;...c5
;...fd
;...72
;...f4
;...01

;...2d
;...fe
;...d0

0x000001c89e499497: vpslld  ymm0,ymm12,1h     ;...c4
;...c1
;...7d
;...72
;...f4
;...01

;...05
;...fe
;...e0

0x000001c89e4994a1: vpslld  ymm0,ymm13,1h     ;...c4
;...c1
;...7d
;...72
;...f5
;...01

;...8d
;...fe
;...e0

0x000001c89e4994ab: vpslld  ymm0,ymm9,1h      ;...c4
;...c1
;...7d
;...72
;...f1
;...01

;...f5
;...fe
;...d0

0x000001c89e4994b5: vpslld  ymm0,ymm8,3h      ;...c4
;...c1
;...7d
;...72
;...f0
;...03

;...41
;...7d
;...fe
;...c3

;...c2
;...3d
;...02
;...c0

;...e2
;...7d
;...02
;...c3

0x000001c89e4994ca: vextracti128 xmm3,ymm0,1h  ;...c4
;...e3
;...7d
;...39
;...c3
;...01

;...f9
;...fe
;...c3

0x000001c89e4994d4: vmovd   xmm3,ebx          ;...c5
;...f9
;...6e
;...db

;...e1
;...fe
;...d8

0x000001c89e4994dc: vmovd   r10d,xmm3         ;...c4
;...c1
;...79
;...7e
;...da

;...e2
;...6d
;...02
;...c2

;...e2
;...7d
;...02
;...c3

0x000001c89e4994eb: vextracti128 xmm3,ymm0,1h  ;...c4
;...e3
;...7d
;...39
;...c3
;...01

;...f9
;...fe
;...c3

0x000001c89e4994f5: vmovd   xmm3,r10d         ;...c4
;...c1
;...79
;...6e
;...da

;...e1
;...fe
;...d8

0x000001c89e4994fe: vmovd   r11d,xmm3         ;...c4
;...c1
;...79
;...7e
;...db

;...c2
;...1d
;...02
;...d4

;...e2
;...6d
;...02
;...d0

0x000001c89e49950d: vextracti128 xmm0,ymm2,1h  ;...c4
;...e3
;...7d
;...39
;...d0
;...01

;...e9
;...fe
;...d0

0x000001c89e499517: vmovd   xmm0,r11d         ;...c4
;...c1
;...79
;...6e
;...c3

;...f9
;...fe
;...c2

0x000001c89e499520: vmovd   r10d,xmm0         ;...c4
;...c1
;...79
;...7e
;...c2

;...c2
;...2d
;...02
;...c2

;...e2
;...7d
;...02
;...c3

0x000001c89e49952f: vextracti128 xmm3,ymm0,1h  ;...c4
;...e3
;...7d
;...39
;...c3
;...01

;...f9
;...fe
;...c3

0x000001c89e499539: vmovd   xmm3,r10d         ;...c4
;...c1
;...79
;...6e
;...da

;...e1
;...fe
;...d8

0x000001c89e499542: vmovd   r11d,xmm3         ;...c4
;...c1
;...79
;...7e
;...db

;...e2
;...45
;...02
;...d7

;...e2
;...6d
;...02
;...d0

0x000001c89e499551: vextracti128 xmm0,ymm2,1h  ;...c4
;...e3
;...7d
;...39
;...d0
;...01

;...e9
;...fe
;...d0

0x000001c89e49955b: vmovd   xmm0,r11d         ;...c4
;...c1
;...79
;...6e
;...c3

;...f9
;...fe
;...c2

0x000001c89e499564: vmovd   r10d,xmm0         ;...c4
;...c1
;...79
;...7e
;...c2

;...e2
;...4d
;...02
;...c6

;...e2
;...7d
;...02
;...c3

0x000001c89e499573: vextracti128 xmm3,ymm0,1h  ;...c4
;...e3
;...7d
;...39
;...c3
;...01

;...f9
;...fe
;...c3

0x000001c89e49957d: vmovd   xmm3,r10d         ;...c4
;...c1
;...79
;...6e
;...da

;...e1
;...fe
;...d8

0x000001c89e499586: vmovd   r11d,xmm3         ;...c4
;...c1
;...79
;...7e
;...db

;...e2
;...55
;...02
;...d5

;...e2
;...6d
;...02
;...d0

0x000001c89e499595: vextracti128 xmm0,ymm2,1h  ;...c4
;...e3
;...7d
;...39
;...d0
;...01

;...e9
;...fe
;...d0

0x000001c89e49959f: vmovd   xmm0,r11d         ;...c4
;...c1
;...79
;...6e
;...c3

;...f9
;...fe
;...c2

0x000001c89e4995a8: vmovd   r10d,xmm0         ;...c4
;...c1
;...79
;...7e
;...c2

;...e2
;...5d
;...02
;...d4

;...e2
;...6d
;...02
;...d1

0x000001c89e4995b7: vextracti128 xmm1,ymm2,1h  ;...c4
;...e3
;...7d
;...39
;...d1
;...01

;...e9
;...fe
;...d1

0x000001c89e4995c1: vmovd   xmm1,r10d         ;...c4
;...c1
;...79
;...6e
;...ca

;...f1
;...fe
;...ca

0x000001c89e4995ca: vmovd   ebx,xmm1          ;...c5
;...f9
;...7e
;...cb
; - com.openkappa.simd.scale.Scale::Scale_Int@25 (line 42)

;...83
;...c5
;...40
;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int@27 (line 41)

0x000001c89e4995d2: cmp     r13d,r8d          ;...45
;...3b
;...e8

0x000001c89e4995d5: jl      1c89e499400h      ;...0f
;...8c
;...25
;...fe
;...ff
;...ff
;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int@14 (line 41)

0x000001c89e4995db: mov     r10d,edi          ;...44
;...8b
;...d7

;...83
;...c2
;...f9

0x000001c89e4995e2: cmp     edi,r10d          ;...41
;...3b
;...fa

0x000001c89e4995e5: cmovl   r10d,ecx          ;...44
;...0f
;...4c
;...d1

0x000001c89e4995e9: cmp     r13d,r10d         ;...45
;...3b
;...ea

0x000001c89e4995ec: jnl     1c89e49962eh      ;...7d
;...40

0x000001c89e4995ee: nop                       ;...66
;...90

0x000001c89e4995f0: vmovdqu ymm0,ymmword ptr [rbp+r13*4+10h]
;...c4
;...a1
;...7e
;...6f
;...44
;...10
; - com.openkappa.simd.scale.Scale::Scale_Int@23 (line 42)

0x000001c89e4995f7: vpslld  ymm1,ymm0,3h      ;...c5
;...f5
;...72
;...f0
;...03

0x000001c89e4995fc: vpslld  ymm0,ymm0,1h      ;...c5
;...fd
;...72
;...f0
;...01

;...f5
;...fe
;...c0

;...e2
;...7d
;...02
;...d8

;...e2
;...65
;...02
;...d9

0x000001c89e49960f: vextracti128 xmm1,ymm3,1h  ;...c4
;...e3
;...7d
;...39
;...d9
;...01

;...e1
;...fe
;...d9

0x000001c89e499619: vmovd   xmm1,ebx          ;...c5
;...f9
;...6e
;...cb

;...f1
;...fe
;...cb

0x000001c89e499621: vmovd   ebx,xmm1          ;...c5
;...f9
;...7e
;...cb
; - com.openkappa.simd.scale.Scale::Scale_Int@25 (line 42)

;...83
;...c5
;...08
;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int@27 (line 41)

0x000001c89e499629: cmp     r13d,r10d         ;...45
;...3b
;...ea

0x000001c89e49962c: jl      1c89e4995f0h      ;...7c
;...c2
;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int@14 (line 41)

0x000001c89e49962e: cmp     r13d,edi          ;...44
;...3b
;...ef

0x000001c89e499631: jnl     1c89e499651h      ;...7d
;...1e

0x000001c89e499633: nop                       ;...90

0x000001c89e499634: mov     r11d,dword ptr [rbp+r13*4+10h]
;...46
;...8b
;...5c
;...10
; - com.openkappa.simd.scale.Scale::Scale_Int@23 (line 42)

0x000001c89e499639: mov     r10d,r11d         ;...45
;...8b
;...d3

0x000001c89e49963c: shl     r10d,3h           ;...41
;...c1
;...e2
;...03

0x000001c89e499640: shl     r11d,1h           ;...41
;...d1
;...e3

;...03
;...d3

;...03
;...da
; - com.openkappa.simd.scale.Scale::Scale_Int@25 (line 42)

0x000001c89e499649: inc     r13d              ;...41
;...ff
;...c5
;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int@27 (line 41)

0x000001c89e49964c: cmp     r13d,edi          ;...44
;...3b
;...ef

0x000001c89e49964f: jl      1c89e499634h      ;...7c
;...e3
;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int@14 (line 41)

0x000001c89e499651: cmp     r13d,edi          ;...44
;...3b
;...ef

0x000001c89e499654: jnl     1c89e49967ah      ;...7d
;...24

0x000001c89e499656: nop                       ;...66
;...90

0x000001c89e499658: cmp     r13d,edi          ;...44
;...3b
;...ef

0x000001c89e49965b: jnb     1c89e499691h      ;...73
;...34

0x000001c89e49965d: mov     r11d,dword ptr [rbp+r13*4+10h]
;...46
;...8b
;...5c
;...10
; - com.openkappa.simd.scale.Scale::Scale_Int@23 (line 42)

0x000001c89e499662: mov     r10d,r11d         ;...45
;...8b
;...d3

0x000001c89e499665: shl     r10d,3h           ;...41
;...c1
;...e2
;...03

0x000001c89e499669: shl     r11d,1h           ;...41
;...d1
;...e3

;...03
;...d3

;...03
;...da
; - com.openkappa.simd.scale.Scale::Scale_Int@25 (line 42)

0x000001c89e499672: inc     r13d              ;...41
;...ff
;...c5
;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int@27 (line 41)

0x000001c89e499675: cmp     r13d,edi          ;...44
;...3b
;...ef

0x000001c89e499678: jl      1c89e499658h      ;...7c
;...de
; - com.openkappa.simd.scale.Scale::Scale_Int@10 (line 41)

0x000001c89e49967a: mov     eax,ebx           ;...8b
;...c3

0x000001c89e49967c: vzeroupper                ;...c5
;...f8
;...77

;...83
;...c4
;...40

0x000001c89e499683: pop     rbp               ;...5d

0x000001c89e499684: test    dword ptr [1c88a9e0000h],eax
;...85
;...05
;...76
;...69
;...54
;...ec
;   {poll_return}
0x000001c89e49968a: ret
```

The loop is aggressively unrolled, pipelined, and vectorised. Moreover, the multiplication by ten results not in a multiplication but two left shifts (see `VPSLLD`) and an addition. Note that `x << 1 + x << 3 = x * 10` and C2 seems to know it; this rewrite can be applied because it can be proven statically that the factor is always 10. The “optimised” loop doesn’t vectorise at all (and I have no idea why not – isn’t this a bug? Yes it is.)

```com/openkappa/simd/scale/Scale.FactoredScale_Int(Lcom/openkappa/simd/state/IntData;)I  [0x000002bbebeda320, 0x000002bbebeda4b8]  408 bytes
Argument 0 is unknown.RIP: 0x2bbebeda320 Code size: 0x00000198
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x000002bb81241148} 'FactoredScale_Int' '(Lcom/openkappa/simd/state/IntData;)I' in 'com/openkappa/simd/scale/Scale'
0x000002bbebeda320: int3                      ;...cc

0x000002bbebeda321: nop     word ptr [rax+rax+0h]  ;...66
;...66
;...66
;...0f
;...1f
;...84
;...00
;...00
;...00
;...00
;...00

0x000002bbebeda32c: nop                       ;...66
;...66
;...66
;...90

0x000002bbebeda330: mov     dword ptr [rsp+0ffffffffffff9000h],eax
;...89
;...84
;...24
;...00
;...90
;...ff
;...ff

0x000002bbebeda337: push    rbp               ;...55

0x000002bbebeda338: sub     rsp,40h           ;...48
;...83
;...ec
;...40

0x000002bbebeda33c: mov     rbp,qword ptr [rdx+8h]  ;...48
;...8b
;...6a
;...08

0x000002bbebeda340: mov     ebx,dword ptr [rdx+10h]  ;...8b
;...5a
;...10

0x000002bbebeda343: mov     r13d,dword ptr [rdx]  ;...44
;...8b
;...2a

0x000002bbebeda346: mov     rcx,rdx           ;...48
;...8b
;...ca

0x000002bbebeda349: mov     r10,51da8d20h     ;...49
;...ba
;...20
;...8d
;...da
;...51
;...00
;...00
;...00
;...00

0x000002bbebeda353: call indirect r10         ;...41
;...ff
;...d2

0x000002bbebeda356: mov     r10d,dword ptr [rbp+8h]  ;...44
;...8b
;...55
;...08
; implicit exception: dispatches to 0x000002bbebeda46d
0x000002bbebeda35a: cmp     r10d,0f800016dh   ;...41
;...81
;...fa
;...6d
;...01
;...00
;...f8
0x000002bbebeda361: jne     2bbebeda459h      ;...0f
;...85
;...f2
;...00
;...00
;...00
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@10 (line 52)

0x000002bbebeda367: mov     r10d,dword ptr [rbp+0ch]
;...44
;...8b
;...55
;...0c
;*arraylength {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@13 (line 52)

0x000002bbebeda36b: cmp     r13d,r10d         ;...45
;...3b
;...ea

0x000002bbebeda36e: jnl     2bbebeda422h      ;...0f
;...8d
;...ae
;...00
;...00
;...00
;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@14 (line 52)

0x000002bbebeda374: mov     r11d,r13d         ;...45
;...8b
;...dd

0x000002bbebeda377: inc     r11d              ;...41
;...ff
;...c3

0x000002bbebeda37a: xor     r8d,r8d           ;...45
;...33
;...c0

0x000002bbebeda37d: cmp     r11d,r8d          ;...45
;...3b
;...d8

0x000002bbebeda380: cmovl   r11d,r8d          ;...45
;...0f
;...4c
;...d8

0x000002bbebeda384: cmp     r11d,r10d         ;...45
;...3b
;...da

0x000002bbebeda387: cmovnle r11d,r10d         ;...45
;...0f
;...4f
;...da

0x000002bbebeda38b: nop                       ;...90
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@21 (line 53)

0x000002bbebeda38c: cmp     r13d,r10d         ;...45
;...3b
;...ea

0x000002bbebeda38f: jnb     2bbebeda43ch      ;...0f
;...83
;...a7
;...00
;...00
;...00

;...42
;...03
;...5c
;...10
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@22 (line 53)

0x000002bbebeda39a: inc     r13d              ;...41
;...ff
;...c5
;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@24 (line 52)

0x000002bbebeda39d: cmp     r13d,r11d         ;...45
;...3b
;...eb

0x000002bbebeda3a0: jl      2bbebeda38ch      ;...7c
;...ea
;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@14 (line 52)

0x000002bbebeda3a2: mov     r11d,r10d         ;...45
;...8b
;...da

;...83
;...c3
;...f9

0x000002bbebeda3a9: mov     r8d,80000000h     ;...41
;...b8
;...00
;...00
;...00
;...80

0x000002bbebeda3af: cmp     r10d,r11d         ;...45
;...3b
;...d3

0x000002bbebeda3b2: cmovl   r11d,r8d          ;...45
;...0f
;...4c
;...d8

0x000002bbebeda3b6: cmp     r13d,r11d         ;...45
;...3b
;...eb

0x000002bbebeda3b9: jnl     2bbebeda409h      ;...7d
;...4e

0x000002bbebeda3bb: nop     dword ptr [rax+rax+0h]  ;...0f
;...1f
;...44
;...00
;...00
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@21 (line 53)

;...42
;...03
;...5c
;...10

0x000002bbebeda3c5: movsxd  r8,r13d           ;...4d
;...63
;...c5

;...42
;...03
;...5c
;...85
;...14

;...42
;...03
;...5c
;...85
;...18

;...42
;...03
;...5c
;...85
;...1c

;...42
;...03
;...5c
;...85
;...20

;...42
;...03
;...5c
;...85
;...24

;...42
;...03
;...5c
;...85
;...28

;...42
;...03
;...5c
;...85
;...2c
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@22 (line 53)

;...83
;...c5
;...08
;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@24 (line 52)

0x000002bbebeda3ef: cmp     r13d,r11d         ;...45
;...3b
;...eb

0x000002bbebeda3f2: jl      2bbebeda3c0h      ;...7c
;...cc
;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@14 (line 52)

0x000002bbebeda3f4: cmp     r13d,r10d         ;...45
;...3b
;...ea

0x000002bbebeda3f7: jnl     2bbebeda409h      ;...7d
;...10

0x000002bbebeda3f9: nop                       ;...66
;...66
;...90
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@21 (line 53)

;...42
;...03
;...5c
;...10
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@22 (line 53)

0x000002bbebeda401: inc     r13d              ;...41
;...ff
;...c5
;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@24 (line 52)

0x000002bbebeda404: cmp     r13d,r10d         ;...45
;...3b
;...ea

0x000002bbebeda407: jl      2bbebeda3fch      ;...7c
;...f3
;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@14 (line 52)

0x000002bbebeda409: cmp     r13d,r10d         ;...45
;...3b
;...ea

0x000002bbebeda40c: jnl     2bbebeda422h      ;...7d
;...14

0x000002bbebeda40e: nop                       ;...66
;...90
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@21 (line 53)

0x000002bbebeda410: cmp     r13d,r10d         ;...45
;...3b
;...ea

0x000002bbebeda413: jnb     2bbebeda442h      ;...73
;...2d

;...42
;...03
;...5c
;...10
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@22 (line 53)

0x000002bbebeda41a: inc     r13d              ;...41
;...ff
;...c5
;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@24 (line 52)

0x000002bbebeda41d: cmp     r13d,r10d         ;...45
;...3b
;...ea

0x000002bbebeda420: jl      2bbebeda410h      ;...7c
;...ee
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@10 (line 52)

0x000002bbebeda422: mov     r11d,ebx          ;...44
;...8b
;...db

0x000002bbebeda425: shl     r11d,3h           ;...41
;...c1
;...e3
;...03

0x000002bbebeda429: shl     ebx,1h            ;...d1
;...e3

;...03
;...db
;*imul {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::FactoredScale_Int@33 (line 55)

0x000002bbebeda42e: mov     eax,ebx           ;...8b
;...c3

;...83
;...c4
;...40

0x000002bbebeda434: pop     rbp               ;...5d

0x000002bbebeda435: test    dword ptr [2bbd8590000h],eax
;...85
;...05
;...c5
;...5b
;...6b
;...ec
;   {poll_return}
0x000002bbebeda43b: ret                       ;...c3
```

This is a special case: data is usually dynamic and variable, so the loop cannot always be proven to be equivalent to a linear combination of bit shifts. The routine is compiled for all possible parameters, not just statically contrived cases like the one above, so you may never see this assembly in the wild. However, even with random factors, the slow looking loop is aggressively optimised in a way the hand “optimised” code is not:

``````
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
@Benchmark
public int Scale_Int_Dynamic(ScaleState state) {
int value = 0;
int[] data = state.data;
int factor = state.randomFactor();
for (int i = 0; i < data.length; ++i) {
value += factor * data[i];
}
return value;
}

@CompilerControl(CompilerControl.Mode.DONT_INLINE)
@Benchmark
public int FactoredScale_Int_Dynamic(ScaleState state) {
int value = 0;
int[] data = state.data;
int factor = state.randomFactor();
for (int i = 0; i < data.length; ++i) {
value += data[i];
}
return factor * value;
}
``````

Benchmark Mode Threads Samples Score Score Error (99.9%) Unit Param: size
FactoredScale_Int_Dynamic thrpt 1 10 26.100439 0.340069 ops/ms 100000
FactoredScale_Int_Dynamic thrpt 1 10 1.918011 0.297925 ops/ms 1000000
Scale_Int_Dynamic thrpt 1 10 30.219809 2.977389 ops/ms 100000
Scale_Int_Dynamic thrpt 1 10 2.314159 0.378442 ops/ms 1000000

Far from seeking to exploit distributivity to reduce the number of multiplication instructions, it seems to almost embrace the extraneous operations as metadata to drive optimisations. The assembly for `Scale_Int_Dynamic` confirms this (it shows vectorised multiplication, not shifts, within the loop):

```com/openkappa/simd/scale/Scale.Scale_Int_Dynamic(Lcom/openkappa/simd/scale/ScaleState;)I  [0x000001f5ca2fa120, 0x000001f5ca2fa498]  888 bytes
Argument 0 is unknown.RIP: 0x1f5ca2fa120 Code size: 0x00000378
[Entry Point]
[Verified Entry Point]
[Constants]
# {method} {0x000001f5df561a20} 'Scale_Int_Dynamic' '(Lcom/openkappa/simd/scale/ScaleState;)I' in 'com/openkappa/simd/scale/Scale'
0x000001f5ca2fa120: int3                      ;...cc

0x000001f5ca2fa121: nop     word ptr [rax+rax+0h]  ;...66
;...66
;...66
;...0f
;...1f
;...84
;...00
;...00
;...00
;...00
;...00

0x000001f5ca2fa12c: nop                       ;...66
;...66
;...66
;...90

0x000001f5ca2fa130: mov     dword ptr [rsp+0ffffffffffff9000h],eax
;...89
;...84
;...24
;...00
;...90
;...ff
;...ff

0x000001f5ca2fa137: push    rbp               ;...55

0x000001f5ca2fa138: sub     rsp,50h           ;...48
;...83
;...ec
;...50

0x000001f5ca2fa13c: mov     r13,qword ptr [rdx+10h]  ;...4c
;...8b
;...6a
;...10

0x000001f5ca2fa140: mov     ebx,dword ptr [rdx+18h]  ;...8b
;...5a
;...18

0x000001f5ca2fa143: mov     ebp,dword ptr [rdx+8h]  ;...8b
;...6a
;...08

0x000001f5ca2fa146: mov     r14d,dword ptr [rdx]  ;...44
;...8b
;...32

0x000001f5ca2fa149: mov     rcx,rdx           ;...48
;...8b
;...ca

0x000001f5ca2fa14c: vzeroupper                ;...c5
;...f8
;...77

0x000001f5ca2fa14f: mov     r10,51da8d20h     ;...49
;...ba
;...20
;...8d
;...da
;...51
;...00
;...00
;...00
;...00

0x000001f5ca2fa159: call indirect r10         ;...41
;...ff
;...d2

0x000001f5ca2fa15c: mov     r10d,dword ptr [r13+8h]  ;...45
;...8b
;...55
;...08
; implicit exception: dispatches to 0x000001f5ca2fa461
0x000001f5ca2fa160: cmp     r10d,0f800016dh   ;...41
;...81
;...fa
;...6d
;...01
;...00
;...f8
0x000001f5ca2fa167: jne     1f5ca2fa445h      ;...0f
;...85
;...d8
;...02
;...00
;...00
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@16 (line 64)

0x000001f5ca2fa16d: mov     edi,dword ptr [r13+0ch]  ;...41
;...8b
;...7d
;...0c
;*arraylength {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@19 (line 64)

0x000001f5ca2fa171: cmp     r14d,edi          ;...44
;...3b
;...f7

0x000001f5ca2fa174: jnl     1f5ca2fa411h      ;...0f
;...8d
;...97
;...02
;...00
;...00
;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@20 (line 64)

0x000001f5ca2fa17a: mov     r11d,r13d         ;...45
;...8b
;...dd

0x000001f5ca2fa17d: mov     r10d,r14d         ;...45
;...8b
;...d6

0x000001f5ca2fa180: inc     r10d              ;...41
;...ff
;...c2

0x000001f5ca2fa183: shr     r11d,2h           ;...41
;...c1
;...eb
;...02

0x000001f5ca2fa187: and     r11d,7h           ;...41
;...83
;...e3
;...07

0x000001f5ca2fa18b: xor     r8d,r8d           ;...45
;...33
;...c0

0x000001f5ca2fa18e: cmp     r10d,r8d          ;...45
;...3b
;...d0

0x000001f5ca2fa191: cmovl   r10d,r8d          ;...45
;...0f
;...4c
;...d0

0x000001f5ca2fa195: cmp     r10d,edi          ;...44
;...3b
;...d7

0x000001f5ca2fa198: cmovnle r10d,edi          ;...44
;...0f
;...4f
;...d7

;...03
;...da

0x000001f5ca2fa19f: mov     r9d,4h            ;...41
;...b9
;...04
;...00
;...00
;...00

0x000001f5ca2fa1a5: sub     r9d,r11d          ;...45
;...2b
;...cb

0x000001f5ca2fa1a8: and     r9d,7h            ;...41
;...83
;...e1
;...07

;...03
;...ca

0x000001f5ca2fa1af: cmp     r9d,edi           ;...44
;...3b
;...cf

0x000001f5ca2fa1b2: cmovnle r9d,edi           ;...44
;...0f
;...4f
;...cf
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@29 (line 65)

0x000001f5ca2fa1b6: cmp     r14d,edi          ;...44
;...3b
;...f7

0x000001f5ca2fa1b9: jnb     1f5ca2fa422h      ;...0f
;...83
;...63
;...02
;...00
;...00

0x000001f5ca2fa1bf: mov     r11d,ebp          ;...44
;...8b
;...dd

0x000001f5ca2fa1c2: imul    r11d,dword ptr [r13+r14*4+10h]
;...47
;...0f
;...af
;...5c
;...b5
;...10

;...03
;...db
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@31 (line 65)

0x000001f5ca2fa1cb: inc     r14d              ;...41
;...ff
;...c6
;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@33 (line 64)

0x000001f5ca2fa1ce: cmp     r14d,r9d          ;...45
;...3b
;...f1

0x000001f5ca2fa1d1: jl      1f5ca2fa1b6h      ;...7c
;...e3
;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@20 (line 64)

0x000001f5ca2fa1d3: mov     r9d,edi           ;...44
;...8b
;...cf

;...83
;...c1
;...c1

0x000001f5ca2fa1da: mov     r8d,80000000h     ;...41
;...b8
;...00
;...00
;...00
;...80

0x000001f5ca2fa1e0: cmp     edi,r9d           ;...41
;...3b
;...f9

0x000001f5ca2fa1e3: cmovl   r9d,r8d           ;...45
;...0f
;...4c
;...c8

0x000001f5ca2fa1e7: cmp     r14d,r9d          ;...45
;...3b
;...f1

0x000001f5ca2fa1ea: jnl     1f5ca2fa3f0h      ;...0f
;...8d
;...00
;...02
;...00
;...00

0x000001f5ca2fa1f0: vmovd   xmm2,ebp          ;...c5
;...f9
;...6e
;...d5

0x000001f5ca2fa1f4: vpshufd xmm2,xmm2,0h      ;...c5
;...f9
;...70
;...d2
;...00

0x000001f5ca2fa1f9: vinserti128 ymm2,ymm2,xmm2,1h  ;...c4
;...e3
;...6d
;...38
;...d2
;...01

0x000001f5ca2fa1ff: nop                       ;...90
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@29 (line 65)

0x000001f5ca2fa200: vmovdqu ymm0,ymmword ptr [r13+r14*4+10h]
;...c4
;...81
;...7e
;...6f
;...44
;...b5
;...10

0x000001f5ca2fa207: vpmulld ymm11,ymm0,ymm2   ;...c4
;...62
;...7d
;...40
;...da

0x000001f5ca2fa20c: movsxd  r10,r14d          ;...4d
;...63
;...d6

0x000001f5ca2fa20f: vmovdqu ymm0,ymmword ptr [r13+r10*4+30h]
;...c4
;...81
;...7e
;...6f
;...44
;...95
;...30

0x000001f5ca2fa216: vmovdqu ymm1,ymmword ptr [r13+r10*4+0f0h]
;...c4
;...81
;...7e
;...6f
;...8c
;...95
;...f0
;...00
;...00
;...00

0x000001f5ca2fa220: vmovdqu ymm3,ymmword ptr [r13+r10*4+50h]
;...c4
;...81
;...7e
;...6f
;...5c
;...95
;...50

0x000001f5ca2fa227: vmovdqu ymm7,ymmword ptr [r13+r10*4+70h]
;...c4
;...81
;...7e
;...6f
;...7c
;...95
;...70

0x000001f5ca2fa22e: vmovdqu ymm6,ymmword ptr [r13+r10*4+90h]
;...c4
;...81
;...7e
;...6f
;...b4
;...95
;...90
;...00
;...00
;...00

0x000001f5ca2fa238: vmovdqu ymm5,ymmword ptr [r13+r10*4+0b0h]
;...c4
;...81
;...7e
;...6f
;...ac
;...95
;...b0
;...00
;...00
;...00

0x000001f5ca2fa242: vmovdqu ymm4,ymmword ptr [r13+r10*4+0d0h]
;...c4
;...81
;...7e
;...6f
;...a4
;...95
;...d0
;...00
;...00
;...00

0x000001f5ca2fa24c: vpmulld ymm9,ymm0,ymm2    ;...c4
;...62
;...7d
;...40
;...ca

0x000001f5ca2fa251: vpmulld ymm4,ymm4,ymm2    ;...c4
;...e2
;...5d
;...40
;...e2

0x000001f5ca2fa256: vpmulld ymm5,ymm5,ymm2    ;...c4
;...e2
;...55
;...40
;...ea

0x000001f5ca2fa25b: vpmulld ymm6,ymm6,ymm2    ;...c4
;...e2
;...4d
;...40
;...f2

0x000001f5ca2fa260: vpmulld ymm8,ymm7,ymm2    ;...c4
;...62
;...45
;...40
;...c2

0x000001f5ca2fa265: vpmulld ymm10,ymm3,ymm2   ;...c4
;...62
;...65
;...40
;...d2

0x000001f5ca2fa26a: vpmulld ymm3,ymm1,ymm2    ;...c4
;...e2
;...75
;...40
;...da

;...c2
;...25
;...02
;...cb

;...e2
;...75
;...02
;...c8

0x000001f5ca2fa279: vextracti128 xmm0,ymm1,1h  ;...c4
;...e3
;...7d
;...39
;...c8
;...01

;...f1
;...fe
;...c8

0x000001f5ca2fa283: vmovd   xmm0,ebx          ;...c5
;...f9
;...6e
;...c3

;...f9
;...fe
;...c1

0x000001f5ca2fa28b: vmovd   r10d,xmm0         ;...c4
;...c1
;...79
;...7e
;...c2

;...c2
;...35
;...02
;...c9

;...e2
;...75
;...02
;...c8

0x000001f5ca2fa29a: vextracti128 xmm0,ymm1,1h  ;...c4
;...e3
;...7d
;...39
;...c8
;...01

;...f1
;...fe
;...c8

0x000001f5ca2fa2a4: vmovd   xmm0,r10d         ;...c4
;...c1
;...79
;...6e
;...c2

;...f9
;...fe
;...c1

;...c1
;...79
;...7e
;...c3

;...c2
;...2d
;...02
;...c2

;...e2
;...7d
;...02
;...c1

0x000001f5ca2fa2bc: vextracti128 xmm1,ymm0,1h  ;...c4
;...e3
;...7d
;...39
;...c1
;...01

;...f9
;...fe
;...c1

0x000001f5ca2fa2c6: vmovd   xmm1,r11d         ;...c4
;...c1
;...79
;...6e
;...cb

;...f1
;...fe
;...c8

0x000001f5ca2fa2cf: vmovd   r10d,xmm1         ;...c4
;...c1
;...79
;...7e
;...ca

;...c2
;...3d
;...02
;...c8

;...e2
;...75
;...02
;...c8

0x000001f5ca2fa2de: vextracti128 xmm0,ymm1,1h  ;...c4
;...e3
;...7d
;...39
;...c8
;...01

;...f1
;...fe
;...c8

0x000001f5ca2fa2e8: vmovd   xmm0,r10d         ;...c4
;...c1
;...79
;...6e
;...c2

;...f9
;...fe
;...c1

0x000001f5ca2fa2f1: vmovd   r11d,xmm0         ;...c4
;...c1
;...79
;...7e
;...c3

;...e2
;...4d
;...02
;...c6

;...e2
;...7d
;...02
;...c1

0x000001f5ca2fa300: vextracti128 xmm1,ymm0,1h  ;...c4
;...e3
;...7d
;...39
;...c1
;...01

;...f9
;...fe
;...c1

0x000001f5ca2fa30a: vmovd   xmm1,r11d         ;...c4
;...c1
;...79
;...6e
;...cb

;...f1
;...fe
;...c8

0x000001f5ca2fa313: vmovd   r10d,xmm1         ;...c4
;...c1
;...79
;...7e
;...ca

;...e2
;...55
;...02
;...cd

;...e2
;...75
;...02
;...c8

0x000001f5ca2fa322: vextracti128 xmm0,ymm1,1h  ;...c4
;...e3
;...7d
;...39
;...c8
;...01

;...f1
;...fe
;...c8

0x000001f5ca2fa32c: vmovd   xmm0,r10d         ;...c4
;...c1
;...79
;...6e
;...c2

;...f9
;...fe
;...c1

0x000001f5ca2fa335: vmovd   r11d,xmm0         ;...c4
;...c1
;...79
;...7e
;...c3

;...e2
;...5d
;...02
;...c4

;...e2
;...7d
;...02
;...c1

0x000001f5ca2fa344: vextracti128 xmm1,ymm0,1h  ;...c4
;...e3
;...7d
;...39
;...c1
;...01

;...f9
;...fe
;...c1

0x000001f5ca2fa34e: vmovd   xmm1,r11d         ;...c4
;...c1
;...79
;...6e
;...cb

;...f1
;...fe
;...c8

0x000001f5ca2fa357: vmovd   r10d,xmm1         ;...c4
;...c1
;...79
;...7e
;...ca

;...e2
;...65
;...02
;...cb

;...e2
;...75
;...02
;...cf

0x000001f5ca2fa366: vextracti128 xmm7,ymm1,1h  ;...c4
;...e3
;...7d
;...39
;...cf
;...01

;...f1
;...fe
;...cf

0x000001f5ca2fa370: vmovd   xmm7,r10d         ;...c4
;...c1
;...79
;...6e
;...fa

;...c1
;...fe
;...f9

0x000001f5ca2fa379: vmovd   ebx,xmm7          ;...c5
;...f9
;...7e
;...fb
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@31 (line 65)

;...83
;...c6
;...40
;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@33 (line 64)

0x000001f5ca2fa381: cmp     r14d,r9d          ;...45
;...3b
;...f1

0x000001f5ca2fa384: jl      1f5ca2fa200h      ;...0f
;...8c
;...76
;...fe
;...ff
;...ff
;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@20 (line 64)

0x000001f5ca2fa38a: mov     r11d,edi          ;...44
;...8b
;...df

;...83
;...c3
;...f9

0x000001f5ca2fa391: cmp     edi,r11d          ;...41
;...3b
;...fb

0x000001f5ca2fa394: cmovl   r11d,r8d          ;...45
;...0f
;...4c
;...d8

0x000001f5ca2fa398: cmp     r14d,r11d         ;...45
;...3b
;...f3

0x000001f5ca2fa39b: jnl     1f5ca2fa3d5h      ;...7d
;...38

0x000001f5ca2fa39d: nop                       ;...66
;...66
;...90
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@29 (line 65)

0x000001f5ca2fa3a0: vmovdqu ymm0,ymmword ptr [r13+r14*4+10h]
;...c4
;...81
;...7e
;...6f
;...44
;...b5
;...10

0x000001f5ca2fa3a7: vpmulld ymm1,ymm0,ymm2    ;...c4
;...e2
;...7d
;...40
;...ca

;...e2
;...75
;...02
;...d9

;...e2
;...65
;...02
;...d8

0x000001f5ca2fa3b6: vextracti128 xmm0,ymm3,1h  ;...c4
;...e3
;...7d
;...39
;...d8
;...01

;...e1
;...fe
;...d8

0x000001f5ca2fa3c0: vmovd   xmm0,ebx          ;...c5
;...f9
;...6e
;...c3

;...f9
;...fe
;...c3

0x000001f5ca2fa3c8: vmovd   ebx,xmm0          ;...c5
;...f9
;...7e
;...c3
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@31 (line 65)

;...83
;...c6
;...08
;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@33 (line 64)

0x000001f5ca2fa3d0: cmp     r14d,r11d         ;...45
;...3b
;...f3

0x000001f5ca2fa3d3: jl      1f5ca2fa3a0h      ;...7c
;...cb
;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@20 (line 64)

0x000001f5ca2fa3d5: cmp     r14d,edi          ;...44
;...3b
;...f7

0x000001f5ca2fa3d8: jnl     1f5ca2fa3f0h      ;...7d
;...16

0x000001f5ca2fa3da: nop                       ;...66
;...90
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@29 (line 65)

0x000001f5ca2fa3dc: mov     r10d,ebp          ;...44
;...8b
;...d5

0x000001f5ca2fa3df: imul    r10d,dword ptr [r13+r14*4+10h]
;...47
;...0f
;...af
;...54
;...b5
;...10

;...03
;...da
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@31 (line 65)

0x000001f5ca2fa3e8: inc     r14d              ;...41
;...ff
;...c6
;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@33 (line 64)

0x000001f5ca2fa3eb: cmp     r14d,edi          ;...44
;...3b
;...f7

0x000001f5ca2fa3ee: jl      1f5ca2fa3dch      ;...7c
;...ec
;*if_icmpge {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@20 (line 64)

0x000001f5ca2fa3f0: cmp     r14d,edi          ;...44
;...3b
;...f7

0x000001f5ca2fa3f3: jnl     1f5ca2fa411h      ;...7d
;...1c

0x000001f5ca2fa3f5: nop                       ;...66
;...66
;...90
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@29 (line 65)

0x000001f5ca2fa3f8: cmp     r14d,edi          ;...44
;...3b
;...f7

0x000001f5ca2fa3fb: jnb     1f5ca2fa428h      ;...73
;...2b

0x000001f5ca2fa3fd: mov     r11d,ebp          ;...44
;...8b
;...dd

0x000001f5ca2fa400: imul    r11d,dword ptr [r13+r14*4+10h]
;...47
;...0f
;...af
;...5c
;...b5
;...10

;...03
;...db
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@31 (line 65)

0x000001f5ca2fa409: inc     r14d              ;...41
;...ff
;...c6
;*iinc {reexecute=0 rethrow=0 return_oop=0}
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@33 (line 64)

0x000001f5ca2fa40c: cmp     r14d,edi          ;...44
;...3b
;...f7

0x000001f5ca2fa40f: jl      1f5ca2fa3f8h      ;...7c
;...e7
; - com.openkappa.simd.scale.Scale::Scale_Int_Dynamic@16 (line 64)

0x000001f5ca2fa411: mov     eax,ebx           ;...8b
;...c3

0x000001f5ca2fa413: vzeroupper                ;...c5
;...f8
;...77

;...83
;...c4
;...50

0x000001f5ca2fa41a: pop     rbp               ;...5d

0x000001f5ca2fa41b: test    dword ptr [1f5b68c0000h],eax
;...85
;...05
;...df
;...5b
;...5c
;...ec
;   {poll_return}
0x000001f5ca2fa421: ret                       ;...c3
```

There are two lessons to be learnt here. The first is that what you see is not what you get. The second is about the correctness of asymptotic analysis. If hierarchical cache renders asymptotic analysis bullshit (linear time but cache friendly algorithms can, and do, outperform logarithmic algorithms with cache misses), optimising compilers render the field practically irrelevant.