cmd/compile: improve interface dispatch performance

Reviewing the following code:

![image](https://user-images.githubusercontent.com/4958833/50028158-5597ad80-ffb4-11e8-97f3-8223c4bb1e9a.png)

and the generated assembly:

<details><summary>
<pre>
varint_bench_test.go:14       0x10f07ee               488b4c2430              MOVQ 0x30(SP), CX
varint_bench_test.go:14       0x10f07f3               488b5118                MOVQ 0x18(CX), DX
...
varint_bench_test.go:14       0x10f0807               ffd2                    CALL DX
 </pre>
</summary>
<pre>
TEXT command-line-arguments.WriteUvarint(SB) /Users/robertengels/gotest/gotest/varint_bench_test.go
  varint_bench_test.go:12       0x10f07b0               65488b0c2530000000      MOVQ GS:0x30, CX
  varint_bench_test.go:12       0x10f07b9               483b6110                CMPQ 0x10(CX), SP
  varint_bench_test.go:12       0x10f07bd               0f869f000000            JBE 0x10f0862
  varint_bench_test.go:12       0x10f07c3               4883ec28                SUBQ $0x28, SP
  varint_bench_test.go:12       0x10f07c7               48896c2420              MOVQ BP, 0x20(SP)
  varint_bench_test.go:12       0x10f07cc               488d6c2420              LEAQ 0x20(SP), BP
  varint_bench_test.go:13       0x10f07d1               488b442440              MOVQ 0x40(SP), AX
  varint_bench_test.go:13       0x10f07d6               eb09                    JMP 0x10f07e1
  varint_bench_test.go:18       0x10f07d8               488b442440              MOVQ 0x40(SP), AX
  varint_bench_test.go:18       0x10f07dd               48c1e807                SHRQ $0x7, AX
  varint_bench_test.go:13       0x10f07e1               483d80000000            CMPQ $0x80, AX
  varint_bench_test.go:13       0x10f07e7               7243                    JB 0x10f082c
  varint_bench_test.go:13       0x10f07e9               4889442440              MOVQ AX, 0x40(SP)
  varint_bench_test.go:14       0x10f07ee               488b4c2430              MOVQ 0x30(SP), CX
  varint_bench_test.go:14       0x10f07f3               488b5118                MOVQ 0x18(CX), DX
  varint_bench_test.go:14       0x10f07f7               83c880                  ORL $-0x80, AX
  varint_bench_test.go:14       0x10f07fa               88442408                MOVB AL, 0x8(SP)
  varint_bench_test.go:14       0x10f07fe               488b442438              MOVQ 0x38(SP), AX
  varint_bench_test.go:14       0x10f0803               48890424                MOVQ AX, 0(SP)
  varint_bench_test.go:14       0x10f0807               ffd2                    CALL DX
  varint_bench_test.go:14       0x10f0809               488b442418              MOVQ 0x18(SP), AX
  varint_bench_test.go:14       0x10f080e               488b4c2410              MOVQ 0x10(SP), CX
  varint_bench_test.go:15       0x10f0813               4885c9                  TESTQ CX, CX
  varint_bench_test.go:15       0x10f0816               74c0                    JE 0x10f07d8
  varint_bench_test.go:16       0x10f0818               48894c2448              MOVQ CX, 0x48(SP)
  varint_bench_test.go:16       0x10f081d               4889442450              MOVQ AX, 0x50(SP)
  varint_bench_test.go:16       0x10f0822               488b6c2420              MOVQ 0x20(SP), BP
  varint_bench_test.go:16       0x10f0827               4883c428                ADDQ $0x28, SP
  varint_bench_test.go:16       0x10f082b               c3                      RET
  varint_bench_test.go:20       0x10f082c               488b4c2430              MOVQ 0x30(SP), CX
  varint_bench_test.go:20       0x10f0831               488b4918                MOVQ 0x18(CX), CX
  varint_bench_test.go:20       0x10f0835               88442408                MOVB AL, 0x8(SP)
  varint_bench_test.go:20       0x10f0839               488b442438              MOVQ 0x38(SP), AX
  varint_bench_test.go:20       0x10f083e               48890424                MOVQ AX, 0(SP)
  varint_bench_test.go:20       0x10f0842               ffd1                    CALL CX
  varint_bench_test.go:20       0x10f0844               488b442418              MOVQ 0x18(SP), AX
  varint_bench_test.go:20       0x10f0849               488b4c2410              MOVQ 0x10(SP), CX
  varint_bench_test.go:20       0x10f084e               48894c2448              MOVQ CX, 0x48(SP)
  varint_bench_test.go:20       0x10f0853               4889442450              MOVQ AX, 0x50(SP)
  varint_bench_test.go:20       0x10f0858               488b6c2420              MOVQ 0x20(SP), BP
  varint_bench_test.go:20       0x10f085d               4883c428                ADDQ $0x28, SP
  varint_bench_test.go:20       0x10f0861               c3                      RET
  varint_bench_test.go:12       0x10f0862               e89948f6ff              CALL runtime.morestack_noctxt(SB)
  varint_bench_test.go:12       0x10f0867               e944ffffff              JMP command-line-arguments.WriteUvarint(SB)
  :-1                           0x10f086c               cc                      INT $0x3
  :-1                           0x10f086d               cc                      INT $0x3
  :-1                           0x10f086e               cc                      INT $0x3
  :-1                           0x10f086f               cc                      INT $0x3
</pre>
</details>

The generated code loads the interface address using double indirection in every loop invocation, and every call (line 14 & 20).

The compiler could easily generate optimized code where the DX is loaded once and used for every interface call, as w is constant in the method.

It is my opinion that loops like this are very common in typical Go code and deserve more optimization attention. As an example, issue #29010 makes specific reference to not using interfaces as call sites due to their inefficiency.

At a minimum the call address could be placed in a stack local avoiding one indirection.

A more advanced change might be to reserve a couple of general purpose registers for the hot interface call address and object reference (r10/r11) and so push/pop r10/r11 on entry/exit for those routines using the optimization.

Issue #18597 has some overlap with this. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cmd/compile: improve interface dispatch performance #29276

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

cmd/compile: improve interface dispatch performance #29276

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions