Description
Reviewing the following code:
and the generated assembly:
varint_bench_test.go:14 0x10f07ee 488b4c2430 MOVQ 0x30(SP), CX
varint_bench_test.go:14 0x10f07f3 488b5118 MOVQ 0x18(CX), DX
...
varint_bench_test.go:14 0x10f0807 ffd2 CALL DX
TEXT command-line-arguments.WriteUvarint(SB) /Users/robertengels/gotest/gotest/varint_bench_test.go varint_bench_test.go:12 0x10f07b0 65488b0c2530000000 MOVQ GS:0x30, CX varint_bench_test.go:12 0x10f07b9 483b6110 CMPQ 0x10(CX), SP varint_bench_test.go:12 0x10f07bd 0f869f000000 JBE 0x10f0862 varint_bench_test.go:12 0x10f07c3 4883ec28 SUBQ $0x28, SP varint_bench_test.go:12 0x10f07c7 48896c2420 MOVQ BP, 0x20(SP) varint_bench_test.go:12 0x10f07cc 488d6c2420 LEAQ 0x20(SP), BP varint_bench_test.go:13 0x10f07d1 488b442440 MOVQ 0x40(SP), AX varint_bench_test.go:13 0x10f07d6 eb09 JMP 0x10f07e1 varint_bench_test.go:18 0x10f07d8 488b442440 MOVQ 0x40(SP), AX varint_bench_test.go:18 0x10f07dd 48c1e807 SHRQ $0x7, AX varint_bench_test.go:13 0x10f07e1 483d80000000 CMPQ $0x80, AX varint_bench_test.go:13 0x10f07e7 7243 JB 0x10f082c varint_bench_test.go:13 0x10f07e9 4889442440 MOVQ AX, 0x40(SP) varint_bench_test.go:14 0x10f07ee 488b4c2430 MOVQ 0x30(SP), CX varint_bench_test.go:14 0x10f07f3 488b5118 MOVQ 0x18(CX), DX varint_bench_test.go:14 0x10f07f7 83c880 ORL $-0x80, AX varint_bench_test.go:14 0x10f07fa 88442408 MOVB AL, 0x8(SP) varint_bench_test.go:14 0x10f07fe 488b442438 MOVQ 0x38(SP), AX varint_bench_test.go:14 0x10f0803 48890424 MOVQ AX, 0(SP) varint_bench_test.go:14 0x10f0807 ffd2 CALL DX varint_bench_test.go:14 0x10f0809 488b442418 MOVQ 0x18(SP), AX varint_bench_test.go:14 0x10f080e 488b4c2410 MOVQ 0x10(SP), CX varint_bench_test.go:15 0x10f0813 4885c9 TESTQ CX, CX varint_bench_test.go:15 0x10f0816 74c0 JE 0x10f07d8 varint_bench_test.go:16 0x10f0818 48894c2448 MOVQ CX, 0x48(SP) varint_bench_test.go:16 0x10f081d 4889442450 MOVQ AX, 0x50(SP) varint_bench_test.go:16 0x10f0822 488b6c2420 MOVQ 0x20(SP), BP varint_bench_test.go:16 0x10f0827 4883c428 ADDQ $0x28, SP varint_bench_test.go:16 0x10f082b c3 RET varint_bench_test.go:20 0x10f082c 488b4c2430 MOVQ 0x30(SP), CX varint_bench_test.go:20 0x10f0831 488b4918 MOVQ 0x18(CX), CX varint_bench_test.go:20 0x10f0835 88442408 MOVB AL, 0x8(SP) varint_bench_test.go:20 0x10f0839 488b442438 MOVQ 0x38(SP), AX varint_bench_test.go:20 0x10f083e 48890424 MOVQ AX, 0(SP) varint_bench_test.go:20 0x10f0842 ffd1 CALL CX varint_bench_test.go:20 0x10f0844 488b442418 MOVQ 0x18(SP), AX varint_bench_test.go:20 0x10f0849 488b4c2410 MOVQ 0x10(SP), CX varint_bench_test.go:20 0x10f084e 48894c2448 MOVQ CX, 0x48(SP) varint_bench_test.go:20 0x10f0853 4889442450 MOVQ AX, 0x50(SP) varint_bench_test.go:20 0x10f0858 488b6c2420 MOVQ 0x20(SP), BP varint_bench_test.go:20 0x10f085d 4883c428 ADDQ $0x28, SP varint_bench_test.go:20 0x10f0861 c3 RET varint_bench_test.go:12 0x10f0862 e89948f6ff CALL runtime.morestack_noctxt(SB) varint_bench_test.go:12 0x10f0867 e944ffffff JMP command-line-arguments.WriteUvarint(SB) :-1 0x10f086c cc INT $0x3 :-1 0x10f086d cc INT $0x3 :-1 0x10f086e cc INT $0x3 :-1 0x10f086f cc INT $0x3
The generated code loads the interface address using double indirection in every loop invocation, and every call (line 14 & 20).
The compiler could easily generate optimized code where the DX is loaded once and used for every interface call, as w is constant in the method.
It is my opinion that loops like this are very common in typical Go code and deserve more optimization attention. As an example, issue #29010 makes specific reference to not using interfaces as call sites due to their inefficiency.
At a minimum the call address could be placed in a stack local avoiding one indirection.
A more advanced change might be to reserve a couple of general purpose registers for the hot interface call address and object reference (r10/r11) and so push/pop r10/r11 on entry/exit for those routines using the optimization.
Issue #18597 has some overlap with this.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status