Skip to content

cmd/compile: inefficient CALL setup when more than 32bytes of args #23377

Open
@ALTree

Description

@ALTree
$ gotip version
go version devel +a62071a209 Sat Jan 6 04:52:00 2018 +0000 linux/amd64
type T struct {
	s1, s2 string
}

//go:noinline
func foo(t T) { _ = t }

func bar() {
	var t T
	foo(t)
}

generates

0x0020 00032 (test.go:14)	MOVUPS	X0, (SP)
0x0024 00036 (test.go:14)	MOVUPS	X0, 16(SP)
0x0029 00041 (test.go:14)	CALL	"".foo(SB)

but when

type T struct {
	s1, s2, s3 string    // one more string
}
0x001d 00029 (test.go:13)	XORPS	X0, X0
0x0020 00032 (test.go:13)	MOVUPS	X0, "".t+48(SP)
0x0025 00037 (test.go:13)	MOVUPS	X0, "".t+64(SP)
0x002a 00042 (test.go:13)	MOVUPS	X0, "".t+80(SP)
0x002f 00047 (test.go:13)	MOVQ	SP, DI
0x0032 00050 (test.go:14)	LEAQ	"".t+48(SP), SI
0x0037 00055 (test.go:14)	DUFFCOPY	$854
0x004a 00074 (test.go:14)	CALL	"".foo(SB)

The stack is bigger; first we MOVUPS a bunch of zeros to 48/64/80(SP), then we call DUFFCOPY to move them again to (SP). This seems wasteful. Even if we cross the multiple-MOVs/DUFF threshold, it seems it would be possible to just DUFFZERO at (SP), essentially the thing the first snippet does.

This also happen when there's no zeroing going on. For example, for struct { a, b, c, d int64}, when initialized as t = {1, 2, 3, 4}, the values are moved directly to (SP), but for struct { a, b, c, d, e int64}, which is bigger than 32bytes, they aren't. There are 5 moves high into the stack and then a DUFFCOPY call moves them to (SP).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions