Skip to content

cmd/compile: poor register allocator behavior in compression code #16122

Open
@flanglet

Description

@flanglet

Please answer these questions before submitting your issue. Thanks!

  1. What version of Go are you using (go version)?
    go1.6 windows/amd64 and go1.7beta1 windows/amd64
  2. What operating system and processor architecture are you using (go env)?
    set GOARCH=amd64
    set GOBIN=
    set GOEXE=.exe
    set GOHOSTARCH=amd64
    set GOHOSTOS=windows
    set GOOS=windows
    set GOPATH=E:\Users\fred\Documents\Prog\kanzi\go
    set GORACE=
    set GOROOT=E:\Program Files\go
    set GOTOOLDIR=E:\Program Files\go\pkg\tool\windows_amd64
    set GO15VENDOREXPERIMENT=1
    set CC=gcc
    set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0
    set CXX=g++
    set CGO_ENABLED=1
  3. What did you do?
    I ran "go build TestZRLT.go" then "TestZRLT.exe" both for Go 1.6 and 1.7 beta1
    The source code is very simple: https://github.com/flanglet/kanzi/blob/master/go/src/kanzi/test/TestZRLT.go.
    It runs a correctness and a performance test tor the Zero Run Length Transform:
    https://github.com/flanglet/kanzi/blob/master/go/src/kanzi/function/ZRLT.go.
  4. What did you expect to see?
    I expected to see no performance regression from 1.6 to 1.7beta1
  5. What did you see instead?
    ZRLT encoding is much faster with 1.7beta1 but decoding is much slower.

Output for 1.6:
Speed test
Iterations: 50000

ZRLT encoding [ms]: 10694
Throughput [MB/s]: 222
ZRLT decoding [ms]: 7419
Throughput [MB/s]: 321

ZRLT encoding [ms]: 10753
Throughput [MB/s]: 221
ZRLT decoding [ms]: 7472
Throughput [MB/s]: 319

ZRLT encoding [ms]: 10724
Throughput [MB/s]: 222
ZRLT decoding [ms]: 7393
Throughput [MB/s]: 322

Output for 1.7beta1:
Speed test
Iterations: 50000

ZRLT encoding [ms]: 6834
Throughput [MB/s]: 348
ZRLT decoding [ms]: 11560
Throughput [MB/s]: 206

ZRLT encoding [ms]: 6828
Throughput [MB/s]: 349
ZRLT decoding [ms]: 11589
Throughput [MB/s]: 205

ZRLT encoding [ms]: 6790
Throughput [MB/s]: 351
ZRLT decoding [ms]: 11558
Throughput [MB/s]: 206

I narrowed down the issue to the run length decoding loop:

for val <= 1 {
    runLength = (runLength << 1) | int(val)
    srcIdx++

    if srcIdx >= srcEnd {
            break
    }

    val = src[srcIdx]
}

If I replace 'for val <= 1 {' with 'for val&1 == val {', the decoding becomes much faster (although not as fast as with Go 1.6)

Output for 1.7beta1 with code change:

Speed test
Iterations: 50000

ZRLT encoding [ms]: 6800
Throughput [MB/s]: 350
ZRLT decoding [ms]: 7669
Throughput [MB/s]: 310

ZRLT encoding [ms]: 6813
Throughput [MB/s]: 349
ZRLT decoding [ms]: 7689
Throughput [MB/s]: 310

ZRLT encoding [ms]: 6775
Throughput [MB/s]: 351
ZRLT decoding [ms]: 7662
Throughput [MB/s]: 311

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsFixThe path to resolution is known, but the work has not been done.Performancecompiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions