Description
Please answer these questions before submitting your issue. Thanks!
- What version of Go are you using (
go version
)?
go1.6 windows/amd64 and go1.7beta1 windows/amd64 - What operating system and processor architecture are you using (
go env
)?
set GOARCH=amd64
set GOBIN=
set GOEXE=.exe
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
set GOPATH=E:\Users\fred\Documents\Prog\kanzi\go
set GORACE=
set GOROOT=E:\Program Files\go
set GOTOOLDIR=E:\Program Files\go\pkg\tool\windows_amd64
set GO15VENDOREXPERIMENT=1
set CC=gcc
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0
set CXX=g++
set CGO_ENABLED=1 - What did you do?
I ran "go build TestZRLT.go" then "TestZRLT.exe" both for Go 1.6 and 1.7 beta1
The source code is very simple: https://github.com/flanglet/kanzi/blob/master/go/src/kanzi/test/TestZRLT.go.
It runs a correctness and a performance test tor the Zero Run Length Transform:
https://github.com/flanglet/kanzi/blob/master/go/src/kanzi/function/ZRLT.go. - What did you expect to see?
I expected to see no performance regression from 1.6 to 1.7beta1 - What did you see instead?
ZRLT encoding is much faster with 1.7beta1 but decoding is much slower.
Output for 1.6:
Speed test
Iterations: 50000
ZRLT encoding [ms]: 10694
Throughput [MB/s]: 222
ZRLT decoding [ms]: 7419
Throughput [MB/s]: 321
ZRLT encoding [ms]: 10753
Throughput [MB/s]: 221
ZRLT decoding [ms]: 7472
Throughput [MB/s]: 319
ZRLT encoding [ms]: 10724
Throughput [MB/s]: 222
ZRLT decoding [ms]: 7393
Throughput [MB/s]: 322
Output for 1.7beta1:
Speed test
Iterations: 50000
ZRLT encoding [ms]: 6834
Throughput [MB/s]: 348
ZRLT decoding [ms]: 11560
Throughput [MB/s]: 206
ZRLT encoding [ms]: 6828
Throughput [MB/s]: 349
ZRLT decoding [ms]: 11589
Throughput [MB/s]: 205
ZRLT encoding [ms]: 6790
Throughput [MB/s]: 351
ZRLT decoding [ms]: 11558
Throughput [MB/s]: 206
I narrowed down the issue to the run length decoding loop:
for val <= 1 {
runLength = (runLength << 1) | int(val)
srcIdx++
if srcIdx >= srcEnd {
break
}
val = src[srcIdx]
}
If I replace 'for val <= 1 {' with 'for val&1 == val {', the decoding becomes much faster (although not as fast as with Go 1.6)
Output for 1.7beta1 with code change:
Speed test
Iterations: 50000
ZRLT encoding [ms]: 6800
Throughput [MB/s]: 350
ZRLT decoding [ms]: 7669
Throughput [MB/s]: 310
ZRLT encoding [ms]: 6813
Throughput [MB/s]: 349
ZRLT decoding [ms]: 7689
Throughput [MB/s]: 310
ZRLT encoding [ms]: 6775
Throughput [MB/s]: 351
ZRLT decoding [ms]: 7662
Throughput [MB/s]: 311