Description
Go version
go version go1.23.0 darwin/arm64
Output of go env
in your module/workspace:
go version go1.23.0 darwin/arm64
Nics-MacBook-Pro:teleport nic$ go env
GO111MODULE=''
GOARCH='arm64'
GOBIN='/Users/nic/go/bin'
GOCACHE='/Users/nic/Library/Caches/go-build'
GOENV='/Users/nic/Library/Application Support/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='arm64'
GOHOSTOS='darwin'
GOINSECURE=''
GOMODCACHE='/Users/nic/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='darwin'
GOPATH='/Users/nic/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/Users/nic/go/pkg/mod/golang.org/[email protected]'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/Users/nic/go/pkg/mod/golang.org/[email protected]/pkg/tool/darwin_arm64'
GOVCS=''
GOVERSION='go1.23.0'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/Users/nic/Library/Application Support/go/telemetry'
GCCGO='gccgo'
GOARM64='v8.0'
AR='ar'
CC='clang'
CXX='clang++'
CGO_ENABLED='1'
GOMOD='/Users/nic/src/teleport/go.mod'
GOWORK='/Users/nic/src/teleport/go.work'
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -arch arm64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -ffile-prefix-map=/var/folders/x_/33kp49n52rg8hs97k5m96rnw0000gn/T/go-build289495391=/tmp/go-build -gno-record-gcc-switches -fno-common'
What did you do?
One of our unit tests seems to start a transaction concurrently with closing the database, this sometimes causes a panic here:
Line 3644 in a4cb37d
I haven't been able to reproduce this with db.Conn
db.Close
calls yet, but I do have a unit test in #68953 using database/sql.(*connRequestSet).CloseAndRemoveAll
and database/sql.(*connRequestSet).Delete
that reliably reproduces the panic.
Update: reproduced with exported methods on database/sql.DB
here: https://go.dev/play/p/fHZFq6zgAV4
What did you see happen?
This looks like a race condition in database/sql.(*DB).conn
:
- In
(*DB).conn
heredelHandle
is set to the result ofdb.connRequests.Add(req)
, and thendb.mu
is unlocked - A concurrent call to
db.Close
will calldb.connRequests.CloseAndRemoveAll
here (*DB).conn
re-locksdb.mu
here and callsdb.connRequests.Delete(delHandle)
delHandle.idx
is positive and no bounds checking is done before trying to index into the slice which was set tonil
by(*connRequestSet).CloseAndRemoveAll
, resulting in the panic.
Here's a full stack trace (from https://github.com/gravitational/teleport/actions/runs/10427126860/job/28881284822):
panic: runtime error: index out of range [0] with length 0
goroutine 5494 [running]:
database/sql.(*connRequestSet).deleteIndex(...)
/opt/go/src/database/sql/sql.go:3644
database/sql.(*connRequestSet).Delete(...)
/opt/go/src/database/sql/sql.go:3638
database/sql.(*DB).conn(0xc001569380, {0x15fa19c0, 0xc0031d4000}, 0x1)
/opt/go/src/database/sql/sql.go:1366 +0x1625
database/sql.(*DB).begin(0xc001569380, {0x15fa19c0, 0xc0031d4000}, 0x0, 0x1)
/opt/go/src/database/sql/sql.go:1883 +0x57
database/sql.(*DB).BeginTx.func1(0x1)
/opt/go/src/database/sql/sql.go:1866 +0x7a
database/sql.(*DB).retry(0x7fa4fef225c8?, 0xc002c0b2c8)
/opt/go/src/database/sql/sql.go:1568 +0x4b
database/sql.(*DB).BeginTx(0xc001569380, {0x15fa19c0, 0xc0031d4000}, 0x0)
/opt/go/src/database/sql/sql.go:1865 +0x108
github.com/gravitational/teleport/lib/backend/lite.(*Backend).inTransaction(0xc0033cde40, {0x15fa19c0, 0xc0031d4000}, 0xc002c0b498)
/__w/teleport/teleport/lib/backend/lite/lite.go:949 +0x155
github.com/gravitational/teleport/lib/backend/lite.(*Backend).Get(0xc0033cde40, {0x15fa19c0, 0xc0031d4000}, {0xc003a9d440, 0x13, 0x18})
/__w/teleport/teleport/lib/backend/lite/lite.go:598 +0x1e9
github.com/gravitational/teleport/lib/auth/storage.(*ProcessStorage).GetState(0xc0000575c0, {0x15fa19c0, 0xc0031d4000}, {0x14ac87f5, 0x5})
/__w/teleport/teleport/lib/auth/storage/storage.go:91 +0x264
github.com/gravitational/teleport/lib/service.(*TeleportProcess).initAuthService.func5()
/__w/teleport/teleport/lib/service/service.go:2486 +0x2db
github.com/gravitational/teleport/lib/srv.(*Heartbeat).fetch(0xc002010ea0)
/__w/teleport/teleport/lib/srv/heartbeat.go:339 +0x56
github.com/gravitational/teleport/lib/srv.(*Heartbeat).fetchAndAnnounce(0xc002010ea0)
/__w/teleport/teleport/lib/srv/heartbeat.go:624 +0x2f
github.com/gravitational/teleport/lib/srv.(*Heartbeat).Run(0xc002010ea0)
/__w/teleport/teleport/lib/srv/heartbeat.go:282 +0xe8
github.com/gravitational/teleport/lib/service.(*LocalService).Serve(0xc0022c37c0)
/__w/teleport/teleport/lib/service/supervisor.go:588 +0x36
github.com/gravitational/teleport/lib/service.(*LocalSupervisor).serve.func1()
/__w/teleport/teleport/lib/service/supervisor.go:313 +0x489
created by github.com/gravitational/teleport/lib/service.(*LocalSupervisor).serve in goroutine 4831
/__w/teleport/teleport/lib/service/supervisor.go:302 +0x105
What did you expect to see?
I expected no panic. sql.DB
is advertised as "safe for concurrent use by multiple goroutines"