runtime: SIGSEGV on nil pointer in mheap.freeManual #73628
Labels
compiler/runtime
Issues related to the Go compiler and/or runtime.
WaitingForInfo
Issue is not actionable because of missing required information, which needs to be provided.
Go version
1.23.8
Output of
go env
in your module/workspace:What did you do?
Have been seeing this SIGSEGV sporadically happen across a large number of high load services, its pretty rare so I have very very low confidence that I will be able to zero into a good repro.
We have probably 40+ of our backend service running across 10+ large AMD EPYC servers with lots of ECC RAM. The services are running in KVM on top of a fully updated Proxmox environment. The crashing backend service has between 4 vCPUs and 16GB of RAM and 16 vCPUs and 128GB of RAM.
We saw the crash start happening when we transitioned to the 1.23 runtime, it has happened on most of our physical hosts, none of the hosts are reporting memory problems and prior to deployment we did a memtest and stress test on all the hosts. The service typically runs for over a month before we see this sigsev and we never see it on our low-load instances, only high load (more GC activity, so that isn't unexpected).
The backing service makes extensive use of mmap for large disk backed data structures.
We just moved our builds over to the 1.24.X runtime but we haven't run them in production for long enough to see if the crash goes away and I can see that the mheap code is wildly different in 1.24 vs 1.23.
What did you see happen?
Crash with the following backtrace (happens in exactly the same spot every time):
Digging in its pretty clear we are getting a nil pointer from
runtime.spanOf
which then causes the crash whenruntime.freeManual
attempts to assigns.needzero = 1
Across all our crashes the arguments to
runtime.freeManual
are always(0x0, 0x2)
according to the backtracesWhat did you expect to see?
No SIGSEGV
The text was updated successfully, but these errors were encountered: