Description
Problem
The default setting of runtime.GOMAXPROCS()
(to be the number of os-apparent processors) can be greatly misaligned with container cpu quota (e.g. as implemented through cfs bandwidth control by docker).
This can lead to large latency artifacts in programs, especially under peak load, or when saturating all processors during background GC phases.
The smaller the container / larger the machine = the worse this effect becomes: let's say you deploy a fleet of micro service workers, each container having a cpu quota of 4, on a fleet of 32 processor[1] machines.
To understand why, you really have to understand the CFS quota mechanism; this blog post does well (with pictures); this kubernetes issue further explores the topic (especially as it relates to a recently resolved kernel cpu accounting bug). But to summarize it briefly for this issue:
- there is a quota
period
, say 100ms - and there is then a
quota
, say 400ms to affect a 4-processor quota - within any
period
, once the process group exceeds itsquota
it is throttled
Running an application workload at a reasonable level of cpu efficiency makes it quite likely that you'll be spiking up to your full quota and getting throttled.
Background waste workload, like concurrent GC[2], is especially likely to cause quota exhaustion.
I hesitate to even call this a "tail latency" problem; the artifacts are visible in the main body of and can shift the entire latency distribution.
Solution
If you care about latency, reliability, predictability (... insert more *ilities to taste), then the correct thing to do is to never exceed your cpu quota, by setting GOMAXPROCS=max(1, floor(cpu_quota))
.
Using this as a default for GOMAXPROCS makes the world safe again, which is why we use uber-go/automaxprocs in all of our microservices.
NOTEs
- intentionally avoiding use of the word "core"; the matter of hyper-threading and virtual-vs-physical cores is another topic
- /digression: can't not mention userspace scheduler pressure induced by background GC; where are we at with goroutine preemption again?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status