Description
Per-CPU sharded values are a useful and common way to reduce contention on shared write-mostly values. However, this technique is currently difficult or impossible to use in Go (though there have been attempts, such as @jonhoo's https://github.com/jonhoo/drwmutex and @bcmills' https://go-review.googlesource.com/#/c/35676/).
We propose providing an API for creating and working with sharded values. Sharding would be encapsulated in a type, say sync.Sharded
, that would have Get() interface{}
, Put(interface{})
, and Do(func(interface{}))
methods. Get
and Put
would always have to be paired to make Do
possible. (This is actually the same API that was proposed in #8281 (comment) and rejected, but perhaps we have a better understanding of the issues now.) This idea came out of off-and-on discussions between at least @rsc, @hyangah, @RLH, @bcmills, @Sajmani, and myself.
This is a counter-proposal to various proposals to expose the current thread/P ID as a way to implement sharded values (#8281, #18590). These have been turned down as exposing low-level implementation details, tying Go to an API that may be inappropriate or difficult to support in the future, being difficult to use correctly (since the ID may change at any time), being difficult to specify, and as being broadly susceptible to abuse.
There are several dimensions to the design of such an API.
Get
and Put
can be blocking or non-blocking:
-
With non-blocking
Get
andPut
,sync.Sharded
behaves like a collection.Get
returns immediately with the current shard's value or nil if the shard is empty.Put
stores a value for the current shard if the shard's slot is empty (which may be different from whereGet
was called, but would often be the same). If the shard's slot is not empty,Put
could either put to some overflow list (in which case the state is potentially unbounded), or run some user-provided combiner (which would bound the state). -
With blocking
Get
andPut
,sync.Sharded
behaves more like a lock.Get
returns and locks the current shard's value, blocking furtherGet
s from that shard.Put
sets the shard's value and unlocks it. In this case,Put
has to know which shard the value came from, soGet
can either return aput
function (though that would require allocating a closure) or some opaque value that must be passed toPut
that internally identifies the shard. -
It would also be possible to combine these behaviors by using an overflow list with a bounded size. Specifying 0 would yield lock-like behavior, while specifying a larger value would give some slack where
Get
andPut
remain non-blocking without allowing the state to become completely unbounded.
Do
could be consistent or inconsistent:
-
If it's consistent, then it passes the callback a snapshot at a single instant. I can think of two ways to do this: block until all outstanding values are
Put
and also block furtherGet
s until theDo
can complete; or use the "current" value of each shard even if it's checked out. The latter requires that shard values be immutable, but it makesDo
non-blocking. -
If it's inconsistent, then it can wait on each shard independently. This is faster and doesn't affect
Get
andPut
, but the caller can only get a rough idea of the combined value. This is fine for uses like approximate statistics counters.
It may be that we can't make this decision at the API level and have to provide both forms of Do
.
I think this is a good base API, but I can think of a few reasonable extensions:
-
Provide
Peek
andCompareAndSwap
. If a user of the API can be written in terms of these, thenDo
would always be able to get an immediate consistent snapshot. -
Provide a
Value
operation that uses the user-provided combiner (if we go down that API route) to get the combined value of thesync.Sharded
.