-
Notifications
You must be signed in to change notification settings - Fork 3.9k
reduce timer object creation to reduce gc overhead #4175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
for { | ||
raftIdleTimer.Reset(raftIdleTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is functionally equivalent. The old code would restart the timer on every iteration. But it looks like this new pattern introduces a small race where the timer can fire just as a request is received. The timer internally sends to raftIdleTimer.C
and on the next iteration we call Reset
, but that doesn't clear the channel. I'm not familiar enough with this code to know if that race is innocuous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch; that does seem like an issue. Now that I look closely, there are several other problems with the shutdown-on-idle behavior. A new request could be written to the channel after the decision has been made to shut down the goroutine, and even without any races the goroutine could stop while there are still outstanding requests that will try to write to the done
channel.
We have several instances of this pattern where we want a goroutine that goes away when it is idle (here, gossip.infoStore.runCallbacks
, and my upcoming work on parallelizing raft execution). I'm going to try to refactor the infoStore
pattern into something reusable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset
returns false
if the timer had expired (in other words, it returns false
if there's a value parked on the channel). Seems easy enough to mitigate:
if !raftIdleTimer.Reset(raftIdleTimeout) {
<-raftIdleTimer.C
}
LGTM. While there are some possible races, I don't think this change makes things significantly worse than before so we might as well make the change. |
There are currently 8 places in CockroachDB non-test code that create a `time.Timer` using `time.NewTimer` during every iteration of a loop. cockroachdb#4175 proposed a fix for the worst instance of this issue within `*rpcTransport.processQueue`, which resulted in upwards of **400,000** timers on a single node inuse at a given time. The second biggest offender of this issue was in `kv.send`, which resulted in about **30,000** timers on a single node inuse at a given time. Together, I diagnosed that these two issues were responsible for the memory leak seen in cockroachdb#4346. After making these fixes, it looks like the issue is gone as memory no longer grows without bound. I've gone ahead and fixed all 8 occurences of this anti-pattern across our codebase, using the strategy @tamird brought up in [this](https://github.com/cockroachdb/cockroach/pull/4175/files#r52558817) comment to avoid a race condition between iterations with the timers. A few of the changes might be a little over-aggressive as the loops are not as "tight" as the ones causing issues, but I still think it's important to make this change now and avoid these issues in the future.
There are currently 8 places in CockroachDB non-test code that create a `time.Timer` using `time.NewTimer` during every iteration of a loop. cockroachdb#4175 proposed a fix for the worst instance of this issue within `*rpcTransport.processQueue`, which resulted in upwards of **400,000** timers on a single node inuse at a given time. The second biggest offender of this issue was in `kv.send`, which resulted in about **30,000** timers on a single node inuse at a given time. Together, I diagnosed that these two issues were responsible for the memory "leak" seen in cockroachdb#4346. After making these fixes, it looks like the issue is gone as memory no longer grows without bound. I've gone ahead and fixed all 8 occurences of this anti-pattern across our codebase, using the strategy @tamird brought up in [this](https://github.com/cockroachdb/cockroach/pull/4175/files#r52558817) comment to avoid a race condition between iterations with the timers. A few of the changes might be a little over-aggressive as the loops are not as "tight" as the ones causing issues, but I still think it's important to make this change now and avoid these issues in the future.
There are currently 8 places in CockroachDB non-test code that create a `time.Timer` using `time.NewTimer` during every iteration of a loop. cockroachdb#4175 proposed a fix for the worst instance of this issue within `*rpcTransport.processQueue`, which resulted in upwards of **400,000** timers on a single node inuse at a given time. The second biggest offender of this issue was in `kv.send`, which resulted in about **30,000** timers on a single node inuse at a given time. Together, I diagnosed that these two issues were responsible for the memory "leak" seen in cockroachdb#4346. After making these fixes, it looks like the issue is gone as memory no longer grows without bound. I've gone ahead and fixed all 8 occurences of this anti-pattern across our codebase, using the strategy @tamird brought up in [this](https://github.com/cockroachdb/cockroach/pull/4175/files#r52558817) comment to avoid a race condition between iterations with the timers. A few of the changes might be a little over-aggressive as the loops are not as "tight" as the ones causing issues, but I still think it's important to make this change now and avoid these issues in the future.
close as #4367 will fix the same. |
There are currently 8 places in CockroachDB non-test code that create a `time.Timer` using `time.NewTimer` during every iteration of a loop. cockroachdb#4175 proposed a fix for the worst instance of this issue within `*rpcTransport.processQueue`, which resulted in upwards of **400,000** timers on a single node inuse at a given time. The second biggest offender of this issue was in `kv.send`, which resulted in about **30,000** timers on a single node inuse at a given time. Together, I diagnosed that these two issues were responsible for the memory "leak" seen in cockroachdb#4346. After making these fixes, it looks like the issue is gone as memory no longer grows without bound. I've gone ahead and fixed all 8 occurences of this anti-pattern across our codebase, using the strategy @tamird brought up in [this](https://github.com/cockroachdb/cockroach/pull/4175/files#r52558817) comment to avoid a race condition between iterations with the timers. A few of the changes might be a little over-aggressive as the loops are not as "tight" as the ones causing issues, but I still think it's important to make this change now and avoid these issues in the future.
by start 3 nodes with GODEBUG=memprofilerate=1, and using
will found heavy timer object creation
This creation can be done only once.