-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Data race when reading / writing join handle waker #2087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
TSAN is prone to false positives, which is one reason why I moved away from it. In this case, we are doing a read as "uninitialized memory" but not assuming it is initialized until we can validate that we we are able to read that memory. You can see that the actual observation of the data only happens if Unfortunately, TSAN is unable to validate this pattern. Now, the question is whether or not this pattern is a data race. I discussed this some with @RalfJung. What I do know is that established and studied algorithms, such as the chase-lev deque, uses such a pattern. We could guard It seems that you have an interest in the concurrency details in Tokio. Why don't you ping me in discord (https://discord.gg/tokio) and we can talk more. I would love to get more help w/ this & loom. |
AFAIK this is the pattern where I told you that yes, this is a data race and thus UB under the rules of C/C++, and thus also under the rules of Rust?
This is arguing with LLVM's rules (where read-write races make the read return You are not alone with this though, crossbeam's
Indeed, there are established and studied algorithms that cannot be implemented in C/C++; see for example this paper. |
@RalfJung How do you suggest handling this case then? |
@RalfJung I guess, are you suggesting that, in Rust, we avoid the "speculative racing load" pattern? |
My experience with thread sanitizer is quite different. Apart from custom In the context of tokio, I have seen only one additional data race. It is not There are memory models where data race merely results in an undefined value, |
If you want to make your code match the Rust spec as it stands right now, the only option is to adjust loom to consider such things data races, and adjust the code to avoid them. If you are fine with causing spec-UB that we know current compilers don't actually turn into miscompilations (and want to be added to rust-lang/unsafe-code-guidelines#158), you can continue as before. (Though as precautionary measure, I recommend using a
Yes. Whenever possible, I suggest to follow the spec, while also arguing for improving the spec. Trying to "force" spec changes by acting as if they already happened puts the lang team in a bad place (see for example the entire unwind-through-extern disaster). This is the equivalent of using a nightly feature on stable before the design is finished; there just isn't a way to have feature flags for UB. But I also understand that sometimes there just isn't a way to comply with the spec, and then it's not like you have much choice -- but that makes it even more important to seek the dialogue with the lang team to make them aware that there is a problem here and to start working towards a solution. |
@tmiasko yes, fences are not supported... and they are used (both in tokio and std), which results in false positives and a brittle wack-a-mole whitelisting process (when the backtrace changes, then the whitelist needs to be updated). |
@tmiasko and also the usage of "speculative racy loads" (which, as you can see, is a grey area). |
There is another case of speculative racy load in the queue. I'm not sure if there is a good way to avoid this one: https://github.com/tokio-rs/tokio/blob/master/tokio/src/runtime/thread_pool/queue/local.rs#L233-L287 |
I have an implementation of I don't really see fences as a big issue, though it does prevent thread |
Well, it's not really a "grey area" in the sense of "the spec is unclear". The spec is pretty clear. But it is a well-known deficiency of the spec. |
The previous implementation would perform a load that might be part of a data race. The value read would be used only when race did not occur. This would be well defined in a memory model where a load that is a part of race merely returns an undefined value, the Rust memory model on the other hand defines it to be undefined behaviour. Perform read conditionally to avoid data race. Covered by existing loom tests after changing casualty check to be immediate rather than deferred. Fixes: #2087
Reproducible when running tests, especially those from
fs_dir.rs
.So far my understanding who has access to the waker field is following:
Join handle attempts to obtain a write lock on waker field by removing
JOIN_WAKER
flag. If removal happened before task was marked asCOMPLETE
orCANCELLED
, attempt succeeded, otherwise it has failed.The task obtains a read lock on a waker field by transitioning to
COMPLETE
orCANCELLED
state. IfJOIN_WAKER
flag was set in the state just beforetransition then it holds the read lock, otherwise it does not.
If above is correct, then in
tokio/tokio/src/task/harness.rs
Lines 467 to 485 in cfd9b36
self.read_join_waker()
should be conditional onres1.has_join_waker()
. With a change along those lines I can no longer reproduce the issue.The text was updated successfully, but these errors were encountered: