-
Notifications
You must be signed in to change notification settings - Fork 5.9k
8282475: SafeFetch should not rely on existence of Thread::current #7727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Welcome back parttimenerd! A progress list of the required criteria for merging this PR into |
@parttimenerd The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Johannes,
The general idea seems good (pity it touches so many files, but then I've never liked any of this WX support precisely because it is so invasive of shared code). I agree that safeFetch should not have become dependent on Thread::current existing, but I have to wonder whether we can just skip the WX code if there is no current thread? If the thread is not attached to the VM then what does it even mean to manipulate the WX state of an unknown thread?
That aside, with this change I think we can move the conditional WX code out of the shared os.hpp and bury it down in os_bsd_aarch64.hpp where it actually belongs.
I'd even like to see threadWXSetters.inline.hpp moved to being in src/os_cpu/bsd_aarch64/ if feasible - I'm not sure what include would be needed for the callsites to function - os.hpp I presume?
Thanks,
David
The OS thread is always known. The WXMode is unrelated to Thread object. The WXMode is set for an OS thread to allow pages to be either writable or executable (needed for code generation).
May I ask how that would affect the code that uses the methods (includes, ...)?
I don't know whether this is enough. |
Hi David,
We need to change the wx state of the current pthread in order to be able to execute stub routines. Otherwise, we would crash right away when trying to execute the SafeFetch stub. And that is a valid requirement. Let's say we crash in a native thread, unrelated to and completely oblivious of the JVM it shares the process with. We'd still want to see e.g. native crash information, stack frames, maybe register region information etc - all that stuff that may require SafeFetch. In fact, this patch is related to Johannes other PR where he modified stack frame walking to check that the registers point into valid memory.
Oh yes!
I agree, all that wx stuff should be limited to os/bsd or os/bsd_aarch. We could have generic wrappers like:
(Macro does not yet exist, but MACOS_AARCH64_ONLY does) -- Side note, I think we have reached a point where it would be cleaner to split xxxBSD and MacOS sources. E.g. this wx stuff should be limited to MacOS too, and we have more and more Cheers, Thomas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Johannes, just some drive-by comments, not a full review. Also please see my comment toward David, proposing a more generic interface in os instead.
Cheers, Thomas
Regarding the names of the new methods: Most of the usages for ThreadWXEnable use it to set the WXMode to WXWrite. The suggested names are therefore a bit misleading (when used in this context). One could add another two methods:
But one would still have the problem of nesting (e.g. when code generating code calls code generating code). |
Oh I see - that is unfortunate. I don't like messing with other people's threads.
@parttimenerd there would be no change - they continue to include os.hpp, which will include the os/cpu specific header files.
@tstuefe I think this is going a little too far in this fix. I'm looking for simplicity. All the WX related code should be buried in the os/cpu file for BSD/Aarch64 and the callsites all using MACOS_AARCH64_ONLY. Splitting BSD from macOS would also be a future RFE. Thanks. |
Thanks.
I'm currently finishing a fix that exactly does that :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking good. A few additional comments below.
Thanks,
David
I don't know why the Linux x86 build fails. I tested the current version with code related to #7591 and it seems to fix the remaining problems (I tested it also with NMT enabled). |
The Linux x86 build failure is not related to this and has already been fixed, so you should re-sync with master branch. |
0204830
to
21dd004
Compare
@parttimenerd please never force-push in an active review as it completely destroys the review history and comment context! |
But we don't need to speculate. If thread-local variables are cheap on MacOS, and there is no reason why they should be expensive, then we can stop worrying and just use a thread-local variable for WX state. We can measure how long it takes, and we only have to care about one platform, MacOS/AArch64. We could also redefine SafeFetch on MacOS/AArch64 to not need WX. We could do this by statically generating SafeFetch on that platform, and it wouldn't be in the JIT region at all. Why not just do that? |
According to https://forums.swift.org/t/concurrencys-use-of-thread-local-variables/48654: "these accesses are just a move from a system register plus a load/store at a constant offset." |
Do you mean using inline assembly? |
On 3/11/22 10:12, Thomas Stuefe wrote:
We could also redefine SafeFetch on MacOS/AArch64 to not need WX. We could do this by statically generating SafeFetch on that platform, and it wouldn't be in the JIT region at all. Why not just do that?
Do you mean using inline assembly?
I'd use out-of-line assembly, as I do for atomic compare-and-swap
on linux:
https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S
But I guess inline would work.
…--
Andrew Haley (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
|
Oh, this is neat. It would work on all platforms too, or on all we care to implement it for. And it would nicely solve the initialization window problem since it would work before stub routines are generated. We could throw It seems we already do static assembly on bsd aarch. So there is already a path to follow. But this could also be done as a follow up enhancement. I still like the OS TLS variable idea. |
Ideally you'd still benchmark that. Some AArch64 implementations have really, really slow moves from the system register used as the thread pointer. Hopefully Apple's implementation isn't in that category. |
This UB looks reasonable. My point is that a native thread would run fine with SIGSEGV blocked. But then JVM decides it can do SafeFetch, and things gets nasty.
I mean, some way to verify the issue is fixed, e.g. a test that does not fail anymore. I see AsyncGetCallTrace to assume the JavaThread very soon, or do I look at the wrong place? https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/forte.cpp#L569
OK, thanks. I think we also handle recursive segfaults recover after interpretation of the corrupted VM state. Otherwise, implementing the printing functions would be too tedious and hard with SafeFetch alone. But I see it's used in printing register content, at least. |
Then the OS TLS way is not better since when the signal handler and SafeFetch start, the state is unknown and is only assumed to be Write (in initialization of TLS variable). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked on the patch again from the perspective of a pure refactoring. It looks fine except we lost one of the asserts.
@@ -276,7 +275,7 @@ Thread::Thread() { | |||
assert(Thread::current_or_null() == NULL, "creating thread before barrier set"); | |||
} | |||
|
|||
MACOS_AARCH64_ONLY(DEBUG_ONLY(_wx_init = false)); | |||
MACOS_AARCH64_ONLY(DEBUG_ONLY(os::ThreadWX::init();)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line meant the WX state is not initialized at this point (as a part of Thread constructor). Since there are a several places where the state is initialized and it was easy to miss one, I would like to preserve some assert that the state is initialized.
Blocking synchronous error signals makes zero sense even for normal programs, since you lose the ability to get cores. For the JVM in particular, it also blocks facilities like polling pages, or dynamically querying CPU abilities. So a JVM would not even start with synchronous error signals blocked.
No, tests do not exist. Unfortunately, otherwise this regression would have been detected right away and we would not need this PR. We have a test though that tests SafeFetch during error handling. That test can be tweaked for this purpose. So, test does not exist yet, but can be easily written.
Secondary error handling is a very coarse-grained tool. If an error reporting step crashes out, we continue with the next step. Has disadvantages though. The total number of retries is very limited. And a faulting error reporting step still hurts, because its report is compromised. E.g. if the call stack printing crashes out, we have no call stack. This is not an abstract problem. Its a very concrete and typical problem. I spend a large part of my work with hs-err reports. They are of very high importance to us. We (SAP) have invested a lot of time and effort in hardening out OpenJDK error reporting, and SafeFetch is an important part of that. For example, we provided the facility that made SafeFetch usable in signal handling. It would be nice if our work was not compromised. Please let us find a way forward here. |
I spent some time doing a static implementation of SafeFetch on Linux x64, and its not super trivial. The problem is that we need to know addresses of instructions inside that function. I can set labels in assembly, and I can export them, but so far I have been unable to use them as addresses in C++ code. I will research some more. |
There are basically two ways (easy) to do it. Put global symbols like .globl address_of_label
address_of_label: into the assembler sources and use extern char address_of_label[] __attribute__ ((visibility ("hidden"))); from the C++ side. Or use a local label, and export the difference to the function start to a local label in a global data symbol from the assembler side: .globl SafeFetch // Real function name goes here.
SafeFetch:
// …
.Llabel:
// …
.section .rodata
.globl SafeFetch_label_offset
.p2align 3
SafeFetch_label_offset:
.quad .Llabel - SafeFetch
.type SafeFetch_label_offset, @object
.size SafeFetch_label_offset, 8 And use extern uintptr_t SafeFetch_label_offset __attribute__ ((__visibility ("hidden"))); and the expression If you have a PR, please Cc: me on it, I will have a look. |
In a tight loop, loads from __thread variables take 1ns. It's this:
... which looks the same as what glibc does. Not bad, but quite a lot more to do than a simple load. I'd still use a static SafeFetch, with no W^X fiddling. It just seems to me much more reasonable. |
ITYM
|
It doesn't hurt, but the Itanium ABI does not mangle such global data symbols, so it's not strictly needed. |
That's an interesting point of view. I guess I never thought about it, but I'd always put symbols for an asm file in an |
Incidentally, there must be a lot of speculation and bypassing going on there. I can see 15 cycles of latency, probably 20, so that'd be more like 5ns start to finish. M1 is a remarkable thing. |
Hi Florian,
Thanks a lot, Florian! I got it to work under Linux x64. My error was that I had declared the label in C++ as
I don't understand this remark, what does Itanium have to do with this? |
Great!
Your approach might have worked as well, but you would have to use Anyway, from what I've seen, the array is more idiomatic.
The C++ ABI definition is probably Itanium's most lasting contribution to computing. I think it's used on most non-Windows systems these days, not just on Linux, and of course on all kinds of CPUs. |
Ah, that makes sense. I wondered why the address did not look like a code pointer in C++. Anyway, got Linux x86_32 working too. Now I am working on aarch64.
Interesting to know. Thanks! |
We're looking into solutions and create a new PR if necessary. |
Mailing list message from David Holmes on hotspot-dev: On 12/03/2022 2:37 am, Anton Kozlov wrote:
It is up to the agent setting things up for AGCT to only actually call David |
This is not the point: It comes down to API design. If we use SafeFetch in os::is_first_C_frame (and thereby in frame::link_or_null) and not just in ASGCT, then it depends on when the other methods can be called. These methods are e.g. used whenever an error happens and a hs_err file is generated. We cannot guarantee that a JavaThread is always present there. |
Mailing list message from David Holmes on hotspot-dev: On 18/03/2022 5:21 pm, Johannes Bechberger wrote:
My comment was specifically in response to your statement:
But AGCT is only intended to ever be called on JavaThreads. David |
Sorry, it was my question. It looked for me this way as well (and that ACGT will return shortly if called on non-Java thread; AFAICS SafeFetch in not involved), and I wanted to confirm. The AGCT on non-Java thread was declared to be one of the two major reasons for this patch. I would support this patch to move W^X management out from Thread to OS-specific code, after the problem with the assert "is initialized" is fixed. |
I'm currently implementing Andrews proposal for a static safefetch (#7865, still in draft, but almost done). That will be more generic solution since we don't have to deal with thread wx state at all. That's why we closed this PR. |
The conversation here is some what hard to follow. I do see that "foreign threads" was mentioned by @tstuefe in the context of AGCT but I have to assume he misspoke there (assuming a foreign thread is one not attached to the VM) as AGCT only works for attached JavaThreads. The signal handler that will call AGCT has to be prepared to find any kind of thread in any state, but AGCT should only be called on the right kinds of thread in the right state. |
Sure, AGCT can be limited to VM threads - or maybe already is. But tracking non-VM threads could be a valid use case. We have downstream in the SapMachine a facility where we track callstacks from malloc sites - independently from NMT or the VM. With the explicit purpose of catching mallocs from non-VM threads too. For collecting the stack trace, we use some VM utilities, SafeFetch among them. That is a very useful facility. I could argue a similar case for the Async Profiler: why should profiling be limited to Java threads? In the end, if it eats performance, it hurts, regardless whether its a java thread or a non-VM-attached thread. Could be a concurrent native thread burning CPU, why would that not be interesting. Our concern was with SafeFetch, and AGCT is only one example. SafeFetch should be as safe as possible. Error reporting alone is a sufficient reason. |
The WXMode for the current thread (on MacOS aarch64) is currently stored in the thread class which is unnecessary as the WXMode is bound to the current OS thread, not the current instance of the thread class.
This pull request moves the storage of the current WXMode into a thread local global variable in
os
and changes all related code. SafeFetch depended on the existence of a thread object only because of the WXMode. This pull request therefore removes the dependency, making SafeFetch usable in more contexts.Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7727/head:pull/7727
$ git checkout pull/7727
Update a local copy of the PR:
$ git checkout pull/7727
$ git pull https://git.openjdk.java.net/jdk pull/7727/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 7727
View PR using the GUI difftool:
$ git pr show -t 7727
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7727.diff