8282475: SafeFetch should not rely on existence of Thread::current #7727

parttimenerd · 2022-03-07T11:29:08Z

The WXMode for the current thread (on MacOS aarch64) is currently stored in the thread class which is unnecessary as the WXMode is bound to the current OS thread, not the current instance of the thread class.
This pull request moves the storage of the current WXMode into a thread local global variable in os and changes all related code. SafeFetch depended on the existence of a thread object only because of the WXMode. This pull request therefore removes the dependency, making SafeFetch usable in more contexts.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed

Issue

JDK-8282475: SafeFetch should not rely on existence of Thread::current

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7727/head:pull/7727
$ git checkout pull/7727

Update a local copy of the PR:
$ git checkout pull/7727
$ git pull https://git.openjdk.java.net/jdk pull/7727/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7727

View PR using the GUI difftool:
$ git pr show -t 7727

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7727.diff

bridgekeeper · 2022-03-07T11:30:47Z

👋 Welcome back parttimenerd! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2022-03-07T11:32:42Z

@parttimenerd The following labels will be automatically applied to this pull request:

hotspot
serviceability
shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

mlbridge · 2022-03-07T11:36:20Z

Webrevs

dholmes-ora

Hi Johannes,

The general idea seems good (pity it touches so many files, but then I've never liked any of this WX support precisely because it is so invasive of shared code). I agree that safeFetch should not have become dependent on Thread::current existing, but I have to wonder whether we can just skip the WX code if there is no current thread? If the thread is not attached to the VM then what does it even mean to manipulate the WX state of an unknown thread?

That aside, with this change I think we can move the conditional WX code out of the shared os.hpp and bury it down in os_bsd_aarch64.hpp where it actually belongs.

I'd even like to see threadWXSetters.inline.hpp moved to being in src/os_cpu/bsd_aarch64/ if feasible - I'm not sure what include would be needed for the callsites to function - os.hpp I presume?

Thanks,
David

src/hotspot/share/runtime/threadWXSetters.inline.hpp

parttimenerd · 2022-03-07T12:30:23Z

I agree that safeFetch should not have become dependent on Thread::current existing, but I have to wonder whether we can just skip the WX code if there is no current thread? If the thread is not attached to the VM then what does it even mean to manipulate the WX state of an unknown thread?

The OS thread is always known. The WXMode is unrelated to Thread object. The WXMode is set for an OS thread to allow pages to be either writable or executable (needed for code generation).

That aside, with this change I think we can move the conditional WX code out of the shared os.hpp and bury it down in os_bsd_aarch64.hpp where it actually belongs.

May I ask how that would affect the code that uses the methods (includes, ...)?

I'd even like to see threadWXSetters.inline.hpp moved to being in src/os_cpu/bsd_aarch64/ if feasible - I'm not sure what include would be needed for the callsites to function - os.hpp I presume?

I don't know whether this is enough.

tstuefe · 2022-03-07T14:24:01Z

Hi David,

The general idea seems good (pity it touches so many files, but then I've never liked any of this WX support precisely because it is so invasive of shared code). I agree that safeFetch should not have become dependent on Thread::current existing, but I have to wonder whether we can just skip the WX code if there is no current thread? If the thread is not attached to the VM then what does it even mean to manipulate the WX state of an unknown thread?

We need to change the wx state of the current pthread in order to be able to execute stub routines. Otherwise, we would crash right away when trying to execute the SafeFetch stub.

And that is a valid requirement. Let's say we crash in a native thread, unrelated to and completely oblivious of the JVM it shares the process with. We'd still want to see e.g. native crash information, stack frames, maybe register region information etc - all that stuff that may require SafeFetch. In fact, this patch is related to Johannes other PR where he modified stack frame walking to check that the registers point into valid memory.

That aside, with this change I think we can move the conditional WX code out of the shared os.hpp and bury it down in os_bsd_aarch64.hpp where it actually belongs.

Oh yes!

I'd even like to see threadWXSetters.inline.hpp moved to being in src/os_cpu/bsd_aarch64/ if feasible - I'm not sure what include would be needed for the callsites to function - os.hpp I presume?

I agree, all that wx stuff should be limited to os/bsd or os/bsd_aarch.

We could have generic wrappers like:

class os {
...
// Platform does whatever needed to prepare for execution of generated code inside the current thread
os::pre_current_thread_jit_call() NOT_MACOS_AARCH64({})
// Platform does whatever needed to clean up after executing generated code inside the current thread
os::post_current_thread_jit_call() NOT_MACOS_AARCH64({})

(Macro does not yet exist, but MACOS_AARCH64_ONLY does)

--

Side note, I think we have reached a point where it would be cleaner to split xxxBSD and MacOS sources. E.g. this wx stuff should be limited to MacOS too, and we have more and more __APPLE_ only sections.

Cheers, Thomas

tstuefe

Hi Johannes, just some drive-by comments, not a full review. Also please see my comment toward David, proposing a more generic interface in os instead.

Cheers, Thomas

src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp

src/hotspot/share/runtime/os.hpp

src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp

parttimenerd · 2022-03-07T15:13:27Z

Regarding the names of the new methods: Most of the usages for ThreadWXEnable use it to set the WXMode to WXWrite. The suggested names are therefore a bit misleading (when used in this context).

One could add another two methods:

class os {
...
// Platform does whatever needed to prepare for execution of generated code inside the current thread
os::pre_current_thread_jit_code_gen() NOT_MACOS_AARCH64({})
// Platform does whatever needed to clean up after executing generated code inside the current thread
os::post_current_thread_jit_code_gen() NOT_MACOS_AARCH64({})

But one would still have the problem of nesting (e.g. when code generating code calls code generating code).

dholmes-ora · 2022-03-07T21:32:36Z

We need to change the wx state of the current pthread in order to be able to execute stub routines. Otherwise, we would crash right away when trying to execute the SafeFetch stub.

Oh I see - that is unfortunate. I don't like messing with other people's threads.

May I ask how that would affect the code that uses the methods (includes, ...)?

@parttimenerd there would be no change - they continue to include os.hpp, which will include the os/cpu specific header files.

We could have generic wrappers like: ...

@tstuefe I think this is going a little too far in this fix. I'm looking for simplicity. All the WX related code should be buried in the os/cpu file for BSD/Aarch64 and the callsites all using MACOS_AARCH64_ONLY.

Splitting BSD from macOS would also be a future RFE.

Thanks.

parttimenerd · 2022-03-07T21:38:20Z

Thanks.

All the WX related code should be buried in the os/cpu file for BSD/Aarch64 and the callsites all using MACOS_AARCH64_ONLY.

I'm currently finishing a fix that exactly does that :)

dholmes-ora

This is looking good. A few additional comments below.

Thanks,
David

src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.cpp

src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.hpp

src/hotspot/share/prims/jni.cpp

src/hotspot/share/runtime/safefetch.inline.hpp

parttimenerd · 2022-03-08T12:07:08Z

I don't know why the Linux x86 build fails.

I tested the current version with code related to #7591 and it seems to fix the remaining problems (I tested it also with NMT enabled).

src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.hpp

dholmes-ora · 2022-03-08T12:35:38Z

The Linux x86 build failure is not related to this and has already been fixed, so you should re-sync with master branch.

dholmes-ora · 2022-03-09T07:30:43Z

@parttimenerd please never force-push in an active review as it completely destroys the review history and comment context!

theRealAph · 2022-03-11T09:33:40Z

Depending on what the pthread library call does, and if it's a real function call into a library, it would be more expensive than that.

Yes, unfortunately we need something like this.

But we don't need to speculate. If thread-local variables are cheap on MacOS, and there is no reason why they should be expensive, then we can stop worrying and just use a thread-local variable for WX state. We can measure how long it takes, and we only have to care about one platform, MacOS/AArch64.

We could also redefine SafeFetch on MacOS/AArch64 to not need WX. We could do this by statically generating SafeFetch on that platform, and it wouldn't be in the JIT region at all. Why not just do that?

parttimenerd · 2022-03-11T09:50:22Z

But we don't need to speculate. If thread-local variables are cheap on MacOS, and there is no reason why they should be expensive, then we can stop worrying and just use a thread-local variable for WX state. We can measure how long it takes, and we only have to care about one platform, MacOS/AArch64.

According to https://forums.swift.org/t/concurrencys-use-of-thread-local-variables/48654: "these accesses are just a move from a system register plus a load/store at a constant offset."

tstuefe · 2022-03-11T10:11:55Z

We could also redefine SafeFetch on MacOS/AArch64 to not need WX. We could do this by statically generating SafeFetch on that platform, and it wouldn't be in the JIT region at all. Why not just do that?

Do you mean using inline assembly?

theRealAph · 2022-03-11T10:27:25Z

On 3/11/22 10:12, Thomas Stuefe wrote: We could also redefine SafeFetch on MacOS/AArch64 to not need WX. We could do this by statically generating SafeFetch on that platform, and it wouldn't be in the JIT region at all. Why not just do that? Do you mean using inline assembly?

I'd use out-of-line assembly, as I do for atomic compare-and-swap on linux: https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S But I guess inline would work.

…

-- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

tstuefe · 2022-03-11T10:44:25Z

On 3/11/22 10:12, Thomas Stuefe wrote: We could also redefine SafeFetch on MacOS/AArch64 to not need WX. We could do this by statically generating SafeFetch on that platform, and it wouldn't be in the JIT region at all. Why not just do that? Do you mean using inline assembly?
I'd use out-of-line assembly, as I do for atomic compare-and-swap on linux: https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S But I guess inline would work.

Oh, this is neat. It would work on all platforms too, or on all we care to implement it for. And it would nicely solve the initialization window problem since it would work before stub routines are generated. We could throw CanUseSafeFetch away.

It seems we already do static assembly on bsd aarch. So there is already a path to follow.

But this could also be done as a follow up enhancement. I still like the OS TLS variable idea.

fweimer-rh · 2022-03-11T12:18:36Z

According to https://forums.swift.org/t/concurrencys-use-of-thread-local-variables/48654: "these accesses are just a move from a system register plus a load/store at a constant offset."

Ideally you'd still benchmark that. Some AArch64 implementations have really, really slow moves from the system register used as the thread pointer. Hopefully Apple's implementation isn't in that category.

AntonKozlov · 2022-03-11T16:34:29Z

blocking SIGSEGV and SIGBUS - or other synchronous error signals like SIGFPE - and then triggering said signal is UB. What happens is OS-dependent. I saw processes vanishing, or hang, or core. It makes sense, since what is the kernel supposed to do. It cannot deliver the signal, and deferring it would require returning to the faulting instruction, that would just re-fault.
For some more details see e.g. https://bugs.openjdk.java.net/browse/JDK-8252533

This UB looks reasonable. My point is that a native thread would run fine with SIGSEGV blocked. But then JVM decides it can do SafeFetch, and things gets nasty.

Is there a crash that is fixed by the change? I just spotted it is an enhancement, not a bug. Just trying to understand the problem.

Yes, this issue is a breakout from https://bugs.openjdk.java.net/browse/JDK-8282306, where we'd like to use SafeFetch to make stack walking in AsyncGetCallTrace more robust. AGCT is called from the signal handler, and it may run in any number of situations (e.g. in foreign threads, or threads which are in the process of getting dismantled, etc).

I mean, some way to verify the issue is fixed, e.g. a test that does not fail anymore.

I see AsyncGetCallTrace to assume the JavaThread very soon, or do I look at the wrong place? https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/forte.cpp#L569

Another situation is error handling itself. When writing an hs-err file, we use SafeFetch to do carefully tiptoe around the possibly corrupt VM state. If the original crash happened in a foreign thread, we still want some of these reports to work (e.g. dumping register content or printing stacks). So SafeFetch should be as robust as possible.

OK, thanks. I think we also handle recursive segfaults recover after interpretation of the corrupted VM state. Otherwise, implementing the printing functions would be too tedious and hard with SafeFetch alone. But I see it's used in printing register content, at least.

AntonKozlov · 2022-03-11T16:38:28Z

Is it possible to change SafeFetch only? Switch to WXExec before calling the stub and switch WXWrite back unconditionally? We won't need to provide assert in ThreadWXEnable. But SafeFetch can check the assumption with assert via Thread, if it exists.

But SafeFetch could be used from outside code as well as VM code. In case of the latter, prior state can either be WXWrite or WXExec. It needs to restore the prior state after the call.

I'm not sure I understand what is the "outside code". The SafeFetch is the private hotspot function, it cannot be linked with non-JVM code, isn't it?

Sorry for being imprecise. I meant SafeFetch is triggered from within a signal handler that runs on a foreign thread. E.g. AGCT or error handling.

Then the OS TLS way is not better since when the signal handler and SafeFetch start, the state is unknown and is only assumed to be Write (in initialization of TLS variable).

AntonKozlov

I looked on the patch again from the perspective of a pure refactoring. It looks fine except we lost one of the asserts.

AntonKozlov · 2022-03-11T16:41:24Z

src/hotspot/share/runtime/thread.cpp

@@ -276,7 +275,7 @@ Thread::Thread() {
    assert(Thread::current_or_null() == NULL, "creating thread before barrier set");
  }

-  MACOS_AARCH64_ONLY(DEBUG_ONLY(_wx_init = false));
+  MACOS_AARCH64_ONLY(DEBUG_ONLY(os::ThreadWX::init();))


This line meant the WX state is not initialized at this point (as a part of Thread constructor). Since there are a several places where the state is initialized and it was easy to miss one, I would like to preserve some assert that the state is initialized.

tstuefe · 2022-03-11T23:34:36Z

blocking SIGSEGV and SIGBUS - or other synchronous error signals like SIGFPE - and then triggering said signal is UB. What happens is OS-dependent. I saw processes vanishing, or hang, or core. It makes sense, since what is the kernel supposed to do. It cannot deliver the signal, and deferring it would require returning to the faulting instruction, that would just re-fault.
For some more details see e.g. https://bugs.openjdk.java.net/browse/JDK-8252533

This UB looks reasonable. My point is that a native thread would run fine with SIGSEGV blocked. But then JVM decides it can do SafeFetch, and things gets nasty.

Blocking synchronous error signals makes zero sense even for normal programs, since you lose the ability to get cores. For the JVM in particular, it also blocks facilities like polling pages, or dynamically querying CPU abilities. So a JVM would not even start with synchronous error signals blocked.

Is there a crash that is fixed by the change? I just spotted it is an enhancement, not a bug. Just trying to understand the problem.

Yes, this issue is a breakout from https://bugs.openjdk.java.net/browse/JDK-8282306, where we'd like to use SafeFetch to make stack walking in AsyncGetCallTrace more robust. AGCT is called from the signal handler, and it may run in any number of situations (e.g. in foreign threads, or threads that are in the process of getting dismantled, etc).

I mean, some way to verify the issue is fixed, e.g. a test that does not fail anymore.

No, tests do not exist. Unfortunately, otherwise this regression would have been detected right away and we would not need this PR.

We have a test though that tests SafeFetch during error handling. That test can be tweaked for this purpose. So, test does not exist yet, but can be easily written.

I see AsyncGetCallTrace to assume the JavaThread very soon, or do I look at the wrong place? https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/forte.cpp#L569

Another situation is error handling itself. When writing an hs-err file, we use SafeFetch to do carefully tiptoe around the possibly corrupt VM state. If the original crash happened in a foreign thread, we still want some of these reports to work (e.g. dumping register content or printing stacks). So SafeFetch should be as robust as possible.

OK, thanks. I think we also handle recursive segfaults recover after interpretation of the corrupted VM state. Otherwise, implementing the printing functions would be too tedious and hard with SafeFetch alone. But I see it's used in printing register content, at least.

Secondary error handling is a very coarse-grained tool. If an error reporting step crashes out, we continue with the next step. Has disadvantages though. The total number of retries is very limited. And a faulting error reporting step still hurts, because its report is compromised. E.g. if the call stack printing crashes out, we have no call stack. This is not an abstract problem. Its a very concrete and typical problem.

I spend a large part of my work with hs-err reports. They are of very high importance to us. We (SAP) have invested a lot of time and effort in hardening out OpenJDK error reporting, and SafeFetch is an important part of that. For example, we provided the facility that made SafeFetch usable in signal handling. It would be nice if our work was not compromised. Please let us find a way forward here.

tstuefe · 2022-03-11T23:40:36Z

On 3/11/22 10:12, Thomas Stuefe wrote: We could also redefine SafeFetch on MacOS/AArch64 to not need WX. We could do this by statically generating SafeFetch on that platform, and it wouldn't be in the JIT region at all. Why not just do that? Do you mean using inline assembly?
I'd use out-of-line assembly, as I do for atomic compare-and-swap on linux: https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S But I guess inline would work.

Oh, this is neat. It would work on all platforms too, or on all we care to implement it for. And it would nicely solve the initialization window problem since it would work before stub routines are generated. We could throw CanUseSafeFetch away.

It seems we already do static assembly on bsd aarch. So there is already a path to follow.

But this could also be done as a follow up enhancement. I still like the OS TLS variable idea.

I spent some time doing a static implementation of SafeFetch on Linux x64, and its not super trivial. The problem is that we need to know addresses of instructions inside that function. I can set labels in assembly, and I can export them, but so far I have been unable to use them as addresses in C++ code. I will research some more.

fweimer-rh · 2022-03-12T07:45:57Z

I spent some time doing a static implementation of SafeFetch on Linux x64, and its not super trivial. The problem is that we need to know addresses of instructions inside that function. I can set labels in assembly, and I can export them, but so far I have been unable to use them as addresses in C++ code. I will research some more.

There are basically two ways (easy) to do it. Put global symbols like

        .globl address_of_label
address_of_label:

into the assembler sources and use

        extern char address_of_label[] __attribute__ ((visibility ("hidden")));

from the C++ side.

Or use a local label, and export the difference to the function start to a local label in a global data symbol from the assembler side:

        .globl SafeFetch // Real function name goes here.
SafeFetch:
        // …
.Llabel:
        // …

        .section .rodata
        .globl SafeFetch_label_offset
        .p2align 3
SafeFetch_label_offset:
        .quad .Llabel - SafeFetch
	.type SafeFetch_label_offset, @object
	.size SafeFetch_label_offset, 8

And use

extern uintptr_t SafeFetch_label_offset __attribute__ ((__visibility ("hidden")));

and the expression (uintptr_t) &SafeFetch + SafeFetch_label_offset to compute the final address. The second approach is friendlier to tools (which may get confused by symbols in the middle of functions).

If you have a PR, please Cc: me on it, I will have a look.

theRealAph · 2022-03-12T12:30:39Z

According to https://forums.swift.org/t/concurrencys-use-of-thread-local-variables/48654: "these accesses are just a move from a system register plus a load/store at a constant offset."

Ideally you'd still benchmark that. Some AArch64 implementations have really, really slow moves from the system register used as the thread pointer. Hopefully Apple's implementation isn't in that category.

In a tight loop, loads from __thread variables take 1ns. It's this:

    0x18ea1c530 <+0>:   ldr    x16, [x0, #0x8]
    0x18ea1c534 <+4>:   mrs    x17, TPIDRRO_EL0 
    0x18ea1c538 <+8>:   and    x17, x17, #0xfffffffffffffff8
    0x18ea1c53c <+12>:  ldr    x17, [x17, x16, lsl #3]
    0x18ea1c540 <+16>:  cbz    x17, 0x18ea1c550          ; only executed first time
    0x18ea1c544 <+20>:  ldr    x16, [x0, #0x10]
    0x18ea1c548 <+24>:  add    x0, x17, x16
    0x18ea1c54c <+28>:  ret

... which looks the same as what glibc does. Not bad, but quite a lot more to do than a simple load.

I'd still use a static SafeFetch, with no W^X fiddling. It just seems to me much more reasonable.

theRealAph · 2022-03-12T12:32:38Z

into the assembler sources and use

        extern char address_of_label[] __attribute__ ((visibility ("hidden")));

ITYM

extern "C" char address_of_label[] __attribute__ ((visibility ("hidden")));

fweimer-rh · 2022-03-12T14:21:13Z

into the assembler sources and use

        extern char address_of_label[] __attribute__ ((visibility ("hidden")));

ITYM

extern "C" char address_of_label[] __attribute__ ((visibility ("hidden")));

It doesn't hurt, but the Itanium ABI does not mangle such global data symbols, so it's not strictly needed.

theRealAph · 2022-03-12T17:39:14Z

extern "C" char address_of_label[] __attribute__ ((visibility ("hidden")));
It doesn't hurt, but the Itanium ABI does not mangle such global data symbols, so it's not strictly needed.

That's an interesting point of view. I guess I never thought about it, but I'd always put symbols for an asm file in an extern "C" section anyway. But yeah, OK.

theRealAph · 2022-03-12T17:45:31Z

1ns

Incidentally, there must be a lot of speculation and bypassing going on there. I can see 15 cycles of latency, probably 20, so that'd be more like 5ns start to finish. M1 is a remarkable thing.

tstuefe · 2022-03-14T08:03:39Z

Hi Florian,

If you have a PR, please Cc: me on it, I will have a look.

Thanks a lot, Florian! I got it to work under Linux x64.

My error was that I had declared the label in C++ as extern void* SafeFetch_continuation. Declaring it as extern char _SafeFetch32_continuation[] __attribute__ ((visibility ("hidden"))); as you suggested does the trick. I'm not sure I understand the difference.

extern "C"

It doesn't hurt, but the Itanium ABI does not mangle such global data symbols, so it's not strictly needed.

I don't understand this remark, what does Itanium have to do with this?

fweimer-rh · 2022-03-14T08:19:41Z

Thanks a lot, Florian! I got it to work under Linux x64.

Great!

My error was that I had declared the label in C++ as extern void* SafeFetch_continuation. Declaring it as extern char _SafeFetch32_continuation[] __attribute__ ((visibility ("hidden"))); as you suggested does the trick. I'm not sure I understand the difference.

Your approach might have worked as well, but you would have to use &SafeFetch_continuation on the C++ side. Arrays work directly because of pointer decay. The actual type does not matter because you just want to create a code address from that, so there's no corresponding object (in the C++ standard sense) at the address anyway.

Anyway, from what I've seen, the array is more idiomatic.

It doesn't hurt, but the Itanium ABI does not mangle such global data symbols, so it's not strictly needed.

I don't understand this remark, what does Itanium have to do with this?

The C++ ABI definition is probably Itanium's most lasting contribution to computing. I think it's used on most non-Windows systems these days, not just on Linux, and of course on all kinds of CPUs.

tstuefe · 2022-03-14T09:00:45Z

Thanks a lot, Florian! I got it to work under Linux x64.

Great!

My error was that I had declared the label in C++ as extern void* SafeFetch_continuation. Declaring it as extern char _SafeFetch32_continuation[] __attribute__ ((visibility ("hidden"))); as you suggested does the trick. I'm not sure I understand the difference.

Your approach might have worked as well, but you would have to use &SafeFetch_continuation on the C++ side. Arrays work directly because of pointer decay.

Ah, that makes sense. I wondered why the address did not look like a code pointer in C++.

Anyway, got Linux x86_32 working too. Now I am working on aarch64.

Anyway, from what I've seen, the array is more idiomatic.

It doesn't hurt, but the Itanium ABI does not mangle such global data symbols, so it's not strictly needed.

I don't understand this remark, what does Itanium have to do with this?

The C++ ABI definition is probably Itanium's most lasting contribution to computing. I think it's used on most non-Windows systems these days, not just on Linux, and of course on all kinds of CPUs.

Interesting to know. Thanks!

parttimenerd · 2022-03-14T10:16:29Z

We're looking into solutions and create a new PR if necessary.

mlbridge · 2022-03-17T23:30:48Z

Mailing list message from David Holmes on hotspot-dev:

On 12/03/2022 2:37 am, Anton Kozlov wrote:

On Thu, 10 Mar 2022 18:04:50 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

blocking SIGSEGV and SIGBUS - or other synchronous error signals like SIGFPE - and then triggering said signal is UB. What happens is OS-dependent. I saw processes vanishing, or hang, or core. It makes sense, since what is the kernel supposed to do. It cannot deliver the signal, and deferring it would require returning to the faulting instruction, that would just re-fault.
For some more details see e.g. https://bugs.openjdk.java.net/browse/JDK-8252533

This UB looks reasonable. My point is that a native thread would run fine with SIGSEGV blocked. But then JVM decides it can do SafeFetch, and things gets nasty.

Is there a crash that is fixed by the change? I just spotted it is an enhancement, not a bug. Just trying to understand the problem.

Yes, this issue is a breakout from https://bugs.openjdk.java.net/browse/JDK-8282306, where we'd like to use SafeFetch to make stack walking in AsyncGetCallTrace more robust. AGCT is called from the signal handler, and it may run in any number of situations (e.g. in foreign threads, or threads which are in the process of getting dismantled, etc).

I mean, some way to verify the issue is fixed, e.g. a test that does not fail anymore.

I see AsyncGetCallTrace to assume the JavaThread very soon, or do I look at the wrong place? https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/forte.cpp#L569

It is up to the agent setting things up for AGCT to only actually call
it for JavaThreads.

David
-----

parttimenerd · 2022-03-18T07:18:23Z

This is not the point: It comes down to API design. If we use SafeFetch in os::is_first_C_frame (and thereby in frame::link_or_null) and not just in ASGCT, then it depends on when the other methods can be called. These methods are e.g. used whenever an error happens and a hs_err file is generated. We cannot guarantee that a JavaThread is always present there.

mlbridge · 2022-03-20T22:49:49Z

Mailing list message from David Holmes on hotspot-dev:

On 18/03/2022 5:21 pm, Johannes Bechberger wrote:

On Fri, 11 Mar 2022 07:52:16 GMT, Johannes Bechberger <duke at openjdk.java.net> wrote:

The WXMode for the current thread (on MacOS aarch64) is currently stored in the thread class which is unnecessary as the WXMode is bound to the current OS thread, not the current instance of the thread class.
This pull request moves the storage of the current WXMode into a thread local global variable in `os` and changes all related code. SafeFetch depended on the existence of a thread object only because of the WXMode. This pull request therefore removes the dependency, making SafeFetch usable in more contexts.

Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision:

Remove two unnecessary lines

This is not the point: It comes down to API design. If we use SafeFetch in os::is_first_C_frame (and thereby in frame::link_or_null) and not just in ASGCT, then it depends on when the other methods can be called. These methods are e.g. used whenever an error happens and a hs_err file is generated. We cannot guarantee that a JavaThread is always present there.

My comment was specifically in response to your statement:

I see AsyncGetCallTrace to assume the JavaThread very soon

But AGCT is only intended to ever be called on JavaThreads.

David

AntonKozlov · 2022-03-21T12:26:44Z

My comment was specifically in response to your statement:

I see AsyncGetCallTrace to assume the JavaThread very soon

But AGCT is only intended to ever be called on JavaThreads.

Sorry, it was my question. It looked for me this way as well (and that ACGT will return shortly if called on non-Java thread; AFAICS SafeFetch in not involved), and I wanted to confirm. The AGCT on non-Java thread was declared to be one of the two major reasons for this patch.

I would support this patch to move W^X management out from Thread to OS-specific code, after the problem with the assert "is initialized" is fixed.

tstuefe · 2022-03-21T12:44:58Z

I'm currently implementing Andrews proposal for a static safefetch (#7865, still in draft, but almost done). That will be more generic solution since we don't have to deal with thread wx state at all. That's why we closed this PR.

dholmes-ora · 2022-03-21T12:48:31Z

The conversation here is some what hard to follow. I do see that "foreign threads" was mentioned by @tstuefe in the context of AGCT but I have to assume he misspoke there (assuming a foreign thread is one not attached to the VM) as AGCT only works for attached JavaThreads. The signal handler that will call AGCT has to be prepared to find any kind of thread in any state, but AGCT should only be called on the right kinds of thread in the right state.

tstuefe · 2022-03-21T13:17:51Z

The conversation here is some what hard to follow. I do see that "foreign threads" was mentioned by @tstuefe in the context of AGCT but I have to assume he misspoke there (assuming a foreign thread is one not attached to the VM) as AGCT only works for attached JavaThreads. The signal handler that will call AGCT has to be prepared to find any kind of thread in any state, but AGCT should only be called on the right kinds of thread in the right state.

Sure, AGCT can be limited to VM threads - or maybe already is. But tracking non-VM threads could be a valid use case.

We have downstream in the SapMachine a facility where we track callstacks from malloc sites - independently from NMT or the VM. With the explicit purpose of catching mallocs from non-VM threads too. For collecting the stack trace, we use some VM utilities, SafeFetch among them. That is a very useful facility. I could argue a similar case for the Async Profiler: why should profiling be limited to Java threads? In the end, if it eats performance, it hurts, regardless whether its a java thread or a non-VM-attached thread. Could be a concurrent native thread burning CPU, why would that not be interesting.

Our concern was with SafeFetch, and AGCT is only one example. SafeFetch should be as safe as possible. Error reporting alone is a sufficient reason.

openjdk bot added the rfr Pull request is ready for review label Mar 7, 2022

openjdk bot added serviceability [email protected] hotspot [email protected] shenandoah [email protected] labels Mar 7, 2022

parttimenerd mentioned this pull request Mar 7, 2022

8282306: os::is_first_C_frame(frame*) crashes on invalid link access #7591

Closed

3 tasks

dholmes-ora reviewed Mar 7, 2022

View reviewed changes

src/hotspot/share/runtime/threadWXSetters.inline.hpp Outdated Show resolved Hide resolved

tstuefe suggested changes Mar 7, 2022

View reviewed changes

dholmes-ora reviewed Mar 8, 2022

View reviewed changes

src/hotspot/os_cpu/bsd_aarch64/os_bsd_aarch64.hpp Outdated Show resolved Hide resolved

parttimenerd added 9 commits March 8, 2022 14:13

Use os::current_thread_change_wx instead of thread methods

37e302d

Remove wx_init and current thread assert in safefetch

2aa9f18

Remove thread parameter from os methods

5664daf

Remove thread parameter from ThreadWXEnable

d3959e7

Fix include for threadWXSetters.inline.hpp

d87b00d

Minor fixes

478ec1a

Move WX functionality into os specific files

3515fb9

Small fixes

2af4f01

Move code to os::current_thread_wx

21dd004

parttimenerd force-pushed the parttimenerd_wx_enable branch from 0204830 to 21dd004 Compare March 8, 2022 13:13

AntonKozlov suggested changes Mar 11, 2022

View reviewed changes

parttimenerd closed this Mar 14, 2022

tstuefe mentioned this pull request Mar 18, 2022

JDK-8283326: Implement SafeFetch statically #7865

Closed

3 tasks

8282475: SafeFetch should not rely on existence of Thread::current #7727

8282475: SafeFetch should not rely on existence of Thread::current #7727

Uh oh!

Conversation

parttimenerd commented Mar 7, 2022 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Mar 7, 2022

Uh oh!

openjdk bot commented Mar 7, 2022

Uh oh!

mlbridge bot commented Mar 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

dholmes-ora left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

parttimenerd commented Mar 7, 2022

Uh oh!

tstuefe commented Mar 7, 2022

Uh oh!

tstuefe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

parttimenerd commented Mar 7, 2022

Uh oh!

dholmes-ora commented Mar 7, 2022

Uh oh!

parttimenerd commented Mar 7, 2022

Uh oh!

dholmes-ora left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

parttimenerd commented Mar 8, 2022

Uh oh!

Uh oh!

dholmes-ora commented Mar 8, 2022

Uh oh!

dholmes-ora commented Mar 9, 2022

Uh oh!

theRealAph commented Mar 11, 2022

Uh oh!

parttimenerd commented Mar 11, 2022

Uh oh!

tstuefe commented Mar 11, 2022

Uh oh!

theRealAph commented Mar 11, 2022 via email

Uh oh!

tstuefe commented Mar 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fweimer-rh commented Mar 11, 2022

Uh oh!

AntonKozlov commented Mar 11, 2022

Uh oh!

AntonKozlov commented Mar 11, 2022

Uh oh!

AntonKozlov left a comment

Choose a reason for hiding this comment

Uh oh!

AntonKozlov Mar 11, 2022

Choose a reason for hiding this comment

Uh oh!

tstuefe commented Mar 11, 2022

Uh oh!

tstuefe commented Mar 11, 2022

parttimenerd commented Mar 7, 2022 •

edited by openjdk bot

Loading

mlbridge bot commented Mar 7, 2022 •

edited

Loading

tstuefe commented Mar 11, 2022 •

edited

Loading