Skip to content

8282475: SafeFetch should not rely on existence of Thread::current #7727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

parttimenerd
Copy link
Contributor

@parttimenerd parttimenerd commented Mar 7, 2022

The WXMode for the current thread (on MacOS aarch64) is currently stored in the thread class which is unnecessary as the WXMode is bound to the current OS thread, not the current instance of the thread class.
This pull request moves the storage of the current WXMode into a thread local global variable in os and changes all related code. SafeFetch depended on the existence of a thread object only because of the WXMode. This pull request therefore removes the dependency, making SafeFetch usable in more contexts.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8282475: SafeFetch should not rely on existence of Thread::current

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/7727/head:pull/7727
$ git checkout pull/7727

Update a local copy of the PR:
$ git checkout pull/7727
$ git pull https://git.openjdk.java.net/jdk pull/7727/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 7727

View PR using the GUI difftool:
$ git pr show -t 7727

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/7727.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 7, 2022

👋 Welcome back parttimenerd! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 7, 2022
@openjdk
Copy link

openjdk bot commented Mar 7, 2022

@parttimenerd The following labels will be automatically applied to this pull request:

  • hotspot
  • serviceability
  • shenandoah

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Mar 7, 2022

Webrevs

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Johannes,

The general idea seems good (pity it touches so many files, but then I've never liked any of this WX support precisely because it is so invasive of shared code). I agree that safeFetch should not have become dependent on Thread::current existing, but I have to wonder whether we can just skip the WX code if there is no current thread? If the thread is not attached to the VM then what does it even mean to manipulate the WX state of an unknown thread?

That aside, with this change I think we can move the conditional WX code out of the shared os.hpp and bury it down in os_bsd_aarch64.hpp where it actually belongs.

I'd even like to see threadWXSetters.inline.hpp moved to being in src/os_cpu/bsd_aarch64/ if feasible - I'm not sure what include would be needed for the callsites to function - os.hpp I presume?

Thanks,
David

@parttimenerd
Copy link
Contributor Author

I agree that safeFetch should not have become dependent on Thread::current existing, but I have to wonder whether we can just skip the WX code if there is no current thread? If the thread is not attached to the VM then what does it even mean to manipulate the WX state of an unknown thread?

The OS thread is always known. The WXMode is unrelated to Thread object. The WXMode is set for an OS thread to allow pages to be either writable or executable (needed for code generation).

That aside, with this change I think we can move the conditional WX code out of the shared os.hpp and bury it down in os_bsd_aarch64.hpp where it actually belongs.

May I ask how that would affect the code that uses the methods (includes, ...)?

I'd even like to see threadWXSetters.inline.hpp moved to being in src/os_cpu/bsd_aarch64/ if feasible - I'm not sure what include would be needed for the callsites to function - os.hpp I presume?

I don't know whether this is enough.

@tstuefe
Copy link
Member

tstuefe commented Mar 7, 2022

Hi David,

The general idea seems good (pity it touches so many files, but then I've never liked any of this WX support precisely because it is so invasive of shared code). I agree that safeFetch should not have become dependent on Thread::current existing, but I have to wonder whether we can just skip the WX code if there is no current thread? If the thread is not attached to the VM then what does it even mean to manipulate the WX state of an unknown thread?

We need to change the wx state of the current pthread in order to be able to execute stub routines. Otherwise, we would crash right away when trying to execute the SafeFetch stub.

And that is a valid requirement. Let's say we crash in a native thread, unrelated to and completely oblivious of the JVM it shares the process with. We'd still want to see e.g. native crash information, stack frames, maybe register region information etc - all that stuff that may require SafeFetch. In fact, this patch is related to Johannes other PR where he modified stack frame walking to check that the registers point into valid memory.

That aside, with this change I think we can move the conditional WX code out of the shared os.hpp and bury it down in os_bsd_aarch64.hpp where it actually belongs.

Oh yes!

I'd even like to see threadWXSetters.inline.hpp moved to being in src/os_cpu/bsd_aarch64/ if feasible - I'm not sure what include would be needed for the callsites to function - os.hpp I presume?

I agree, all that wx stuff should be limited to os/bsd or os/bsd_aarch.

We could have generic wrappers like:

class os {
...
// Platform does whatever needed to prepare for execution of generated code inside the current thread
os::pre_current_thread_jit_call() NOT_MACOS_AARCH64({})
// Platform does whatever needed to clean up after executing generated code inside the current thread
os::post_current_thread_jit_call() NOT_MACOS_AARCH64({})

(Macro does not yet exist, but MACOS_AARCH64_ONLY does)

--

Side note, I think we have reached a point where it would be cleaner to split xxxBSD and MacOS sources. E.g. this wx stuff should be limited to MacOS too, and we have more and more __APPLE_ only sections.

Cheers, Thomas

Copy link
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Johannes, just some drive-by comments, not a full review. Also please see my comment toward David, proposing a more generic interface in os instead.

Cheers, Thomas

@parttimenerd
Copy link
Contributor Author

Regarding the names of the new methods: Most of the usages for ThreadWXEnable use it to set the WXMode to WXWrite. The suggested names are therefore a bit misleading (when used in this context).

One could add another two methods:

class os {
...
// Platform does whatever needed to prepare for execution of generated code inside the current thread
os::pre_current_thread_jit_code_gen() NOT_MACOS_AARCH64({})
// Platform does whatever needed to clean up after executing generated code inside the current thread
os::post_current_thread_jit_code_gen() NOT_MACOS_AARCH64({})

But one would still have the problem of nesting (e.g. when code generating code calls code generating code).

@dholmes-ora
Copy link
Member

We need to change the wx state of the current pthread in order to be able to execute stub routines. Otherwise, we would crash right away when trying to execute the SafeFetch stub.

Oh I see - that is unfortunate. I don't like messing with other people's threads.

May I ask how that would affect the code that uses the methods (includes, ...)?

@parttimenerd there would be no change - they continue to include os.hpp, which will include the os/cpu specific header files.

We could have generic wrappers like: ...

@tstuefe I think this is going a little too far in this fix. I'm looking for simplicity. All the WX related code should be buried in the os/cpu file for BSD/Aarch64 and the callsites all using MACOS_AARCH64_ONLY.

Splitting BSD from macOS would also be a future RFE.

Thanks.

@parttimenerd
Copy link
Contributor Author

Thanks.

All the WX related code should be buried in the os/cpu file for BSD/Aarch64 and the callsites all using MACOS_AARCH64_ONLY.

I'm currently finishing a fix that exactly does that :)

Copy link
Member

@dholmes-ora dholmes-ora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good. A few additional comments below.

Thanks,
David

@parttimenerd
Copy link
Contributor Author

I don't know why the Linux x86 build fails.

I tested the current version with code related to #7591 and it seems to fix the remaining problems (I tested it also with NMT enabled).

@dholmes-ora
Copy link
Member

The Linux x86 build failure is not related to this and has already been fixed, so you should re-sync with master branch.

@parttimenerd parttimenerd force-pushed the parttimenerd_wx_enable branch from 0204830 to 21dd004 Compare March 8, 2022 13:13
@dholmes-ora
Copy link
Member

@parttimenerd please never force-push in an active review as it completely destroys the review history and comment context!

@theRealAph
Copy link
Contributor

Depending on what the pthread library call does, and if it's a real function call into a library, it would be more expensive than that.

Yes, unfortunately we need something like this.

But we don't need to speculate. If thread-local variables are cheap on MacOS, and there is no reason why they should be expensive, then we can stop worrying and just use a thread-local variable for WX state. We can measure how long it takes, and we only have to care about one platform, MacOS/AArch64.

We could also redefine SafeFetch on MacOS/AArch64 to not need WX. We could do this by statically generating SafeFetch on that platform, and it wouldn't be in the JIT region at all. Why not just do that?

@parttimenerd
Copy link
Contributor Author

But we don't need to speculate. If thread-local variables are cheap on MacOS, and there is no reason why they should be expensive, then we can stop worrying and just use a thread-local variable for WX state. We can measure how long it takes, and we only have to care about one platform, MacOS/AArch64.

According to https://forums.swift.org/t/concurrencys-use-of-thread-local-variables/48654: "these accesses are just a move from a system register plus a load/store at a constant offset."

@tstuefe
Copy link
Member

tstuefe commented Mar 11, 2022

We could also redefine SafeFetch on MacOS/AArch64 to not need WX. We could do this by statically generating SafeFetch on that platform, and it wouldn't be in the JIT region at all. Why not just do that?

Do you mean using inline assembly?

@theRealAph
Copy link
Contributor

theRealAph commented Mar 11, 2022 via email

@tstuefe
Copy link
Member

tstuefe commented Mar 11, 2022

On 3/11/22 10:12, Thomas Stuefe wrote: We could also redefine SafeFetch on MacOS/AArch64 to not need WX. We could do this by statically generating SafeFetch on that platform, and it wouldn't be in the JIT region at all. Why not just do that? Do you mean using inline assembly?
I'd use out-of-line assembly, as I do for atomic compare-and-swap on linux: https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S But I guess inline would work.

Oh, this is neat. It would work on all platforms too, or on all we care to implement it for. And it would nicely solve the initialization window problem since it would work before stub routines are generated. We could throw CanUseSafeFetch away.

It seems we already do static assembly on bsd aarch. So there is already a path to follow.

But this could also be done as a follow up enhancement. I still like the OS TLS variable idea.

@fweimer-rh
Copy link

According to https://forums.swift.org/t/concurrencys-use-of-thread-local-variables/48654: "these accesses are just a move from a system register plus a load/store at a constant offset."

Ideally you'd still benchmark that. Some AArch64 implementations have really, really slow moves from the system register used as the thread pointer. Hopefully Apple's implementation isn't in that category.

@AntonKozlov
Copy link
Member

blocking SIGSEGV and SIGBUS - or other synchronous error signals like SIGFPE - and then triggering said signal is UB. What happens is OS-dependent. I saw processes vanishing, or hang, or core. It makes sense, since what is the kernel supposed to do. It cannot deliver the signal, and deferring it would require returning to the faulting instruction, that would just re-fault.
For some more details see e.g. https://bugs.openjdk.java.net/browse/JDK-8252533

This UB looks reasonable. My point is that a native thread would run fine with SIGSEGV blocked. But then JVM decides it can do SafeFetch, and things gets nasty.

Is there a crash that is fixed by the change? I just spotted it is an enhancement, not a bug. Just trying to understand the problem.

Yes, this issue is a breakout from https://bugs.openjdk.java.net/browse/JDK-8282306, where we'd like to use SafeFetch to make stack walking in AsyncGetCallTrace more robust. AGCT is called from the signal handler, and it may run in any number of situations (e.g. in foreign threads, or threads which are in the process of getting dismantled, etc).

I mean, some way to verify the issue is fixed, e.g. a test that does not fail anymore.

I see AsyncGetCallTrace to assume the JavaThread very soon, or do I look at the wrong place? https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/forte.cpp#L569

Another situation is error handling itself. When writing an hs-err file, we use SafeFetch to do carefully tiptoe around the possibly corrupt VM state. If the original crash happened in a foreign thread, we still want some of these reports to work (e.g. dumping register content or printing stacks). So SafeFetch should be as robust as possible.

OK, thanks. I think we also handle recursive segfaults recover after interpretation of the corrupted VM state. Otherwise, implementing the printing functions would be too tedious and hard with SafeFetch alone. But I see it's used in printing register content, at least.

@AntonKozlov
Copy link
Member

Is it possible to change SafeFetch only? Switch to WXExec before calling the stub and switch WXWrite back unconditionally? We won't need to provide assert in ThreadWXEnable. But SafeFetch can check the assumption with assert via Thread, if it exists.

But SafeFetch could be used from outside code as well as VM code. In case of the latter, prior state can either be WXWrite or WXExec. It needs to restore the prior state after the call.

I'm not sure I understand what is the "outside code". The SafeFetch is the private hotspot function, it cannot be linked with non-JVM code, isn't it?

Sorry for being imprecise. I meant SafeFetch is triggered from within a signal handler that runs on a foreign thread. E.g. AGCT or error handling.

Then the OS TLS way is not better since when the signal handler and SafeFetch start, the state is unknown and is only assumed to be Write (in initialization of TLS variable).

Copy link
Member

@AntonKozlov AntonKozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked on the patch again from the perspective of a pure refactoring. It looks fine except we lost one of the asserts.

@@ -276,7 +275,7 @@ Thread::Thread() {
assert(Thread::current_or_null() == NULL, "creating thread before barrier set");
}

MACOS_AARCH64_ONLY(DEBUG_ONLY(_wx_init = false));
MACOS_AARCH64_ONLY(DEBUG_ONLY(os::ThreadWX::init();))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line meant the WX state is not initialized at this point (as a part of Thread constructor). Since there are a several places where the state is initialized and it was easy to miss one, I would like to preserve some assert that the state is initialized.

@tstuefe
Copy link
Member

tstuefe commented Mar 11, 2022

blocking SIGSEGV and SIGBUS - or other synchronous error signals like SIGFPE - and then triggering said signal is UB. What happens is OS-dependent. I saw processes vanishing, or hang, or core. It makes sense, since what is the kernel supposed to do. It cannot deliver the signal, and deferring it would require returning to the faulting instruction, that would just re-fault.
For some more details see e.g. https://bugs.openjdk.java.net/browse/JDK-8252533

This UB looks reasonable. My point is that a native thread would run fine with SIGSEGV blocked. But then JVM decides it can do SafeFetch, and things gets nasty.

Blocking synchronous error signals makes zero sense even for normal programs, since you lose the ability to get cores. For the JVM in particular, it also blocks facilities like polling pages, or dynamically querying CPU abilities. So a JVM would not even start with synchronous error signals blocked.

Is there a crash that is fixed by the change? I just spotted it is an enhancement, not a bug. Just trying to understand the problem.

Yes, this issue is a breakout from https://bugs.openjdk.java.net/browse/JDK-8282306, where we'd like to use SafeFetch to make stack walking in AsyncGetCallTrace more robust. AGCT is called from the signal handler, and it may run in any number of situations (e.g. in foreign threads, or threads that are in the process of getting dismantled, etc).

I mean, some way to verify the issue is fixed, e.g. a test that does not fail anymore.

No, tests do not exist. Unfortunately, otherwise this regression would have been detected right away and we would not need this PR.

We have a test though that tests SafeFetch during error handling. That test can be tweaked for this purpose. So, test does not exist yet, but can be easily written.

I see AsyncGetCallTrace to assume the JavaThread very soon, or do I look at the wrong place? https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/forte.cpp#L569

Another situation is error handling itself. When writing an hs-err file, we use SafeFetch to do carefully tiptoe around the possibly corrupt VM state. If the original crash happened in a foreign thread, we still want some of these reports to work (e.g. dumping register content or printing stacks). So SafeFetch should be as robust as possible.

OK, thanks. I think we also handle recursive segfaults recover after interpretation of the corrupted VM state. Otherwise, implementing the printing functions would be too tedious and hard with SafeFetch alone. But I see it's used in printing register content, at least.

Secondary error handling is a very coarse-grained tool. If an error reporting step crashes out, we continue with the next step. Has disadvantages though. The total number of retries is very limited. And a faulting error reporting step still hurts, because its report is compromised. E.g. if the call stack printing crashes out, we have no call stack. This is not an abstract problem. Its a very concrete and typical problem.

I spend a large part of my work with hs-err reports. They are of very high importance to us. We (SAP) have invested a lot of time and effort in hardening out OpenJDK error reporting, and SafeFetch is an important part of that. For example, we provided the facility that made SafeFetch usable in signal handling. It would be nice if our work was not compromised. Please let us find a way forward here.

@tstuefe
Copy link
Member

tstuefe commented Mar 11, 2022

On 3/11/22 10:12, Thomas Stuefe wrote: We could also redefine SafeFetch on MacOS/AArch64 to not need WX. We could do this by statically generating SafeFetch on that platform, and it wouldn't be in the JIT region at all. Why not just do that? Do you mean using inline assembly?
I'd use out-of-line assembly, as I do for atomic compare-and-swap on linux: https://github.com/openjdk/jdk/blob/master/src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S But I guess inline would work.

Oh, this is neat. It would work on all platforms too, or on all we care to implement it for. And it would nicely solve the initialization window problem since it would work before stub routines are generated. We could throw CanUseSafeFetch away.

It seems we already do static assembly on bsd aarch. So there is already a path to follow.

But this could also be done as a follow up enhancement. I still like the OS TLS variable idea.

I spent some time doing a static implementation of SafeFetch on Linux x64, and its not super trivial. The problem is that we need to know addresses of instructions inside that function. I can set labels in assembly, and I can export them, but so far I have been unable to use them as addresses in C++ code. I will research some more.

@fweimer-rh
Copy link

I spent some time doing a static implementation of SafeFetch on Linux x64, and its not super trivial. The problem is that we need to know addresses of instructions inside that function. I can set labels in assembly, and I can export them, but so far I have been unable to use them as addresses in C++ code. I will research some more.

There are basically two ways (easy) to do it. Put global symbols like

        .globl address_of_label
address_of_label:

into the assembler sources and use

        extern char address_of_label[] __attribute__ ((visibility ("hidden")));

from the C++ side.

Or use a local label, and export the difference to the function start to a local label in a global data symbol from the assembler side:

        .globl SafeFetch // Real function name goes here.
SafeFetch:
        // …
.Llabel:
        // …

        .section .rodata
        .globl SafeFetch_label_offset
        .p2align 3
SafeFetch_label_offset:
        .quad .Llabel - SafeFetch
	.type SafeFetch_label_offset, @object
	.size SafeFetch_label_offset, 8

And use

extern uintptr_t SafeFetch_label_offset __attribute__ ((__visibility ("hidden")));

and the expression (uintptr_t) &SafeFetch + SafeFetch_label_offset to compute the final address. The second approach is friendlier to tools (which may get confused by symbols in the middle of functions).

If you have a PR, please Cc: me on it, I will have a look.

@theRealAph
Copy link
Contributor

According to https://forums.swift.org/t/concurrencys-use-of-thread-local-variables/48654: "these accesses are just a move from a system register plus a load/store at a constant offset."

Ideally you'd still benchmark that. Some AArch64 implementations have really, really slow moves from the system register used as the thread pointer. Hopefully Apple's implementation isn't in that category.

In a tight loop, loads from __thread variables take 1ns. It's this:

    0x18ea1c530 <+0>:   ldr    x16, [x0, #0x8]
    0x18ea1c534 <+4>:   mrs    x17, TPIDRRO_EL0 
    0x18ea1c538 <+8>:   and    x17, x17, #0xfffffffffffffff8
    0x18ea1c53c <+12>:  ldr    x17, [x17, x16, lsl #3]
    0x18ea1c540 <+16>:  cbz    x17, 0x18ea1c550          ; only executed first time
    0x18ea1c544 <+20>:  ldr    x16, [x0, #0x10]
    0x18ea1c548 <+24>:  add    x0, x17, x16
    0x18ea1c54c <+28>:  ret    

... which looks the same as what glibc does. Not bad, but quite a lot more to do than a simple load.

I'd still use a static SafeFetch, with no W^X fiddling. It just seems to me much more reasonable.

@theRealAph
Copy link
Contributor

into the assembler sources and use

        extern char address_of_label[] __attribute__ ((visibility ("hidden")));

ITYM

extern "C" char address_of_label[] __attribute__ ((visibility ("hidden")));

@fweimer-rh
Copy link

into the assembler sources and use

        extern char address_of_label[] __attribute__ ((visibility ("hidden")));

ITYM

extern "C" char address_of_label[] __attribute__ ((visibility ("hidden")));

It doesn't hurt, but the Itanium ABI does not mangle such global data symbols, so it's not strictly needed.

@theRealAph
Copy link
Contributor

extern "C" char address_of_label[] __attribute__ ((visibility ("hidden")));

It doesn't hurt, but the Itanium ABI does not mangle such global data symbols, so it's not strictly needed.

That's an interesting point of view. I guess I never thought about it, but I'd always put symbols for an asm file in an extern "C" section anyway. But yeah, OK.

@theRealAph
Copy link
Contributor

1ns

Incidentally, there must be a lot of speculation and bypassing going on there. I can see 15 cycles of latency, probably 20, so that'd be more like 5ns start to finish. M1 is a remarkable thing.

@tstuefe
Copy link
Member

tstuefe commented Mar 14, 2022

Hi Florian,

If you have a PR, please Cc: me on it, I will have a look.

Thanks a lot, Florian! I got it to work under Linux x64.

My error was that I had declared the label in C++ as extern void* SafeFetch_continuation. Declaring it as extern char _SafeFetch32_continuation[] __attribute__ ((visibility ("hidden"))); as you suggested does the trick. I'm not sure I understand the difference.

extern "C"

It doesn't hurt, but the Itanium ABI does not mangle such global data symbols, so it's not strictly needed.

I don't understand this remark, what does Itanium have to do with this?

@fweimer-rh
Copy link

Thanks a lot, Florian! I got it to work under Linux x64.

Great!

My error was that I had declared the label in C++ as extern void* SafeFetch_continuation. Declaring it as extern char _SafeFetch32_continuation[] __attribute__ ((visibility ("hidden"))); as you suggested does the trick. I'm not sure I understand the difference.

Your approach might have worked as well, but you would have to use &SafeFetch_continuation on the C++ side. Arrays work directly because of pointer decay. The actual type does not matter because you just want to create a code address from that, so there's no corresponding object (in the C++ standard sense) at the address anyway.

Anyway, from what I've seen, the array is more idiomatic.

It doesn't hurt, but the Itanium ABI does not mangle such global data symbols, so it's not strictly needed.

I don't understand this remark, what does Itanium have to do with this?

The C++ ABI definition is probably Itanium's most lasting contribution to computing. I think it's used on most non-Windows systems these days, not just on Linux, and of course on all kinds of CPUs.

@tstuefe
Copy link
Member

tstuefe commented Mar 14, 2022

Thanks a lot, Florian! I got it to work under Linux x64.

Great!

My error was that I had declared the label in C++ as extern void* SafeFetch_continuation. Declaring it as extern char _SafeFetch32_continuation[] __attribute__ ((visibility ("hidden"))); as you suggested does the trick. I'm not sure I understand the difference.

Your approach might have worked as well, but you would have to use &SafeFetch_continuation on the C++ side. Arrays work directly because of pointer decay.

Ah, that makes sense. I wondered why the address did not look like a code pointer in C++.

Anyway, got Linux x86_32 working too. Now I am working on aarch64.

Anyway, from what I've seen, the array is more idiomatic.

It doesn't hurt, but the Itanium ABI does not mangle such global data symbols, so it's not strictly needed.

I don't understand this remark, what does Itanium have to do with this?

The C++ ABI definition is probably Itanium's most lasting contribution to computing. I think it's used on most non-Windows systems these days, not just on Linux, and of course on all kinds of CPUs.

Interesting to know. Thanks!

@parttimenerd
Copy link
Contributor Author

We're looking into solutions and create a new PR if necessary.

@mlbridge
Copy link

mlbridge bot commented Mar 17, 2022

Mailing list message from David Holmes on hotspot-dev:

On 12/03/2022 2:37 am, Anton Kozlov wrote:

On Thu, 10 Mar 2022 18:04:50 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

blocking SIGSEGV and SIGBUS - or other synchronous error signals like SIGFPE - and then triggering said signal is UB. What happens is OS-dependent. I saw processes vanishing, or hang, or core. It makes sense, since what is the kernel supposed to do. It cannot deliver the signal, and deferring it would require returning to the faulting instruction, that would just re-fault.
For some more details see e.g. https://bugs.openjdk.java.net/browse/JDK-8252533

This UB looks reasonable. My point is that a native thread would run fine with SIGSEGV blocked. But then JVM decides it can do SafeFetch, and things gets nasty.

Is there a crash that is fixed by the change? I just spotted it is an enhancement, not a bug. Just trying to understand the problem.

Yes, this issue is a breakout from https://bugs.openjdk.java.net/browse/JDK-8282306, where we'd like to use SafeFetch to make stack walking in AsyncGetCallTrace more robust. AGCT is called from the signal handler, and it may run in any number of situations (e.g. in foreign threads, or threads which are in the process of getting dismantled, etc).

I mean, some way to verify the issue is fixed, e.g. a test that does not fail anymore.

I see AsyncGetCallTrace to assume the JavaThread very soon, or do I look at the wrong place? https://github.com/openjdk/jdk/blob/master/src/hotspot/share/prims/forte.cpp#L569

It is up to the agent setting things up for AGCT to only actually call
it for JavaThreads.

David
-----

@parttimenerd
Copy link
Contributor Author

This is not the point: It comes down to API design. If we use SafeFetch in os::is_first_C_frame (and thereby in frame::link_or_null) and not just in ASGCT, then it depends on when the other methods can be called. These methods are e.g. used whenever an error happens and a hs_err file is generated. We cannot guarantee that a JavaThread is always present there.

@mlbridge
Copy link

mlbridge bot commented Mar 20, 2022

Mailing list message from David Holmes on hotspot-dev:

On 18/03/2022 5:21 pm, Johannes Bechberger wrote:

On Fri, 11 Mar 2022 07:52:16 GMT, Johannes Bechberger <duke at openjdk.java.net> wrote:

The WXMode for the current thread (on MacOS aarch64) is currently stored in the thread class which is unnecessary as the WXMode is bound to the current OS thread, not the current instance of the thread class.
This pull request moves the storage of the current WXMode into a thread local global variable in `os` and changes all related code. SafeFetch depended on the existence of a thread object only because of the WXMode. This pull request therefore removes the dependency, making SafeFetch usable in more contexts.

Johannes Bechberger has updated the pull request incrementally with one additional commit since the last revision:

Remove two unnecessary lines

This is not the point: It comes down to API design. If we use SafeFetch in os::is_first_C_frame (and thereby in frame::link_or_null) and not just in ASGCT, then it depends on when the other methods can be called. These methods are e.g. used whenever an error happens and a hs_err file is generated. We cannot guarantee that a JavaThread is always present there.

My comment was specifically in response to your statement:

I see AsyncGetCallTrace to assume the JavaThread very soon

But AGCT is only intended to ever be called on JavaThreads.

David

@AntonKozlov
Copy link
Member

My comment was specifically in response to your statement:

I see AsyncGetCallTrace to assume the JavaThread very soon

But AGCT is only intended to ever be called on JavaThreads.

Sorry, it was my question. It looked for me this way as well (and that ACGT will return shortly if called on non-Java thread; AFAICS SafeFetch in not involved), and I wanted to confirm. The AGCT on non-Java thread was declared to be one of the two major reasons for this patch.

I would support this patch to move W^X management out from Thread to OS-specific code, after the problem with the assert "is initialized" is fixed.

@tstuefe
Copy link
Member

tstuefe commented Mar 21, 2022

I'm currently implementing Andrews proposal for a static safefetch (#7865, still in draft, but almost done). That will be more generic solution since we don't have to deal with thread wx state at all. That's why we closed this PR.

@dholmes-ora
Copy link
Member

The conversation here is some what hard to follow. I do see that "foreign threads" was mentioned by @tstuefe in the context of AGCT but I have to assume he misspoke there (assuming a foreign thread is one not attached to the VM) as AGCT only works for attached JavaThreads. The signal handler that will call AGCT has to be prepared to find any kind of thread in any state, but AGCT should only be called on the right kinds of thread in the right state.

@tstuefe
Copy link
Member

tstuefe commented Mar 21, 2022

The conversation here is some what hard to follow. I do see that "foreign threads" was mentioned by @tstuefe in the context of AGCT but I have to assume he misspoke there (assuming a foreign thread is one not attached to the VM) as AGCT only works for attached JavaThreads. The signal handler that will call AGCT has to be prepared to find any kind of thread in any state, but AGCT should only be called on the right kinds of thread in the right state.

Sure, AGCT can be limited to VM threads - or maybe already is. But tracking non-VM threads could be a valid use case.

We have downstream in the SapMachine a facility where we track callstacks from malloc sites - independently from NMT or the VM. With the explicit purpose of catching mallocs from non-VM threads too. For collecting the stack trace, we use some VM utilities, SafeFetch among them. That is a very useful facility. I could argue a similar case for the Async Profiler: why should profiling be limited to Java threads? In the end, if it eats performance, it hurts, regardless whether its a java thread or a non-VM-attached thread. Could be a concurrent native thread burning CPU, why would that not be interesting.

Our concern was with SafeFetch, and AGCT is only one example. SafeFetch should be as safe as possible. Error reporting alone is a sufficient reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

6 participants