-
Notifications
You must be signed in to change notification settings - Fork 201
Conversation
.emit( | ||
r#" | ||
// leaq @tlsld(%rip), %rdi | ||
sink.put1(0b0100_1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this get a const
like for LEA
like:
sink.put1(0b0100_1000); | |
const REX_W: u8 = 0x48; | |
sink.put1(REX_W); |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one of those rex
functions would be better, but I don't know which one to use.
// output %rax | ||
|
||
// leaq @dtpoff32(%rax), %rax | ||
sink.put1(0b0100_1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above re. naming this constant.
We probably should start with the "General Dynamic TLS Model", and only use "Local Dynamic TLS Model" if the required conditions are met (the variable is in the same object and ideally only if there are multiple TLS variable accesses that can share the Also, if there are multiple TLS variable accesses, will cranelift be able to optimize the away the unneeded |
Yeah, I didnt fully read the docs, but just followed LLVM's example.
That would require a new optimization pass. It probably wouldn't help much for cg_clif, because of lack of inlining when MIR opts are not set to max. |
Can you detail in the issue the problem you're trying to solve, and what possible implementations exist, please? The mentioned issue doesn't contain any text but a link to another issue in cg_clif. I'll comment in there. |
@bjorn3, I would like to understand this at a higher-level; what are you using as reference? Perhaps "ELF Handling for Thread-Local Storage"? |
I use https://akkadia.org/drepper/tls.pdf + LLVM output as reference. |
Switched from the local only LD model to the global GD tls model. |
Implemented MachO support. Requires gimli-rs/object#142. |
Note to self: I will need to write |
I thought the Mach-O relocations were designed such that the implicit -4 meant the actual addend would be 0, so you wouldn't need to write an addend to the location, and I think the disassembly you did agrees with that:
|
I actually meant substract -4. The encoding for x86_64_macho_tls_symbol already uses -4 as addend, so substracting -4 would indeed give 0. |
956e6a5
to
5dcfa8c
Compare
Can somebody please review this? Both ELF and Mach-O have been tested using cg_clif. |
It looks like I reviewed what I understood above; perhaps @bnjbvr or @julian-seward1? |
Ping? |
@bjorn3 I'm fine with this once the conflicts are resolved and tests are added but that is because I'm not familiar enough with other ways of implementing TLS to have much comment. There's a Cranelift meeting on Monday mornings and there might be someone there that has more insight into the future Cranelift's threading support; I mention that because the whole threading thing might require some back-and-forth discussion. |
While there are other ways to implement TLS support, this matches the native TLS handling of ELF cq Mach-O. This way, it is possible to import/export tls variables from/to native objects. While this is not really necessary for cg_clif, as Rust makes a getter for the TLS var anyway and doesn't export the raw TLS var, for rcc it will likely be needed to interoperate with native code. |
I rebased this and added tests for legalization of
For cg_clif this is technically the only thing that prevents multithreading at the moment. I have made atomics atomic by using a global lock. While this is slow, at least it works. I would love to see atomic support in Cranelift, but for now TLS is more important for me. |
@abrown Has this been discussed? |
I think the cranelift meeting yesterday was cancelled due to a US holiday (at least I didn't attend) but I posted in a zulip chat thread and I will bring it up next Monday. |
Ok, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this approach looks reasonable to me. TLS makes sense as a GlobalValue flag, and starting with global-dynamic sounds like a good way to get started. Just a few questions below:
func.dfg.replace(inst).tls_value(ptr_ty, gv); | ||
} else { | ||
func.dfg.replace(inst).symbol_value(ptr_ty, gv); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having a tls_value
instruction, would it work to just continue to use symbol_value
here, and use the tls
flag of gv
to determine whether it's TLS or not? You could add a is_tls_data
predicate function which could work the same way as is_colocated_data
, FormatPredicateKind::IsColocatedData
, and so on, and then define_entity_ref
in cranelift-codegen/meta/src/isa/x86/encodings.rs could use that to decide when to apply the TLS legalizations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
symbol_value
can directly be encoded, while tls_value
needs to be legalized to the correct instruction for the respective object format. See comment below for why. As far as I know, unlike an encoding, there is no way to predicate a legalization.
I decided to look a bit more at the Clobbering all necessary registers will mean that I have to add even more clobber registers. |
I tried to add more output registers for the clobbering, but the maximum amount of outputs is 8, which is way less than the amount of registers than are clobbered. I also tried to use the same spill logic as used by normal calls, but I can't find where it is determined which registers are caller-saved and should spilled to the stack. |
I think you might not have been able to find it because all register values that live through a call are spilled - callee-save/caller-save ABI guarantees are disregarded at the moment. There's a note in |
Thanks @iximeow! It now correctly spills all caller-saved registers. |
🎉 |
I'm getting a compile error with building for aarch64-unknown-linux-gnu on current master, and found this pullrequest as the culprit when bisecting the issue. Here's the error:
It's propably worth noting that this only happens when executing P.S: the regression is caused by bytecodealliance/wasmtime@0a1bb3b , I hope this thread is the correct one to note the author of that regression. |
The problem is probably that Cranelift got built without x86 support, but I added a special case to regalloc with an x86 only instruction without checking for x86 support. I think the solution is to add an instruction attribute, just like |
This adds TLS (thread local storage) support for ELF targets to Cranelift
tls_example.clif
file has been verifier to match the ASM emitted by LLVM at https://github.com/bjorn3/rustc_codegen_cranelift/issues/388#issuecomment-531466812If you don't know who could review this, please indicate so and/or ping
bnjbvr
.The list of suggested reviewers on the right can help you.No list of suggested reviewers shown on the right side.Fixes #963
cc @philipc https://github.com/bjorn3/rustc_codegen_cranelift/issues/388#issuecomment-526909856
cc @abrown Because you implemented SIMD support for x86, I think you know enough of x86 encodings to verify that the encodings are correct.