-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[MachineLICM] Work-around Incomplete RegUnits #95926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 | ||
# RUN: llc -mtriple=aarch64-unknown-linux-gnu -run-pass=greedy,machinelicm -verify-machineinstrs -debug -o - %s | FileCheck %s | ||
|
||
# FIXME: Running RA is needed otherwise it runs pre-RA LICM. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but machinelicm should be the post-rA one? early-machinelicm is pre-RA There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For some reason it only triggers if I run RA first, otherwise it seems There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you are working around the MIR parser bug where it assumes SSA until it finds something obviously not SSA. The pass needs to clear the properties and I think we need an explicit SSA property |
||
--- | ||
name: test | ||
tracksRegLiveness: true | ||
body: | | ||
; CHECK-LABEL: name: test | ||
; CHECK: bb.0: | ||
; CHECK-NEXT: successors: %bb.1(0x80000000) | ||
; CHECK-NEXT: liveins: $x0, $w1, $x2 | ||
; CHECK-NEXT: {{ $}} | ||
; CHECK-NEXT: B %bb.1 | ||
; CHECK-NEXT: {{ $}} | ||
; CHECK-NEXT: bb.1: | ||
; CHECK-NEXT: successors: %bb.1(0x40000000), %bb.2(0x40000000) | ||
; CHECK-NEXT: liveins: $x0, $w1, $x2 | ||
; CHECK-NEXT: {{ $}} | ||
; CHECK-NEXT: renamable $q11 = MOVIv4i32 2, 8 | ||
; CHECK-NEXT: BL &memset, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $w1, implicit $x2, implicit-def $sp, implicit-def $x0 | ||
; CHECK-NEXT: renamable $q10 = MVNIv4i32 4, 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does this test actually demonstrate? Without this patch, I see that the def of
I don't understand why this happened. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The D20 D21 being subregs of Q10 is an ARM thing that wasn't carried over into AArch64. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. D'oh - I thought this test case was ARM! Let me take another look... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was trying to get my head around what was going on. There is a rule in the AArch64 PCS which talks about the top bits of a Q register needing to be preserved across a call:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK so AArch64 registers work like this: My suspicion is that TableGen only creates one regunit for all of these There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
What is that flag that we already have? Anyway I think this kind of approach sounds reasonable, but should probably get some wider discussion. Possible alternative ideas:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Opened #96146 to continue the discussion There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Thanks. In the mean time I guess it's OK to commit the current patch to fix the AArch64 breakage, with a comment explaining that it's overly conservative (hence the AMDGPU test regression). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hello. Sorry, all the subtleties of RegUnits are not something I know a lot about (and I may be a bit unresponsive today, sorry if I am). Perhaps @efriedma-quic or @TNorthover know more about how it should all work. We would usually revert back to a known-good state or fix quickly, and leaving it broken for any length of time isn't the llvm way of doing things. So I'm all in favour of this fix, thanks. We have scalable registers in AArch64 aliasing the same registers too, to throw another potential spanner in the works. I don't believe those registers actually get marked as scalable at the moment though. If we go with option 3 then we should try and make sure that works OK too. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry all, I'm just seeing this thread now.
@jayfoad is right this is how the regunits work and the issue with register mask is well known (well at least @MatzeB and I know :P). We shouldn't use regunits for regmask operands because they are indeed not precise enough until we go with option 2. (And by the way changing regunits shouldn't change regalloc. I'm curious to see why you @Pierre-vh saw this.) |
||
; CHECK-NEXT: $xzr = SUBSXri $x0, 1, 0, implicit-def $nzcv | ||
; CHECK-NEXT: Bcc 11, %bb.1, implicit $nzcv | ||
; CHECK-NEXT: B %bb.2 | ||
; CHECK-NEXT: {{ $}} | ||
; CHECK-NEXT: bb.2: | ||
; CHECK-NEXT: liveins: $q10, $q11 | ||
; CHECK-NEXT: {{ $}} | ||
; CHECK-NEXT: $q0 = COPY $q10 | ||
; CHECK-NEXT: $q1 = COPY $q11 | ||
bb.0: | ||
liveins: $x0, $w1, $x2 | ||
B %bb.1 | ||
|
||
bb.1: | ||
liveins: $x0, $w1, $x2 | ||
renamable $q11 = MOVIv4i32 2, 8 | ||
BL &memset, csr_aarch64_aapcs, implicit-def dead $lr, implicit $sp, implicit $x0, implicit $w1, implicit $x2, implicit-def $sp, implicit-def $x0 | ||
renamable $q10 = MVNIv4i32 4, 0 | ||
$xzr = SUBSXri $x0, 1, 0, implicit-def $nzcv | ||
Bcc 11, %bb.1, implicit $nzcv | ||
B %bb.2 | ||
|
||
bb.2: | ||
liveins: $q10, $q11 | ||
$q0 = COPY $q10 | ||
$q1 = COPY $q11 | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't see why this is using -debug, but that requires asserts