Skip to content

[AutoBump] Merge with 2a8c12b2 (Jan 21) (11) #549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 99 commits into
base: bump_to_977d744b
Choose a base branch
from

Conversation

jorickert
Copy link

No description provided.

ddcc and others added 30 commits January 20, 2025 08:57
)

In https://reviews.llvm.org/D136765 / https://reviews.llvm.org/D144155,
the asan annotations for `std::vector` were modified to unpoison freed
backing memory on destruction, instead of leaving it poisoned. However,
calling `__clear()` instead of `clear()` skips informing the asan runtime
of this decrease in the accessible container size, which breaks the
invariant that the value of `old_mid` should match the value of `new_mid`
from the previous call to `__sanitizer_annotate_contiguous_container`, which
can trip the sanity checks for the partial poison between [d1, d2) and the
container redzone between [d2, c), if enabled. To fix this, ensure that
`clear()` is called instead, as is already done by `__vdeallocate()`.
Also remove `__clear()`, since it is no longer called.
The function getPartialReductionCost is already quite large and
is likely to grow in size as we add support for more cases in
future. Therefore, I think it's best to move this into the cpp
file.
llvm::FixedPoint is not trivially copyable.
…lvm#123611)

Summary:
We used this globally scoped `ext_no_call_asm` as a sort of hack around
the compiler that allowed the attributor to optimize out inline assembly
calls to PTX instructions. Quite some time ago I got rid of every inline
assembly call and replaced it with a builitin, so this can just be
deleted.

Furthermore, I use the `[[omp::assume]]` attribute directly for the
aligned barrier usage. This prints an unknown assumption warning (even
though it isn't) so I'm just silencing that for now until I fix it
later.

---------

Co-authored-by: Michael Kruse <[email protected]>
After a30e50f, AMDGPUAAResult is being called in more situations where
BasicAA isn't sure. This exposed some regressions where NoAlias is being
incorrectly returned for two identical pointers.

The fix is to check the underlying objects for equality before returning
NoAlias.
… option private-headers (llvm#121226)

[llvm-objdump] Print out xcoff load section of xcoff object file with
option private-headers
…#123076)

1. Remove `%c0 = arith.constant 0 : index` from testt functions. This
   extra Op is not needed (the index can be passed as an argument), so
   this is just noise.
2. Replaced `%cst_0` with `%pad` to communicate what the underlying SSA
   value is intended for.
3. Unified some comments.
Summary:
This is spelled `ompx_aligned_barrier` when used directly, but wasn't
included in the list of known assumptions. Fix that so now th test
works.
)

For the following variable

DenseMap<const Instruction *, std::pair<PartialReductionChain,
unsigned>>
  ScaledReductionExitInstrs;

we never actually need the PartialReductionChain when using the map.
I've cleaned this up so that this now becomes

  DenseMap<const Instruction *, unsigned> ScaledReductionMap;
The indices of SGPR register pairs need to be 2-aligned and SGPR
quadruplets need to be 4-aligned. With this patch, we report an error
when inline asm register constraints specify a misaligned register
index, instead of silently dropping the specified index.

Fixes llvm#123208

---------

Co-authored-by: Matt Arsenault <[email protected]>
…es (llvm#121079)

This is a follow-up for llvm#119110
and a fix for llvm#118450

RemoveDeadValues used to delete Values and analyzing the IR at the same
time, because of that, `isMemoryEffectFree` got invalid IR with
half-deleted linalg.generic operation. This PR separates analysis and
cleanup to prevent such situation.

Thank you!

---------

Co-authored-by: Renat Idrisov <[email protected]>
Co-authored-by: Andrzej Warzyński <[email protected]>
…us. (llvm#122807)

For similar reasons for fixed-width being prefered to scalable for
Neoverse V2, this patch enables the UseFixedOverScalableIfEqualCost
feature when using -mcpu=cortex-x2, x3, x4 and x925 that are similar to
Neoverse V2.
…_mul(x, C0)) (llvm#123468)

This PR introduces the following transformations:

- If C0 is not 0:  
umax(nuw_shl(x, C0), x + 1) -> x == 0 ? 1 : nuw_shl(x, C0)  
- If C0 is not 0 or 1:  
umax(nuw_mul(x, C0), x + 1) -> x == 0 ? 1 : nuw_mul(x, C0)  

Fixes llvm#122388.
Alive2 proof: https://alive2.llvm.org/ce/z/rkp_8U
…m#123617)

In accordance with llvm#123569

In order to keep the patch at reasonable size, this PR only covers for
the llvm subproject, unittests excluded.
…auses (llvm#121356)

No functional change.

(Also, tried to filter out all ALLOCATOR modifiers, but that makes some
other tests fail).
…tom flags (llvm#123577)

The test was failing in the case where a `multilib.yaml` file was
present in the installation. This is because the presence of a multilib
YAML file leads to the diagnosing of validity of the multilib custom
flags.

This patch fixes the test by creating a new YAML file with multilib
custom flags to be used by the test.
All targets build `__clc_mad` -- even SPIR-V targets -- since it
compiles to the optimal `llvm.fmuladd` intrinsic. There is no change to
the bytecode generated for non-SPIR-V targets.

The `mix` builtin, which is implemented as a wrapper around `mad`, is
left as an OpenCL-layer wrapper of `__clc_mad`. I don't know if it's
worth having a specific CLC version of `mix`.

The changes to the other CLC files/functions are moving uses of `mad` to
`__clc_mad`, and reformatting. There is an additional instance of
`trunc` becoming `__clc_trunc`, which was missed before.
…m#123629)

These instructions return a 64-bit result and a 1-bit carry, unlike
smul_lohi and umul_lohi which return a pair of 32-bit results.

This does not appear to make any difference in practice because the DAG
types are not used for anything before these nodes are converted to
MachineInstrs.
…123619)

But keep evaluating. This is what the current interpreter does as well.
Failed to notice them when landing that patch - apologies!
In gcc-15, explicit includes of `<cstdint>` are required when fixed-size
integers are used. In this file, this include only happened as a side
effect of including SmallVector.h

Although llvm compiles fine, the root-project would benefit from
explicitly including it here, so we can backport the patch.

Maybe interesting for @hahnjo and @vgvassilev
)

Currently we're using quite different internal names for the
`std::invoke` family of type traits. This adds a layer around the
current implementation to make it easier to understand when it is used
and makes it easier to define multiple implementations of it.
Correct CSE in SelectionDAG can make DAG combining more effective and
reduces the size of the DAG and thus should improve compile time.
ilogb libcall was not being constant folded correctly. This patch adds 
ilogb case in isMathLibCallNoop with correct error condition.

Fixes llvm#101873
Before this change InstrSet in SPIRVEmitIntrinsics was uninitialized
before running runOnFunction. This change adds a new function
getPreferredInstructionSet in SPIRVSubtarget.
…m#123477)

Use `mlir_target_link_libraries()` to link dependencies of libraries
that are not included in libMLIR, to ensure that they link to the dylib
when they are used in Flang. Otherwise, they implicitly pull in all
their static dependencies, effectively causing Flang binaries to
simultaneously link to the dylib and to static libraries, which is never
a good idea.

I have only covered the libraries that are used by Flang. If you wish, I
can extend this approach to all non-libMLIR libraries in MLIR, making
MLIR itself also link to the dylib consistently.
…on (llvm#120771)

Replace `static void exitWithError(Twine Message, std::string Whence =
"", std::string Hint = "")` std::string with StringRef to remove
constructing Strings on every call or passing by value

Fixes: llvm#100065
fzou1 and others added 30 commits January 21, 2025 10:11
Reland the patch after fixing the lit test.
Add some generated tests with every shuffle permutation
for relevant vector element types and sizes. Not sure if this
is going overboard with the number of tests. I pruned out the largest
cases (16 and 32-bit cases are impractically large), and there's
redundancy when testing the pointer cases (at least for SelectionDAG).

This uses inline assembly to produce sample values because of how the
ABI is lowered when using a function argument. Since we break all
arguments into 32-bit pieces, a shuffle never ends up forming. We
need separate handling to reconstruct shuffles in contexts involving
physical registers in ABI contexts.

I wrote a small tool to generate these, so I can easily change the
exact test body. Not sure if it's worth posting anywhere.

This is in preparation for making better use of v_pk_mov_b32,
v_mov_b64 and s_mov_b64 in shuffles.
Primarily around uses of getSubReg/getSuperReg.
)

Implementation details:
The UNTIED clause is recognized by setting the flag=0 for the default
case or performing logical OR to flag if other clauses are specified,
and this flag is passed as an argument to the `__kmpc_omp_task_alloc`
runtime call.


Resubmitting the PR with fix for the failure, as it was reverted here:
927a70d
and previously merged here: llvm#115283
These passed prechecks but failed after cc5eba1
Summary:
We used to avoid a lot of this stuff because we didn't properly handle
variadics in device code. That's been solved for now, so we can just
make an internal printf handler that forwards to the external `vprintf`
function. This is either provided by NVIDIA's SDK or by the GPU libc
implementation.

The main reason for doing this is because it prevents the stupid AMDGPU
printf pass from mangling our beautiful printfs!
…122786)

Summary:
Right now we just default to device for each type, and mix an ad-hoc
scope with the one used by the compiler's builtins. Unify this can make
each version take the scope optionally.

For @ronlieb, this will remove the need for `add_system` in the fork as
well as the extra `cas` with system scope, just pass `system`.
It appears that omp_lib is not correctly (or maybe not at all?) found
from the build directory. This made a few buildbots break after
[PR#121356](llvm#121356) landed.
This is a workaround to unblock the buildbots.

https://lab.llvm.org/staging/#/builders/130/builds/12654
https://lab.llvm.org/buildbot/#/builders/140/builds/15102
https://lab.llvm.org/staging/#/builders/105/builds/13855
)

In the current state of the code, the transform computes entries for the
dependency matrix until `MaxMemInstrCount` which is 100. After 99th
entry, it terminates and thus overall wastes compile-time.

It would be nice if we can compute total number of entries upfront and
early exit if the number of entries > 100. However, computing the number
of entries is not always possible as it depends on two factors:
1. Number of load-store pairs in a loop.
2. Number of common loop levels for each of the pair.

This patch constrains the whole computation on the number of loads and
stores instructions in the loop.

In another approach, I experimented with computing 1 and constraining
the number of pairs, but that did not lead to any additional benefit in
terms of compile time. However, when other issues are fixed, I can
revisit this approach.
)

This is preparation for extending ReachingDefAnalysis to stack slots. We
should use `Register`, not `MCRegister` for something that can be a
physical register or a stack slot.
…CallOperatorInstantiationRAII (llvm#123687)

Now that the RAII object has a dedicate logic for handling nested
lambdas, where the inner lambda could reference any
captures/variables/parameters from the outer lambda, we can shift the
responsibility for managing lambdas away from SetupConstraintScope().

I think this also makes the structure clearer.

Fixes llvm#123441
…form (llvm#123575)

Enable ELFNixPlatform support for loongarch64. These are few simple
changes, but it allows us to use the orc runtime in ELF/LoongArch64
backend.

This change adds test cases targeting the LoongArch64 Linux platform to
the ORC runtime integration test suite. Since jitlink for loongarch64 is
ready for general use, and ELF-based platforms support defining multiple
static initializer table sections with differing priorities, some
relevant test cases in compiler-rt for ELFNixPlatform support can be
enabled.
…3465)

Turn free-standing `MemRefType`-related helper functions in
`BuiltinTypes.h` into member functions.
…FFLE` (llvm#123555)

This PR fixes operand order of `ILVOD.df` when lowering
`VECTOR_SHUFFLE`, the result was `<y[1], x[1]>` while it should be
`<x[1], y[1]>`.

* This PR is split from llvm#123040.
…20912)

On Windows, imported symbols must be searched with '__imp_' prefix.
Support imported global variables and imported functions.
PR llvm#122344 adds intrinsics for Bulk Async Copy
(non-tensor variants) using TMA. This patch
adds the corresponding NVVM Dialect Ops.

lit tests are added to verify the lowering to all
variants of the intrinsics.

Signed-off-by: Durgadoss R <[email protected]>
…123344)

Commit 1eed469 added logic to
reassociate a (add (add x y) -c) operand to a CSEL instruction with a
comparison involving x and c (or a similar constant) in order to obtain
a common (SUBS x c) instruction.
    
This commit extends this logic to non-constants. In this way, we also
reassociate a (sub (add x y) z) operand of a CSEL instruction to
(add (sub x z) y) if the CSEL compares x and z, for example.
    
Alive proof: https://alive2.llvm.org/ce/z/SEVpR
…vm#120363)

This started out as trying to combine bf16 fpround to BFCVT2
instructions, but ended up removing the aarch64.neon.nfcvt intrinsics in
favour of generating fpround instructions directly. This simplifies the
patterns and can lead to other optimizations. The BFCVT2 instruction is
adjusted to makes sure the types are valid, and a bfcvt2 is now
generated in more place. The old intrinsics are auto-upgraded to fptrunc
instructions too.
This will be sent by Arm's Guarded Control Stack extension when an
invalid return is executed.

The signal does have an address we could show, but it's the PC at which
the fault occured. The debugger has plenty of ways to show you that
already, so I've left it out.

```
(lldb) c
Process 460 resuming
Process 460 stopped
* thread #1, name = 'test', stop reason = signal SIGSEGV: control protection fault
    frame #0: 0x0000000000400784 test`main at main.c:57:1
   54  	  afunc();
   55  	  printf("return from main\n");
   56  	  return 0;
-> 57  	}
(lldb) dis
<...>
->  0x400784 <+100>: ret
```

The new test case generates the signal by corrupting the link register
then attempting to return. This will work whether we manually enable GCS
or the C library does it for us.

(in the former case you could just return from main and it would fault)
…rray is indexed by const evaluatable expressions (llvm#119340)"" (llvm#123713)

This reverts commit 7dd34ba.

Fixed the assertion violation reported by
7dd34ba

Co-authored-by: MalavikaSamak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.