[AutoBump] Merge with 2a8c12b2 (Jan 21) (11) #549

jorickert · 2025-05-20T14:38:44Z

No description provided.

) In https://reviews.llvm.org/D136765 / https://reviews.llvm.org/D144155, the asan annotations for `std::vector` were modified to unpoison freed backing memory on destruction, instead of leaving it poisoned. However, calling `__clear()` instead of `clear()` skips informing the asan runtime of this decrease in the accessible container size, which breaks the invariant that the value of `old_mid` should match the value of `new_mid` from the previous call to `__sanitizer_annotate_contiguous_container`, which can trip the sanity checks for the partial poison between [d1, d2) and the container redzone between [d2, c), if enabled. To fix this, ensure that `clear()` is called instead, as is already done by `__vdeallocate()`. Also remove `__clear()`, since it is no longer called.

The function getPartialReductionCost is already quite large and is likely to grow in size as we add support for more cases in future. Therefore, I think it's best to move this into the cpp file.

llvm::FixedPoint is not trivially copyable.

…lvm#123611) Summary: We used this globally scoped `ext_no_call_asm` as a sort of hack around the compiler that allowed the attributor to optimize out inline assembly calls to PTX instructions. Quite some time ago I got rid of every inline assembly call and replaced it with a builitin, so this can just be deleted. Furthermore, I use the `[[omp::assume]]` attribute directly for the aligned barrier usage. This prints an unknown assumption warning (even though it isn't) so I'm just silencing that for now until I fix it later. --------- Co-authored-by: Michael Kruse <[email protected]>

After a30e50f, AMDGPUAAResult is being called in more situations where BasicAA isn't sure. This exposed some regressions where NoAlias is being incorrectly returned for two identical pointers. The fix is to check the underlying objects for equality before returning NoAlias.

… option private-headers (llvm#121226) [llvm-objdump] Print out xcoff load section of xcoff object file with option private-headers

…#123076) 1. Remove `%c0 = arith.constant 0 : index` from testt functions. This extra Op is not needed (the index can be passed as an argument), so this is just noise. 2. Replaced `%cst_0` with `%pad` to communicate what the underlying SSA value is intended for. 3. Unified some comments.

Summary: This is spelled `ompx_aligned_barrier` when used directly, but wasn't included in the list of known assumptions. Fix that so now th test works.

) For the following variable DenseMap<const Instruction *, std::pair<PartialReductionChain, unsigned>> ScaledReductionExitInstrs; we never actually need the PartialReductionChain when using the map. I've cleaned this up so that this now becomes DenseMap<const Instruction *, unsigned> ScaledReductionMap;

The indices of SGPR register pairs need to be 2-aligned and SGPR quadruplets need to be 4-aligned. With this patch, we report an error when inline asm register constraints specify a misaligned register index, instead of silently dropping the specified index. Fixes llvm#123208 --------- Co-authored-by: Matt Arsenault <[email protected]>

…es (llvm#121079) This is a follow-up for llvm#119110 and a fix for llvm#118450 RemoveDeadValues used to delete Values and analyzing the IR at the same time, because of that, `isMemoryEffectFree` got invalid IR with half-deleted linalg.generic operation. This PR separates analysis and cleanup to prevent such situation. Thank you! --------- Co-authored-by: Renat Idrisov <[email protected]> Co-authored-by: Andrzej Warzyński <[email protected]>

…us. (llvm#122807) For similar reasons for fixed-width being prefered to scalable for Neoverse V2, this patch enables the UseFixedOverScalableIfEqualCost feature when using -mcpu=cortex-x2, x3, x4 and x925 that are similar to Neoverse V2.

…_mul(x, C0)) (llvm#123468) This PR introduces the following transformations: - If C0 is not 0: umax(nuw_shl(x, C0), x + 1) -> x == 0 ? 1 : nuw_shl(x, C0) - If C0 is not 0 or 1: umax(nuw_mul(x, C0), x + 1) -> x == 0 ? 1 : nuw_mul(x, C0) Fixes llvm#122388. Alive2 proof: https://alive2.llvm.org/ce/z/rkp_8U

…m#123617) In accordance with llvm#123569 In order to keep the patch at reasonable size, this PR only covers for the llvm subproject, unittests excluded.

…auses (llvm#121356) No functional change. (Also, tried to filter out all ALLOCATOR modifiers, but that makes some other tests fail).

…tom flags (llvm#123577) The test was failing in the case where a `multilib.yaml` file was present in the installation. This is because the presence of a multilib YAML file leads to the diagnosing of validity of the multilib custom flags. This patch fixes the test by creating a new YAML file with multilib custom flags to be used by the test.

All targets build `__clc_mad` -- even SPIR-V targets -- since it compiles to the optimal `llvm.fmuladd` intrinsic. There is no change to the bytecode generated for non-SPIR-V targets. The `mix` builtin, which is implemented as a wrapper around `mad`, is left as an OpenCL-layer wrapper of `__clc_mad`. I don't know if it's worth having a specific CLC version of `mix`. The changes to the other CLC files/functions are moving uses of `mad` to `__clc_mad`, and reformatting. There is an additional instance of `trunc` becoming `__clc_trunc`, which was missed before.

…m#123629) These instructions return a 64-bit result and a 1-bit carry, unlike smul_lohi and umul_lohi which return a pair of 32-bit results. This does not appear to make any difference in practice because the DAG types are not used for anything before these nodes are converted to MachineInstrs.

…123619) But keep evaluating. This is what the current interpreter does as well.

Failed to notice them when landing that patch - apologies!

@hahnjo

In gcc-15, explicit includes of `<cstdint>` are required when fixed-size integers are used. In this file, this include only happened as a side effect of including SmallVector.h Although llvm compiles fine, the root-project would benefit from explicitly including it here, so we can backport the patch. Maybe interesting for @hahnjo and @vgvassilev

)

) Currently we're using quite different internal names for the `std::invoke` family of type traits. This adds a layer around the current implementation to make it easier to understand when it is used and makes it easier to define multiple implementations of it.

Correct CSE in SelectionDAG can make DAG combining more effective and reduces the size of the DAG and thus should improve compile time.

ilogb libcall was not being constant folded correctly. This patch adds ilogb case in isMathLibCallNoop with correct error condition. Fixes llvm#101873

Before this change InstrSet in SPIRVEmitIntrinsics was uninitialized before running runOnFunction. This change adds a new function getPreferredInstructionSet in SPIRVSubtarget.

…m#123477) Use `mlir_target_link_libraries()` to link dependencies of libraries that are not included in libMLIR, to ensure that they link to the dylib when they are used in Flang. Otherwise, they implicitly pull in all their static dependencies, effectively causing Flang binaries to simultaneously link to the dylib and to static libraries, which is never a good idea. I have only covered the libraries that are used by Flang. If you wish, I can extend this approach to all non-libMLIR libraries in MLIR, making MLIR itself also link to the dylib consistently.

…on (llvm#120771) Replace `static void exitWithError(Twine Message, std::string Whence = "", std::string Hint = "")` std::string with StringRef to remove constructing Strings on every call or passing by value Fixes: llvm#100065

This is to fix llvm#123410.

Reland the patch after fixing the lit test.

Add some generated tests with every shuffle permutation for relevant vector element types and sizes. Not sure if this is going overboard with the number of tests. I pruned out the largest cases (16 and 32-bit cases are impractically large), and there's redundancy when testing the pointer cases (at least for SelectionDAG). This uses inline assembly to produce sample values because of how the ABI is lowered when using a function argument. Since we break all arguments into 32-bit pieces, a shuffle never ends up forming. We need separate handling to reconstruct shuffles in contexts involving physical registers in ABI contexts. I wrote a small tool to generate these, so I can easily change the exact test body. Not sure if it's worth posting anywhere. This is in preparation for making better use of v_pk_mov_b32, v_mov_b64 and s_mov_b64 in shuffles.

Primarily around uses of getSubReg/getSuperReg.

) Implementation details: The UNTIED clause is recognized by setting the flag=0 for the default case or performing logical OR to flag if other clauses are specified, and this flag is passed as an argument to the `__kmpc_omp_task_alloc` runtime call. Resubmitting the PR with fix for the failure, as it was reverted here: 927a70d and previously merged here: llvm#115283

These passed prechecks but failed after cc5eba1

Summary: We used to avoid a lot of this stuff because we didn't properly handle variadics in device code. That's been solved for now, so we can just make an internal printf handler that forwards to the external `vprintf` function. This is either provided by NVIDIA's SDK or by the GPU libc implementation. The main reason for doing this is because it prevents the stupid AMDGPU printf pass from mangling our beautiful printfs!

@ronlieb

…122786) Summary: Right now we just default to device for each type, and mix an ad-hoc scope with the one used by the compiler's builtins. Unify this can make each version take the scope optionally. For @ronlieb, this will remove the need for `add_system` in the fork as well as the extra `cas` with system scope, just pass `system`.

It appears that omp_lib is not correctly (or maybe not at all?) found from the build directory. This made a few buildbots break after [PR#121356](llvm#121356) landed. This is a workaround to unblock the buildbots. https://lab.llvm.org/staging/#/builders/130/builds/12654 https://lab.llvm.org/buildbot/#/builders/140/builds/15102 https://lab.llvm.org/staging/#/builders/105/builds/13855

) In the current state of the code, the transform computes entries for the dependency matrix until `MaxMemInstrCount` which is 100. After 99th entry, it terminates and thus overall wastes compile-time. It would be nice if we can compute total number of entries upfront and early exit if the number of entries > 100. However, computing the number of entries is not always possible as it depends on two factors: 1. Number of load-store pairs in a loop. 2. Number of common loop levels for each of the pair. This patch constrains the whole computation on the number of loads and stores instructions in the loop. In another approach, I experimented with computing 1 and constraining the number of pairs, but that did not lead to any additional benefit in terms of compile time. However, when other issues are fixed, I can revisit this approach.

) This is preparation for extending ReachingDefAnalysis to stack slots. We should use `Register`, not `MCRegister` for something that can be a physical register or a stack slot.

…CallOperatorInstantiationRAII (llvm#123687) Now that the RAII object has a dedicate logic for handling nested lambdas, where the inner lambda could reference any captures/variables/parameters from the outer lambda, we can shift the responsibility for managing lambdas away from SetupConstraintScope(). I think this also makes the structure clearer. Fixes llvm#123441

…form (llvm#123575) Enable ELFNixPlatform support for loongarch64. These are few simple changes, but it allows us to use the orc runtime in ELF/LoongArch64 backend. This change adds test cases targeting the LoongArch64 Linux platform to the ORC runtime integration test suite. Since jitlink for loongarch64 is ready for general use, and ELF-based platforms support defining multiple static initializer table sections with differing priorities, some relevant test cases in compiler-rt for ELFNixPlatform support can be enabled.

…3465) Turn free-standing `MemRefType`-related helper functions in `BuiltinTypes.h` into member functions.

…FFLE` (llvm#123555) This PR fixes operand order of `ILVOD.df` when lowering `VECTOR_SHUFFLE`, the result was `<y[1], x[1]>` while it should be `<x[1], y[1]>`. * This PR is split from llvm#123040.

…lvm#123606) Reverts llvm#123330

…20912) On Windows, imported symbols must be searched with '__imp_' prefix. Support imported global variables and imported functions.

PR llvm#122344 adds intrinsics for Bulk Async Copy (non-tensor variants) using TMA. This patch adds the corresponding NVVM Dialect Ops. lit tests are added to verify the lowering to all variants of the intrinsics. Signed-off-by: Durgadoss R <[email protected]>

… (NFC) (llvm#123621) In accordance with llvm#123569

…lvm#123551) Fixes llvm#123549

…123344) Commit 1eed469 added logic to reassociate a (add (add x y) -c) operand to a CSEL instruction with a comparison involving x and c (or a similar constant) in order to obtain a common (SUBS x c) instruction. This commit extends this logic to non-constants. In this way, we also reassociate a (sub (add x y) z) operand of a CSEL instruction to (add (sub x z) y) if the CSEL compares x and z, for example. Alive proof: https://alive2.llvm.org/ce/z/SEVpR

…vm#120363) This started out as trying to combine bf16 fpround to BFCVT2 instructions, but ended up removing the aarch64.neon.nfcvt intrinsics in favour of generating fpround instructions directly. This simplifies the patterns and can lead to other optimizations. The BFCVT2 instruction is adjusted to makes sure the types are valid, and a bfcvt2 is now generated in more place. The old intrinsics are auto-upgraded to fptrunc instructions too.

This will be sent by Arm's Guarded Control Stack extension when an invalid return is executed. The signal does have an address we could show, but it's the PC at which the fault occured. The debugger has plenty of ways to show you that already, so I've left it out. ``` (lldb) c Process 460 resuming Process 460 stopped * thread #1, name = 'test', stop reason = signal SIGSEGV: control protection fault frame #0: 0x0000000000400784 test`main at main.c:57:1 54 afunc(); 55 printf("return from main\n"); 56 return 0; -> 57 } (lldb) dis <...> -> 0x400784 <+100>: ret ``` The new test case generates the signal by corrupting the link register then attempting to return. This will work whether we manually enable GCS or the C library does it for us. (in the former case you could just return from main and it would fault)

…rray is indexed by const evaluatable expressions (llvm#119340)"" (llvm#123713) This reverts commit 7dd34ba. Fixed the assertion violation reported by 7dd34ba Co-authored-by: MalavikaSamak <[email protected]>

ddcc and others added 30 commits January 20, 2025 08:57

[AArch64][NFC] Move getPartialReductionCost into cpp file (llvm#123370)

a733c1f

The function getPartialReductionCost is already quite large and is likely to grow in size as we add support for more cases in future. Therefore, I think it's best to move this into the cpp file.

[clang][bytecode] Don't memcpy() FixedPoint values (llvm#123599)

b5c9cba

llvm::FixedPoint is not trivially copyable.

[llvm-objdump] Print out xcoff load section of xcoff object file with…

b92cc78

… option private-headers (llvm#121226) [llvm-objdump] Print out xcoff load section of xcoff object file with option private-headers

[OpenMP] Fix mispelled attribute and warning

723a3e7

Summary: This is spelled `ompx_aligned_barrier` when used directly, but wasn't included in the list of known assumptions. Fix that so now th test works.

[IR] Replace of PointerType::get(Type) with opaque version (NFC) (llv…

416f1c4

…m#123617) In accordance with llvm#123569 In order to keep the patch at reasonable size, this PR only covers for the llvm subproject, unittests excluded.

[Flang][OpenMP][NFC] Add tests for align and allocator in allocate cl…

9da7c3b

…auses (llvm#121356) No functional change. (Also, tried to filter out all ALLOCATOR modifiers, but that makes some other tests fail).

[clang][bytecode] Diagnose IntegralToPointer casts to non-void (llvm#…

e8674af

…123619) But keep evaluating. This is what the current interpreter does as well.

[IR] Remove unused variables from llvm#123617

b95ed30

Failed to notice them when landing that patch - apologies!

[Clang] Document some of the implementation-defined keywords (llvm#84591

c248fc1

)

[SDAG] Fix CSE for ADDRSPACECAST nodes (llvm#122912)

3606876

Correct CSE in SelectionDAG can make DAG combining more effective and reduces the size of the DAG and thus should improve compile time.

[ConstantFolding] Add ilogb in isMathLibCallNoop (llvm#122582)

19bd2d6

ilogb libcall was not being constant folded correctly. This patch adds ilogb case in isMathLibCallNoop with correct error condition. Fixes llvm#101873

SIISelLowering.cpp - remove unused variable missed in llvm#123617

8ff195c

X86ISelLowering.cpp - remove unused variable missed in llvm#123617

7084110

[SPIR-V] Fix SPIRVEmitIntrinsics undefined behavior (llvm#123625)

5810f15

Before this change InstrSet in SPIRVEmitIntrinsics was uninitialized before running runOnFunction. This change adds a new function getPreferredInstructionSet in SPIRVSubtarget.

fzou1 and others added 30 commits January 21, 2025 10:11

[X86][AMX] Fix handling of AMX-FP8 internal intrinsics (llvm#123540)

abbfed9

This is to fix llvm#123410.

Reland [OffloadBundler] Compress bundles over 4GB (llvm#122307)

e87b843

Reland the patch after fixing the lit test.

[ARM] Use MCRegister instead of unsigned. NFC

9d9c561

Primarily around uses of getSubReg/getSuperReg.

AMDGPU: Fix asm constrains in new shuffle tests

585858a

These passed prechecks but failed after cc5eba1

[ReachingDefAnalysis][NFC] Replace MCRegister with Register (llvm#123626

5cde6d2

) This is preparation for extending ReachingDefAnalysis to stack slots. We should use `Register`, not `MCRegister` for something that can be a physical register or a stack slot.

[mlir][IR][NFC] Move free-standing functions to MemRefType (llvm#12…

6aaa8f2

…3465) Turn free-standing `MemRefType`-related helper functions in `BuiltinTypes.h` into member functions.

[MIPS][MSA] Invert operand order of ILVOD when lowering `VECTOR_SHU…

385f776

…FFLE` (llvm#123555) This PR fixes operand order of `ILVOD.df` when lowering `VECTOR_SHUFFLE`, the result was `<y[1], x[1]>` while it should be `<x[1], y[1]>`. * This PR is split from llvm#123040.

Reland "[Flang][Driver] Add a flag to control zero initialization" (l…

ce32625

…lvm#123606) Reverts llvm#123330

[CodeGen] Use MCRegister for ignoreCSRForAllocationOrder. (llvm#123685)

7bb363b

[Mips] Handle declspec(dllimport) on mipsel-windows-* triples (llvm#1…

26b87aa

…20912) On Windows, imported symbols must be searched with '__imp_' prefix. Support imported global variables and imported functions.

[MC] Avoid repeated hash lookups (NFC) (llvm#123698)

73beb15

[TableGen] Avoid repeated map lookups (NFC) (llvm#123699)

1714fac

[Rewrite] Avoid repeated hash lookups (NFC) (llvm#123696)

671088b

[SelectionDAG] Avoid repeated hash lookups (NFC) (llvm#123697)

a588e20

[IR][unittests] Replace of PointerType::get(Type) with opaque version…

97d691b

… (NFC) (llvm#123621) In accordance with llvm#123569

[clang][Sema] Respect qualification of methods in heuristic results (l…

4740e09

…lvm#123551) Fixes llvm#123549

[AutoBump] Merge with 2a8c12b (Jan 21)

b798fdb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AutoBump] Merge with 2a8c12b2 (Jan 21) (11) #549

[AutoBump] Merge with 2a8c12b2 (Jan 21) (11) #549

Uh oh!

jorickert commented May 20, 2025

Uh oh!

Uh oh!

[AutoBump] Merge with 2a8c12b2 (Jan 21) (11) #549

Are you sure you want to change the base?

[AutoBump] Merge with 2a8c12b2 (Jan 21) (11) #549

Uh oh!

Conversation

jorickert commented May 20, 2025

Uh oh!

Uh oh!