forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
[AutoBump] Merge with b4e81fd1 (Jan 24) (20) #558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jorickert
wants to merge
196
commits into
bump_to_8388040f
Choose a base branch
from
bump_to_b4e81fd1
base: bump_to_8388040f
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…RISC-V failure (llvm#123959) I'm not sure why the test is larger for RISC-V than other targets, but we saw this before with llvm#111360. The file is just over the current 60KB limit: ``` 62772 /home/asb/llvm-project/build/stage2/tools/clang/test/Modules/Output/empty.modulemap.tmp/base.pcm ```
…123648) Summary: This PR fixes bugreport llvm#122493 The root problem is the same as before lambda function and DeclRefExpr references a variable that does not belong to the same module as the enclosing function body. Therefore iteration over the function body doesn’t visit the VarDecl. Before this change RelatedDeclsMap was created only for canonical decl but in reality it has to be done for the definition of the function that does not always match the canonical decl. Test Plan: check-clang
Intrinsics are available for the 'cpSize' variants also. So, this patch migrates the Op to lower to the intrinsics for all cases. * Update the existing tests to check the lowering to intrinsics. * Add newer cp_async_zfill tests to verify the lowering for the 'cpSize' variants. * Tidy-up CHECK lines in cp_async() function in nvvmir.mlir (NFC) PTX spec link: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async Signed-off-by: Durgadoss R <[email protected]>
…123967) We use variable locations such as DBG_VALUE $xmm0 as shorthand to refer to "the low lane of $xmm0", and this is reflected in how DWARF is interpreted too. However InstrRefBasedLDV tries to be smart and interprets such a DBG_VALUE as a 128-bit reference. We then issue a DW_OP_deref_size of 128 bits to the stack, which isn't permitted by DWARF (it's larger than a pointer). Solve this for now by not using DW_OP_deref_size if it would be illegal. Instead we'll use DW_OP_deref, and the consumer will load the variable type from the stack, which should be correct. There's still a risk of imprecision when LLVM decides to use smaller or larger value types than the source-variable type, which manifests as too-little or too-much memory being read from the stack. However we can't solve that without putting more type information in debug-info. fixes llvm#64093
… object parameters (llvm#124096) LLDB deduces the CV-qualifiers and storage class of a C++ method from the object parameter. Currently it assumes that parameter is implicit (and is a pointer type with the name "this"). This isn't true anymore in C++23 with explicit object parameters. To support those we can simply check the `DW_AT_object_pointer` of the subprogram DIE (works for both declarations and definitions) when searching for the object parameter. We can also remove the check for `eEncodingIsPointerUID`, because in C++ an artificial parameter called `this` is only ever the implicit object parameter (at least for all the major compilers).
This patch adds NVVM intrinsics and NVPTX codegen for: - cp.async.bulk.prefetch.L2.* variants - These intrinsics optionally support cache_hints as indicated by the boolean flag argument. - Lit tests are added for all combinations of these intrinsics in cp-async-bulk.ll. - The generated PTX is verified with a 12.3 ptxas executable. - Added docs for these intrinsics in NVPTXUsage.rst file. PTX Spec reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk-prefetch Co-authored-by: abmajumder <[email protected]>
…explicit object parameters" (llvm#124100) Reverts llvm#124096 Broke linux CI: ``` Note: This is test shard 7 of 42. [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from DWARFASTParserClangTests [ RUN ] DWARFASTParserClangTests.TestParseSubroutine_ExplicitObjectParameter Expected<T> must be checked before access or destruction. Expected<T> value was in success state. (Note: Expected<T> values in success mode must still be checked prior to being destroyed). Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it): 0 SymbolFileDWARFTests 0x0000560271ee5ba7 1 SymbolFileDWARFTests 0x0000560271ee3a2c 2 SymbolFileDWARFTests 0x0000560271ee63ea 3 libc.so.6 0x00007f3e54e5b050 4 libc.so.6 0x00007f3e54ea9e2c 5 libc.so.6 0x00007f3e54e5afb2 gsignal + 18 6 libc.so.6 0x00007f3e54e45472 abort + 211 7 SymbolFileDWARFTests 0x0000560271e79d51 8 SymbolFileDWARFTests 0x0000560271e724f7 9 SymbolFileDWARFTests 0x0000560271f39e2c 10 SymbolFileDWARFTests 0x0000560271f3b368 11 SymbolFileDWARFTests 0x0000560271f3c053 12 SymbolFileDWARFTests 0x0000560271f4cf67 13 SymbolFileDWARFTests 0x0000560271f4c18a 14 SymbolFileDWARFTests 0x0000560271f2561c 15 libc.so.6 0x00007f3e54e4624a 16 libc.so.6 0x00007f3e54e46305 __libc_start_main + 133 17 SymbolFileDWARFTests 0x0000560271e65161 ```
Nothing in VPlan.h directly uses VPBlockUtils.h. Move it out to the more appropriate VPlanUtils.h to reduce the size of the widely included VPlan.h.
…vm#124101) Reverts llvm#123393 This is causing `TestVectorOfVectorsFromStdModule.py` to fail on the the macOS clang-15 matrix bot.
Prevents avoidable memory leaks. Looks like exchange added in aa1333a didn't take "continue" into account. ``` ==llc==2150782==ERROR: LeakSanitizer: detected memory leaks Direct leak of 10 byte(s) in 1 object(s) allocated from: #0 0x5f1b0f9ac14a in strdup llvm-project/compiler-rt/lib/asan/asan_interceptors.cpp:593:3 #1 0x5f1b1768428d in FileToRemoveList llvm-project/llvm/lib/Support/Unix/Signals.inc:105:55 ```
…explicit object parameters" (llvm#124100)" This reverts commit a802093. Relands original commit but fixing the unit-test to consume the `llvm::Expected` error object.
Using a "random" name for an "anonymous" pipe seems to be the state of the art on windows (according to stack overflow, new windows versions may have something better, but it involves calling kernel APIs directly and generally a lot of dark magic). The problem with the current method was that is does not produce unique names if one has two copies of the pipe code in the same process, which is what happened with llvm#120457 (because liblldb only exposes the public api, and we've started using the pipe code in lldb-dap as well). This patch works around the problem by adding the address of the counter variable to the pipe name. Replicating the multiple-copies setup in a test would be very difficult, which is why I'm not adding a test for this scenario.
.. by changing the signal stop reason format 🤦 The reason this did not work is because the code in `StopInfo::GetCrashingDereference` was looking for the string "address=" to extract the address of the crash. Macos stop reason strings have the form ``` EXC_BAD_ACCESS (code=1, address=0xdead) ``` while on linux they look like: ``` signal SIGSEGV: address not mapped to object (fault address: 0xdead) ``` Extracting the address from a string sounds like a bad idea, but I suppose there's some value in using a consistent format across platforms, so this patch changes the signal format to use the equals sign as well. All of the diagnose tests pass except one, which appears to fail due to something similar llvm#115453 (disassembler reports unrelocated call targets). I've left the tests disabled on windows, as the stop reason reporting code works very differently there, and I suspect it won't work out of the box. If I'm wrong -- the XFAIL will let us know.
…pecializations (llvm#118167) Some templates in the standard library are illegal to specialize for users (even if the specialization contains user-defined types). The [[clang::no_specializations]] attribute allows marking such base templates so that the compiler will diagnose if users try adding a specialization.
There were two implementations of this - one that implemented nextafter in software, and another that called a clang builtin. No in-tree targets called the builtin, so all targets build the software version. The builtin version has been removed, and the software version has been renamed to be the "default". This commit also optimizes nextafter, to avoid scalarization as much as possible. Note however that the (CLC) relational builtins still scalarize; those will be optimized in a separate commit. Since nextafter is used by some convert_type builtins, the diff to IR codegen is not limited to the builtin itself.
The code that checks a predicate against a swapped predicate in isImpliedCondBalancedTypes is not covered by any existing test, within any Analysis or Transform. Fix this by adding a test to SCEV.
…2782) Support reductions in SCFToGPU: `scf.parallel` and `scf.reduce` op combination is now converted to a `gpu.all_reduce` op.
Intel docs have been updated to be similar to AMD and now describe BSF/BSR as not changing the destination register if the input value was zero, which allows us to support CTTZ/CTLZ zero-input cases by setting the destination to support a NumBits result (BSR is a bit messy as it has to be XOR'd to create a CTLZ result). VIA/Zhaoxin x86_64 CPUs have also been confirmed to match this behaviour. This patch adjusts the X86ISD::BSF/BSR nodes to take a "pass through" argument for zero-input cases, by default this is set to UNDEF to match existing behaviour, but it can be set to a suitable value if supported. There are still some limits to this - its only supported for x86_64 capable processors (and I've only enabled it for x86_64 codegen), and Intel CPUs sometimes zero the upper 32-bits of a pass through register when used for BSR32/BSF32 with a zero source value (i.e. the whole 64bits may not get passed through). Fixes llvm#122004
Split target under LLVMIR/Transforms to avoid deps loop.
This code was using a pre-move-semantics trick of using std::swap to avoid expensive vector copies.
…123942) This is meant as a short-term workaround for an invalid conversion in this pass that occurs because existing SDWA selections are not correctly taken into account during the conversion. See the draft PR llvm#123221 for an attempt to fix the actual issue. --------- Co-authored-by: Frederik Harwath <[email protected]>
This header has been removed in C++20 and causes a large amount of deprecation spam when building against libstdc++ 15 in C++17 mode. As far as I understand, we just need to include *some* STL header to get access to the version macros, and as this header also includes <optional> nowadays we can just drop the <cstd646> include entirely.
…lvm#124089) Fixes llvm#123800 Extends LDS lowering by allowing it to discover transitive indirect/escpaing references to LDS GVs. For example, given the following input: ```llvm @lds_item_to_indirectly_load = internal addrspace(3) global ptr undef, align 8 %store_type = type { i32, ptr } @place_to_store_indirect_caller = internal addrspace(3) global %store_type undef, align 8 define amdgpu_kernel void @offloading_kernel() { store ptr @indirectly_load_lds, ptr addrspace(3) getelementptr inbounds nuw (i8, ptr addrspace(3) @place_to_store_indirect_caller, i32 0), align 8 call void @call_unknown() ret void } define void @call_unknown() { %1 = alloca ptr, align 8 %2 = call i32 %1() ret void } define void @indirectly_load_lds() { call void @directly_load_lds() ret void } define void @directly_load_lds() { %2 = load ptr, ptr addrspace(3) @lds_item_to_indirectly_load, align 8 ret void } ``` With the above input, prior to this patch, LDS lowering failed to lower the reference to `@lds_item_to_indirectly_load` because: 1. it is indirectly called by a function whose address is taken in the kernel. 2. we did not check if the kernel indirectly makes any calls to unknown functions (we only checked the direct calls). Co-authored-by: Jon Chesterfield <[email protected]>
…23684) For VALU shuffles, this saves an instruction in some case.
…vm#124116) Summary: We pass the `-nvptx-lower-global-ctor-dtor` option to support the `libc` like use-case which needs global constructors sometimes. This only affects the backend. If the NVPTX target is not enabled this option will be unknown which prevents you from compiling generic IR for this.
…etCastInstrCost without argument data We don't use the IntrinsicCostAttributes arguments so, which allows us to use in type-only analysis in a future patch.
Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect Source to be nonnull.
Note that PointerUnion::dyn_cast has been soft deprecated in PointerUnion.h: // FIXME: Replace the uses of is(), get() and dyn_cast() with // isa<T>, cast<T> and the llvm::dyn_cast<T> Literal migration would result in dyn_cast_if_present (see the definition of PointerUnion::dyn_cast), but this patch uses dyn_cast because we expect UPP.first to be nonnull.
) TySan supports some preprocessor checks and ignorelists, but they are currently untested. This PR adds some tests to make sure they all work. @fhahn @AaronBallman, this is based off the discussion in the documentation PR [llvm#123595]
llvm#124154) This makes it more clear what you the author must do, and what reviewers can expect you to do, before an approved PR can be merged. Spliting out the email bit into a section also means we can link directly to it in discussions. This relies on one of those parties actually reading this, but I plan to tackle the case where they don't with some new automation.
…os (llvm#123755) The FORM_TRANSPOSED_REG_TUPLE pseudo nodes use either the ZPR2Mul2 or ZPR4Mul4 register classes for output. This patch changes the class so that these can be extended to other multi-vector intrinsics which instead create a ZPR2/ZPR4 register sequence.
VBMI2 CPUs actually have vector funnel shift instruction support
…#120338) This patch align the debug location of the widen-phi to the debug location of original phi. Split from: llvm#120054
Most of this is mis-compiling with +fullfp16 and should be disabled for GISel.
Assign register banks to virtual registers. Does not use generic RegBankSelect. After register bank selection all register operand of G_ instructions have LLT and register banks exclusively. If they had register class, reassign appropriate register bank. Assign register banks using machine uniformity analysis: Sgpr - uniform values and some lane masks Vgpr - divergent, non S1, values Vcc - divergent S1 values(lane masks) AMDGPURegBankSelect does not consider available instructions and, in some cases, G_ instructions with some register bank assignment can't be inst-selected. This is solved in RegBankLegalize. Exceptions when uniformity analysis does not work: S32/S64 lane masks: - need to end up with sgpr register class after instruction selection - In most cases Uniformity analysis declares them as uniform (forced by tablegen) resulting in sgpr S32/S64 reg bank - When Uniformity analysis declares them as divergent (some phis), use intrinsic lane mask analyzer to still assign sgpr register bank temporal divergence copy: - COPY to vgpr with implicit use of $exec inside of the cycle - this copy is declared as uniform by uniformity analysis - make sure that assigned bank is vgpr Note: uniformity analysis does not consider that registers with vgpr def are divergent (you can have uniform value in vgpr). - TODO: implicit use of $exec could be implemented as indicator that instruction is divergent
This is subject to agreement by the Flang community (https://discourse.llvm.org/t/rfc-deprecate-ofast-in-flang/80243).
…m#123900) ... for the dynamic blocks created for operator new calls. This way we get the type of memory allocated right. As a side-effect, the diagnostics now point to the std::allocator calls, which is an improvement.
) As part of the "RemoveDIs" project, BasicBlock::iterator now carries a debug-info bit that's needed when getFirstNonPHI and similar feed into instruction insertion positions. Call-sites where that's necessary were updated a year ago; but to ensure some type safety however, we'd like to have all calls to moveBefore use iterators. This patch adds a (guaranteed dereferenceable) iterator-taking moveBefore, and changes a bunch of call-sites where it's obviously safe to change to use it by just calling getIterator() on an instruction pointer. A follow-up patch will contain less-obviously-safe changes. We'll eventually deprecate and remove the instruction-pointer insertBefore, but not before adding concise documentation of what considerations are needed (very few).
When generating `arm_neon.h`, NeonEmitter outputs code that violates strict aliasing rules (C23 6.5 Expressions #7, C++23 7.2.1 Value category [basic.lval] #11), for example: bfloat16_t __reint = __p0; uint32_t __reint1 = (uint32_t)(*(uint16_t *) &__reint) << 16; __ret = *(float32_t *) &__reint1; This patch fixed the offending code by replacing it with a call to `__builtin_bit_cast`.
This patch adds SM and PTX versions for SM 101, 120 and their arch-accelerated variants. All these are supported in cuda-12.8. sm120/120a requires ptx8.7 and the rest require ptx8.6. Signed-off-by: Durgadoss R <[email protected]>
Lower G_ instructions that can't be inst-selected with register bank assignment from AMDGPURegBankSelect based on uniformity analysis. - Lower instruction to perform it on assigned register bank - Put uniform value in vgpr because SALU instruction is not available - Execute divergent instruction in SALU - "waterfall loop" Given LLTs on all operands after legalizer, some register bank assignments require lowering while other do not. Note: cases where all register bank assignments would require lowering are lowered in legalizer. AMDGPURegBankLegalize goals: - Define Rules: when and how to perform lowering - Goal of defining Rules it to provide high level table-like brief overview of how to lower generic instructions based on available target features and uniformity info (uniform vs divergent). - Fast search of Rules, depends on how complicated Rule.Predicate is - For some opcodes there would be too many Rules that are essentially all the same just for different combinations of types and banks. Write custom function that handles all cases. - Rules are made from enum IDs that correspond to each operand. Names of IDs are meant to give brief description what lowering does for each operand or the whole instruction. - AMDGPURegBankLegalizeHelper implements lowering algorithms Since this is the first patch that actually enables -new-reg-bank-select here is the summary of regression tests that were added earlier: - if instruction is uniform always select SALU instruction if available - eliminate back to back vgpr to sgpr to vgpr copies of uniform values - fast rules: small differences for standard and vector instruction - enabling Rule based on target feature - salu_float - how to specify lowering algorithm - vgpr S64 AND to S32 - on G_TRUNC in reg, it is up to user to deal with truncated bits G_TRUNC in reg is treated as no-op. - dealing with truncated high bits - ABS S16 to S32 - sgpr S1 phi lowering - new opcodes for vcc-to-scc and scc-to-vcc copies - lowering for vgprS1-to-vcc copy (formally this is vgpr-to-vcc G_TRUNC) - S1 zext and sext lowering to select - uniform and divergent S1 AND(OR and XOR) lowering - inst-selected into SALU instruction - divergent phi with uniform inputs - divergent instruction with temporal divergent use, source instruction is defined as uniform(AMDGPURegBankSelect) - missing temporal divergence lowering - uniform phi, because of undef incoming, is assigned to vgpr. Will be fixed in AMDGPURegBankSelect via another fix in machine uniformity analysis.
…vm#117939) Canonicalize gathers/scatters with contiguous (i.e. [0, 1, 2, ...]) offsets into vector masked load/store ops.
…#123958) `TimerGroup` don't need to use as field of `ClangTidyProfiling`. We can construct it local during destructing.
…23454) skip header file before register AST Matchers it can avoid to matcher lots of ast node when lint header file
Add IDs for bit width that cover multiple LLTs: B32 B64 etc. "Predicate" wrapper class for bool predicate functions used to write pretty rules. Predicates can be combined using &&, || and !. Lowering for splitting and widening loads. Write rules for loads to not change existing mir tests from old regbankselect.
…l coroutine clones (llvm#118628) Summary: CoroCloner, by calling into CloneFunctionInto, does a lot of repeated work priming DIFinder and building a list of common module-level debug info metadata. For programs compiled with full debug info this can get very expensive. This diff builds the data once and shares it between all clones. Anecdata for a sample cpp source file compiled with full debug info: | | Baseline | IdentityMD set | Prebuilt CommonDI (cur.) | |-----------------|----------|----------------|--------------------------| | CoroSplitPass | 306ms | 221ms | 68ms | | CoroCloner | 101ms | 72ms | 0.5ms | | CollectCommonDI | - | - | 63ms | | Speed up | 1x | 1.4x | 4.5x | Note that CollectCommonDebugInfo happens once *per coroutine* rather than per clone. Test Plan: ninja check-llvm-unit ninja check-llvm Compiled a sample internal source file, checked time trace output for scope timings.
…2866) Change existing code for G_PHI to match what LLVM-IR version is doing via PHINode::hasConstantOrUndefValue. This is not safe for regular PHI since it may appear with an undef operand and getVRegDef can fail. Most notably this improves number of values that can be allocated to sgpr in AMDGPURegBankSelect. Common case here are phis that appear in structurize-cfg lowering for cycles with multiple exits: Undef incoming value is coming from block that reached cycle exit condition, if other incoming is uniform keep the phi uniform despite the fact it is joining values from pair of blocks that are entered via divergent condition branch.
This is the behavior expected by DWARF. It also requires some fixups to algorithms which were storing the addresses of some objects (Blocks and Variables) relative to the beginning of the function. There are plenty of things that still don't work in this setups, but this change is sufficient for the expression evaluator to correctly recognize the entry point of a function in this case.
…llvm#123745) Add the following workflows: - `fullbuild` on aarch64 ubuntu - `overlay` on windows 2025 - `overlay` on aarch64 ubuntu `ccache` variant is used on `aarch64` due to hendrikmuhs/ccache-action#279
…ot (llvm#121463) In function handleMFLOSlot, we may get a variable LastInstInFunction with a value of true from function getNextMachineInstr and IInSlot may be null which would trigger an assert. So we need to skip this case. Fix llvm#118223.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.