forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 3
[AutoBump] Merge with 08195f31 (Jan 23) (18) #556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jorickert
wants to merge
80
commits into
bump_to_7e622b61
Choose a base branch
from
bump_to_08195f31
base: bump_to_7e622b61
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Create separate resource initialization function for each resource and add them to CodeGenModule's `CXXGlobalInits` list. Fixes llvm#120636 and addresses this [comment ](https://github.com/llvm/llvm-project/pull/119755/files#r1894093603).
True16 format for v_cmpx_class_f16. Update VOPCX_CLASS t16 and fake16 pseudo.
A bulk commit of true16 support for v_cmp_xx_i/u16 instructions including: v_cmpx_lt_i16 v_cmpx_eq_i16 v_cmpx_le_i16 v_cmpx_gt_i16 v_cmpx_ne_i16 v_cmpx_ge_i16 v_cmpx_lt_u16 v_cmpx_eq_u16 v_cmpx_le_u16 v_cmpx_gt_u16 v_cmpx_ne_u16 v_cmpx_ge_u16
This PR relands [llvm#122992](llvm#122992). Some machines were failing to run the `reflect-error.ll` test due to the RUN lines ```llvm ; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %} ; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %} ``` which failed when `spirv-tools` was not present on the machine due to running the command `not` without any arguments. These RUN lines have been removed since they don't actually test anything new compared to the other two RUN lines due to the expected error during instruction selection. ```llvm ; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 | FileCheck %s ; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 | FileCheck %s ```
A SYCL kernel entry point function is a non-member function or a static member function declared with the `sycl_kernel_entry_point` attribute. Such functions define a pattern for an offload kernel entry point function to be generated to enable execution of a SYCL kernel on a device. A SYCL library implementation orchestrates the invocation of these functions with corresponding SYCL kernel arguments in response to calls to SYCL kernel invocation functions specified by the SYCL 2020 specification. The offload kernel entry point function (sometimes referred to as the SYCL kernel caller function) is generated from the SYCL kernel entry point function by a transformation of the function parameters followed by a transformation of the function body to replace references to the original parameters with references to the transformed ones. Exactly how parameters are transformed will be explained in a future change that implements non-trivial transformations. For now, it suffices to state that a given parameter of the SYCL kernel entry point function may be transformed to multiple parameters of the offload kernel entry point as needed to satisfy offload kernel argument passing requirements. Parameters that are decomposed in this way are reconstituted as local variables in the body of the generated offload kernel entry point function. For example, given the following SYCL kernel entry point function definition: ``` template<typename KernelNameType, typename KernelType> [[clang::sycl_kernel_entry_point(KernelNameType)]] void sycl_kernel_entry_point(KernelType kernel) { kernel(); } ``` and the following call: ``` struct Kernel { int dm1; int dm2; void operator()() const; }; Kernel k; sycl_kernel_entry_point<class kernel_name>(k); ``` the corresponding offload kernel entry point function that is generated might look as follows (assuming `Kernel` is a type that requires decomposition): ``` void offload_kernel_entry_point_for_kernel_name(int dm1, int dm2) { Kernel kernel{dm1, dm2}; kernel(); } ``` Other details of the generated offload kernel entry point function, such as its name and calling convention, are implementation details that need not be reflected in the AST and may differ across target devices. For that reason, only the transformation described above is represented in the AST; other details will be filled in during code generation. These transformations are represented using new AST nodes introduced with this change. `OutlinedFunctionDecl` holds a sequence of `ImplicitParamDecl` nodes and a sequence of statement nodes that correspond to the transformed parameters and function body. `SYCLKernelCallStmt` wraps the original function body and associates it with an `OutlinedFunctionDecl` instance. For the example above, the AST generated for the `sycl_kernel_entry_point<kernel_name>` specialization would look as follows: ``` FunctionDecl 'sycl_kernel_entry_point<kernel_name>(Kernel)' TemplateArgument type 'kernel_name' TemplateArgument type 'Kernel' ParmVarDecl kernel 'Kernel' SYCLKernelCallStmt CompoundStmt <original statements> OutlinedFunctionDecl ImplicitParamDecl 'dm1' 'int' ImplicitParamDecl 'dm2' 'int' CompoundStmt VarDecl 'kernel' 'Kernel' <initialization of 'kernel' with 'dm1' and 'dm2'> <transformed statements with redirected references of 'kernel'> ``` Any ODR-use of the SYCL kernel entry point function will (with future changes) suffice for the offload kernel entry point to be emitted. An actual call to the SYCL kernel entry point function will result in a call to the function. However, evaluation of a `SYCLKernelCallStmt` statement is a no-op, so such calls will have no effect other than to trigger emission of the offload kernel entry point. Additionally, as a related change inspired by code review feedback, these changes disallow use of the `sycl_kernel_entry_point` attribute with functions defined with a _function-try-block_. The SYCL 2020 specification prohibits the use of C++ exceptions in device functions. Even if exceptions were not prohibited, it is unclear what the semantics would be for an exception that escapes the SYCL kernel entry point function; the boundary between host and device code could be an implicit noexcept boundary that results in program termination if violated, or the exception could perhaps be propagated to host code via the SYCL library. Pending support for C++ exceptions in device code and clear semantics for handling them at the host-device boundary, this change makes use of the `sycl_kernel_entry_point` attribute with a function defined with a _function-try-block_ an error.
…vance. NFC (llvm#123876) Use this to improve performance of SubtargetEmitter::findWriteResources and SubtargetEmitter::findReadAdvance. Now we can do a map lookup instead of a linear search through all WriteRes/ReadAdvance records. This reduces the build time of RISCVGenSubtargetInfo.inc on my machine from 43 seconds to 10 seconds.
…d mtriple used when passing options into the translate API call (llvm#123975) Rename internal command line flags for optimization level and mtriple used when passing options into the translate API call.
…r` in `TypeLocTypeMatcher` (llvm#123450) There are no template in `TypeLocTypeMatcher`. So we do not need to use `DynTypedMatcher` which can improve performance
MSVC ignores the `/defArm64Native` argument on non-ARM64X targets. It is also ignored if the `/def` option is not specified.
Plumbs through creating file ranges to C and Python.
…3718) This changes the implementation of `__copy_cvref_t` to only template the implementation class on the `_From` parameter, avoiding instantiations for every combination of `_From` and `_To`.
In retrospect, this probably should have been rolled into llvm#123973. It seemed more involved when I first decided to split. :)
only bsd and linux intercept these syscalls. Fix llvm#123601
This patch implements the diamond pattern where we are vectorizing toward the top of the diamond from both edges, but the second edge may use elements from a different vector or just scalar values. This requires some additional packing code (see lit test).
This adds an instruction to adopt `-fbounds-safety` using the preview implementation available in the fork of llvm-project.
When testing my SBProgress DAP PR (llvm#123826), I noticed Progress update messages aren't sent over DAP. This patch adds the lldb progress event's message to the body when sent over DAP. Before  Now  Tested with my [progress tester command](https://gist.github.com/Jlalond/48d85e75a91f7a137e3142e6a13d0947), testing 10 events 5 seconds apart 1-10
This is a small QoL improvement so that we don't have to go through helpers when building `NamedAttribute`s.
VecUtils::getLowest(Valse) returns the lowest instruction in the BB among Vals. If the instructions are not in the same BB, or if none of them is an instruction it returns nullptr.
…vm#124003) With the changes in 48d0eb5, the CodeGenOptions used to emit .pcm files with -fmodule-format=obj (-gmodules) were the ones from the original invocation, rather than the ones specifically crafted for outputting the pcm. This was causing the pcm to be written with only the debug info and without the __clangast section in some cases (e.g. -O2). This unforunately was not covered by existing tests, because compiling and loading a module within a single compilation load the ast content from the in-memory module cache rather than reading it from the pcm file that was written. This broke bootstrapping a build of clang with modules enabled on Darwin. rdar://143418834
…epo (llvm#123797) Not really any functional change, just a clean up that could make it easier to share snippets with other repos.
…24033) Before this patch packing a bundle of constants would crash because `getInsertPointAfterInstrs()` expected instructions. This patch fixes this.
Avoiding leaks in such cases is very hard. There are similar suppression in other Index tests.
This patch fixes: clang/lib/Sema/SemaSYCL.cpp:428:25: error: unused variable 'SKI' [-Werror,-Wunused-variable]
…lvm#123881) This extension adds eight 48 bit load store instructions. The current spec can be found at: https://github.com/quic/riscv-unified-db/releases/latest This patch adds assembler only support. --------- Co-authored-by: Harsh Chandel <[email protected]>
https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744 proposes to partition static data sections. This patch introduces a codegen pass. This patch produces jump table hotness in the in-memory states (machine jump table info and entries). Target-lowering and asm-printer consume the states and produce `.hot` section suffix. The follow up PR llvm#122215 implements such changes. --------- Co-authored-by: Ellis Hoag <[email protected]>
…#118656) This patch is an extension to llvm#115128. After profiling LLVM test-suite, I see a lot of loop nest of depth more than `MaxLoopNestDepth` which is 10. Early exit for them would save compile-time as it would avoid computing DependenceInfo and CacheCost. Please see 'bound-max-depth' branch on compile-time-tracker.
Fixes llvm#113191 Issue: [flang][OpenMP] Runtime segfault when an allocatable variable is used with copyin Rootcause: The value of the threadprivate variable is not being copied from the primary thread to the other threads within a parallel region. As a result it tries to access a null pointer inside a parallel region which causes segfault. Fix: When allocatables used with copyin clause need to ensure that, on entry to any parallel region each thread’s copy of a variable will acquire the allocation status of the primary thread, before copying the value of a threadprivate variable of the primary thread to the threadprivate variable of each other member of the team.
When `try_table`'s catch clause's destination has a return type, as in the case of catch with a concrete tag, catch_ref, and catch_all_ref. For example: ```wasm block exnref try_table (catch_all_ref 0) ... end_try_table end_block ... use exnref ... ``` This code is not valid because the block's body type is not exnref. So we add an unreachable after the 'end_try_table' to make the code valid here: ```wasm block exnref try_table (catch_all_ref 0) ... end_try_table unreachable ;; Newly added end_block ``` Because 'unreachable' is a terminator we also need to split the BB. --- We need to handle the same thing for unwind mismatch handling. In the code below, we create a "trampoline BB" that will be the destination for the nested `try_table`~`end_try_table` added to fix a unwind mismatch: ```wasm try_table (catch ... ) block exnref ... try_table (catch_all_ref N) some code end_try_table ... end_block ;; Trampoline BB throw_ref end_try_table ``` While the `block` added for the trampoline BB has the return type `exnref`, its body, which contains the nested `try_table` and other code, wouldn't have the `exnref` return type. Most times it didn't become a problem because the block's body ended with something like `br` or `return`, but that may not always be the case, especially when there is a loop. So we add an `unreachable` to make the code valid here too: ```wasm try_table (catch ... ) block exnref ... try_table (catch_all_ref N) some code end_try_table ... unreachable ;; Newly added end_block ;; Trampoline BB throw_ref end_try_table ``` In this case we just append the `unreachable` at the end of the layout predecessor BB. (This was tricky to do in the first (non-mismatch) case because there `end_try_table` and `end_block` were added in the beginning of an EH pad in `placeTryTableMarker` and moving `end_try_table` and the new `unreachable` to the previous BB caused other problems.) --- This adds many `unreaachable`s to the output, but this adds `unreachable` to only a few places to see if this is working. The FileCheck lines in `exception.ll` and `cfg-stackify-eh.ll` are already heavily redacted to only leave important control-flow instructions, so I don't think it's worth adding `unreachable`s everywhere.
Resubmit, previously PR has compilation issues.
…24041) `X86FrameLowering::emitSPUpdate()` assumes that 64-bit targets use a 64-bit stack pointer, but that's not true on x32. When checking the stack pointer size, we need to look at `Uses64BitFramePtr` rather than `Is64Bit`. This avoids generating invalid instructions like `add esp, rcx`. For impossibly-large stack frames (4 GiB or larger with a 32-bit stack pointer), we were also generating invalid instructions like `mov eax, 5000000000`. The inline stack probe code already had a check for that situation; I've moved the check into `emitSPUpdate()`, so any attempt to allocate a 4 GiB stack frame with a 32-bit stack pointer will now trap rather than adjusting ESP by the wrong amount. This also fixes the "can't have 32-bit 16GB stack frame" assertion, which used to be triggerable by user code but is now correct. To help catch situations like this in the future, I've added `-verify-machineinstrs` to the stack clash tests that generate large stack frames. This fixes the expensive-checks buildbot failure caused by llvm#113219.
…lvm#123916) ecb5ea6 tried to fix cases when LLD links what seems to be import library header objects from MSVC. However, the fix seems incorrect; the review at https://reviews.llvm.org/D133627 concluded that if this (treating this kind of symbol as a common symbol) is what link.exe does, it's fine. However, this is most probably not what link.exe does. The symbol mentioned in the commit message of ecb5ea6 would be a common symbol with a size of around 3 GB; this is not what might have been intended. That commit tried to avoid running into the error ".idata$4 should not refer to special section 0"; that issue is fixed for a similar style of section symbols in 4a4a8a1. Therefore, revert ecb5ea6 and extend the fix from 4a4a8a1 to also work for the section symbols in MSVC generated import libraries. The main detail about them, is that for symbols of type IMAGE_SYM_CLASS_SECTION, the Value field is not an offset, but it is an optional set of flags, corresponding to the Characteristics of the section header (although it may be empty). This is a reland of a previous version of this commit, earlier merged in 9457418 / llvm#122811. The previous version failed tests when run with address sanitizer. The issue was that the synthesized coff_symbol_generic object actually will be used to access a full coff_symbol16 or coff_symbol32 struct, see DefinedCOFF::getCOFFSymbol. Therefore, we need to make a copy of the full size of either of them.
Now that we have a dedicated abstraction for string tables, switch the option parser library's string table over to it rather than using a raw `const char*`. Also try to use the `StringTable::Offset` type rather than a raw `unsigned` where we can to avoid accidental increments or other issues. This is based on review feedback for the initial switch of options to a string table. Happy to tweak or adjust if desired here.
This is part of https://discourse.llvm.org/t/rfc-introduce-opasm-type-attr-interface-for-pretty-print-in-asmprinter/83792. OpAsmOpInterface controls the SSA Name/Block Name and Default Dialect Prefix. This PR adds the usage of them by existing examples in MLIR.
…lvm#123934) Once we get to SelectionDAG the IR should not be changing anymore, so we can use BatchAAResults rather than AAResults to cache AA queries. This should be a NFC change for targets that enable AA during codegen (such as AArch64), but also give a nice compile-time improvement in some cases. See: llvm#123787 (comment) Note: This follows Nikita's suggestion on llvm#123787.
- Added support for AArch64-specific build attributes. - Print AArch64 build attributes to assembly. - Emit AArch64 build attributes to ELF. Specification: ARM-software/abi-aa#230
…lvm#121423) Facts of eq/ne were added to unsigned system only, causing some missing optimizations. This patch adds eq/ne facts to both signed & unsigned constraint system. Fixes llvm#117961.
Most of the `basic_streambuf` functions are really simple, which makes most of the implementation when they are out of line boilerplate.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.