Skip to content

[AutoBump] Merge with 08195f31 (Jan 23) (18) #556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 80 commits into
base: bump_to_7e622b61
Choose a base branch
from

Conversation

jorickert
Copy link

No description provided.

hekota and others added 30 commits January 22, 2025 12:39
Create separate resource initialization function for each resource and
add them to CodeGenModule's `CXXGlobalInits` list.
Fixes llvm#120636 and addresses this [comment
](https://github.com/llvm/llvm-project/pull/119755/files#r1894093603).
True16 format for v_cmpx_class_f16. Update VOPCX_CLASS t16 and fake16
pseudo.
A bulk commit of true16 support for v_cmp_xx_i/u16 instructions
including:

v_cmpx_lt_i16
v_cmpx_eq_i16
v_cmpx_le_i16
v_cmpx_gt_i16
v_cmpx_ne_i16
v_cmpx_ge_i16
v_cmpx_lt_u16
v_cmpx_eq_u16
v_cmpx_le_u16
v_cmpx_gt_u16
v_cmpx_ne_u16
v_cmpx_ge_u16
This PR relands
[llvm#122992](llvm#122992).

Some machines were failing to run the `reflect-error.ll` test due to the
RUN lines
```llvm
; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %}
; RUN: not %if spirv-tools %{ llc -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 -filetype=obj %}
```
which failed when `spirv-tools` was not present on the machine due to
running the command `not` without any arguments.

These RUN lines have been removed since they don't actually test
anything new compared to the other two RUN lines due to the expected
error during instruction selection.
```llvm
; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv64-unknown-unknown %s -o /dev/null 2>&1 | FileCheck %s
; RUN: not llc -verify-machineinstrs -O0 -mtriple=spirv32-unknown-unknown %s -o /dev/null 2>&1 | FileCheck %s
```
A SYCL kernel entry point function is a non-member function or a static
member function declared with the `sycl_kernel_entry_point` attribute.
Such functions define a pattern for an offload kernel entry point
function to be generated to enable execution of a SYCL kernel on a
device. A SYCL library implementation orchestrates the invocation of
these functions with corresponding SYCL kernel arguments in response to
calls to SYCL kernel invocation functions specified by the SYCL 2020
specification.

The offload kernel entry point function (sometimes referred to as the
SYCL kernel caller function) is generated from the SYCL kernel entry
point function by a transformation of the function parameters followed
by a transformation of the function body to replace references to the
original parameters with references to the transformed ones. Exactly how
parameters are transformed will be explained in a future change that
implements non-trivial transformations. For now, it suffices to state
that a given parameter of the SYCL kernel entry point function may be
transformed to multiple parameters of the offload kernel entry point as
needed to satisfy offload kernel argument passing requirements.
Parameters that are decomposed in this way are reconstituted as local
variables in the body of the generated offload kernel entry point
function.

For example, given the following SYCL kernel entry point function
definition:
```
template<typename KernelNameType, typename KernelType>
[[clang::sycl_kernel_entry_point(KernelNameType)]]
void sycl_kernel_entry_point(KernelType kernel) {
  kernel();
}
```

and the following call:
```
struct Kernel {
  int dm1;
  int dm2;
  void operator()() const;
};
Kernel k;
sycl_kernel_entry_point<class kernel_name>(k);
```

the corresponding offload kernel entry point function that is generated
might look as follows (assuming `Kernel` is a type that requires
decomposition):
```
void offload_kernel_entry_point_for_kernel_name(int dm1, int dm2) {
  Kernel kernel{dm1, dm2};
  kernel();
}
```

Other details of the generated offload kernel entry point function, such
as its name and calling convention, are implementation details that need
not be reflected in the AST and may differ across target devices. For
that reason, only the transformation described above is represented in
the AST; other details will be filled in during code generation.

These transformations are represented using new AST nodes introduced
with this change. `OutlinedFunctionDecl` holds a sequence of
`ImplicitParamDecl` nodes and a sequence of statement nodes that
correspond to the transformed parameters and function body.
`SYCLKernelCallStmt` wraps the original function body and associates it
with an `OutlinedFunctionDecl` instance. For the example above, the AST
generated for the `sycl_kernel_entry_point<kernel_name>` specialization
would look as follows:
```
FunctionDecl 'sycl_kernel_entry_point<kernel_name>(Kernel)'
  TemplateArgument type 'kernel_name'
  TemplateArgument type 'Kernel'
  ParmVarDecl kernel 'Kernel'
  SYCLKernelCallStmt
    CompoundStmt
      <original statements>
    OutlinedFunctionDecl
      ImplicitParamDecl 'dm1' 'int'
      ImplicitParamDecl 'dm2' 'int'
      CompoundStmt
        VarDecl 'kernel' 'Kernel'
          <initialization of 'kernel' with 'dm1' and 'dm2'>
        <transformed statements with redirected references of 'kernel'>
```

Any ODR-use of the SYCL kernel entry point function will (with future
changes) suffice for the offload kernel entry point to be emitted. An
actual call to the SYCL kernel entry point function will result in a
call to the function. However, evaluation of a `SYCLKernelCallStmt`
statement is a no-op, so such calls will have no effect other than to
trigger emission of the offload kernel entry point.

Additionally, as a related change inspired by code review feedback,
these changes disallow use of the `sycl_kernel_entry_point` attribute
with functions defined with a _function-try-block_. The SYCL 2020
specification prohibits the use of C++ exceptions in device functions.
Even if exceptions were not prohibited, it is unclear what the semantics
would be for an exception that escapes the SYCL kernel entry point
function; the boundary between host and device code could be an implicit
noexcept boundary that results in program termination if violated, or
the exception could perhaps be propagated to host code via the SYCL
library. Pending support for C++ exceptions in device code and clear
semantics for handling them at the host-device boundary, this change
makes use of the `sycl_kernel_entry_point` attribute with a function
defined with a _function-try-block_ an error.
…vance. NFC (llvm#123876)

Use this to improve performance of SubtargetEmitter::findWriteResources
and SubtargetEmitter::findReadAdvance. Now we can do a map lookup
instead of a linear search through all WriteRes/ReadAdvance records.
    
This reduces the build time of RISCVGenSubtargetInfo.inc on my
machine from 43 seconds to 10 seconds.
…d mtriple used when passing options into the translate API call (llvm#123975)

Rename internal command line flags for optimization level and mtriple
used when passing options into the translate API call.
…r` in `TypeLocTypeMatcher` (llvm#123450)

There are no template in `TypeLocTypeMatcher`. So we do not need to use
`DynTypedMatcher` which can improve performance
MSVC ignores the `/defArm64Native` argument on non-ARM64X targets.
It is also ignored if the `/def` option is not specified.
Plumbs through creating file ranges to C and Python.
…3718)

This changes the implementation of `__copy_cvref_t` to only template the
implementation class on the `_From` parameter, avoiding instantiations
for every combination of `_From` and `_To`.
In retrospect, this probably should have been rolled into llvm#123973. It
seemed more involved when I first decided to split. :)
This patch implements the diamond pattern where we are vectorizing
toward the top of the diamond from both edges, but the second edge may
use elements from a different vector or just scalar values. This
requires some additional packing code (see lit test).
This adds an instruction to adopt `-fbounds-safety` using the preview
implementation available in the fork of llvm-project.
When testing my SBProgress DAP PR (llvm#123826), I noticed Progress update
messages aren't sent over DAP. This patch adds the lldb progress event's
message to the body when sent over DAP.

Before 

![image](https://github.com/user-attachments/assets/404adaa8-b784-4f23-895f-cd3625fdafad)


Now

![image](https://github.com/user-attachments/assets/eb1c3235-0936-4e36-96e5-0a0ee60dabb8)

Tested with my [progress tester
command](https://gist.github.com/Jlalond/48d85e75a91f7a137e3142e6a13d0947),
testing 10 events 5 seconds apart 1-10
This is a small QoL improvement so that we don't have to go through
helpers when building `NamedAttribute`s.
VecUtils::getLowest(Valse) returns the lowest instruction in the BB among Vals.
If the instructions are not in the same BB, or if none of them is an
instruction it returns nullptr.
…vm#124003)

With the changes in 48d0eb5, the CodeGenOptions used to emit .pcm
files with -fmodule-format=obj (-gmodules) were the ones from the
original invocation, rather than the ones specifically crafted for
outputting the pcm. This was causing the pcm to be written with only the
debug info and without the __clangast section in some cases (e.g. -O2).
This unforunately was not covered by existing tests, because compiling
and loading a module within a single compilation load the ast content
from the in-memory module cache rather than reading it from the pcm file
that was written. This broke bootstrapping a build of clang with modules
enabled on Darwin.

rdar://143418834
…epo (llvm#123797)

Not really any functional change, just a clean up that could make it
easier to share snippets with other repos.
…24033)

Before this patch packing a bundle of constants would crash because
`getInsertPointAfterInstrs()` expected instructions. This patch fixes
this.
Avoiding leaks in such cases is very hard.

There are similar suppression in other Index tests.
This patch fixes:

  clang/lib/Sema/SemaSYCL.cpp:428:25: error: unused variable 'SKI'
  [-Werror,-Wunused-variable]
SixWeining and others added 30 commits January 23, 2025 12:38
…lvm#123881)

This extension adds eight 48 bit load store instructions.

The current spec can be found at:
https://github.com/quic/riscv-unified-db/releases/latest

This patch adds assembler only support.

---------

Co-authored-by: Harsh Chandel <[email protected]>
https://discourse.llvm.org/t/rfc-profile-guided-static-data-partitioning/83744
proposes to partition static data sections.

This patch introduces a codegen pass. This patch produces jump table
hotness in the in-memory states (machine jump table info and entries).
Target-lowering and asm-printer consume the states and produce `.hot`
section suffix. The follow up PR
llvm#122215 implements such
changes.

---------

Co-authored-by: Ellis Hoag <[email protected]>
…#118656)

This patch is an extension to llvm#115128.

After profiling LLVM test-suite, I see a lot of loop nest of depth more
than `MaxLoopNestDepth` which is 10. Early exit for them would save
compile-time as it would avoid computing DependenceInfo and CacheCost.

Please see 'bound-max-depth' branch on compile-time-tracker.
Fixes llvm#113191

Issue: [flang][OpenMP] Runtime segfault when an allocatable variable is
used with copyin

Rootcause: The value of the threadprivate variable is not being copied
from the primary thread to the other threads within a parallel region.
As a result it tries to access a null pointer inside a parallel region
which causes segfault.

Fix: When allocatables used with copyin clause need to ensure that, on
entry to any parallel region each thread’s copy of a variable will
acquire the allocation status of the primary thread, before copying the
value of a threadprivate variable of the primary thread to the
threadprivate variable of each other member of the team.
When `try_table`'s catch clause's destination has a return type, as in
the case of catch with a concrete tag, catch_ref, and catch_all_ref. For
example:
```wasm
block exnref
  try_table (catch_all_ref 0)
    ...
  end_try_table
end_block
... use exnref ...
```

This code is not valid because the block's body type is not exnref. So
we add an unreachable after the 'end_try_table' to make the code valid
here:
```wasm
block exnref
  try_table (catch_all_ref 0)
    ...
  end_try_table
  unreachable                    ;; Newly added
end_block
```
Because 'unreachable' is a terminator we also need to split the BB.

---

We need to handle the same thing for unwind mismatch handling. In the
code below, we create a "trampoline BB" that will be the destination for
the nested `try_table`~`end_try_table` added to fix a unwind mismatch:
```wasm
try_table (catch ... )
  block exnref
    ...
    try_table (catch_all_ref N)
      some code
    end_try_table
    ...
  end_block                      ;; Trampoline BB
  throw_ref
end_try_table
```
While the `block` added for the trampoline BB has the return type
`exnref`, its body, which contains the nested `try_table` and other
code, wouldn't have the `exnref` return type. Most times it didn't
become a problem because the block's body ended with something like `br`
or `return`, but that may not always be the case, especially when there
is a loop. So we add an `unreachable` to make the code valid here too:
```wasm
try_table (catch ... )
  block exnref
    ...
    try_table (catch_all_ref N)
      some code
    end_try_table
    ...
    unreachable                  ;; Newly added
  end_block                      ;; Trampoline BB
  throw_ref
end_try_table
```
In this case we just append the `unreachable` at the end of the layout
predecessor BB. (This was tricky to do in the first (non-mismatch) case
because there `end_try_table` and `end_block` were added in the
beginning of an EH pad in `placeTryTableMarker` and moving
`end_try_table` and the new `unreachable` to the previous BB caused
other problems.)

---

This adds many `unreaachable`s to the output, but this adds
`unreachable` to only a few places to see if this is working. The
FileCheck lines in `exception.ll` and `cfg-stackify-eh.ll` are already
heavily redacted to only leave important control-flow instructions, so I
don't think it's worth adding `unreachable`s everywhere.
Resubmit, previously PR has compilation issues.
…24041)

`X86FrameLowering::emitSPUpdate()` assumes that 64-bit targets use a
64-bit stack pointer, but that's not true on x32.
When checking the stack pointer size, we need to look at
`Uses64BitFramePtr` rather than `Is64Bit`. This avoids generating
invalid instructions like `add esp, rcx`.

For impossibly-large stack frames (4 GiB or larger with a 32-bit stack
pointer), we were also generating invalid instructions like `mov eax,
5000000000`. The inline stack probe code already had a check for that
situation; I've moved the check into `emitSPUpdate()`, so any attempt to
allocate a 4 GiB stack frame with a 32-bit stack pointer will now trap
rather than adjusting ESP by the wrong amount. This also fixes the
"can't have 32-bit 16GB stack frame" assertion, which used to be
triggerable by user code but is now correct.

To help catch situations like this in the future, I've added
`-verify-machineinstrs` to the stack clash tests that generate large
stack frames.

This fixes the expensive-checks buildbot failure caused by llvm#113219.
…lvm#123916)

ecb5ea6 tried to fix cases when LLD
links what seems to be import library header objects from MSVC. However,
the fix seems incorrect; the review at https://reviews.llvm.org/D133627
concluded that if this (treating this kind of symbol as a common symbol)
is what link.exe does, it's fine.

However, this is most probably not what link.exe does. The symbol
mentioned in the commit message of
ecb5ea6 would be a common symbol with a
size of around 3 GB; this is not what might have been intended.

That commit tried to avoid running into the error ".idata$4 should not
refer to special section 0"; that issue is fixed for a similar style of
section symbols in 4a4a8a1.

Therefore, revert ecb5ea6 and extend
the fix from 4a4a8a1 to also work for
the section symbols in MSVC generated import libraries.

The main detail about them, is that for symbols of type
IMAGE_SYM_CLASS_SECTION, the Value field is not an offset, but it is an
optional set of flags, corresponding to the Characteristics of the
section header (although it may be empty).

This is a reland of a previous version of this commit, earlier merged in
9457418 / llvm#122811. The previous version
failed tests when run with address sanitizer. The issue was that the
synthesized coff_symbol_generic object actually will be used to access a
full coff_symbol16 or coff_symbol32 struct, see
DefinedCOFF::getCOFFSymbol. Therefore, we need to make a copy of the
full size of either of them.
Now that we have a dedicated abstraction for string tables, switch the
option parser library's string table over to it rather than using a raw
`const char*`. Also try to use the `StringTable::Offset` type rather
than a raw `unsigned` where we can to avoid accidental increments or
other issues.

This is based on review feedback for the initial switch of options to a
string table. Happy to tweak or adjust if desired here.
This is part of
https://discourse.llvm.org/t/rfc-introduce-opasm-type-attr-interface-for-pretty-print-in-asmprinter/83792.

OpAsmOpInterface controls the SSA Name/Block Name and Default Dialect
Prefix. This PR adds the usage of them by existing examples in MLIR.
…lvm#123934)

Once we get to SelectionDAG the IR should not be changing anymore, so we
can use BatchAAResults rather than AAResults to cache AA queries.

This should be a NFC change for targets that enable AA during codegen
(such as AArch64), but also give a nice compile-time improvement in some
cases. See:
llvm#123787 (comment)

Note: This follows Nikita's suggestion on llvm#123787.
- Added support for AArch64-specific build attributes.
- Print AArch64 build attributes to assembly.
- Emit AArch64 build attributes to ELF.

Specification: ARM-software/abi-aa#230
…lvm#121423)

Facts of eq/ne were added to unsigned system only, causing some missing
optimizations. This patch adds eq/ne facts to both signed & unsigned
constraint system.

Fixes llvm#117961.
Most of the `basic_streambuf` functions are really simple, which makes
most of the implementation when they are out of line boilerplate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.