forked from llvm/llvm-project
-
Notifications
You must be signed in to change notification settings - Fork 0
[mlir][mlir-query] Add inital draft for mlir-query tool #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
devajithvs
wants to merge
12
commits into
main
Choose a base branch
from
devajith.mlir-query
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
afbefe0
to
b3b897b
Compare
6b94c1e
to
ea94792
Compare
devajithvs
pushed a commit
that referenced
this pull request
Jul 12, 2023
The original MFS work D85368 shows good performance improvement with Instrumented FDO. However, AutoFDO or Flow-Sensitive AutoFDO (FSAFDO) does not show performance gain. This is mainly caused by a less accurate profile compared to the iFDO profile. For the past few months, we have been working to improve FSAFDO quality, like in D145171. Taking advantage of this improvement, MFS now shows performance improvements over FSAFDO profiles. That being said, 2 minor changes need to be made, 1) An FS-AutoFDO profile generation pass needs to be added right before MFS pass and an FSAFDO profile load pass is needed when FS-AutoFDO is enabled and the MFS flag is present. 2) MFS only applies to hot functions, because we believe (and experiment also shows) FS-AutoFDO is more accurate about functions that have plenty of samples than those with no or very few samples. With this improvement, we see a 1.2% performance improvement in clang benchmark, 0.9% QPS improvement in our internal search benchmark, and 3%-5% improvement in internal storage benchmark. This is #1 of the two patches that enables the improvement. Reviewed By: wenlei, snehasish, xur Differential Revision: https://reviews.llvm.org/D152399
devajithvs
pushed a commit
that referenced
this pull request
Jul 12, 2023
…tput The crash happens in clang::driver::tools::SplitDebugName when Output is InputInfo::Nothing. It doesn't happen with standalone clang driver because output is created in Driver::BuildJobsForActionNoCache. Example backtrace: ``` * thread #1, name = 'clangd', stop reason = hit program assert * frame #0: 0x00007ffff5c4eacf libc.so.6`raise + 271 frame #1: 0x00007ffff5c21ea5 libc.so.6`abort + 295 frame #2: 0x00007ffff5c21d79 libc.so.6`__assert_fail_base.cold.0 + 15 frame llvm#3: 0x00007ffff5c47426 libc.so.6`__assert_fail + 70 frame llvm#4: 0x000055555dc0923c clangd`clang::driver::InputInfo::getFilename(this=0x00007fffffff9398) const at InputInfo.h:84:5 frame llvm#5: 0x000055555dcd0d8d clangd`clang::driver::tools::SplitDebugName(JA=0x000055555f6c6a50, Args=0x000055555f6d0b80, Input=0x00007fffffff9678, Output=0x00007fffffff9398) at CommonArgs.cpp:1275:40 frame llvm#6: 0x000055555dc955a5 clangd`clang::driver::tools::Clang::ConstructJob(this=0x000055555f6c69d0, C=0x000055555f6c64a0, JA=0x000055555f6c6a50, Output=0x00007fffffff9398, Inputs=0x00007fffffff9668, Args=0x000055555f6d0b80, LinkingOutput=0x0000000000000000) const at Clang.cpp:5690:33 frame llvm#7: 0x000055555dbf6b54 clangd`clang::driver::Driver::BuildJobsForActionNoCache(this=0x00007fffffffb5e0, C=0x000055555f6c64a0, A=0x000055555f6c6a50, TC=0x000055555f6c4be0, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=1, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:5618:10 frame llvm#8: 0x000055555dbf4ef0 clangd`clang::driver::Driver::BuildJobsForAction(this=0x00007fffffffb5e0, C=0x000055555f6c64a0, A=0x000055555f6c6a50, TC=0x000055555f6c4be0, BoundArch=(Data = 0x0000000000000000, Length = 0), AtTopLevel=true, MultipleArchs=false, LinkingOutput=0x0000000000000000, CachedResults=size=1, TargetDeviceOffloadKind=OFK_None) const at Driver.cpp:5306:26 frame llvm#9: 0x000055555dbeb590 clangd`clang::driver::Driver::BuildJobs(this=0x00007fffffffb5e0, C=0x000055555f6c64a0) const at Driver.cpp:4844:5 frame llvm#10: 0x000055555dbe6b0f clangd`clang::driver::Driver::BuildCompilation(this=0x00007fffffffb5e0, ArgList=ArrayRef<const char *> @ 0x00007fffffffb268) at Driver.cpp:1496:3 frame llvm#11: 0x000055555b0cc0d9 clangd`clang::createInvocation(ArgList=ArrayRef<const char *> @ 0x00007fffffffbb38, Opts=CreateInvocationOptions @ 0x00007fffffffbb90) at CreateInvocationFromCommandLine.cpp:53:52 frame llvm#12: 0x000055555b378e7b clangd`clang::clangd::buildCompilerInvocation(Inputs=0x00007fffffffca58, D=0x00007fffffffc158, CC1Args=size=0) at Compiler.cpp:116:44 frame llvm#13: 0x000055555895a6c8 clangd`clang::clangd::(anonymous namespace)::Checker::buildInvocation(this=0x00007fffffffc760, TFS=0x00007fffffffe570, Contents= Has Value=false ) at Check.cpp:212:9 frame llvm#14: 0x0000555558959cec clangd`clang::clangd::check(File=(Data = "build/test.cpp", Length = 64), TFS=0x00007fffffffe570, Opts=0x00007fffffffe600) at Check.cpp:486:34 frame llvm#15: 0x000055555892164a clangd`main(argc=4, argv=0x00007fffffffecd8) at ClangdMain.cpp:993:12 frame llvm#16: 0x00007ffff5c3ad85 libc.so.6`__libc_start_main + 229 frame llvm#17: 0x00005555585bbe9e clangd`_start + 46 ``` Test Plan: ninja ClangDriverTests && tools/clang/unittests/Driver/ClangDriverTests Differential Revision: https://reviews.llvm.org/D154602
b9b8bc2
to
6c2ec4f
Compare
6d7039c
to
8763f67
Compare
devajithvs
pushed a commit
that referenced
this pull request
Oct 2, 2023
Summary: Thread sanitizer reports the following data race: ``` Write of size 8 at 0x000103303e70 by thread T1 (mutexes: write M0): #0 RNBRemote::CommDataReceived(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) RNBRemote.cpp:1075 (debugserver:arm64+0x100038db8) (BuildId: f130b34f693c4f3eba96139104af2b7132000000200000000100000000000e00) #1 RNBRemote::ThreadFunctionReadRemoteData(void*) RNBRemote.cpp:1180 (debugserver:arm64+0x1000391dc) (BuildId: f130b34f693c4f3eba96139104af2b7132000000200000000100000000000e00) Previous read of size 8 at 0x000103303e70 by main thread: #0 RNBRemote::GetPacketPayload(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) RNBRemote.cpp:797 (debugserver:arm64+0x100037c5c) (BuildId: f130b34f693c4f3eba96139104af2b7132000000200000000100000000000e00) #1 RNBRemote::GetPacket(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, RNBRemote::Packet&, bool) RNBRemote.cpp:907 (debugserver:arm64+0x1000378cc) (BuildId: f130b34f693c4f3eba96139104af2b7132000000200000000100000000000e00) ``` RNBRemote already has a mutex, extend its usage to protect the read of m_rx_packets. Reviewers: jdevlieghere, bulbazord, jingham Subscribers:
devajithvs
pushed a commit
that referenced
this pull request
Oct 2, 2023
…fine.parallel verifier This patch updates AffineParallelOp::verify() to check each result type matches its corresponding reduction op (i.e, the result type must be a `FloatType` if the reduction attribute is `addf`) affine.parallel will crash on --lower-affine if the corresponding result type cannot match the reduction attribute. ``` %128 = affine.parallel (%arg2, %arg3) = (0, 0) to (8, 7) reduce ("maxf") -> (memref<8x7xf32>) { %alloc_33 = memref.alloc() : memref<8x7xf32> affine.yield %alloc_33 : memref<8x7xf32> } ``` This will crash and report a type conversion issue when we run `mlir-opt --lower-affine` ``` Assertion failed: (isa<To>(Val) && "cast<Ty>() argument of incompatible type!"), function cast, file Casting.h, line 572. PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: mlir-opt --lower-affine temp.mlir #0 0x0000000102a18f18 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/workspacebin/mlir-opt+0x1002f8f18) #1 0x0000000102a171b4 llvm::sys::RunSignalHandlers() (/workspacebin/mlir-opt+0x1002f71b4) #2 0x0000000102a195c4 SignalHandler(int) (/workspacebin/mlir-opt+0x1002f95c4) llvm#3 0x00000001be7894c4 (/usr/lib/system/libsystem_platform.dylib+0x1803414c4) llvm#4 0x00000001be771ee0 (/usr/lib/system/libsystem_pthread.dylib+0x180329ee0) llvm#5 0x00000001be6ac340 (/usr/lib/system/libsystem_c.dylib+0x180264340) llvm#6 0x00000001be6ab754 (/usr/lib/system/libsystem_c.dylib+0x180263754) llvm#7 0x0000000106864790 mlir::arith::getIdentityValueAttr(mlir::arith::AtomicRMWKind, mlir::Type, mlir::OpBuilder&, mlir::Location) (.cold.4) (/workspacebin/mlir-opt+0x104144790) llvm#8 0x0000000102ba66ac mlir::arith::getIdentityValueAttr(mlir::arith::AtomicRMWKind, mlir::Type, mlir::OpBuilder&, mlir::Location) (/workspacebin/mlir-opt+0x1004866ac) llvm#9 0x0000000102ba6910 mlir::arith::getIdentityValue(mlir::arith::AtomicRMWKind, mlir::Type, mlir::OpBuilder&, mlir::Location) (/workspacebin/mlir-opt+0x100486910) ... ``` Fixes llvm#64068 Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D157985
This commit adds the initial version of the mlir-query tool, which leverages the pre-existing matchers defined in mlir/include/mlir/IR/Matchers.h The tool provides the following set of basic queries: QUERY MATCHER hasOpAttrName(string) -> m_Attr hasOpName(string) -> m_Op isConstantOp() -> m_Constant isNegInfFloat() -> m_NegInfFloat isNegZeroFloat() -> m_NegZeroFloat isNonZero() -> m_NonZero isOne() -> m_One isOneFloat() -> m_OneFloat isPosInfFloat() -> m_PosInfFloat isPosZeroFloat() -> m_PosZeroFloat isZero() -> m_Zero isZeroFloat() -> m_AnyZeroFloat Differential Revision: https://reviews.llvm.org/D155127
f3d0a30
to
be9ee6a
Compare
devajithvs
pushed a commit
that referenced
this pull request
Nov 8, 2024
…ates explicitly specialized for an implicitly instantiated class template specialization (llvm#113464) Consider the following: ``` template<typename T> struct A { template<typename U> struct B { static constexpr int x = 0; // #1 }; template<typename U> struct B<U*> { static constexpr int x = 1; // #2 }; }; template<> template<typename U> struct A<long>::B { static constexpr int x = 2; // llvm#3 }; static_assert(A<short>::B<int>::y == 0); // uses #1 static_assert(A<short>::B<int*>::y == 1); // uses #2 static_assert(A<long>::B<int>::y == 2); // uses llvm#3 static_assert(A<long>::B<int*>::y == 2); // uses llvm#3 ``` According to [temp.spec.partial.member] p2: > If the primary member template is explicitly specialized for a given (implicit) specialization of the enclosing class template, the partial specializations of the member template are ignored for this specialization of the enclosing class template. If a partial specialization of the member template is explicitly specialized for a given (implicit) specialization of the enclosing class template, the primary member template and its other partial specializations are still considered for this specialization of the enclosing class template. The example above fails to compile because we currently don't implement [temp.spec.partial.member] p2. This patch implements the wording, fixing llvm#51051.
devajithvs
pushed a commit
that referenced
this pull request
Nov 8, 2024
We've found that basic profiling could help improving/optimizing when developing clang-tidy checks. This PR adds an extra command ``` set enable-profile (true|false) Set whether to enable matcher profiling. ``` which enables profiling queries on each file. Sample output: ``` $ cat test.cql set enable-profile true m binaryOperator(isExpansionInMainFile()) $ cat test.c int test(int i, int j) { return i + j; } $ clang-query --track-memory -f test.cql test.c -- Match #1: {{.*}}/test.c:2:10: note: "root" binds here 2 | return i + j; | ^~~~~ 1 match. ===-------------------------------------------------------------------------=== clang-query matcher profiling ===-------------------------------------------------------------------------=== Total Execution Time: 0.0000 seconds (0.0000 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- ---Mem--- --- Name --- 0.0000 (100.0%) 0.0000 (100.0%) 0.0000 (100.0%) 0.0000 (100.0%) 224 {{.*}}/test.c 0.0000 (100.0%) 0.0000 (100.0%) 0.0000 (100.0%) 0.0000 (100.0%) 224 Total ```
devajithvs
pushed a commit
that referenced
this pull request
Nov 8, 2024
llvm#115376) …15019)" This reverts commit 9f79615. This is breaking compiler-rt/lib/sanitizer_common/... Author knows about the breakage.
devajithvs
pushed a commit
that referenced
this pull request
Nov 17, 2024
… depobj construct (llvm#114221) A codegen crash is occurring when a depend object was initialized with omp_all_memory in the depobj directive. llvm#114214 The root cause of issue looks to be the improper handling of the dependency list when omp_all_memory was specified. The change introduces the use of OMPTaskDataTy to manage dependencies. The buildDependences function is called to construct the dependency list, and the list is iterated over to emit and store the dependencies. Reduced Test Case : ``` #include <omp.h> int main() { omp_depend_t obj; #pragma omp depobj(obj) depend(inout: omp_all_memory) } ``` ``` #1 0x0000000003de6623 SignalHandler(int) Signals.cpp:0:0 #2 0x00007f8e4a6b990f (/lib64/libpthread.so.0+0x1690f) llvm#3 0x00007f8e4a117d2a raise (/lib64/libc.so.6+0x4ad2a) llvm#4 0x00007f8e4a1193e4 abort (/lib64/libc.so.6+0x4c3e4) llvm#5 0x00007f8e4a10fc69 __assert_fail_base (/lib64/libc.so.6+0x42c69) llvm#6 0x00007f8e4a10fcf1 __assert_fail (/lib64/libc.so.6+0x42cf1) llvm#7 0x0000000004114367 clang::CodeGen::CodeGenFunction::EmitOMPDepobjDirective(clang::OMPDepobjDirective const&) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x4114367) llvm#8 0x00000000040f8fac clang::CodeGen::CodeGenFunction::EmitStmt(clang::Stmt const*, llvm::ArrayRef<clang::Attr const*>) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x40f8fac) llvm#9 0x00000000040ff4fb clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope(clang::CompoundStmt const&, bool, clang::CodeGen::AggValueSlot) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x40ff4fb) llvm#10 0x00000000041847b2 clang::CodeGen::CodeGenFunction::EmitFunctionBody(clang::Stmt const*) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x41847b2) llvm#11 0x0000000004199e4a clang::CodeGen::CodeGenFunction::GenerateCode(clang::GlobalDecl, llvm::Function*, clang::CodeGen::CGFunctionInfo const&) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x4199e4a) llvm#12 0x00000000041f7b9d clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition(clang::GlobalDecl, llvm::GlobalValue*) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x41f7b9d) llvm#13 0x00000000041f16a3 clang::CodeGen::CodeGenModule::EmitGlobalDefinition(clang::GlobalDecl, llvm::GlobalValue*) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x41f16a3) llvm#14 0x00000000041fd954 clang::CodeGen::CodeGenModule::EmitDeferred() (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x41fd954) llvm#15 0x0000000004200277 clang::CodeGen::CodeGenModule::Release() (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x4200277) llvm#16 0x00000000046b6a49 (anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit(clang::ASTContext&) ModuleBuilder.cpp:0:0 llvm#17 0x00000000046b4cb6 clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x46b4cb6) llvm#18 0x0000000006204d5c clang::ParseAST(clang::Sema&, bool, bool) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x6204d5c) llvm#19 0x000000000496b278 clang::FrontendAction::Execute() (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x496b278) llvm#20 0x00000000048dd074 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x48dd074) llvm#21 0x0000000004a38092 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0x4a38092) llvm#22 0x0000000000fd4e9c cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0xfd4e9c) llvm#23 0x0000000000fcca73 ExecuteCC1Tool(llvm::SmallVectorImpl<char const*>&, llvm::ToolContext const&) driver.cpp:0:0 llvm#24 0x0000000000fd140c clang_main(int, char**, llvm::ToolContext const&) (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0xfd140c) llvm#25 0x0000000000ee2ef3 main (/opt/cray/pe/cce/18.0.1/cce-clang/x86_64/bin/clang-18+0xee2ef3) llvm#26 0x00007f8e4a10224c __libc_start_main (/lib64/libc.so.6+0x3524c) llvm#27 0x0000000000fcaae9 _start /home/abuild/rpmbuild/BUILD/glibc-2.31/csu/../sysdeps/x86_64/start.S:120:0 clang: error: unable to execute command: Aborted ``` --------- Co-authored-by: Chandra Ghale <[email protected]>
devajithvs
pushed a commit
that referenced
this pull request
Nov 17, 2024
…tely from Linux (llvm#115722) This test fails on https://lab.llvm.org/staging/#/builders/197/builds/76/steps/18/logs/FAIL__lldb-shell__inline_sites_live_cpp because of a little difference in the lldb output. ``` # .---command stderr------------ # | C:\buildbot\as-builder-10\lldb-x-aarch64\llvm-project\lldb\test\Shell\SymbolFile\NativePDB\inline_sites_live.cpp:25:11: error: CHECK: expected string not found in input # | // CHECK: * thread #1, stop reason = breakpoint 1 # | ^ # | <stdin>:1:1: note: scanning from here # | (lldb) platform select remote-linux # | ^ # | <stdin>:28:27: note: possible intended match here # | * thread #1, name = 'inline_sites_li', stop reason = breakpoint 1.3 # | ^ # | ```
devajithvs
pushed a commit
that referenced
this pull request
Nov 17, 2024
Add patterns to fold MOV (scalar, predicated) to MOV (imm, pred, merging) or MOV (imm, pred, zeroing) as appropriate. This affects the `@llvm.aarch64.sve.dup` intrinsics, which currently generate MOV (scalar, predicated) instructions even when the immediate forms are possible. For example: ``` svuint8_t mov_z_b(svbool_t p) { return svdup_u8_z(p, 1); } ``` Currently generates: ``` mov_z_b(__SVBool_t): mov z0.b, #0 mov w8, #1 mov z0.b, p0/m, w8 ret ``` Instead of: ``` mov_z_b(__SVBool_t): mov z0.b, p0/z, #1 ret ```
devajithvs
pushed a commit
that referenced
this pull request
Feb 6, 2025
Change https://reviews.llvm.org/D140059 exposed the following crash in Z3Solver, where bit widths were not checked consistently with that change. This change makes the check consistent, and fixes the crash. ``` clang: <root>/llvm/include/llvm/ADT/APSInt.h:99: int64_t llvm::APSInt::getExtValue() const: Assertion `isRepresentableByInt64() && "Too many bits for int64_t"' failed. ... Stack dump: 0. Program arguments: clang -cc1 -internal-isystem <root>/lib/clang/16/include -nostdsysteminc -analyze -analyzer-checker=core,unix.Malloc,debug.ExprInspection -analyzer-config crosscheck-with-z3=true -verify reproducer.c #0 0x00000000045b3476 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) <root>/llvm/lib/Support/Unix/Signals.inc:567:22 #1 0x00000000045b3862 PrintStackTraceSignalHandler(void*) <root>/llvm/lib/Support/Unix/Signals.inc:641:1 #2 0x00000000045b14a5 llvm::sys::RunSignalHandlers() <root>/llvm/lib/Support/Signals.cpp:104:20 llvm#3 0x00000000045b2eb4 SignalHandler(int) <root>/llvm/lib/Support/Unix/Signals.inc:412:1 ... llvm#9 0x0000000004be2eb3 llvm::APSInt::getExtValue() const <root>/llvm/include/llvm/ADT/APSInt.h:99:5 <root>/llvm/lib/Support/Z3Solver.cpp:740:53 clang::ASTContext&, clang::ento::SymExpr const*, llvm::APSInt const&, llvm::APSInt const&, bool) <root>/clang/include/clang/StaticAnalyzer/Core/PathSensitive/SMTConv.h:552:61 ``` Reviewed By: steakhal Differential Revision: https://reviews.llvm.org/D142627 (cherry picked from commit f027dd5)
devajithvs
pushed a commit
that referenced
this pull request
Mar 17, 2025
…ible (llvm#123752) This patch adds a new option `-aarch64-enable-zpr-predicate-spills` (which is disabled by default), this option replaces predicate spills with vector spills in streaming[-compatible] functions. For example: ``` str p8, [sp, llvm#7, mul vl] // 2-byte Folded Spill // ... ldr p8, [sp, llvm#7, mul vl] // 2-byte Folded Reload ``` Becomes: ``` mov z0.b, p8/z, #1 str z0, [sp] // 16-byte Folded Spill // ... ldr z0, [sp] // 16-byte Folded Reload ptrue p4.b cmpne p8.b, p4/z, z0.b, #0 ``` This is done to avoid streaming memory hazards between FPR/vector and predicate spills, which currently occupy the same stack area even when the `-aarch64-stack-hazard-size` flag is set. This is implemented with two new pseudos SPILL_PPR_TO_ZPR_SLOT_PSEUDO and FILL_PPR_FROM_ZPR_SLOT_PSEUDO. The expansion of these pseudos handles scavenging the required registers (z0 in the above example) and, in the worst case spilling a register to an emergency stack slot in the expansion. The condition flags are also preserved around the `cmpne` in case they are live at the expansion point.
devajithvs
pushed a commit
that referenced
this pull request
Jun 10, 2025
`clang-repl --cuda` was previously crashing with a segmentation fault, instead of reporting a clean error ``` (base) anutosh491@Anutoshs-MacBook-Air bin % ./clang-repl --cuda #0 0x0000000111da4fbc llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/opt/local/libexec/llvm-20/lib/libLLVM.dylib+0x150fbc) #1 0x0000000111da31dc llvm::sys::RunSignalHandlers() (/opt/local/libexec/llvm-20/lib/libLLVM.dylib+0x14f1dc) #2 0x0000000111da5628 SignalHandler(int) (/opt/local/libexec/llvm-20/lib/libLLVM.dylib+0x151628) llvm#3 0x000000019b242de4 (/usr/lib/system/libsystem_platform.dylib+0x180482de4) llvm#4 0x0000000107f638d0 clang::IncrementalCUDADeviceParser::IncrementalCUDADeviceParser(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, clang::CompilerInstance&, llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem>, llvm::Error&, std::__1::list<clang::PartialTranslationUnit, std::__1::allocator<clang::PartialTranslationUnit>> const&) (/opt/local/libexec/llvm-20/lib/libclang-cpp.dylib+0x216b8d0) llvm#5 0x0000000107f638d0 clang::IncrementalCUDADeviceParser::IncrementalCUDADeviceParser(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, clang::CompilerInstance&, llvm::IntrusiveRefCntPtr<llvm::vfs::InMemoryFileSystem>, llvm::Error&, std::__1::list<clang::PartialTranslationUnit, std::__1::allocator<clang::PartialTranslationUnit>> const&) (/opt/local/libexec/llvm-20/lib/libclang-cpp.dylib+0x216b8d0) llvm#6 0x0000000107f6bac8 clang::Interpreter::createWithCUDA(std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>, std::__1::unique_ptr<clang::CompilerInstance, std::__1::default_delete<clang::CompilerInstance>>) (/opt/local/libexec/llvm-20/lib/libclang-cpp.dylib+0x2173ac8) llvm#7 0x000000010206f8a8 main (/opt/local/libexec/llvm-20/bin/clang-repl+0x1000038a8) llvm#8 0x000000019ae8c274 Segmentation fault: 11 ``` The underlying issue was that the `DeviceCompilerInstance` (used for device-side CUDA compilation) was never initialized with a `Sema`, which is required before constructing the `IncrementalCUDADeviceParser`. https://github.com/llvm/llvm-project/blob/89687e6f383b742a3c6542dc673a84d9f82d02de/clang/lib/Interpreter/DeviceOffload.cpp#L32 https://github.com/llvm/llvm-project/blob/89687e6f383b742a3c6542dc673a84d9f82d02de/clang/lib/Interpreter/IncrementalParser.cpp#L31 Unlike the host-side `CompilerInstance` which runs `ExecuteAction` inside the Interpreter constructor (thereby setting up Sema), the device-side CI was passed into the parser uninitialized, leading to an assertion or crash when accessing its internals. To fix this, I refactored the `Interpreter::create` method to include an optional `DeviceCI` parameter. If provided, we know we need to take care of this instance too. Only then do we construct the `IncrementalCUDADeviceParser`. (cherry picked from commit 21fb19f)
devajithvs
pushed a commit
that referenced
this pull request
Jun 10, 2025
llvm#138091) Check this error for more context (https://github.com/compiler-research/CppInterOp/actions/runs/14749797085/job/41407625681?pr=491#step:10:531) This fails with ``` * thread #1, name = 'CppInterOpTests', stop reason = signal SIGSEGV: address not mapped to object (fault address: 0x55500356d6d3) * frame #0: 0x00007fffee41cfe3 libclangCppInterOp.so.21.0gitclang::PragmaNamespace::~PragmaNamespace() + 99 frame #1: 0x00007fffee435666 libclangCppInterOp.so.21.0gitclang::Preprocessor::~Preprocessor() + 3830 frame #2: 0x00007fffee20917a libclangCppInterOp.so.21.0gitstd::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 58 frame llvm#3: 0x00007fffee224796 libclangCppInterOp.so.21.0gitclang::CompilerInstance::~CompilerInstance() + 838 frame llvm#4: 0x00007fffee22494d libclangCppInterOp.so.21.0gitclang::CompilerInstance::~CompilerInstance() + 13 frame llvm#5: 0x00007fffed95ec62 libclangCppInterOp.so.21.0gitclang::IncrementalCUDADeviceParser::~IncrementalCUDADeviceParser() + 98 frame llvm#6: 0x00007fffed9551b6 libclangCppInterOp.so.21.0gitclang::Interpreter::~Interpreter() + 102 frame llvm#7: 0x00007fffed95598d libclangCppInterOp.so.21.0gitclang::Interpreter::~Interpreter() + 13 frame llvm#8: 0x00007fffed9181e7 libclangCppInterOp.so.21.0gitcompat::createClangInterpreter(std::vector<char const*, std::allocator<char const*>>&) + 2919 ``` Problem : 1) The destructor currently handles no clearance for the DeviceParser and the DeviceAct. We currently only have this https://github.com/llvm/llvm-project/blob/976493822443c52a71ed3c67aaca9a555b20c55d/clang/lib/Interpreter/Interpreter.cpp#L416-L419 2) The ownership for DeviceCI currently is present in IncrementalCudaDeviceParser. But this should be similar to how the combination for hostCI, hostAction and hostParser are managed by the Interpreter. As on master the DeviceAct and DeviceParser are managed by the Interpreter but not DeviceCI. This is problematic because : IncrementalParser holds a Sema& which points into the DeviceCI. On master, DeviceCI is destroyed before the base class ~IncrementalParser() runs, causing Parser::reset() to access a dangling Sema (and as Sema holds a reference to Preprocessor which owns PragmaNamespace) we see this ``` * frame #0: 0x00007fffee41cfe3 libclangCppInterOp.so.21.0gitclang::PragmaNamespace::~PragmaNamespace() + 99 frame #1: 0x00007fffee435666 libclangCppInterOp.so.21.0gitclang::Preprocessor::~Preprocessor() + 3830 ``` (cherry picked from commit 529b6fc)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This tool is for interactive exploration of the mlir using existing matchers. It currently allows the user to enter a matcher at an interactive prompt and view the matches.
More features will be added in this MR. This MR will act as the history for the development of mlir-query