Skip to content

Add clang-linker-wrapper changes to call clang-sycl-linker for SYCL offloads #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

asudarsa
Copy link
Owner

@asudarsa asudarsa commented Apr 4, 2025

Device code linking happens inside clang-linker-wrapper. In the current implementation, clang-linker-wrapper does the following:

  1. Extracts device code. Input_1, Input_2,.....
  2. Group device code according to target devices
    Inputs[triple_1] = ....
    Inputs[triple_2] = ....
  3. For each group, i.e. Inputs[triple_i],
    a. Gather all the offload kinds found inside those inputs in ActiveOffloadKinds
    b. Link all images inside Inputs[triple_i] by calling clang --target=triple_i ....
    c. Create a copy of that linked image for each offload kind and add it to Output[Kind] list.

In SYCL compilation flow, there is a deviation in Step 3b. We call device code splitting inside the 'clang --target=triple_i ....' call and the output is now a 'packaged' file containing multiple device images. This deviation requires us to capture the OffloadKind during the linking stage and pass it along to the linking function (clang), so that clang can be called with a unique option '--sycl-link' that will help us to call 'clang-sycl-linker' under the hood (clang-sycl-linker will do SYCL specific linking).

Our current objective is to implement an end-to-end SYCL offloading flow and get it working. We will eventually merge our approach with the community flow.

Thanks

asudarsa pushed a commit that referenced this pull request Apr 10, 2025
…ctor-bits=128." (llvm#134997)

Reverts llvm#134068

Caused a stage 2 build failure:
https://lab.llvm.org/buildbot/#/builders/41/builds/6016

```
FAILED: lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o 
/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/lib/Support -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Werror=global-constructors -O3 -DNDEBUG -std=c++17 -UNDEBUG  -fno-exceptions -funwind-tables -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support/Caching.cpp
Opcode has unknown scale!
UNREACHABLE executed at ../llvm/llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:4530!
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
0.	Program arguments: /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GLIBCXX_ASSERTIONS -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/lib/Support -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage2/include -I/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/include -mcpu=neoverse-512tvb -mllvm -scalable-vectorization=preferred -mllvm -treat-scalable-fixed-error-as-warning=false -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wno-comment -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Werror=global-constructors -O3 -DNDEBUG -std=c++17 -UNDEBUG -fno-exceptions -funwind-tables -fno-rtti -MD -MT lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o -MF lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o.d -o lib/Support/CMakeFiles/LLVMSupport.dir/Caching.cpp.o -c /home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support/Caching.cpp
1.	<eof> parser at end of file
2.	Code generation
3.	Running pass 'Function Pass Manager' on module '/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/llvm/llvm/lib/Support/Caching.cpp'.
4.	Running pass 'AArch64 load / store optimization pass' on function '@"_ZNSt17_Function_handlerIFN4llvm8ExpectedISt8functionIFNS1_ISt10unique_ptrINS0_16CachedFileStreamESt14default_deleteIS4_EEEEjRKNS0_5TwineEEEEEjNS0_9StringRefESB_EZNS0_10localCacheESB_SB_SB_S2_IFvjSB_S3_INS0_12MemoryBufferES5_ISH_EEEEE3$_0E9_M_invokeERKSt9_Any_dataOjOSF_SB_"'
 #0 0x0000b6eae9b67bf0 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang+++0x81c7bf0)
 #1 0x0000b6eae9b65aec llvm::sys::RunSignalHandlers() (/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang+++0x81c5aec)
 #2 0x0000b6eae9acd5f4 CrashRecoverySignalHandler(int) CrashRecoveryContext.cpp:0:0
 #3 0x0000f16c1aff28f8 (linux-vdso.so.1+0x8f8)
 #4 0x0000f16c1aacf1f0 __pthread_kill_implementation ./nptl/pthread_kill.c:44:76
 #5 0x0000f16c1aa8a67c gsignal ./signal/../sysdeps/posix/raise.c:27:6
 #6 0x0000f16c1aa77130 abort ./stdlib/abort.c:81:7
 llvm#7 0x0000b6eae9ad6628 (/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang+++0x8136628)
 llvm#8 0x0000b6eae72e95a8 (/home/tcwg-buildbot/worker/clang-aarch64-sve-vla-2stage/stage1.install/bin/clang+++0x59495a8)
 llvm#9 0x0000b6eae74ca9a8 (anonymous namespace)::AArch64LoadStoreOpt::findMatchingInsn(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>, (anonymous namespace)::LdStPairFlags&, unsigned int, bool) AArch64LoadStoreOptimizer.cpp:0:0
llvm#10 0x0000b6eae74c85a8 (anonymous namespace)::AArch64LoadStoreOpt::tryToPairLdStInst(llvm::MachineInstrBundleIterator<llvm::MachineInstr, false>&) AArch64LoadStoreOptimizer.cpp:0:0
llvm#11 0x0000b6eae74c624c (anonymous namespace)::AArch64LoadStoreOpt::optimizeBlock(llvm::MachineBasicBlock&, bool) AArch64LoadStoreOptimizer.cpp:0:0
llvm#12 0x0000b6eae74c429c (anonymous namespace)::AArch64LoadStoreOpt::runOnMachineFunction(llvm::MachineFunction&) AArch64LoadStoreOptimizer.cpp:0:0
```
Signed-off-by: Arvind Sudarsanam <[email protected]>
@@ -349,13 +351,55 @@ Error runSYCLLink(ArrayRef<std::string> Files, const ArgList &Args) {
if (!LinkedFile)
reportError(LinkedFile.takeError());

// TODO: SYCL post link functionality involves device code splitting and will
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code splitting PR is under review here: llvm#131347

Thanks

Signed-off-by: Arvind Sudarsanam <[email protected]>
@@ -567,7 +575,8 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
} // namespace generic

Expected<StringRef> linkDevice(ArrayRef<StringRef> InputFiles,
const ArgList &Args) {
const ArgList &Args,
bool HasSYCLOffloadKind = false) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any way we can avoid this? omp/hip/cuda seem to work without special handing here, and it would be idea if SYCL did as well

Copy link
Owner Author

@asudarsa asudarsa Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it exists today, we need to call clang tool with --sycl-link option for sycl offloads (in order to call clang-sycl-linker that includes device code splitting and other SYCL specific stuff). omp/cuda/hip do not have this requirement. They do not require any special handling (or require any special options to be passed to clang) during the linking stage.

clang-linker-wrapper is designed in such a way that the offload kind does not matter during the device code linking stage. Only the targets matter. Unfortunately, for SYCL offload, it is not clear how we can get rid of 'special handling' during the device code linking stage.

We can eventually try to align SYCL device code linking flow with community flow. But we need to have a SYCL device code linking in the first place for that.

Please let me know if I am missing something here.

Thanks
P.S: I have tried to keep the deviation 'as clean as possible' here. Any tips to make this better will be great.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline, makes sense for now

Copy link
Owner Author

@asudarsa asudarsa Apr 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarnex, Added a detailed PR description as we discussed offline.

Thanks

@maksimsab
Copy link
Collaborator

Where a call to clang-sycl-linker happen?

@asudarsa
Copy link
Owner Author

Where a call to clang-sycl-linker happen?

When we call 'clang' with the --sycl-link option and target SPIR-V, SPIR-V toolchain will call clang-sycl-linker.

https://github.com/llvm/llvm-project/blob/092b6e73e651469527662443b592f98f442ece72/clang/lib/Driver/ToolChains/SPIRV.cpp#L133

Thanks

Signed-off-by: Arvind Sudarsanam <[email protected]>
@asudarsa asudarsa requested review from sarnex and mdtoguchi April 14, 2025 19:18
Copy link

@sarnex sarnex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits plus one comment about string manipulation, thanks!

@@ -554,6 +555,17 @@ Expected<StringRef> clang(ArrayRef<StringRef> InputFiles, const ArgList &Args) {
if (Args.hasArg(OPT_embed_bitcode))
CmdArgs.push_back("-Wl,--lto-emit-llvm");

// For linking device codes with SYCL offload kind, special handling is
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// For linking device codes with SYCL offload kind, special handling is
// For linking device code with the SYCL offload kind, special handling is

if (!SPVFile)
return SPVFile.takeError();
for (size_t I = 0, E = SplitModules.size(); I != E; ++I) {
auto Stem = OutputFile.split('.').first;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we actually want rsplit here, because if the file name is like name.with.dots.spv we want to modify the dots. part

int FD = -1;
if (std::error_code EC = sys::fs::openFileForWrite(OutputFile, FD))
return errorCodeToError(EC);
llvm::raw_fd_ostream FS(FD, true);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment on what true is here would help readability

return Error::success();
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove whitespace changes

Signed-off-by: Arvind Sudarsanam <[email protected]>
}
}

if (!HasNonSYCLOffloadKind)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went back and forth on if this is necessary or if we can just check the ActiveOffloadKinds for !OFK_SYCL

asudarsa pushed a commit that referenced this pull request Apr 16, 2025
…reporting (llvm#131756)

### Description  
This PR resolves a deadlock between AddressSanitizer (ASan) and
LeakSanitizer (LSan)
that occurs when both sanitizers attempt to acquire locks in conflicting
orders across
threads. The fix ensures safe lock acquisition ordering by preloading
module information
before error reporting.  

---  

### Issue Details  
**Reproducer**
```cpp  
// Thread 1: ASan error path 
int arr[1] = {0};
std::thread t([&]() {  
  arr[1] = 1; // Triggers ASan OOB error  
});  

// Thread 2: LSan check path  
__lsan_do_leak_check();   
```  

**Lock Order Conflict**:  

- Thread 1 (ASan error reporting):  
  1. Acquires ASan thread registry lock (B)  
  1. Attempts to acquire libdl lock (A) via `dl_iterate_phdr`

- Thread 2 (LSan leak check):  
  1. Acquires libdl lock (A) via `dl_iterate_phdr`
  1. Attempts to acquire ASan thread registry lock (B)  
  

This creates a circular wait condition (A -> B -> A) meeting all four
Coffman deadlock criteria.

---  

### Fix Strategy  
The root cause lies in ASan's error reporting path needing
`dl_iterate_phdr` (requiring lock A)
while already holding its thread registry lock (B). The solution:  

1. **Preload Modules Early**: Force module list initialization _before_
acquiring ASan's thread lock
2. **Avoid Nested Locking**: Ensure symbolization (via dl_iterate_phdr)
completes before error reporting locks

Key code change:  
```cpp  
// Before acquiring ASan's thread registry lock:  
Symbolizer::GetOrInit()->GetRefreshedListOfModules();  
```  

This guarantees module information is cached before lock acquisition,
eliminating
the need for `dl_iterate_phdr` calls during error reporting.  

---  

### Testing  
Added **asan_lsan_deadlock.cpp** test case:  
- Reproduces deadlock reliably without fix **under idle system
conditions**
   - Uses watchdog thread to detect hangs  
   - Verifies ASan error reports correctly without deadlock  

**Note**: Due to the inherent non-determinism of thread scheduling and
lock acquisition timing,
this test may not reliably reproduce the deadlock on busy systems (e.g.,
during parallel
   `ninja check-asan` runs).



---  

### Impact  
- Fixes rare but severe deadlocks in mixed ASan+LSan environments  
- Maintains thread safety guarantees for both sanitizers  
- No user-visible behavior changes except deadlock elimination  

---

### Relevant Buggy Code
- Code in ASan's asan_report.cpp
```cpp
  explicit ScopedInErrorReport(bool fatal = false)
      : halt_on_error_(fatal || flags()->halt_on_error) {
    // Acquire lock B
    asanThreadRegistry().Lock();
  }
  ~ScopedInErrorReport() {
    ...
    // Try to acquire lock A under holding lock B via the following path
    // #4  0x000071a353d83e93 in __GI___dl_iterate_phdr (
    //     callback=0x5d1a07a39580 <__sanitizer::dl_iterate_phdr_cb(dl_phdr_info*, unsigned long, void*)>, 
    //     data=0x6da3510fd3f0) at ./elf/dl-iteratephdr.c:39
    // #5  0x00005d1a07a39574 in __sanitizer::ListOfModules::init (this=0x71a353ebc080)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_linux_libcdep.cpp:784
    // #6  0x00005d1a07a429e3 in __sanitizer::Symbolizer::RefreshModules (this=0x71a353ebc058)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:188
    // llvm#7  __sanitizer::Symbolizer::FindModuleForAddress (this=this@entry=0x71a353ebc058, 
    //     address=address@entry=102366378805727)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:214
    // llvm#8  0x00005d1a07a4291b in __sanitizer::Symbolizer::SymbolizePC (this=0x71a353ebc058, addr=102366378805727)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_symbolizer_libcdep.cpp:88
    // llvm#9  0x00005d1a07a40df7 in __sanitizer::(anonymous namespace)::StackTraceTextPrinter::ProcessAddressFrames (
    //     this=this@entry=0x6da3510fd520, pc=102366378805727)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cpp:37
    // llvm#10 0x00005d1a07a40d27 in __sanitizer::StackTrace::PrintTo (this=this@entry=0x6da3510fd5e8, 
    //     output=output@entry=0x6da3510fd588)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cpp:110
    // llvm#11 0x00005d1a07a410a1 in __sanitizer::StackTrace::Print (this=0x6da3510fd5e8)
    //     at llvm-project/compiler-rt/lib/sanitizer_common/sanitizer_stacktrace_libcdep.cpp:133
    // llvm#12 0x00005d1a0798758d in __asan::ErrorGeneric::Print (
    //     this=0x5d1a07aa4e08 <__asan::ScopedInErrorReport::current_error_+8>)
    //     at llvm-project/compiler-rt/lib/asan/asan_errors.cpp:617    
    current_error_.Print();
    ... 
  }
```

- Code in LSan's lsan_common_linux.cpp
```cpp
void LockStuffAndStopTheWorld(StopTheWorldCallback callback,
                              CheckForLeaksParam *argument) {
  // Acquire lock A
  dl_iterate_phdr(LockStuffAndStopTheWorldCallback, &param);
}

static int LockStuffAndStopTheWorldCallback(struct dl_phdr_info *info,
                                            size_t size, void *data) {
  // Try to acquire lock B under holding lock A via the following path
  // #3  0x000055555562b34a in __sanitizer::ThreadRegistry::Lock (this=<optimized out>)
  //     at llvm-project/compiler-rt/lib/asan/../sanitizer_common/sanitizer_thread_registry.h:99
  // #4  __lsan::LockThreads () at llvm-project/compiler-rt/lib/asan/asan_thread.cpp:484
  // #5  0x0000555555652629 in __lsan::ScopedStopTheWorldLock::ScopedStopTheWorldLock (this=<optimized out>)
  //     at llvm-project/compiler-rt/lib/lsan/lsan_common.h:164
  // #6  __lsan::LockStuffAndStopTheWorldCallback (info=<optimized out>, size=<optimized out>, data=0x0, 
  //     data@entry=0x7fffffffd158) at llvm-project/compiler-rt/lib/lsan/lsan_common_linux.cpp:120
  ScopedStopTheWorldLock lock;
  DoStopTheWorldParam *param = reinterpret_cast<DoStopTheWorldParam *>(data);
  StopTheWorld(param->callback, param->argument);
  return 1;
}
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants