-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Introduce -defer-thinlto-prelink-coro-split that skips Coro passes in ThinLTO pre-link pipeline #107153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…TO pre-link pipeline
@llvm/pr-subscribers-coroutines @llvm/pr-subscribers-clang Author: Wei Wang (apolloww) Changes#100205 skips CoroSplit and CoroCleanup in ThinLTO pre-link pipeline unconditionally. It defers coroutine optimizations to ThinLTO post-link so that across-module CoroElide is possible by leaving coroutine functions in pre-split state after pre-link. However, this makes the bitcode module unconsumable directly by codegen passes because of the coroutine intrinsics. With -defer-thinlto-prelink-coro-split, the deferral action is made explicit and conditional. Full diff: https://github.com/llvm/llvm-project/pull/107153.diff 5 Files Affected:
diff --git a/clang/test/CodeGenCoroutines/coro-elide-thinlto.cpp b/clang/test/CodeGenCoroutines/coro-elide-thinlto.cpp
index 54063cf0704aad..4e328c53b17fcc 100644
--- a/clang/test/CodeGenCoroutines/coro-elide-thinlto.cpp
+++ b/clang/test/CodeGenCoroutines/coro-elide-thinlto.cpp
@@ -1,10 +1,10 @@
// REQUIRES: x86_64-linux
-// This tests that the coroutine elide optimization could happen succesfully with ThinLTO.
+// This tests that the coroutine elide optimization could happen succesfully with ThinLTO when coro-split is deferred to ThinLTO post-link.
// This test is adapted from coro-elide.cpp and splits functions into two files.
//
// RUN: split-file %s %t
-// RUN: %clang --target=x86_64-linux -std=c++20 -O2 -flto=thin -I %S -c %t/coro-elide-callee.cpp -o %t/coro-elide-callee.bc
-// RUN: %clang --target=x86_64-linux -std=c++20 -O2 -flto=thin -I %S -c %t/coro-elide-caller.cpp -o %t/coro-elide-caller.bc
+// RUN: %clang --target=x86_64-linux -std=c++20 -O2 -flto=thin -mllvm -defer-thinlto-prelink-coro-split -I %S -c %t/coro-elide-callee.cpp -o %t/coro-elide-callee.bc
+// RUN: %clang --target=x86_64-linux -std=c++20 -O2 -flto=thin -mllvm -defer-thinlto-prelink-coro-split -I %S -c %t/coro-elide-caller.cpp -o %t/coro-elide-caller.bc
// RUN: llvm-lto --thinlto %t/coro-elide-callee.bc %t/coro-elide-caller.bc -o %t/summary
// RUN: %clang_cc1 -O2 -x ir %t/coro-elide-caller.bc -fthinlto-index=%t/summary.thinlto.bc -emit-llvm -o - | FileCheck %s
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 9c3d49cabbd38c..58b33a3f1a0d03 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -305,6 +305,12 @@ static cl::opt<bool> UseLoopVersioningLICM(
"enable-loop-versioning-licm", cl::init(false), cl::Hidden,
cl::desc("Enable the experimental Loop Versioning LICM pass"));
+static cl::opt<bool> DeferThinLTOPreLinkCoroSplit(
+ "defer-thinlto-prelink-coro-split", cl::init(false), cl::Hidden,
+ cl::desc("Do not run CoroSplit pass in ThinLTO pre-link pipeline. "
+ "This allows coroutine optimizations to be run in the ThinLTO "
+ "post-link pipeline."));
+
extern cl::opt<std::string> UseCtxProfile;
namespace llvm {
@@ -984,7 +990,8 @@ PassBuilder::buildInlinerPipeline(OptimizationLevel Level,
MainCGPipeline.addPass(createCGSCCToFunctionPassAdaptor(
RequireAnalysisPass<ShouldNotRunFunctionPassesAnalysis, Function>()));
- if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink)
+ if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink ||
+ !DeferThinLTOPreLinkCoroSplit)
MainCGPipeline.addPass(CoroSplitPass(Level != OptimizationLevel::O0));
// Make sure we don't affect potential future NoRerun CGSCC adaptors.
@@ -1027,7 +1034,8 @@ PassBuilder::buildModuleInlinerPipeline(OptimizationLevel Level,
buildFunctionSimplificationPipeline(Level, Phase),
PTO.EagerlyInvalidateAnalyses));
- if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink)
+ if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink ||
+ !DeferThinLTOPreLinkCoroSplit)
MPM.addPass(createModuleToPostOrderCGSCCPassAdaptor(
CoroSplitPass(Level != OptimizationLevel::O0)));
@@ -1236,7 +1244,8 @@ PassBuilder::buildModuleSimplificationPipeline(OptimizationLevel Level,
// and argument promotion.
MPM.addPass(DeadArgumentEliminationPass());
- if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink)
+ if (Phase != ThinOrFullLTOPhase::ThinLTOPreLink ||
+ !DeferThinLTOPreLinkCoroSplit)
MPM.addPass(CoroCleanupPass());
// Optimize globals now that functions are fully simplified.
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
index ab04f80abc5722..b950a99ce0dd32 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-defaults.ll
@@ -10,52 +10,55 @@
; Prelink pipelines:
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -passes='thinlto-pre-link<O1>' -S %s 2>&1 \
-; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O1,CHECK-O-NODIS
+; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O1,CHECK-O-NODIS,CHECK-O-CORO
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -unified-lto -passes='lto-pre-link<O1>' -S %s 2>&1 \
-; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O1,CHECK-O-NODIS
+; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O1,CHECK-O-NODIS,CHECK-O-CORO
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -passes='thinlto-pre-link<O2>' -S %s 2>&1 \
-; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O2,CHECK-O23SZ,CHECK-O-NODIS
+; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O2,CHECK-O23SZ,CHECK-O-NODIS,CHECK-O-CORO
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -unified-lto -passes='lto-pre-link<O2>' -S %s 2>&1 \
-; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O2,CHECK-O23SZ,CHECK-O-NODIS
+; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O2,CHECK-O23SZ,CHECK-O-NODIS,CHECK-O-CORO
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -passes='thinlto-pre-link<O3>' -S -passes-ep-pipeline-start='no-op-module' %s 2>&1 \
-; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O3,CHECK-O23SZ,CHECK-O-NODIS,CHECK-EP-PIPELINE-START
+; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O3,CHECK-O23SZ,CHECK-O-NODIS,CHECK-EP-PIPELINE-START,CHECK-O-CORO
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -unified-lto -passes='lto-pre-link<O3>' -S -passes-ep-pipeline-start='no-op-module' %s 2>&1 \
-; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O3,CHECK-O23SZ,CHECK-O-NODIS,CHECK-EP-PIPELINE-START
+; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O3,CHECK-O23SZ,CHECK-O-NODIS,CHECK-EP-PIPELINE-START,CHECK-O-CORO
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -passes='thinlto-pre-link<O3>' -S -passes-ep-optimizer-early='no-op-module' %s 2>&1 \
-; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O3,CHECK-O23SZ,CHECK-O-NODIS,CHECK-EP-OPT-EARLY
+; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O3,CHECK-O23SZ,CHECK-O-NODIS,CHECK-EP-OPT-EARLY,CHECK-O-CORO
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -unified-lto -passes='lto-pre-link<O3>' -S -passes-ep-optimizer-early='no-op-module' %s 2>&1 \
-; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O3,CHECK-O23SZ,CHECK-O-NODIS,CHECK-EP-OPT-EARLY
+; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O3,CHECK-O23SZ,CHECK-O-NODIS,CHECK-EP-OPT-EARLY,CHECK-O-CORO
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -passes='thinlto-pre-link<O3>' -S -passes-ep-optimizer-last='no-op-module' %s 2>&1 \
-; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O3,CHECK-O23SZ,CHECK-O-NODIS,CHECK-EP-OPT-LAST
+; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O3,CHECK-O23SZ,CHECK-O-NODIS,CHECK-EP-OPT-LAST,CHECK-O-CORO
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -unified-lto -passes='lto-pre-link<O3>' -S -passes-ep-optimizer-last='no-op-module' %s 2>&1 \
-; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O3,CHECK-O23SZ,CHECK-O-NODIS,CHECK-EP-OPT-LAST
+; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O3,CHECK-O23SZ,CHECK-O-NODIS,CHECK-EP-OPT-LAST,CHECK-O-CORO
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -passes='thinlto-pre-link<Os>' -S %s 2>&1 \
-; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O23SZ,CHECK-O-NODIS
+; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O23SZ,CHECK-O-NODIS,CHECK-O-CORO
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -unified-lto -passes='lto-pre-link<Os>' -S %s 2>&1 \
-; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O23SZ,CHECK-O-NODIS
+; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O23SZ,CHECK-O-NODIS,CHECK-O-CORO
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -passes='thinlto-pre-link<Oz>' -S %s 2>&1 \
-; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O23SZ,CHECK-O-NODIS
+; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O23SZ,CHECK-O-NODIS,CHECK-O-CORO
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
; RUN: -unified-lto -passes='lto-pre-link<Oz>' -S %s 2>&1 \
-; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O23SZ,CHECK-O-NODIS
+; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O23SZ,CHECK-O-NODIS,CHECK-O-CORO
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager -debug-info-for-profiling \
; RUN: -passes='thinlto-pre-link<O2>' -S %s 2>&1 \
-; RUN: | FileCheck %s --check-prefixes=CHECK-DIS,CHECK-O,CHECK-O2,CHECK-O23SZ
+; RUN: | FileCheck %s --check-prefixes=CHECK-DIS,CHECK-O,CHECK-O2,CHECK-O23SZ,CHECK-O-CORO
; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager -debug-info-for-profiling \
; RUN: -unified-lto -passes='lto-pre-link<O2>' -S %s 2>&1 \
-; RUN: | FileCheck %s --check-prefixes=CHECK-DIS,CHECK-O,CHECK-O2,CHECK-O23SZ
+; RUN: | FileCheck %s --check-prefixes=CHECK-DIS,CHECK-O,CHECK-O2,CHECK-O23SZ,CHECK-O-CORO
+; RUN: opt -disable-verify -verify-analysis-invalidation=0 -eagerly-invalidate-analyses=0 -debug-pass-manager \
+; RUN: -passes='thinlto-pre-link<O1>' -S -defer-thinlto-prelink-coro-split %s 2>&1 \
+; RUN: | FileCheck %s --check-prefixes=CHECK-O,CHECK-O1,CHECK-O-NODIS
;
; Suppress FileCheck --allow-unused-prefixes=false diagnostics.
@@ -184,10 +187,12 @@
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Running analysis: ShouldNotRunFunctionPassesAnalysis
+; CHECK-O-CORO: Running pass: CoroSplitPass
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
+; CHECK-O-CORO-NEXT: Running pass: CoroCleanupPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
; CHECK-O-NEXT: Running pass: GlobalDCEPass
; CHECK-EXT: Running pass: {{.*}}::Bye
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
index cb49cbd22d60c0..e74f88c1a3bf99 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-pgo-defaults.ll
@@ -183,10 +183,12 @@
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Running analysis: ShouldNotRunFunctionPassesAnalysis
+; CHECK-O-NEXT: Running pass: CoroSplitPass
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
+; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
; CHECK-O-NEXT: Running analysis: TargetLibraryAnalysis on bar
; CHECK-O-NEXT: Running pass: GlobalDCEPass
diff --git a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
index 96e83493504426..0bb26330d000a0 100644
--- a/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
+++ b/llvm/test/Other/new-pm-thinlto-prelink-samplepgo-defaults.ll
@@ -148,10 +148,12 @@
; CHECK-O-NEXT: Running pass: PostOrderFunctionAttrsPass
; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Running analysis: ShouldNotRunFunctionPassesAnalysis
+; CHECK-O-NEXT: Running pass: CoroSplitPass
; CHECK-O-NEXT: Running pass: InvalidateAnalysisPass<{{.*}}ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: ShouldNotRunFunctionPassesAnalysis
; CHECK-O-NEXT: Invalidating analysis: InlineAdvisorAnalysis
; CHECK-O-NEXT: Running pass: DeadArgumentEliminationPass
+; CHECK-O-NEXT: Running pass: CoroCleanupPass
; CHECK-O-NEXT: Running pass: GlobalOptPass
; CHECK-O-NEXT: Running pass: GlobalDCEPass
; CHECK-O-NEXT: Running pass: AnnotationRemarksPass on foo
|
What's the scenario in which we do ThinLTO prelink and then go to codegen (i.e. skip thinlto post-link)? |
In #104525, one use case is that pre-link output was used directly by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know much about the Coro handling, but this not only makes it conditional but changes the default - what should the default behavior be?
Default behavior: Coro passes were always run in pre-link pipeline. Since all the coro intrinsics are lowered, the output bitcode object file can be fed into codegen directly or optimized by custom pipeline that does not have coro passes. With #100205, they are postponed to post-link pipeline in order to take advantage of cross-module inlining for better CoroElide. However this creates a discrepancy from the default behavior: user now must run both pre-link and post-link pipelines (or custom pipeline with coro passes) before running codegen. So this change tries to move back to the default behavior unless user explicitly wants coro pass to happen in post-link and their job to make sure the build process is set up properly. |
My 2cts: I think it would be best for clang-users, if HALO / coroutine optimizations work out-of-the-box in as many cases as possible. Unfortunately, it seems we have an "either-or" decision here. Either, the "thin-lto + HALO" work out-of-the-box or the llc-code-generation works out-of-the-box. Afaict, thin-lto is the more common use case compared to calling llc directly. At least there seems to be documentation for thin-lto but not for the llc use case? As a normal C++ programmer and clang user, I doubt that I would ever find the All of this is to say: I think the default should be flipped. By default, coro-split should be deferred to post-link (such that the more common thin-lto setup can apply HALO out-of-the-box). More advanced users, who call llc directly, can still use a flag to defer coro-split. In addition, some documentation would be great (e.g. mentioning the newly introduced flag in https://clang.llvm.org/docs/ThinLTO.html and https://llvm.org/docs/Coroutines.html |
Although... Is this even an either-or decision? Could we add coro-split to the llc unconditionally, also for non-thin-lto pipelines? Afaict, coro-split is a no-op pass if coroutines were already split previously. Hence, codegen would simply work, even if a thin-lto-prelink pipeline is combined with a non-thin-lto post-link step. Doing so, I think we wouldn't need to introduce a flag at all? |
#104525 is still crashing, what we can do about that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#100205 skips CoroSplit and CoroCleanup in ThinLTO pre-link pipeline unconditionally. It defers coroutine optimizations to ThinLTO post-link so that across-module CoroElide is possible by leaving coroutine functions in pre-split state after pre-link. However, this makes the bitcode module unconsumable directly by codegen passes because of the coroutine intrinsics.
With -defer-thinlto-prelink-coro-split, the deferral action is made explicit and conditional.