-
Notifications
You must be signed in to change notification settings - Fork 3
Remove rank 0 affine.parallel when canonicalizing #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -2614,6 +2614,33 @@ static LogicalResult verify(AffineVectorStoreOp op) { | |||
return success(); | |||
} | |||
|
|||
namespace { | |||
/// This pattern removes affine.parallel ops with no induction variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the problem with this is how do we deal with kernel outlining? Wouldn't we want to not remove affine.parallel
with no induction variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbruestle what do you think? One option I suppose is to not canonicalize if we want that use case -- although deliberately avoiding canonicalization seems a bit sketchy and also perhaps we'll need to canonicalize for other reasons? Another possibility that occurs to me is some other wrapper to represent kernel outlining. I don't fully understand the requirements and goals of kernel outlining though.
We can always just use the code in plaidml/plaidml#1054, which works just fine. This is here because having a no-induction-variables affine.parallel
feels non-canonical and like something that should be simplified away (and doing so would address the same problems as that PR). But if we have need for a rank 0 affine.parallel
then maybe we shouldn't be removing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's true that while semantically, unwinding an empty affine.parallel
is equivalent, I wonder if we should be considering the structure of affine.parallel
(like how many nests there are) as some other bit of information for other passes to use. For instance I can imagine that some pass might want to think of the outermost affine.parallel
as the loop over threads, and then another affine.parallel
to loop over something else. Even if the iteration domain is 1, which can either be represented as affine.parallel () = () to ()
or affine.parallel (i) = (0) to (0)
, they both imply some structure perhaps as just a way to group a collection of ops together.
Then again, maybe it's not a good idea to use affine.parallel
like that (as a means of grouping ops). I've often thought maybe we should have a kernel
op to represent this grouping.
Basically, in stripe, we had assumed that the outermost block was always the 'kernel' level. How do we represent such a thing in affine dialect, or do we even need such a thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for one thing, a kernel that operates on one element is slow inside a affine.parallel or outside of it, and so having kernel outlining rebuild an empty 1 element kernel when it doesn't find an affine.parallel seems very reasonable. On the other hand, that's not the only place where 'structuring' the program in a platform specific way makes sense, for example on the GPU we have 1) 'grid/group' affine.parallel 2) thread affine.parallel.
What about this: we don't allow canonicalization if the affine.parallel has any attributes on it? Or maybe we look for a specific 'unit' attribute like 'is_structural' or whatever?
It seems to me like 1) The general case of canonicalization of empty parallels is good 2) 'Structuring' affine.parallel like constructs will exist at some point, and making them each a new op is annoying causes all kinds of issues, so a compromise is perhaps the best approach. Notably, the attribute trick was what I did for the Stripe dialect, which did have an empty PF canonicalization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, it's also reasonable for GPU lowering to look for a 'grid' AP, and a 'thread' AP, and if either is missing, add an empty loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first option (limiting canonicalization when an attribute it present) in some sense pushes our design upstream more, and the second option (re-introducing affine.parallels when a specific one isn't present) puts a bit more work on the lowering, but is more 'neutral'. I don't have a strong opinion on either side, but if we are going to go with the attribute to prevent canonicalization, we will need to describe why it's there when we upstream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling that upstream won't accept 'special' attributes as a way to control canonicalization. My guess is that the complaint will be that attributes should not 'leak' or otherwise have semantics outside of the dialect they were defined in. I don't know of any other situation where attributes/markers are used to adjust canonicalization.
Personally I find it a pragmatic solution, but then I wonder what the attribute should be named, and if it should be defined on the affine.parallel
as an official attribute rather than an ad-hoc one (probably official).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me. Since attributes in general might have structural information I will for now at least draft a version that only canonicalizes when there are no unexpected attributes. Bah, didn't see the latest posts
45aa123
to
a763cc1
Compare
When `Target::GetEntryPointAddress()` calls `exe_module->GetObjectFile()->GetEntryPointAddress()`, and the returned `entry_addr` is valid, it can immediately be returned. However, just before that, an `llvm::Error` value has been setup, but in this case it is not consumed before returning, like is done further below in the function. In https://bugs.freebsd.org/248745 we got a bug report for this, where a very simple test case aborts and dumps core: ``` * thread plaidml#1, name = 'testcase', stop reason = breakpoint 1.1 frame #0: 0x00000000002018d4 testcase`main(argc=1, argv=0x00007fffffffea18) at testcase.c:3:5 1 int main(int argc, char *argv[]) 2 { -> 3 return 0; 4 } (lldb) p argc Program aborted due to an unhandled Error: Error value was Success. (Note: Success values must still be checked prior to being destroyed). Thread 1 received signal SIGABRT, Aborted. thr_kill () at thr_kill.S:3 3 thr_kill.S: No such file or directory. (gdb) bt #0 thr_kill () at thr_kill.S:3 plaidml#1 0x00000008049a0004 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52 plaidml#2 0x0000000804916229 in abort () at /usr/src/lib/libc/stdlib/abort.c:67 plaidml#3 0x000000000451b5f5 in fatalUncheckedError () at /usr/src/contrib/llvm-project/llvm/lib/Support/Error.cpp:112 plaidml#4 0x00000000019cf008 in GetEntryPointAddress () at /usr/src/contrib/llvm-project/llvm/include/llvm/Support/Error.h:267 plaidml#5 0x0000000001bccbd8 in ConstructorSetup () at /usr/src/contrib/llvm-project/lldb/source/Target/ThreadPlanCallFunction.cpp:67 plaidml#6 0x0000000001bcd2c0 in ThreadPlanCallFunction () at /usr/src/contrib/llvm-project/lldb/source/Target/ThreadPlanCallFunction.cpp:114 plaidml#7 0x00000000020076d4 in InferiorCallMmap () at /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp:97 plaidml#8 0x0000000001f4be33 in DoAllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/FreeBSD/ProcessFreeBSD.cpp:604 plaidml#9 0x0000000001fe51b9 in AllocatePage () at /usr/src/contrib/llvm-project/lldb/source/Target/Memory.cpp:347 plaidml#10 0x0000000001fe5385 in AllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Target/Memory.cpp:383 plaidml#11 0x0000000001974da2 in AllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Target/Process.cpp:2301 plaidml#12 CanJIT () at /usr/src/contrib/llvm-project/lldb/source/Target/Process.cpp:2331 plaidml#13 0x0000000001a1bf3d in Evaluate () at /usr/src/contrib/llvm-project/lldb/source/Expression/UserExpression.cpp:190 plaidml#14 0x00000000019ce7a2 in EvaluateExpression () at /usr/src/contrib/llvm-project/lldb/source/Target/Target.cpp:2372 plaidml#15 0x0000000001ad784c in EvaluateExpression () at /usr/src/contrib/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:414 plaidml#16 0x0000000001ad86ae in DoExecute () at /usr/src/contrib/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:646 plaidml#17 0x0000000001a5e3ed in Execute () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandObject.cpp:1003 plaidml#18 0x0000000001a6c4a3 in HandleCommand () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:1762 plaidml#19 0x0000000001a6f98c in IOHandlerInputComplete () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:2760 plaidml#20 0x0000000001a90b08 in Run () at /usr/src/contrib/llvm-project/lldb/source/Core/IOHandler.cpp:548 plaidml#21 0x00000000019a6c6a in ExecuteIOHandlers () at /usr/src/contrib/llvm-project/lldb/source/Core/Debugger.cpp:903 plaidml#22 0x0000000001a70337 in RunCommandInterpreter () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:2946 plaidml#23 0x0000000001d9d812 in RunCommandInterpreter () at /usr/src/contrib/llvm-project/lldb/source/API/SBDebugger.cpp:1169 plaidml#24 0x0000000001918be8 in MainLoop () at /usr/src/contrib/llvm-project/lldb/tools/driver/Driver.cpp:675 plaidml#25 0x000000000191a114 in main () at /usr/src/contrib/llvm-project/lldb/tools/driver/Driver.cpp:890``` Fix the incorrect error catch by only instantiating an `Error` object if it is necessary. Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D86355
This reverts commit bfd2c21. This appears to be causing stage2 msan failures on buildbots: FAIL: LLVM :: Transforms/SimplifyCFG/X86/bug-25299.ll (65872 of 71835) ******************** TEST 'LLVM :: Transforms/SimplifyCFG/X86/bug-25299.ll' FAILED ******************** Script: -- : 'RUN: at line 1'; /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/opt < /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SimplifyCFG/X86/bug-25299.ll -simplifycfg -S | /b/sanitizer-x86_64-linux-fast/build/llvm_build_msan/bin/FileCheck /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/test/Transforms/SimplifyCFG/X86/bug-25299.ll -- Exit Code: 2 Command Output (stderr): -- ==87374==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x9de47b6 in getBasicBlockIndex /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/include/llvm/IR/Instructions.h:2749:5 #1 0x9de47b6 in simplifyCommonResume /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:4112:23 #2 0x9de47b6 in simplifyResume /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:4039:12 #3 0x9de47b6 in (anonymous namespace)::SimplifyCFGOpt::simplifyOnce(llvm::BasicBlock*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:6330:16 #4 0x9dcca13 in run /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:6358:16 #5 0x9dcca13 in llvm::simplifyCFG(llvm::BasicBlock*, llvm::TargetTransformInfo const&, llvm::SimplifyCFGOptions const&, llvm::SmallPtrSetImpl<llvm::BasicBlock*>*) /b/sanitizer-x86_64-linux-fast/build/llvm-project/llvm/lib/Transforms/Utils/SimplifyCFG.cpp:6369:8 #6 0x974643d in iterativelySimplifyCFG(
Static destructor can race with calls to notify and trigger tsan warning. ``` WARNING: ThreadSanitizer: data race (pid=5787) Write of size 1 at 0x55bec9df8de8 by thread T23: #0 pthread_mutex_destroy [third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1344](third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp?l=1344&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x1b12affb) (BuildId: ff25ace8b17d9863348bb1759c47246c) #1 __libcpp_recursive_mutex_destroy [third_party/crosstool/v18/stable/src/libcxx/include/__thread/support/pthread.h:91](third_party/crosstool/v18/stable/src/libcxx/include/__thread/support/pthread.h?l=91&cl=669089572):10 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x4523d4e9) (BuildId: ff25ace8b17d9863348bb1759c47246c) #2 std::__tsan::recursive_mutex::~recursive_mutex() [third_party/crosstool/v18/stable/src/libcxx/src/mutex.cpp:52](third_party/crosstool/v18/stable/src/libcxx/src/mutex.cpp?l=52&cl=669089572):11 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x4523d4e9) #3 ~SmartMutex [third_party/llvm/llvm-project/llvm/include/llvm/Support/Mutex.h:28](third_party/llvm/llvm-project/llvm/include/llvm/Support/Mutex.h?l=28&cl=669089572):11 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaedfe) (BuildId: ff25ace8b17d9863348bb1759c47246c) #4 (anonymous namespace)::PerfJITEventListener::~PerfJITEventListener() [third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp:65](third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp?l=65&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaedfe) #5 cxa_at_exit_callback_installed_at(void*) [third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:437](third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp?l=437&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x1b172cb9) (BuildId: ff25ace8b17d9863348bb1759c47246c) #6 llvm::JITEventListener::createPerfJITEventListener() [third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp:496](third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp?l=496&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcad8f5) (BuildId: ff25ace8b17d9863348bb1759c47246c) ``` ``` Previous atomic read of size 1 at 0x55bec9df8de8 by thread T192 (mutexes: write M0, write M1): #0 pthread_mutex_unlock [third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp:1387](third_party/llvm/llvm-project/compiler-rt/lib/tsan/rtl/tsan_interceptors_posix.cpp?l=1387&cl=669089572):3 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x1b12b6bb) (BuildId: ff25ace8b17d9863348bb1759c47246c) #1 __libcpp_recursive_mutex_unlock [third_party/crosstool/v18/stable/src/libcxx/include/__thread/support/pthread.h:87](third_party/crosstool/v18/stable/src/libcxx/include/__thread/support/pthread.h?l=87&cl=669089572):10 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x4523d589) (BuildId: ff25ace8b17d9863348bb1759c47246c) #2 std::__tsan::recursive_mutex::unlock() [third_party/crosstool/v18/stable/src/libcxx/src/mutex.cpp:64](third_party/crosstool/v18/stable/src/libcxx/src/mutex.cpp?l=64&cl=669089572):11 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x4523d589) #3 unlock [third_party/llvm/llvm-project/llvm/include/llvm/Support/Mutex.h:47](third_party/llvm/llvm-project/llvm/include/llvm/Support/Mutex.h?l=47&cl=669089572):16 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaf968) (BuildId: ff25ace8b17d9863348bb1759c47246c) #4 ~lock_guard [third_party/crosstool/v18/stable/src/libcxx/include/__mutex/lock_guard.h:39](third_party/crosstool/v18/stable/src/libcxx/include/__mutex/lock_guard.h?l=39&cl=669089572):101 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaf968) #5 (anonymous namespace)::PerfJITEventListener::notifyObjectLoaded(unsigned long, llvm::object::ObjectFile const&, llvm::RuntimeDyld::LoadedObjectInfo const&) [third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp:290](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/PerfJITEvents/PerfJITEventListener.cpp?l=290&cl=669089572):1 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bcaf968) #6 llvm::orc::RTDyldObjectLinkingLayer::onObjEmit(llvm::orc::MaterializationResponsibility&, llvm::object::OwningBinary<llvm::object::ObjectFile>, std::__tsan::unique_ptr<llvm::RuntimeDyld::MemoryManager, std::__tsan::default_delete<llvm::RuntimeDyld::MemoryManager>>, std::__tsan::unique_ptr<llvm::RuntimeDyld::LoadedObjectInfo, std::__tsan::default_delete<llvm::RuntimeDyld::LoadedObjectInfo>>, std::__tsan::unique_ptr<llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>, llvm::DenseMapInfo<llvm::orc::JITDylib*, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>>>, std::__tsan::default_delete<llvm::DenseMap<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>, llvm::DenseMapInfo<llvm::orc::JITDylib*, void>, llvm::detail::DenseMapPair<llvm::orc::JITDylib*, llvm::DenseSet<llvm::orc::SymbolStringPtr, llvm::DenseMapInfo<llvm::orc::SymbolStringPtr, void>>>>>>, llvm::Error) [third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp:386](https://cs.corp.google.com/piper///depot/google3/third_party/llvm/llvm-project/llvm/lib/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.cpp?l=386&cl=669089572):10 (be1eb158bb70fc9cf7be2db70407e512890e5c6e20720cd88c69d7d9c26ea531_0200d5f71908+0x2bc404a8) (BuildId: ff25ace8b17d9863348bb1759c47246c) ```
AffineParallelOps of rank 0 (i.e., those with no induction variables) are superfluous. This adds a canonicalizer for AffineParallelOps that removes them if they are rank 0.
Submitting here for verification that this correctly enables the changes we want on the PlaidML side before submitting upstream. We can also land this and point PlaidML here if we decide we want to move faster than the LLVM/MLIR review process. We probably shouldn't land until we've rebased this on top of #5 and landed plaidml/plaidml#1062 on the PlaidML side, as there are some minor conflicts that will need to be resolved.