Flaub sync #23

flaub · 2020-07-31T02:22:09Z

No description provided.

Optimize some specific immediates selection by materializing them with sub/mvn instructions as opposed to loading them from the constant pool. Patch by Ben Shi, [email protected]. Differential Revision: https://reviews.llvm.org/D83745

I have added tests to: CodeGen/AArch64/sve-intrinsics-int-arith.ll for doing simple integer add operations on tuple types. Since these tests introduced new warnings due to incorrect use of getVectorNumElements() I have also fixed up these warnings in the same patch. These fixes are: 1. In narrowExtractedVectorBinOp I have changed the code to bail out early for scalable vector types, since we've not yet hit a case that proves the optimisations are profitable for scalable vectors. 2. In DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS I have replaced calls to getVectorNumElements with getVectorMinNumElements in cases that work with scalable vectors. For the other cases I have added asserts that the vector is not scalable because we should not be using shuffle vectors and build vectors in such cases. Differential revision: https://reviews.llvm.org/D84016

Currently, getCastInstrCost has limited information about the cast it's rating, often just the opcode and types. Sometimes there is a context instruction as well, but it isn't trustworthy: for instance, when the vectorizer is rating a plan, it calls getCastInstrCost with the old instructions when, in fact, it's trying to evaluate the cost of the instruction post-vectorization. Thus, the current system can get the cost of certain casts incorrect as the correct cost can vary greatly based on the context in which it's used. For example, if the vectorizer queries getCastInstrCost to evaluate the cost of a sext(load) with tail predication enabled, getCastInstrCost will think it's free most of the time, but it's not always free. On ARM MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar situations can come up with how masked loads can be extended when being split. To fix that, this path adds a new parameter to getCastInstrCost to give it a hint about the context of the cast. It adds a CastContextHint enum which contains the type of the load/store being created by the vectorizer - one for each of the types it can produce. Original patch by Pierre van Houtryve Differential Revision: https://reviews.llvm.org/D79162

… masked stores This patch uses the feature added in D79162 to fix the cost of a sext/zext of a masked load, or a trunc for a masked store. Previously, those were considered cheap or even free, but it's not the case as we cannot split the load in the same way we would for normal loads. This updates the costs to better reflect reality, and adds a test for it in test/Analysis/CostModel/ARM/cast.ll. It also adds a vectorizer test that showcases the improvement: in some cases, the vectorizer will now choose a smaller VF when tail-predication is enabled, which results in better codegen. (Because if it were to use a higher VF in those cases, the code we see above would be generated, and the vmovs would block tail-predication later in the process, resulting in very poor codegen overall) Original Patch by Pierre van Houtryve Differential Revision: https://reviews.llvm.org/D79163

`std.dim` currently only accepts ranked memrefs and `std.rank` is limited to tensors. Differential Revision: https://reviews.llvm.org/D84790

…InstrCost variant This will simplify target overrides, and matches what we do for most integer intrinsic costs.

Differential Revision: https://reviews.llvm.org/D84749

…s used A list of target features is disabled when there is no hardware floating-point support. This is the case when one of the following options is passed to clang: - -mfloat-abi=soft - -mfpu=none This option list is missing, however, the extension "+nofp" that can be specified in -march flags, such as "-march=armv8-a+nofp". This patch also disables unsupported target features when nofp is passed to -march. Differential Revision: https://reviews.llvm.org/D82948

Fix testcase introduced in d1a3396.

There's a slight difference in functionality with the new CHECK lines: before, we allowed either -0.0 or 0.0 for maxnum/minnum. That matches the definition, but we should always get a deterministic result from constant folding within the compiler, so now we assert that we got the single expected result in all cases.

Differential Revision: https://reviews.llvm.org/D84832

In general Decl::getASTContext() is relatively expensive and here the changes are non-invasive. NFC.

The lowering does not support all types for its source operations. This change makes the patterns fail in a well-defined manner. Differential Revision: https://reviews.llvm.org/D84443

This patch teaches SCEVExpander to directly preserve LCSSA. As it is currently, SCEV does not look through PHI nodes in loops, as it might break LCSSA form. Once SCEVExpander can preserve LCSSA form, it should be safe for SCEV to look through PHIs. To preserve LCSSA form, this patch uses formLCSSAForInstructions on operands of newly created instructions, if the definition is inside a different loop than the new instruction. The final value we return from expandCodeFor may also need LCSSA phis, depending on the insert point. As no user for it exists there yet, create a temporary instruction at the insert point, which can be passed to formLCSSAForInstructions. This temporary instruction is removed after LCSSA construction. Reviewed By: mkazantsev Differential Revision: https://reviews.llvm.org/D71538

When an instrinsic function is declared in a type declaration statement we need to set the INTRINSIC attribute and (per 8.2(3)) ignore the specified type. To simplify the check, add IsIntrinsic utility to BaseVisitor. Also, intrinsics and external procedures were getting assigned a size and offset and they shouldn't be. Differential Revision: https://reviews.llvm.org/D84702

This patch introduces 2 new address spaces in OpenCL: global_device and global_host which are a subset of a global address space, so the address space scheme will be looking like: ``` generic->global->host ->device ->private ->local constant ``` Justification: USM allocations may be associated with both host and device memory. We want to give users a way to tell the compiler the allocation type of a USM pointer for optimization purposes. (Link to the Unified Shared Memory extension: https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/cl_intel_unified_shared_memory.asciidoc) Before this patch USM pointer could be only in opencl_global address space, hence a device backend can't tell if a particular pointer points to host or device memory. On FPGAs at least we can generate more efficient hardware code if the user tells us where the pointer can point - being able to distinguish between these types of pointers at compile time allows us to instantiate simpler load-store units to perform memory transactions. Patch by Dmitry Sidorov. Reviewed By: Anastasia Differential Revision: https://reviews.llvm.org/D82174

The previous fix for this, https://reviews.llvm.org/D76761, Passed test cases but failed in the real world as std::string has a non trivial destructor so creates a CXXBindTemporaryExpr. This handles that shortfall and updates the test case std::basic_string implementation to use a non trivial destructor to reflect real world behaviour. Reviewed By: gribozavr2 Differential Revision: https://reviews.llvm.org/D84831

This commit is part of a greater project which aims to add full end-to-end support for convolutions inside mlir. The reason behind having conv ops for each rank rather than having one generic ConvOp is to enable better optimizations for every N-D case which reflects memory layout of input/kernel buffers better and simplifies code as well. We expect plain linalg.conv to be progressively retired. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D83879

… types Differential Revision: https://reviews.llvm.org/D84444

Differential Revision: https://reviews.llvm.org/D84069

If both operands are undef, return undef. If one operand is undef, clamp to limit constant.

This patch replaces 'AddrSize'/'SegSize' with 'AddressSize'/'SegmentSelectorSize'. NFC.

…ities Not a bug that is ever likely to materialise, but still worth fixing Reviewed By: DmitryPolukhin Differential Revision: https://reviews.llvm.org/D84850

…verlappingMultipleDef In MachineCopyPropagation::BackwardPropagatableCopy(), a check is added for multiple destination registers. The copy propagation is avoided if the copied destination register is the same register as another destination on the same instruction. A new test is added. This used to fail on ARM like this: error: unpredictable instruction, RdHi and RdLo must be different umull r9, r9, lr, r0 Reviewed By: lkail Differential Revision: https://reviews.llvm.org/D82638

We previously used a non-aggregate RValue to represent the passed value, which violated the assumptions of call arg lowering in some cases, in particular on 32-bit Windows, where we'd end up producing an FCA store with TBAA metadata, that the IR verifier would reject.

Reviewed By: gribozavr2 Differential Revision: https://reviews.llvm.org/D84926

This allow declaring buffers and alloc of vectors so that we can support vector load/store. Differential Revision: https://reviews.llvm.org/D84982

Avoid recursively calling copyPhysReg for AGPR handling. This was dropping the necessary super register implicit defs to avoid liveness verifier errors.

Differential Revision: https://reviews.llvm.org/D84984

This allows clients to detect invalid transformations applied by JITLink passes (e.g. inserting or removing symbols in unexpected ways) and terminate linking with an error. This change is used to simplify the error propagation logic in ObjectLinkingLayer.

The -harness option enables new testing use-cases for llvm-jitlink. It takes a list of objects to treat as a test harness for any regular objects passed to llvm-jitlink. If any files are passed using the -harness option then the following transformations are applied to all other files: (1) Symbols definitions that are referenced by the harness files are promoted to default scope. (This enables access to statics from test harness). (2) Symbols definitions that clash with definitions in the harness files are deleted. (This enables interposition by test harness). (3) All other definitions in regular files are demoted to local scope. (This causes untested code to be dead stripped, reducing memory cost and eliminating spurious unresolved symbol errors from untested code). These transformations allow the harness files to reference and interpose symbols in the regular object files, which can be used to support execution tests (including fuzz tests) of functions in relocatable objects produced by a build.

…cessary fptosi/fptoui have similar, but not identical, semantics. In particular, the behavior on overflow is different. Fixes https://bugs.llvm.org/show_bug.cgi?id=46844 for 64-bit. (The corresponding patch for 32-bit is more involved because the equivalent intrinsics don't exist, as far as I can tell.) Differential Revision: https://reviews.llvm.org/D84703

This tool will be used to generate C wrappers for the C++ LLVM libc implementations. This change does not hook this tool up to anything yet. However, it can be useful for cases where one does not want to run the objcopy step (to insert the C symbol in the object file) but can make use of LTO to eliminate the cost of the additional wrapper call. This can be relevant for certain downstream platforms. If this tool can benefit other libc platforms in general, then it can be integrated into the build system with options to use or not use the wrappers. An example of such a platform is CUDA. Reviewed By: abrachet Differential Revision: https://reviews.llvm.org/D84848

clang-tidy's llvm-header-guard rule references the LLVM style - where it's missing. Differential Revision: https://reviews.llvm.org/D84989

…INSIC_LRINT. Differential Revision: https://reviews.llvm.org/D84552

Just the obvious implementation that rewrites the result type. Also fix warning from EXTRACT_SUBVECTOR legalization that triggers on the test. Differential Revision: https://reviews.llvm.org/D84706

…nslated processes When we detect a process that the native debugserver cannot handle, handoff the connection fd to the translated debugserver.

InstrProfilingBuffer.c.o is generic code that must support compilation into freestanding projects. This gets rid of its dependence on the _getpagesize symbol from libc, shifting it to InstrProfilingFile.c.o. This fixes a build failure seen in a firmware project. rdar://66249701

…nsic This includes basic support for computeKnownBits on abs. I've left FIXMEs for more complicated things we could do. Differential Revision: https://reviews.llvm.org/D84963

This patch addes time trace functionality to have a better understanding of the analysis times. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84980

…res and tuning features After the recent change to the tuning settings for pentium4 to improve our default 32-bit behavior, I've decided to see about implementing -mtune support. This way we could have a default architecture CPU of "pentium4" or "x86-64" and a default tuning cpu of "generic". And we could change our "pentium4" tuning settings back to what they were before. As a step to supporting this, this patch separates all of the features lists for the CPUs into 2 lists. I'm using the Proc class and a new ProcModel class to concat the 2 lists before passing to the target independent ProcessorModel. Future work to truly support mtune would change ProcessorModel to take 2 lists separately. I've diffed the X86GenSubtargetInfo.inc file before and after this patch to ensure that the final feature list for the CPUs isn't changed. Differential Revision: https://reviews.llvm.org/D84879

…VI) mitigations Fix for the issue raised in rust-lang/rust#74632. The current heuristic for inserting LFENCEs uses a quadratic-time algorithm. This can apparently cause substantial compilation slowdowns for building Rust projects, where functions > 5000 LoC are apparently common. The updated heuristic in this patch implements a linear-time algorithm. On a set of benchmarks, the slowdown factor for the generated code was comparable (2.55x geo mean for the quadratic-time heuristic, vs. 2.58x for the linear-time heuristic). Both heuristics offer the same security properties, namely, mitigating LVI. This patch also includes some formatting fixes. Differential Revision: https://reviews.llvm.org/D84471

Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D84903

Refactored the function `target` to make preparation for fixing the issue of ahead-of-time device memory deallocation. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84816

Differential Revision: https://reviews.llvm.org/D84616

When `Target::GetEntryPointAddress()` calls `exe_module->GetObjectFile()->GetEntryPointAddress()`, and the returned `entry_addr` is valid, it can immediately be returned. However, just before that, an `llvm::Error` value has been setup, but in this case it is not consumed before returning, like is done further below in the function. In https://bugs.freebsd.org/248745 we got a bug report for this, where a very simple test case aborts and dumps core: ``` * thread plaidml#1, name = 'testcase', stop reason = breakpoint 1.1 frame #0: 0x00000000002018d4 testcase`main(argc=1, argv=0x00007fffffffea18) at testcase.c:3:5 1 int main(int argc, char *argv[]) 2 { -> 3 return 0; 4 } (lldb) p argc Program aborted due to an unhandled Error: Error value was Success. (Note: Success values must still be checked prior to being destroyed). Thread 1 received signal SIGABRT, Aborted. thr_kill () at thr_kill.S:3 3 thr_kill.S: No such file or directory. (gdb) bt #0 thr_kill () at thr_kill.S:3 plaidml#1 0x00000008049a0004 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52 plaidml#2 0x0000000804916229 in abort () at /usr/src/lib/libc/stdlib/abort.c:67 plaidml#3 0x000000000451b5f5 in fatalUncheckedError () at /usr/src/contrib/llvm-project/llvm/lib/Support/Error.cpp:112 plaidml#4 0x00000000019cf008 in GetEntryPointAddress () at /usr/src/contrib/llvm-project/llvm/include/llvm/Support/Error.h:267 plaidml#5 0x0000000001bccbd8 in ConstructorSetup () at /usr/src/contrib/llvm-project/lldb/source/Target/ThreadPlanCallFunction.cpp:67 plaidml#6 0x0000000001bcd2c0 in ThreadPlanCallFunction () at /usr/src/contrib/llvm-project/lldb/source/Target/ThreadPlanCallFunction.cpp:114 plaidml#7 0x00000000020076d4 in InferiorCallMmap () at /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp:97 plaidml#8 0x0000000001f4be33 in DoAllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/FreeBSD/ProcessFreeBSD.cpp:604 plaidml#9 0x0000000001fe51b9 in AllocatePage () at /usr/src/contrib/llvm-project/lldb/source/Target/Memory.cpp:347 plaidml#10 0x0000000001fe5385 in AllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Target/Memory.cpp:383 plaidml#11 0x0000000001974da2 in AllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Target/Process.cpp:2301 plaidml#12 CanJIT () at /usr/src/contrib/llvm-project/lldb/source/Target/Process.cpp:2331 plaidml#13 0x0000000001a1bf3d in Evaluate () at /usr/src/contrib/llvm-project/lldb/source/Expression/UserExpression.cpp:190 plaidml#14 0x00000000019ce7a2 in EvaluateExpression () at /usr/src/contrib/llvm-project/lldb/source/Target/Target.cpp:2372 plaidml#15 0x0000000001ad784c in EvaluateExpression () at /usr/src/contrib/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:414 plaidml#16 0x0000000001ad86ae in DoExecute () at /usr/src/contrib/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:646 plaidml#17 0x0000000001a5e3ed in Execute () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandObject.cpp:1003 plaidml#18 0x0000000001a6c4a3 in HandleCommand () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:1762 plaidml#19 0x0000000001a6f98c in IOHandlerInputComplete () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:2760 plaidml#20 0x0000000001a90b08 in Run () at /usr/src/contrib/llvm-project/lldb/source/Core/IOHandler.cpp:548 plaidml#21 0x00000000019a6c6a in ExecuteIOHandlers () at /usr/src/contrib/llvm-project/lldb/source/Core/Debugger.cpp:903 plaidml#22 0x0000000001a70337 in RunCommandInterpreter () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:2946 plaidml#23 0x0000000001d9d812 in RunCommandInterpreter () at /usr/src/contrib/llvm-project/lldb/source/API/SBDebugger.cpp:1169 plaidml#24 0x0000000001918be8 in MainLoop () at /usr/src/contrib/llvm-project/lldb/tools/driver/Driver.cpp:675 plaidml#25 0x000000000191a114 in main () at /usr/src/contrib/llvm-project/lldb/tools/driver/Driver.cpp:890``` Fix the incorrect error catch by only instantiating an `Error` object if it is necessary. Reviewed By: JDevlieghere Differential Revision: https://reviews.llvm.org/D86355

arsenm and others added 30 commits July 29, 2020 08:27

AMDGPU/GlobalISel: Refactor special argument management

200bb51

[ARM] Optimize immediate selection

85342c2

Optimize some specific immediates selection by materializing them with sub/mvn instructions as opposed to loading them from the constant pool. Patch by Ben Shi, [email protected]. Differential Revision: https://reviews.llvm.org/D83745

[mlir][Standard] Allow unranked memrefs as operands to dim and rank

823ffef

`std.dim` currently only accepts ranked memrefs and `std.rank` is limited to tensors. Differential Revision: https://reviews.llvm.org/D84790

[TTI] Move abs/smax/smin/umax/umin cost expansion to ICA getIntrinsic…

7518210

…InstrCost variant This will simplify target overrides, and matches what we do for most integer intrinsic costs.

[ELF][test] Add test coverage of __real_ to wrap-plt.s

8725a49

Differential Revision: https://reviews.llvm.org/D84749

[CostModel][X86] Add SSE costs for ABS intrinsics

0a0f282

[Driver][ARM] Fix testcase that should only run on ARM

71bf6dd

Fix testcase introduced in d1a3396.

Forward extent tensors through shape.broadcast.

ad793ed

Differential Revision: https://reviews.llvm.org/D84832

[clang][NFC] Pass the ASTContext to CXXRecordDecl::setCaptures

1ae63b4

In general Decl::getASTContext() is relatively expensive and here the changes are non-invasive. NFC.

[clang][NFC] clang-format fix after eb10b06

517fe05

[MLIR][Shape] Limit shape to standard lowerings to their supported types

6673c6c

The lowering does not support all types for its source operations. This change makes the patterns fail in a well-defined manner. Differential Revision: https://reviews.llvm.org/D84443

[InstSimplify] add tests for expandCommutativeBinOp; NFC

672df0f

[MLIR][Shape] Limit shape to SCF lowering patterns to their supported…

5fc34fa

… types Differential Revision: https://reviews.llvm.org/D84444

[CostModel][X86] Add SSE costs for SMAX/SMIN/UMAX/UMIN intrinsics

d1abca1

[NFC][PPC][AIX] Add test coverage for _Complex return values

d5776f2

Differential Revision: https://reviews.llvm.org/D84069

[ConstantFolding] add tests for integer min/max intrinsics; NFC

9f95895

[ConstantFolding] fold integer min/max intrinsics

9ee7d71

If both operands are undef, return undef. If one operand is undef, clamp to limit constant.

[DWARFYAML] Make the field names consistent with the DWARF spec. NFC.

bfa1403

This patch replaces 'AddrSize'/'SegSize' with 'AddressSize'/'SegmentSelectorSize'. NFC.

[clang-tidy] Fix module options being registered with different prior…

62beb7c

…ities Not a bug that is ever likely to materialise, but still worth fixing Reviewed By: DmitryPolukhin Differential Revision: https://reviews.llvm.org/D84850

int3 and others added 25 commits July 30, 2020 14:38

[lld-macho] Add comment for literal argument

c89e46e

[clang-tidy][NFC] Use StringMap for ClangTidyCheckFactories::FacoryMap

c23ae3f

Reviewed By: gribozavr2 Differential Revision: https://reviews.llvm.org/D84926

[mlir][spirv] Add support for converting memref of vector to SPIR-V

59156ba

This allow declaring buffers and alloc of vectors so that we can support vector load/store. Differential Revision: https://reviews.llvm.org/D84982

AMDGPU: Fix liveness errors when copying AGPR tuples

e56e902

Avoid recursively calling copyPhysReg for AGPR handling. This was dropping the necessary super register implicit defs to avoid liveness verifier errors.

[MLIR][NFC] Add SymbolUse::UseRange::empty()

a34a8d5

Differential Revision: https://reviews.llvm.org/D84984

[COFF] Port CallGraphSort to COFF from ELF

763671f

[gn build] Port 763671f

b811736

[doc] Describe the header guard style

abb8128

clang-tidy's llvm-header-guard rule references the LLVM style - where it's missing. Differential Revision: https://reviews.llvm.org/D84989

[AArch64][GlobalISel] Add legalization & selection support for G_INTR…

09f9f7d

…INSIC_LRINT. Differential Revision: https://reviews.llvm.org/D84552

[LegalizeTypes][SVE] Support widen/split legalization for SPLAT_VECTOR

7e88efa

Just the obvious implementation that rewrites the result type. Also fix warning from EXTRACT_SUBVECTOR legalization that triggers on the test. Differential Revision: https://reviews.llvm.org/D84706

[debugserver/Apple Silicon] Handoff connections when attaching to tra…

5760575

…nslated processes When we detect a process that the native debugserver cannot handle, handoff the connection fd to the translated debugserver.

[ValueTracking] Add basic computeKnownBits support for llvm.abs intri…

24f5235

…nsic This includes basic support for computeKnownBits on abs. I've left FIXMEs for more complicated things we could do. Differential Revision: https://reviews.llvm.org/D84963

[Attributor] Add time trace support.

49def10

This patch addes time trace functionality to have a better understanding of the analysis times. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D84980

[NFC][AMDGPU] Improve fused fmul+fadd tests.

aa77232

Reviewed By: rampitec Differential Revision: https://reviews.llvm.org/D84903

[OpenMP] Refactored the function target

8218eee

Refactored the function `target` to make preparation for fixing the issue of ahead-of-time device memory deallocation. Reviewed By: ye-luo Differential Revision: https://reviews.llvm.org/D84816

[NFC] Move findAllocaForValue into ValueTracking.h

61cab35

Differential Revision: https://reviews.llvm.org/D84616

Merge branch 'master' into flaub-sync

724aef5

flaub requested a review from jbruestle July 31, 2020 02:22

flaub self-assigned this Jul 31, 2020

flaub merged commit ae1b53a into plaidml/plaidml-v1 Jul 31, 2020

flaub deleted the flaub-sync branch July 31, 2020 02:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Flaub sync #23

Flaub sync #23

Uh oh!

flaub commented Jul 31, 2020

Uh oh!

Uh oh!

Flaub sync #23

Flaub sync #23

Uh oh!

Conversation

flaub commented Jul 31, 2020

Uh oh!

Uh oh!