Skip to content

Flaub sync #23

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1,958 commits into from
Jul 31, 2020
Merged

Flaub sync #23

merged 1,958 commits into from
Jul 31, 2020

Conversation

flaub
Copy link

@flaub flaub commented Jul 31, 2020

No description provided.

arsenm and others added 30 commits July 29, 2020 08:27
Optimize some specific immediates selection by materializing them with sub/mvn
instructions as opposed to loading them from the constant pool.

Patch by Ben Shi, [email protected].

Differential Revision: https://reviews.llvm.org/D83745
I have added tests to:

  CodeGen/AArch64/sve-intrinsics-int-arith.ll

for doing simple integer add operations on tuple types. Since these
tests introduced new warnings due to incorrect use of
getVectorNumElements() I have also fixed up these warnings in the
same patch. These fixes are:

1. In narrowExtractedVectorBinOp I have changed the code to bail out
early for scalable vector types, since we've not yet hit a case that
proves the optimisations are profitable for scalable vectors.
2. In DAGTypeLegalizer::WidenVecRes_CONCAT_VECTORS I have replaced
calls to getVectorNumElements with getVectorMinNumElements in cases
that work with scalable vectors. For the other cases I have added
asserts that the vector is not scalable because we should not be
using shuffle vectors and build vectors in such cases.

Differential revision: https://reviews.llvm.org/D84016
Currently, getCastInstrCost has limited information about the cast it's
rating, often just the opcode and types.  Sometimes there is a context
instruction as well, but it isn't trustworthy: for instance, when the
vectorizer is rating a plan, it calls getCastInstrCost with the old
instructions when, in fact, it's trying to evaluate the cost of the
instruction post-vectorization.  Thus, the current system can get the
cost of certain casts incorrect as the correct cost can vary greatly
based on the context in which it's used.

For example, if the vectorizer queries getCastInstrCost to evaluate the
cost of a sext(load) with tail predication enabled, getCastInstrCost
will think it's free most of the time, but it's not always free. On ARM
MVE, a VLD2 group cannot be extended like a normal VLDR can. Similar
situations can come up with how masked loads can be extended when being
split.

To fix that, this path adds a new parameter to getCastInstrCost to give
it a hint about the context of the cast. It adds a CastContextHint enum
which contains the type of the load/store being created by the
vectorizer - one for each of the types it can produce.

Original patch by Pierre van Houtryve

Differential Revision: https://reviews.llvm.org/D79162
… masked stores

This patch uses the feature added in D79162 to fix the cost of a
sext/zext of a masked load, or a trunc for a masked store.
Previously, those were considered cheap or even free, but it's
not the case as we cannot split the load in the same way we would for
normal loads.

This updates the costs to better reflect reality, and adds a test for it
in test/Analysis/CostModel/ARM/cast.ll.

It also adds a vectorizer test that showcases the improvement: in some
cases, the vectorizer will now choose a smaller VF when
tail-predication is enabled, which results in better codegen. (Because
if it were to use a higher VF in those cases, the code we see above
would be generated, and the vmovs would block tail-predication later in
the process, resulting in very poor codegen overall)

Original Patch by Pierre van Houtryve

Differential Revision: https://reviews.llvm.org/D79163
`std.dim` currently only accepts ranked memrefs and `std.rank` is limited to
tensors.

Differential Revision: https://reviews.llvm.org/D84790
…InstrCost variant

This will simplify target overrides, and matches what we do for most integer intrinsic costs.
…s used

A list of target features is disabled when there is no hardware
floating-point support. This is the case when one of the following
options is passed to clang:

 - -mfloat-abi=soft
 - -mfpu=none

This option list is missing, however, the extension "+nofp" that can be
specified in -march flags, such as "-march=armv8-a+nofp".

This patch also disables unsupported target features when nofp is passed
to -march.

Differential Revision: https://reviews.llvm.org/D82948
There's a slight difference in functionality with the new CHECK lines:
before, we allowed either -0.0 or 0.0 for maxnum/minnum. That matches
the definition, but we should always get a deterministic result from
constant folding within the compiler, so now we assert that we got
the single expected result in all cases.
In general Decl::getASTContext() is relatively expensive and here the changes
are non-invasive. NFC.
The lowering does not support all types for its source operations. This change
makes the patterns fail in a well-defined manner.

Differential Revision: https://reviews.llvm.org/D84443
This patch teaches SCEVExpander to directly preserve LCSSA.

As it is currently, SCEV does not look through PHI nodes in loops,
as it might break LCSSA form. Once SCEVExpander can preserve
LCSSA form, it should be safe for SCEV to look through PHIs.

To preserve LCSSA form, this patch uses formLCSSAForInstructions
on operands of newly created instructions, if the definition is inside
a different loop than the new instruction.

The final value we return from expandCodeFor may also need LCSSA
phis, depending on the insert point. As no user for it exists there yet,
create a temporary instruction at the insert point, which can be passed
to formLCSSAForInstructions. This temporary instruction is removed
after LCSSA construction.

Reviewed By: mkazantsev

Differential Revision: https://reviews.llvm.org/D71538
When an instrinsic function is declared in a type declaration statement
we need to set the INTRINSIC attribute and (per 8.2(3)) ignore the
specified type.

To simplify the check, add IsIntrinsic utility to BaseVisitor.

Also, intrinsics and external procedures were getting assigned a size
and offset and they shouldn't be.

Differential Revision: https://reviews.llvm.org/D84702
This patch introduces 2 new address spaces in OpenCL: global_device and global_host
which are a subset of a global address space, so the address space scheme will be
looking like:

```
generic->global->host
                          ->device
             ->private
             ->local
constant
```

Justification: USM allocations may be associated with both host and device memory. We
want to give users a way to tell the compiler the allocation type of a USM pointer for
optimization purposes. (Link to the Unified Shared Memory extension:
https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/USM/cl_intel_unified_shared_memory.asciidoc)

Before this patch USM pointer could be only in opencl_global
address space, hence a device backend can't tell if a particular pointer
points to host or device memory. On FPGAs at least we can generate more
efficient hardware code if the user tells us where the pointer can point -
being able to distinguish between these types of pointers at compile time
allows us to instantiate simpler load-store units to perform memory
transactions.

Patch by Dmitry Sidorov.

Reviewed By: Anastasia

Differential Revision: https://reviews.llvm.org/D82174
The previous fix for this, https://reviews.llvm.org/D76761, Passed test cases but failed in the real world as std::string has a non trivial destructor so creates a CXXBindTemporaryExpr.

This handles that shortfall and updates the test case std::basic_string implementation to use a non trivial destructor to reflect real world behaviour.

Reviewed By: gribozavr2

Differential Revision: https://reviews.llvm.org/D84831
This commit is part of a greater project which aims to add
full end-to-end support for convolutions inside mlir. The
reason behind having conv ops for each rank rather than
having one generic ConvOp is to enable better optimizations
for every N-D case which reflects memory layout of input/kernel
buffers better and simplifies code as well. We expect plain linalg.conv
to be progressively retired.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D83879
If both operands are undef, return undef.
If one operand is undef, clamp to limit constant.
This patch replaces 'AddrSize'/'SegSize' with
'AddressSize'/'SegmentSelectorSize'. NFC.
…ities

Not a bug that is ever likely to materialise, but still worth fixing

Reviewed By: DmitryPolukhin

Differential Revision: https://reviews.llvm.org/D84850
…verlappingMultipleDef

In MachineCopyPropagation::BackwardPropagatableCopy(),
a check is added for multiple destination registers.

The copy propagation is avoided if the copied destination register
is the same register as another destination on the same instruction.

A new test is added.  This used to fail on ARM like this:
error: unpredictable instruction, RdHi and RdLo must be different
        umull   r9, r9, lr, r0

Reviewed By: lkail

Differential Revision: https://reviews.llvm.org/D82638
int3 and others added 25 commits July 30, 2020 14:38
We previously used a non-aggregate RValue to represent the passed value,
which violated the assumptions of call arg lowering in some cases, in
particular on 32-bit Windows, where we'd end up producing an FCA store
with TBAA metadata, that the IR verifier would reject.
This allow declaring buffers and alloc of vectors so that we can support vector
load/store.

Differential Revision: https://reviews.llvm.org/D84982
Avoid recursively calling copyPhysReg for AGPR handling. This was
dropping the necessary super register implicit defs to avoid liveness
verifier errors.
This allows clients to detect invalid transformations applied by JITLink passes
(e.g. inserting or removing symbols in unexpected ways) and terminate linking
with an error.

This change is used to simplify the error propagation logic in
ObjectLinkingLayer.
The -harness option enables new testing use-cases for llvm-jitlink. It takes a
list of objects to treat as a test harness for any regular objects passed to
llvm-jitlink.

If any files are passed using the -harness option then the following
transformations are applied to all other files:

  (1) Symbols definitions that are referenced by the harness files are promoted
      to default scope. (This enables access to statics from test harness).

  (2) Symbols definitions that clash with definitions in the harness files are
      deleted. (This enables interposition by test harness).

  (3) All other definitions in regular files are demoted to local scope.
      (This causes untested code to be dead stripped, reducing memory cost and
      eliminating spurious unresolved symbol errors from untested code).

These transformations allow the harness files to reference and interpose
symbols in the regular object files, which can be used to support execution
tests (including fuzz tests) of functions in relocatable objects produced by a
build.
…cessary

fptosi/fptoui have similar, but not identical, semantics.  In
particular, the behavior on overflow is different.

Fixes https://bugs.llvm.org/show_bug.cgi?id=46844 for 64-bit.  (The
corresponding patch for 32-bit is more involved because the equivalent
intrinsics don't exist, as far as I can tell.)

Differential Revision: https://reviews.llvm.org/D84703
This tool will be used to generate C wrappers for the C++ LLVM libc
implementations. This change does not hook this tool up to anything yet.
However, it can be useful for cases where one does not want to run the
objcopy step (to insert the C symbol in the object file) but can make use
of LTO to eliminate the cost of the additional wrapper call. This can be
relevant for certain downstream platforms. If this tool can benefit other
libc platforms in general, then it can be integrated into the build system
with options to use or not use the wrappers. An example of such a
platform is CUDA.

Reviewed By: abrachet

Differential Revision: https://reviews.llvm.org/D84848
clang-tidy's llvm-header-guard rule references the LLVM style - where it's
missing.

Differential Revision: https://reviews.llvm.org/D84989
Just the obvious implementation that rewrites the result type. Also fix
warning from EXTRACT_SUBVECTOR legalization that triggers on the test.

Differential Revision: https://reviews.llvm.org/D84706
…nslated processes

When we detect a process that the native debugserver cannot handle,
handoff the connection fd to the translated debugserver.
InstrProfilingBuffer.c.o is generic code that must support compilation
into freestanding projects. This gets rid of its dependence on the
_getpagesize symbol from libc, shifting it to InstrProfilingFile.c.o.

This fixes a build failure seen in a firmware project.

rdar://66249701
…nsic

This includes basic support for computeKnownBits on abs. I've left FIXMEs for more complicated things we could do.

Differential Revision: https://reviews.llvm.org/D84963
This patch addes time trace functionality to have a better understanding
of the analysis times.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D84980
…res and tuning features

After the recent change to the tuning settings for pentium4 to improve our default 32-bit behavior, I've decided to see about implementing -mtune support. This way we could have a default architecture CPU of "pentium4" or "x86-64" and a default tuning cpu of "generic". And we could change our "pentium4" tuning settings back to what they were before.

As a step to supporting this, this patch separates all of the features lists for the CPUs into 2 lists. I'm using the Proc class and a new ProcModel class to concat the 2 lists before passing to the target independent ProcessorModel. Future work to truly support mtune would change ProcessorModel to take 2 lists separately. I've diffed the X86GenSubtargetInfo.inc file before and after this patch to ensure that the final feature list for the CPUs isn't changed.

Differential Revision: https://reviews.llvm.org/D84879
…VI) mitigations

Fix for the issue raised in rust-lang/rust#74632.

The current heuristic for inserting LFENCEs uses a quadratic-time algorithm. This can apparently cause substantial compilation slowdowns for building Rust projects, where functions > 5000 LoC are apparently common.

The updated heuristic in this patch implements a linear-time algorithm. On a set of benchmarks, the slowdown factor for the generated code was comparable (2.55x geo mean for the quadratic-time heuristic, vs. 2.58x for the linear-time heuristic). Both heuristics offer the same security properties, namely, mitigating LVI.

This patch also includes some formatting fixes.

Differential Revision: https://reviews.llvm.org/D84471
Refactored the function `target` to make preparation for fixing the
issue of ahead-of-time device memory deallocation.

Reviewed By: ye-luo

Differential Revision: https://reviews.llvm.org/D84816
@flaub flaub requested a review from jbruestle July 31, 2020 02:22
@flaub flaub self-assigned this Jul 31, 2020
@flaub flaub merged commit ae1b53a into plaidml/plaidml-v1 Jul 31, 2020
@flaub flaub deleted the flaub-sync branch July 31, 2020 02:22
haoyouab pushed a commit to Flex-plaidml-team/llvm-project that referenced this pull request Sep 4, 2020
When `Target::GetEntryPointAddress()` calls `exe_module->GetObjectFile()->GetEntryPointAddress()`, and the returned
`entry_addr` is valid, it can immediately be returned.

However, just before that, an `llvm::Error` value has been setup, but in this case it is not consumed before returning, like is done further below in the function.

In https://bugs.freebsd.org/248745 we got a bug report for this, where a very simple test case aborts and dumps core:

```
* thread plaidml#1, name = 'testcase', stop reason = breakpoint 1.1
    frame #0: 0x00000000002018d4 testcase`main(argc=1, argv=0x00007fffffffea18) at testcase.c:3:5
   1	int main(int argc, char *argv[])
   2	{
-> 3	    return 0;
   4	}
(lldb) p argc
Program aborted due to an unhandled Error:
Error value was Success. (Note: Success values must still be checked prior to being destroyed).

Thread 1 received signal SIGABRT, Aborted.
thr_kill () at thr_kill.S:3
3	thr_kill.S: No such file or directory.
(gdb) bt
#0  thr_kill () at thr_kill.S:3
plaidml#1  0x00000008049a0004 in __raise (s=6) at /usr/src/lib/libc/gen/raise.c:52
plaidml#2  0x0000000804916229 in abort () at /usr/src/lib/libc/stdlib/abort.c:67
plaidml#3  0x000000000451b5f5 in fatalUncheckedError () at /usr/src/contrib/llvm-project/llvm/lib/Support/Error.cpp:112
plaidml#4  0x00000000019cf008 in GetEntryPointAddress () at /usr/src/contrib/llvm-project/llvm/include/llvm/Support/Error.h:267
plaidml#5  0x0000000001bccbd8 in ConstructorSetup () at /usr/src/contrib/llvm-project/lldb/source/Target/ThreadPlanCallFunction.cpp:67
plaidml#6  0x0000000001bcd2c0 in ThreadPlanCallFunction () at /usr/src/contrib/llvm-project/lldb/source/Target/ThreadPlanCallFunction.cpp:114
plaidml#7  0x00000000020076d4 in InferiorCallMmap () at /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/Utility/InferiorCallPOSIX.cpp:97
plaidml#8  0x0000000001f4be33 in DoAllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Plugins/Process/FreeBSD/ProcessFreeBSD.cpp:604
plaidml#9  0x0000000001fe51b9 in AllocatePage () at /usr/src/contrib/llvm-project/lldb/source/Target/Memory.cpp:347
plaidml#10 0x0000000001fe5385 in AllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Target/Memory.cpp:383
plaidml#11 0x0000000001974da2 in AllocateMemory () at /usr/src/contrib/llvm-project/lldb/source/Target/Process.cpp:2301
plaidml#12 CanJIT () at /usr/src/contrib/llvm-project/lldb/source/Target/Process.cpp:2331
plaidml#13 0x0000000001a1bf3d in Evaluate () at /usr/src/contrib/llvm-project/lldb/source/Expression/UserExpression.cpp:190
plaidml#14 0x00000000019ce7a2 in EvaluateExpression () at /usr/src/contrib/llvm-project/lldb/source/Target/Target.cpp:2372
plaidml#15 0x0000000001ad784c in EvaluateExpression () at /usr/src/contrib/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:414
plaidml#16 0x0000000001ad86ae in DoExecute () at /usr/src/contrib/llvm-project/lldb/source/Commands/CommandObjectExpression.cpp:646
plaidml#17 0x0000000001a5e3ed in Execute () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandObject.cpp:1003
plaidml#18 0x0000000001a6c4a3 in HandleCommand () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:1762
plaidml#19 0x0000000001a6f98c in IOHandlerInputComplete () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:2760
plaidml#20 0x0000000001a90b08 in Run () at /usr/src/contrib/llvm-project/lldb/source/Core/IOHandler.cpp:548
plaidml#21 0x00000000019a6c6a in ExecuteIOHandlers () at /usr/src/contrib/llvm-project/lldb/source/Core/Debugger.cpp:903
plaidml#22 0x0000000001a70337 in RunCommandInterpreter () at /usr/src/contrib/llvm-project/lldb/source/Interpreter/CommandInterpreter.cpp:2946
plaidml#23 0x0000000001d9d812 in RunCommandInterpreter () at /usr/src/contrib/llvm-project/lldb/source/API/SBDebugger.cpp:1169
plaidml#24 0x0000000001918be8 in MainLoop () at /usr/src/contrib/llvm-project/lldb/tools/driver/Driver.cpp:675
plaidml#25 0x000000000191a114 in main () at /usr/src/contrib/llvm-project/lldb/tools/driver/Driver.cpp:890```

Fix the incorrect error catch by only instantiating an `Error` object if it is necessary.

Reviewed By: JDevlieghere

Differential Revision: https://reviews.llvm.org/D86355
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.