Skip to content

Torch 1.6.0 update #166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Aug 26, 2020
Merged

Torch 1.6.0 update #166

merged 13 commits into from
Aug 26, 2020

Conversation

narendasan
Copy link
Collaborator

Description

Updates the compiler for PyTorch 1.6.0. Breaking change: drops support for Python 3.5. Known issue: Bug with PyTorch some int[] are not parsable with the IR parsing tools. Issue has been raised with PyTorch team. Shuffle case fails due to this issue. Solve issues with segfaults during cuDNN clean up.

Fixes #1

Type of change

Please delete options that are not relevant and/or add your own.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation and have regenerated the documentation (make html in docsrc)
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes

narendasan and others added 7 commits July 24, 2020 11:47
Signed-off-by: Naren Dasan <[email protected]>
Signed-off-by: Naren Dasan <[email protected]>
BREAKING CHANGE: Support for Python 3.5 is being dropped with this
update

Signed-off-by: Naren Dasan <[email protected]>
Signed-off-by: Naren Dasan <[email protected]>
@narendasan narendasan added this to the v0.1.0 milestone Aug 6, 2020
@github-actions github-actions bot added component: api [Python] Issues re: Python API component: build system Issues re: Build system component: conversion Issues re: Conversion stage component: core Issues re: The core compiler component: evaluators Issues re: Specific op evaluators component: lowering Issues re: The lowering / preprocessing passes component: tests Issues re: Tests labels Aug 6, 2020
@xsacha
Copy link
Contributor

xsacha commented Aug 6, 2020

Just tried to build it (Windows) and got:

execution.lo.lib(TRTEngine.obj) : error LNK2019: unresolved external symbol createInferRuntime_INTERNAL referenced in function "public: __cdecl trtorch::core::execution::TRTEngine::TRTEngine(class std::basic_string<char,struct std::char_traits,class std::allocator >,class std::basic_string<char,struct std::char_traits,class std::allocator >)" (??0TRTEngine@execution@core@trtorch@@qeaa@V?$basic_string@DU?$char_traits@D@std@@v?$allocator@D@2@@std@@0@Z)
conversionctx.lib(ConversionCtx.obj) : error LNK2019: unresolved external symbol createInferBuilder_INTERNAL referenced in function "public: __cdecl trtorch::core::conversion::ConversionCtx::ConversionCtx(struct trtorch::core::conversion::BuilderSettings)" (??0ConversionCtx@conversion@core@trtorch@@qeaa@UBuilderSettings@123@@z)
bazel-out\x64_windows-opt\bin\cpp\api\lib\libtrtorch.so : fatal error LNK1120: 3 unresolved externals

Edit: Sorry, unrelated to this PR.
It was this commit: 858d8c3#diff-e0ac18efc84fa06bf6e9b694d57f68adL75

Here's this PR built for Windows:
trtorch-PR.zip
trtorch-PR-debug.zip

Unfortunately, the same bug still occurs that was happening prior to this PR.
While calling torch::jit::parseSchema on:
trt::execute_engine(Tensor[] inputs, __torch__.torch.classes.tensorrt.Engine engine) -> Tensor[]

Last 3 lines in debug console are:
DEBUG: [TRTorch - Debug Build] - Registering evaluator for prim::unchecked_cast
DEBUG: [TRTorch - Debug Build] - Registering evaluator for prim::Uninitialized
DEBUG: [TRTorch - Debug Build] - Registering evaluator for prim::RaiseException

@narendasan
Copy link
Collaborator Author

narendasan commented Aug 6, 2020

We can add back the change, but we removed it because it broke linux builds. Maybe we can just have a default condition with an empty list

platform friendly way

Signed-off-by: Naren Dasan <[email protected]>
Signed-off-by: Naren Dasan <[email protected]>
@xsacha
Copy link
Contributor

xsacha commented Aug 6, 2020

Oh yeah, I've added it back to compile it, but the other issue below remain. It's the issue in #153 so probably Windows only.

Unknown custom class type tensorrt.Engine. Please ensure it is registered.:
trt::execute_engine(Tensor[] inputs, torch.torch.classes.tensorrt.Engine engine) -> Tensor[]
                                                                  ~~~~~~ <--- HERE

If I ignore this exception, it seems to correctly compile the graph but then when executing it, I get:

ERROR: [TRTorch Conversion Context] - %input.61 : Tensor = aten::prelu(%input.59, %self.input_layer.2.weight) # /home/sacha/.local/lib/python3.8/site-packages/torch/nn/functional.py:1263:0: slope tensor must be unidirectional broadcastable to input tensor
DEBUG: [TRTorch - Debug Build] - momentum disregarded
DEBUG: [TRTorch - Debug Build] - training disregarded
DEBUG: [TRTorch - Debug Build] - cudnn disregarded
DEBUG: [TRTorch - Debug Build] - Input shape is less than 4D got: [], inserting shuffle layer to reshape to 4D tensor shape: [1, 1, 1, 1]
DEBUG: [TRTorch - Debug Build] - Weights: [64]
    Number of input maps: 64
    Number of output maps: 64
    Element shape: [1]
DEBUG: [TRTorch - Debug Build] - Weights: [64]
    Number of input maps: 64
    Number of output maps: 64
    Element shape: [1]
ERROR: [TRTorch Conversion Context] - %input.61 : Tensor = aten::prelu(%input.59, %self.input_layer.2.weight) # /home/sacha/.local/lib/python3.8/site-packages/torch/nn/functional.py:1263:0: slope tensor must be unidirectional broadcastable to input tensor
ERROR: [TRTorch Conversion Context] - Parameter check failed at: Network.cpp::nvinfer1::Network::addScaleNd::737, condition: nbSpatialDims == 2 || nbSpatialDims == 3

That appears to be a aten::prelu issue, so I tried it on a model that doesn't have prelu and it got much further past this.
It seems to run the model and then ended up with this error:

0 INTERNAL ASSERT FAILED at "..\\..\\torch\\csrc\\jit\\ir\\alias_analysis.cpp":465, please report a bug to PyTorch. We don't have an op for trt::execute_engine but it isn't a special case.  Argument types: Tensor[], __torch__.torch.classes.tensorrt.Engine,
Exception raised from analyzeImpl at ..\..\torch\csrc\jit\ir\alias_analysis.cpp:465 (most recent call first):

@narendasan
Copy link
Collaborator Author

Hmm, seems like that error is actually caused by some issue earlier in the compilation process. I see

DEBUG: [TRTorch - Debug Build] - Input shape is less than 4D got: [], inserting shuffle layer to reshape to 4D tensor shape: [1, 1, 1, 1]

Which is odd for the input to prelu. That would explain why you cannot broadcast here

@xsacha
Copy link
Contributor

xsacha commented Aug 10, 2020

The issues I'm facing do not seem to be related to this PR in any case.

narendasan and others added 4 commits August 24, 2020 09:20
PyTorch container

Signed-off-by: Naren Dasan <[email protected]>
Signed-off-by: Naren Dasan <[email protected]>
BREAKING CHANGE: Version is being bumped to version 0.1.0a0 to target
PyTorch 1.6.0

Signed-off-by: Naren Dasan <[email protected]>
Signed-off-by: Naren Dasan <[email protected]>
@github-actions github-actions bot added the component: api [C++] Issues re: C++ API label Aug 25, 2020
@narendasan narendasan merged commit 809f9b3 into master Aug 26, 2020
@narendasan narendasan deleted the torch_1.6.0_update branch August 26, 2020 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: api [C++] Issues re: C++ API component: api [Python] Issues re: Python API component: build system Issues re: Build system component: conversion Issues re: Conversion stage component: core Issues re: The core compiler component: evaluators Issues re: Specific op evaluators component: lowering Issues re: The lowering / preprocessing passes component: tests Issues re: Tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Module tests fail because of segfault in cuDNN destructor
2 participants