Description
🐛 Describe the bug
Similar to: #150628
Cherry Picks to validate on final RC:
- modded-nanogpt flaky NCCL hang starting 3/30 nightly #152623 @kwen2501
- [c10d] Fix extra CUDA context created by barrier #152834 @kwen2501
- [c10d] Turn off default non-blocking API mode to work around hang in NCCL 2.26 #154085 @kwen2501
- [FlexAttention] Remove Old Constraint on lastdim strides #151959 @drisspg
- [FlexAttention] Remove old constraint that was causing assert failure #151521 @drisspg
- [FlexAttention] Remove Old Constraint on lastdim strides #151959 @drisspg
- [FlexAttention] explicilty create grad_q w/ strides #152641 @drisspg
- [dynamo][super variable] Fix bug to use correct source #151154 @anijain2305
- [cudagraphs] Fix issue in collecting static_input_idxs #152287 @anijain2305
- [dynamo][super variable] Fix bug to use correct source #152774 @anijain2305
- [cudagraphs][HF][torch 2.7] Excessive cudagraph re-recording for HF LLM models #152275 @anijain2305
- Add device guard for xpu conv on multi device #153067 @chuanqi129
- [CD] Fix the libgomp twice load issue #150084 @chuanqi129
- Add device guard for xpu conv on multi device #153345 @chuanqi129
- [CD] Fix the libgomp twice load issue (#150084) #153518 @chuanqi129
- XPU inference output abnormal with device 'XPU:1' #153022 @chuanqi129
- Pip-installed pytorch limits threads to 1 when setting GOMP_CPU_AFFINITY (likely due to bundled GOMP) #149422 @chuanqi129
- Illegal Instruction Caused by
grid_sample
Under Windows #152385 @xuhancn - [ATen][CUDA] Optimize 128 bit vectorization #148320 @atalman - New Cherry-Pick done for this issue
- Remove cuda dependencies from non cuda buids #152333 @atalman
- [binary builds] Linux aarch64 CUDA builds. Make sure tag is set correctly #154136 @atalman
- [release only] Bump triton version to 3.3.1 #153554 @atalman
- [BUG]
einops
is unsupported and break dynamo graph with torch 2.7 #153476 @ZainRizvi - [Dynamo] Exception raised inside torch.autocast causes crash AttributeError: 'NoneType' object has no attribute 'is_python_constant #152012 @ZainRizvi
- [MKLDNN] Check that strides are positive #151848 @malfet
- Fix tensorpipe compilation with clang-17 #151344 @malfet
- [vec128] Fix fmsub NEON defintion #152075 @malfet
- [Cherry-pick] Fix copysign + scalar correctness issue #153098 @malfet
- Inductor doesn't support tensor.view(dtype).copy_(...) #151156 @atalman
- Mark auto_functionalized HOPs as cacheable (#151194) #153304 @zou3519
- [ONNX] Update decomposition logic to loop over onnx registry #153168 @titaiwangms
- Only print dde partial fx graph for export #153218 @StrongerXi
- [dynamo] replace
unimplemented
withunimplemented_v2
invariables/functions.py
#153533 @StrongerXi - [ROCm] Update CUDAPluggableAllocator.h (#1984) #153974 @jithunnair-amd
Additional validation checks - @atalman :
-
Validate Linux aarch64 CUDA builds with triton (Please note all CUDA Aarch64 builds where validated by Nvidia)
-
Python 3.13 and 3.13t wheel validate - https://github.com/pytorch/test-infra/actions/runs/15303152347/job/43059048408
-
Amazon Linux 2023 Test + torch.compile + no numpy installed: https://github.com/pytorch/test-infra/actions/runs/15303152347/job/43059048408
-
Validate Metadata section of wheels - make sure python versions are set
-
PyTorch 2.7.1 exposes statically linked libstdc++ CXX11 ABI symbols : PyTorch 2.5.0 exposes statically linked
libstdc++
CXX11 ABI symbols. #133437- Tested on macOS by running and verified and no matches
(release2.7) ~/test/release2.7/.venv/lib/python3.12/site-packages/torch/lib nm -gU libtorch_cpu.dylib | grep "recursive_directory_iterator"
-
CUDA
- pypi binaries with slimmed dependencies are usable in standard AWS containers 2023 regression in 1.13 - https://github.com/pytorch/test-infra/actions/runs/15352480877/job/43203837080 - TODO REMOVE NEXT ITERATION. AUTOMATED
- Check cuda 1.12.1 update issue:
torch.linalg.eigh
fails on GPU #94772 with small wheels . Passes on GPU but failing on CPU, new issue: torch.linalg.eigh fails on CPU #145801
-
torch.compile
- Basic test works (for example see test mentioned in Search for
libdevice
relative to shared library triton-lang/triton#1176 ) in PyTorch docker container -
torch.compile
raises an error if used on Windows. Test (part of torchvision): https://github.com/pytorch/test-infra/actions/runs/14182325015/job/39731076931#step:9:447 -
torch.compile
works on 3.13 : Test: https://github.com/pytorch/test-infra/actions/runs/14315674885/job/40121143490#step:15:3483 -
torch.compile
raises error on 3.13t: Validated :RuntimeError: torch.compile is not supported on Python built with GIL disabled
- Basic test works (for example see test mentioned in Search for
-
MPS
- Resnet is usable out of the box (https://github.com/pytorch/test-infra/actions/runs/14315674885/job/40121143490#step:15:3469)
- Is torchvision usable? True German shepherd (cpu): 37.6% German shepherd (mps): 34.1%
-
Validate docker release builds
Versions
2.7.1
Metadata
Metadata
Assignees
Labels
Type
Projects
Status