-
Notifications
You must be signed in to change notification settings - Fork 689
Compatibility between nightly build and ffmpeg #3411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The upgrade to FFmpeg 5 was reverted in #3377, due to inconsistent availability. I am still figuring out the best way to support FFmpeg. The segfault could be a regression introduced in main branch. We don't have a good CI for GPU decoder so I might have missed something. I will try to look into it. |
CPU decoder also failed with the segfault, but seems like CPU decoder was tested in the CI pipeline without any error? |
To investigate pytorch#3411
Yeah, and I tested it on my macbook pro, and it works fine. I have two hypothesis on this. 1 is some issue with FFmpeg you installed and 2 is the dlopen I introduced the last week. To rule out 2, I made #3418. I will land it before tomorrow so that this feature is turned off in tomorrow's nightly build, and I would like ask you to try again and see if CPU decoder works and GPU decoder throws an error instead of segfault. |
So I tested with the new nightly build and I still got the segfault for both both cpu and cuda decoders. See below for the test environment Collecting environment information...
PyTorch version: 2.1.0.dev20230608
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-1017-aws-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.6.124
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A10G
Nvidia driver version: 510.73.08
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7R32
Stepping: 0
CPU MHz: 3002.855
BogoMIPS: 5599.58
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 128 KiB
L1i cache: 128 KiB
L2 cache: 2 MiB
L3 cache: 16 MiB
NUMA node0 CPU(s): 0-7
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid
Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.1.0.dev20230608
[pip3] torchaudio==2.1.0.dev20230608
[pip3] torchvision==0.16.0.dev20230608
[pip3] triton==2.1.0
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h6d00ec8_46342
[conda] mkl-service 2.4.0 py310h5eee18b_1
[conda] mkl_fft 1.3.6 py310h1128e8f_1
[conda] mkl_random 1.2.2 py310h1128e8f_1
[conda] numpy 1.24.3 py310h5f9d8c6_1
[conda] numpy-base 1.24.3 py310hb5e798b_1
[conda] pytorch 2.1.0.dev20230608 py3.10_cuda11.8_cudnn8.7.0_0 pytorch-nightly
[conda] pytorch-cuda 11.8 h7e8668a_5 pytorch-nightly
[conda] pytorch-mutex 1.0 cuda pytorch-nightly
[conda] torchaudio 2.1.0.dev20230608 py310_cu118 pytorch-nightly
[conda] torchtriton 2.1.0+9820899b38 py310 pytorch-nightly
[conda] torchvision 0.16.0.dev20230608 py310_cu118 pytorch-nightly I also printed the version of ffmpeg by # packages in environment at /home/ubuntu/.conda/envs/torchqa_nightly:
#
# Name Version Build Channel
ffmpeg 4.4.2 gpl_h8dda1f0_112 conda-forge I also tested the latest stable release of torchaudio (2.0.2-py310_cu118) with the same version of ffmpeg, and the StreamReader works well. |
Thanks for trying. That is strange. I tried your repro script (BTW thanks for the complete repro script), in Windows and it worked fine. How did you install the ffmpeg? I see the particular build is listed as cf-staging, but I don't know how to install it. regular |
I installed ffmpeg by How do you normally install ffmpeg? |
I use the same command but it never picks up those packages from |
To pick up exactly the same ffmpeg, you could try BTW, I ran the following three commands to create my conda environemnt for the test. conda create -n test python=3.10
conda install -y ffmpeg=4.4.2 -c conda-forge
conda install -y pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch-nightly -c nvidia |
@mthrok Hi, are you able to reproduce the error in the end? If no, do you need me to give you a dockerfile to reproduce the environment? |
Hi - Sorry, I have not gotten the time to look into it yet. Yes, Docker-based repro would be nice. Thank you |
@mthrok Hi, I created a docker file named FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04
# install downloader
RUN apt update &&\
apt -y upgrade &&\
apt -y install wget
# install conda
ENV CONDA_DIR /opt/conda
RUN wget -P /media/ https://repo.anaconda.com/archive/Anaconda3-2023.07-1-Linux-x86_64.sh &&\
/bin/bash /media/Anaconda3-2023.07-1-Linux-x86_64.sh -b -p ${CONDA_DIR}
ENV PATH=${CONDA_DIR}/bin:$PATH
# install torchaudio environment
RUN conda create -y -n torchenv python=3.10
RUN echo "source activate torchenv" > ~/.bashrc &&\
echo "export PATH=/opt/conda/envs/torchenv/bin:$PATH"
SHELL [ "conda", "run", "-n", "torchenv", "/bin/bash", "-c"]
RUN conda install -y ffmpeg=4.4.2 -c conda-forge &&\
conda install -y pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
# change working directory
WORKDIR /app
# copy test scripts and generate test data
COPY test_01.py test_02.py ./
RUN ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx265 -pix_fmt yuv420p10le -vtag hvc1 -y test_hevc_hdr.mp4 &&\
ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx265 -pix_fmt yuv420p -vtag hvc1 -y test_hevc_sdr.mp4 &&\
ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx264 -pix_fmt yuv420p -vtag avc1 -y test_h264_sdr.mp4 I also put the two following test scripts in the same folder of the docker file.
import torch
import torchaudio
from torchaudio.utils import ffmpeg_utils
print(torch.__version__)
print(torchaudio.__version__)
print(ffmpeg_utils.get_versions())
print(ffmpeg_utils.get_build_config())
print([k for k in ffmpeg_utils.get_video_decoders().keys() if 'cuvid' in k])
from torchaudio.io import StreamReader
from pathlib import Path
def test_func(src: str, decoder: str, device: str = 'cpu'):
if device == 'cuda':
decode_config = {
'buffer_chunk_size': 50,
'decoder': f'{decoder}_cuvid',
'hw_accel': 'cuda',
"format": None,
}
else:
decode_config = {
'buffer_chunk_size': 50,
'decoder': decoder,
"decoder_option": {"threads": "0"},
"format": "yuv420p",
}
video = StreamReader(src=src)
video.add_basic_video_stream(1, **decode_config)
stream = video.stream()
frame, = next(stream)
print(frame.device, frame.shape, frame.dtype)
return frame
if __name__ == "__main__":
root_dir = Path(__file__).parent
test_videos = [
'test_hevc_hdr.mp4',
'test_hevc_sdr.mp4',
'test_h264_sdr.mp4'
]
decoders = [
'hevc',
'hevc',
'h264'
]
devices = [
'cpu',
'cuda'
]
for test_video, decoder in zip(test_videos, decoders):
for device in devices:
src_path = root_dir / test_video
test_func(str(src_path), decoder, device) Then I built the docker file in the same folder by running After the docker image is built, I started a container by In the 2.0.1
2.0.2
{'libavutil': (56, 70, 100), 'libavcodec': (58, 134, 100), 'libavformat': (58, 76, 100), 'libavfilter': (7, 110, 100), 'libavdevice': (58, 13, 100)}
--prefix=/opt/conda/envs/torchenv --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-cc --cxx=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-c++ --nm=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-nm --ar=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-ar --disable-doc --disable-openssl --enable-avresample --enable-demuxer=dash --enable-hardcoded-tables --enable-libfreetype --enable-libfontconfig --enable-libopenh264 --enable-gnutls --enable-libmp3lame --enable-libvpx --enable-pthreads --enable-vaapi --enable-gpl --enable-libx264 --enable-libx265 --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-pic --enable-shared --disable-static --enable-version3 --enable-zlib --pkg-config=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/pkg-config
['av1_cuvid', 'h264_cuvid', 'hevc_cuvid', 'mjpeg_cuvid', 'mpeg1_cuvid', 'mpeg2_cuvid', 'mpeg4_cuvid', 'vc1_cuvid', 'vp8_cuvid', 'vp9_cuvid'] which looks okay. Then I ran cpu torch.Size([1, 3, 480, 640]) torch.uint8
Segmentation fault (core dumped) This dockerfile installed the stable release |
@w238liu - thanks for reproduction. I will take a look. (unfortunately I don't have an easy access to GPU + docker environment) meanwhile I updated the mechanism for ffmepg integration, and now torchaudio works with FFmpeg 4, 5 and 6. |
Feel free to re-open if the issue persists. |
🐛 Describe the bug
I am trying to use the nightly build to have a taste on this feature #3332 . However, I could not figure out which ffmpeg version is compatible with the nightly build.
According to issue #3269 , I first installed ffmpeg with
conda install ffmpeg=5.1.2 -c conda-forge
, and then installedtorchaudio
byconda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch-nightly -c nvidia
. Then I ran the following scriptand got the following error message
I then in a new conda env installed ffmpeg 4.4.2 by running
conda install -y ffmpeg=4.4.2 -c conda-forge
. This time, the test script above passed. However, when I try to decode real videos, the program stopped with a Segmentation fault. Specifically, I created three test video filesand ran the following script in the same folder
The program stopped with the following message
This error didn't happen with the latest stable release. I am not sure if it's just because nightly build is not built with full functionality or there are some new code changes that I am not aware of.
Versions
For FFmpeg 5.1.2 env
Collecting environment information...
PyTorch version: 2.1.0.dev20230606
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-1017-aws-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.6.124
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A10G
Nvidia driver version: 510.73.08
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7R32
Stepping: 0
CPU MHz: 2799.946
BogoMIPS: 5599.89
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 128 KiB
L1i cache: 128 KiB
L2 cache: 2 MiB
L3 cache: 16 MiB
NUMA node0 CPU(s): 0-7
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid
Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] mypy==1.3.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.3
[pip3] pytorch-lightning==2.0.2
[pip3] torch==2.1.0.dev20230606
[pip3] torchaudio==2.1.0.dev20230606
[pip3] torchmetrics==0.11.4
[pip3] torchqa==0.2.1
[pip3] torchvision==0.16.0.dev20230606
[pip3] triton==2.1.0
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h6d00ec8_46342
[conda] mkl-service 2.4.0 py310h5eee18b_1
[conda] mkl_fft 1.3.6 py310h1128e8f_1
[conda] mkl_random 1.2.2 py310h1128e8f_1
[conda] numpy 1.24.3 py310h5f9d8c6_1
[conda] numpy-base 1.24.3 py310hb5e798b_1
[conda] pytorch 2.1.0.dev20230606 py3.10_cuda11.8_cudnn8.7.0_0 pytorch-nightly
[conda] pytorch-cuda 11.8 h7e8668a_5 pytorch-nightly
[conda] pytorch-lightning 2.0.2 pypi_0 pypi
[conda] pytorch-mutex 1.0 cuda pytorch-nightly
[conda] torchaudio 2.1.0.dev20230606 py310_cu118 pytorch-nightly
[conda] torchmetrics 0.11.4 pypi_0 pypi
[conda] torchqa 0.2.1 pypi_0 pypi
[conda] torchtriton 2.1.0+9820899b38 py310 pytorch-nightly
[conda] torchvision 0.16.0.dev20230606 py310_cu118 pytorch-nightly
For FFmpeg 4.4.2 env
Collecting environment information...
PyTorch version: 2.1.0.dev20230606
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-1017-aws-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.6.124
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A10G
Nvidia driver version: 510.73.08
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7R32
Stepping: 0
CPU MHz: 2799.946
BogoMIPS: 5599.89
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 128 KiB
L1i cache: 128 KiB
L2 cache: 2 MiB
L3 cache: 16 MiB
NUMA node0 CPU(s): 0-7
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid
Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.1.0.dev20230606
[pip3] torchaudio==2.1.0.dev20230606
[pip3] torchvision==0.16.0.dev20230606
[pip3] triton==2.1.0
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h6d00ec8_46342
[conda] mkl-service 2.4.0 py310h5eee18b_1
[conda] mkl_fft 1.3.6 py310h1128e8f_1
[conda] mkl_random 1.2.2 py310h1128e8f_1
[conda] numpy 1.24.3 py310h5f9d8c6_1
[conda] numpy-base 1.24.3 py310hb5e798b_1
[conda] pytorch 2.1.0.dev20230606 py3.10_cuda11.8_cudnn8.7.0_0 pytorch-nightly
[conda] pytorch-cuda 11.8 h7e8668a_5 pytorch-nightly
[conda] pytorch-mutex 1.0 cuda pytorch-nightly
[conda] torchaudio 2.1.0.dev20230606 py310_cu118 pytorch-nightly
[conda] torchtriton 2.1.0+9820899b38 py310 pytorch-nightly
[conda] torchvision 0.16.0.dev20230606 py310_cu118 pytorch-nightly
The text was updated successfully, but these errors were encountered: