Compatibility between nightly build and ffmpeg #3411

w238liu · 2023-06-07T02:33:27Z

🐛 Describe the bug

I am trying to use the nightly build to have a taste on this feature #3332 . However, I could not figure out which ffmpeg version is compatible with the nightly build.

According to issue #3269 , I first installed ffmpeg with conda install ffmpeg=5.1.2 -c conda-forge, and then installed torchaudio by conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch-nightly -c nvidia. Then I ran the following script

import torch
import torchaudio
from torchaudio.utils import ffmpeg_utils


print(torch.__version__)
print(torchaudio.__version__)
print(ffmpeg_utils.get_versions())
print(ffmpeg_utils.get_build_config())
print([k for k in ffmpeg_utils.get_video_decoders().keys() if 'cuvid' in k])

and got the following error message

2.1.0.dev20230606
2.1.0.dev20230606
Traceback (most recent call last):
  File "/home/ubuntu/.conda/envs/torchqa_nightly/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 134, in wrapped
    _init_ffmpeg()
  File "/home/ubuntu/.conda/envs/torchqa_nightly/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 91, in _init_ffmpeg
    torchaudio.lib._torchaudio_ffmpeg.init()
RuntimeError: Error in dlopen: /lib/x86_64-linux-gnu/libgobject-2.0.so.0: undefined symbol: ffi_type_uint32, version LIBFFI_BASE_7.0
Exception raised from DynamicLibrary at /opt/conda/conda-bld/pytorch_1686036062101/work/aten/src/ATen/DynamicLibrary.cpp:38 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fe50e5c3477 in /home/ubuntu/.conda/envs/torchqa_nightly/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xd9699c (0x7fe557f1899c in /home/ubuntu/.conda/envs/torchqa_nightly/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #2: torchaudio::io::detail::ffmpeg_stub() + 0x94 (0x7fe4f3cf0054 in /home/ubuntu/.conda/envs/torchqa_nightly/lib/python3.10/site-packages/torchaudio/lib/libtorchaudio_ffmpeg.so)
frame #3: <unknown function> + 0xef49 (0x7fe4f3c93f49 in /home/ubuntu/.conda/envs/torchqa_nightly/lib/python3.10/site-packages/torchaudio/lib/_torchaudio_ffmpeg.so)
frame #4: <unknown function> + 0x2beb7 (0x7fe4f3cb0eb7 in /home/ubuntu/.conda/envs/torchqa_nightly/lib/python3.10/site-packages/torchaudio/lib/_torchaudio_ffmpeg.so)
frame #5: python() [0x4fc887]
<omitting python frames>
frame #12: python() [0x592592]
frame #14: python() [0x5c32c7]
frame #15: python() [0x5be400]
frame #16: python() [0x4598ca]
frame #21: __libc_start_main + 0xf3 (0x7fe5b46f1083 in /lib/x86_64-linux-gnu/libc.so.6)
frame #22: python() [0x5854ee]


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/git/ssimplus-library/research/TorchQA/tmp/test_torchaudio_sr/test_torchaudio.py", line 8, in <module>
    print(ffmpeg_utils.get_versions())
  File "/home/ubuntu/.conda/envs/torchqa_nightly/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 136, in wrapped
    raise RuntimeError(
RuntimeError: get_versions requires FFmpeg extension which is not available. Please refer to the stacktrace above for how to resolve this.

I then in a new conda env installed ffmpeg 4.4.2 by running conda install -y ffmpeg=4.4.2 -c conda-forge. This time, the test script above passed. However, when I try to decode real videos, the program stopped with a Segmentation fault. Specifically, I created three test video files

ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx265 -pix_fmt yuv420p10le -vtag hvc1 -y test_hevc_hdr.mp4
ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx265 -pix_fmt yuv420p -vtag hvc1 -y test_hevc_sdr.mp4
ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx264 -pix_fmt yuv420p -vtag avc1 -y test_h264_sdr.mp4

and ran the following script in the same folder

from torchaudio.io import StreamReader
from pathlib import Path


def test_func(src: str, decoder: str, device: str = 'cpu'):
    if device == 'cuda':
        decode_config = {
            'buffer_chunk_size': 50,
            'decoder': f'{decoder}_cuvid',
            'hw_accel': 'cuda',
            "format": None,
        }
    else:
        decode_config = {
            'buffer_chunk_size': 50,
            'decoder': decoder,
            "decoder_option": {"threads": "0"},
            "format": "yuv420p",
        }

    video = StreamReader(src=src)

    video.add_basic_video_stream(1, **decode_config)

    stream = video.stream()
    frame, = next(stream)

    print(frame.device, frame.shape, frame.dtype)
    return frame


if __name__ == "__main__":
    root_dir = Path('.')
    test_videos = [
        'test_hevc_hdr.mp4',
        'test_hevc_sdr.mp4',
        'test_h264_sdr.mp4'
    ]
    decoders = [
        'hevc',
        'hevc',
        'h264'
    ]
    devices = [
        'cpu',
        'cuda'
    ]

    for test_video, decoder in zip(test_videos, decoders):
        for device in devices:
            src_path = root_dir / test_video
            test_func(str(src_path), decoder, device)

The program stopped with the following message

[W conversion.cpp:210] Warning: The output format YUV420P is selected. This will be implicitly converted to YUV444P, in which all the color components Y, U, V have the same dimension. (function operator())
Segmentation fault (core dumped)

This error didn't happen with the latest stable release. I am not sure if it's just because nightly build is not built with full functionality or there are some new code changes that I am not aware of.

Versions

For FFmpeg 5.1.2 env

Collecting environment information...
PyTorch version: 2.1.0.dev20230606
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-1017-aws-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.6.124
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A10G
Nvidia driver version: 510.73.08
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7R32
Stepping: 0
CPU MHz: 2799.946
BogoMIPS: 5599.89
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 128 KiB
L1i cache: 128 KiB
L2 cache: 2 MiB
L3 cache: 16 MiB
NUMA node0 CPU(s): 0-7
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid

Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] mypy==1.3.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.3
[pip3] pytorch-lightning==2.0.2
[pip3] torch==2.1.0.dev20230606
[pip3] torchaudio==2.1.0.dev20230606
[pip3] torchmetrics==0.11.4
[pip3] torchqa==0.2.1
[pip3] torchvision==0.16.0.dev20230606
[pip3] triton==2.1.0
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h6d00ec8_46342
[conda] mkl-service 2.4.0 py310h5eee18b_1
[conda] mkl_fft 1.3.6 py310h1128e8f_1
[conda] mkl_random 1.2.2 py310h1128e8f_1
[conda] numpy 1.24.3 py310h5f9d8c6_1
[conda] numpy-base 1.24.3 py310hb5e798b_1
[conda] pytorch 2.1.0.dev20230606 py3.10_cuda11.8_cudnn8.7.0_0 pytorch-nightly
[conda] pytorch-cuda 11.8 h7e8668a_5 pytorch-nightly
[conda] pytorch-lightning 2.0.2 pypi_0 pypi
[conda] pytorch-mutex 1.0 cuda pytorch-nightly
[conda] torchaudio 2.1.0.dev20230606 py310_cu118 pytorch-nightly
[conda] torchmetrics 0.11.4 pypi_0 pypi
[conda] torchqa 0.2.1 pypi_0 pypi
[conda] torchtriton 2.1.0+9820899b38 py310 pytorch-nightly
[conda] torchvision 0.16.0.dev20230606 py310_cu118 pytorch-nightly

For FFmpeg 4.4.2 env

Collecting environment information...
PyTorch version: 2.1.0.dev20230606
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-1017-aws-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.6.124
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A10G
Nvidia driver version: 510.73.08
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7R32
Stepping: 0
CPU MHz: 2799.946
BogoMIPS: 5599.89
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 128 KiB
L1i cache: 128 KiB
L2 cache: 2 MiB
L3 cache: 16 MiB
NUMA node0 CPU(s): 0-7
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.1.0.dev20230606
[pip3] torchaudio==2.1.0.dev20230606
[pip3] torchvision==0.16.0.dev20230606
[pip3] triton==2.1.0
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h6d00ec8_46342
[conda] mkl-service 2.4.0 py310h5eee18b_1
[conda] mkl_fft 1.3.6 py310h1128e8f_1
[conda] mkl_random 1.2.2 py310h1128e8f_1
[conda] numpy 1.24.3 py310h5f9d8c6_1
[conda] numpy-base 1.24.3 py310hb5e798b_1
[conda] pytorch 2.1.0.dev20230606 py3.10_cuda11.8_cudnn8.7.0_0 pytorch-nightly
[conda] pytorch-cuda 11.8 h7e8668a_5 pytorch-nightly
[conda] pytorch-mutex 1.0 cuda pytorch-nightly
[conda] torchaudio 2.1.0.dev20230606 py310_cu118 pytorch-nightly
[conda] torchtriton 2.1.0+9820899b38 py310 pytorch-nightly
[conda] torchvision 0.16.0.dev20230606 py310_cu118 pytorch-nightly

The text was updated successfully, but these errors were encountered:

mthrok · 2023-06-07T20:47:58Z

The upgrade to FFmpeg 5 was reverted in #3377, due to inconsistent availability. I am still figuring out the best way to support FFmpeg.

The segfault could be a regression introduced in main branch. We don't have a good CI for GPU decoder so I might have missed something. I will try to look into it.

w238liu · 2023-06-07T20:52:21Z

CPU decoder also failed with the segfault, but seems like CPU decoder was tested in the CI pipeline without any error?

To investigate pytorch#3411

mthrok · 2023-06-07T21:22:55Z

CPU decoder also failed with the segfault, but seems like CPU decoder was tested in the CI pipeline without any error?

Yeah, and I tested it on my macbook pro, and it works fine. I have two hypothesis on this. 1 is some issue with FFmpeg you installed and 2 is the dlopen I introduced the last week.

To rule out 2, I made #3418. I will land it before tomorrow so that this feature is turned off in tomorrow's nightly build, and I would like ask you to try again and see if CPU decoder works and GPU decoder throws an error instead of segfault.

Summary: To investigate #3411 Pull Request resolved: #3418 Differential Revision: D46535891 Pulled By: mthrok fbshipit-source-id: b90bba399eb54f9f0ae073bd590cd8a46054ed7e

w238liu · 2023-06-08T20:08:09Z

CPU decoder also failed with the segfault, but seems like CPU decoder was tested in the CI pipeline without any error?

Yeah, and I tested it on my macbook pro, and it works fine. I have two hypothesis on this. 1 is some issue with FFmpeg you installed and 2 is the dlopen I introduced the last week.

To rule out 2, I made #3418. I will land it before tomorrow so that this feature is turned off in tomorrow's nightly build, and I would like ask you to try again and see if CPU decoder works and GPU decoder throws an error instead of segfault.

So I tested with the new nightly build and I still got the segfault for both both cpu and cuda decoders. See below for the test environment

Collecting environment information...
PyTorch version: 2.1.0.dev20230608
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-1017-aws-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.6.124
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A10G
Nvidia driver version: 510.73.08
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7R32
Stepping:                        0
CPU MHz:                         3002.855
BogoMIPS:                        5599.58
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       128 KiB
L1i cache:                       128 KiB
L2 cache:                        2 MiB
L3 cache:                        16 MiB
NUMA node0 CPU(s):               0-7
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.1.0.dev20230608
[pip3] torchaudio==2.1.0.dev20230608
[pip3] torchvision==0.16.0.dev20230608
[pip3] triton==2.1.0
[conda] blas                      1.0                         mkl  
[conda] mkl                       2023.1.0         h6d00ec8_46342  
[conda] mkl-service               2.4.0           py310h5eee18b_1  
[conda] mkl_fft                   1.3.6           py310h1128e8f_1  
[conda] mkl_random                1.2.2           py310h1128e8f_1  
[conda] numpy                     1.24.3          py310h5f9d8c6_1  
[conda] numpy-base                1.24.3          py310hb5e798b_1  
[conda] pytorch                   2.1.0.dev20230608 py3.10_cuda11.8_cudnn8.7.0_0    pytorch-nightly
[conda] pytorch-cuda              11.8                 h7e8668a_5    pytorch-nightly
[conda] pytorch-mutex             1.0                        cuda    pytorch-nightly
[conda] torchaudio                2.1.0.dev20230608     py310_cu118    pytorch-nightly
[conda] torchtriton               2.1.0+9820899b38           py310    pytorch-nightly
[conda] torchvision               0.16.0.dev20230608     py310_cu118    pytorch-nightly

I also printed the version of ffmpeg by conda list ffmpeg, and the output is

# packages in environment at /home/ubuntu/.conda/envs/torchqa_nightly:
#
# Name                    Version                   Build  Channel
ffmpeg                    4.4.2           gpl_h8dda1f0_112    conda-forge

I also tested the latest stable release of torchaudio (2.0.2-py310_cu118) with the same version of ffmpeg, and the StreamReader works well.

mthrok · 2023-06-08T22:26:41Z

CPU decoder also failed with the segfault, but seems like CPU decoder was tested in the CI pipeline without any error?

Yeah, and I tested it on my macbook pro, and it works fine. I have two hypothesis on this. 1 is some issue with FFmpeg you installed and 2 is the dlopen I introduced the last week.
To rule out 2, I made #3418. I will land it before tomorrow so that this feature is turned off in tomorrow's nightly build, and I would like ask you to try again and see if CPU decoder works and GPU decoder throws an error instead of segfault.

So I tested with the new nightly build and I still got the segfault for both both cpu and cuda decoders. See below for the test environment

I also tested the latest stable release of torchaudio (2.0.2-py310_cu118) with the same version of ffmpeg, and the StreamReader works well.

Thanks for trying. That is strange. I tried your repro script (BTW thanks for the complete repro script), in Windows and it worked fine.

How did you install the ffmpeg? I see the particular build is listed as cf-staging, but I don't know how to install it. regular conda install -c conda-forge ffmpeg does not pick it.

w238liu · 2023-06-09T00:03:09Z

How did you install the ffmpeg? I see the particular build is listed as cf-staging, but I don't know how to install it. regular conda install -c conda-forge ffmpeg does not pick it.

I installed ffmpeg by conda install -y ffmpeg=4.4.2 -c conda-forge.

How do you normally install ffmpeg?

mthrok · 2023-06-09T09:43:25Z

How did you install the ffmpeg? I see the particular build is listed as cf-staging, but I don't know how to install it. regular conda install -c conda-forge ffmpeg does not pick it.

I installed ffmpeg by conda install -y ffmpeg=4.4.2 -c conda-forge.

How do you normally install ffmpeg?

I use the same command but it never picks up those packages from cf-staging.

w238liu · 2023-06-09T12:59:08Z

How did you install the ffmpeg? I see the particular build is listed as cf-staging, but I don't know how to install it. regular conda install -c conda-forge ffmpeg does not pick it.

I installed ffmpeg by conda install -y ffmpeg=4.4.2 -c conda-forge.
How do you normally install ffmpeg?

I use the same command but it never picks up those packages from cf-staging.

To pick up exactly the same ffmpeg, you could try conda install ffmpeg=4.4.2=gpl_h8dda1f0_112 -c conda-forge

BTW, I ran the following three commands to create my conda environemnt for the test.

conda create -n test python=3.10
conda install -y ffmpeg=4.4.2 -c conda-forge
conda install -y pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch-nightly -c nvidia

w238liu · 2023-06-16T14:56:06Z

@mthrok Hi, are you able to reproduce the error in the end? If no, do you need me to give you a dockerfile to reproduce the environment?

mthrok · 2023-06-16T19:59:34Z

@mthrok Hi, are you able to reproduce the error in the end? If no, do you need me to give you a dockerfile to reproduce the environment?

Hi - Sorry, I have not gotten the time to look into it yet. Yes, Docker-based repro would be nice. Thank you

w238liu · 2023-07-20T20:34:17Z

@mthrok Hi, I created a docker file named test.dockerfile as below to reproduce the segmentation fault I encountered.

FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04

# install downloader
RUN apt update &&\
    apt -y upgrade &&\
    apt -y install wget

# install conda
ENV CONDA_DIR /opt/conda
RUN wget -P /media/ https://repo.anaconda.com/archive/Anaconda3-2023.07-1-Linux-x86_64.sh &&\
    /bin/bash /media/Anaconda3-2023.07-1-Linux-x86_64.sh -b -p ${CONDA_DIR}
ENV PATH=${CONDA_DIR}/bin:$PATH

# install torchaudio environment
RUN conda create -y -n torchenv python=3.10

RUN echo "source activate torchenv" > ~/.bashrc &&\
    echo "export PATH=/opt/conda/envs/torchenv/bin:$PATH"

SHELL [ "conda", "run", "-n", "torchenv", "/bin/bash", "-c"]
RUN conda install -y ffmpeg=4.4.2 -c conda-forge &&\
    conda install -y pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

# change working directory
WORKDIR /app

# copy test scripts and generate test data
COPY test_01.py test_02.py ./
RUN ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx265 -pix_fmt yuv420p10le -vtag hvc1 -y test_hevc_hdr.mp4 &&\
    ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx265 -pix_fmt yuv420p -vtag hvc1 -y test_hevc_sdr.mp4 &&\
    ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx264 -pix_fmt yuv420p -vtag avc1 -y test_h264_sdr.mp4

I also put the two following test scripts in the same folder of the docker file.

test_01.py

import torch
import torchaudio
from torchaudio.utils import ffmpeg_utils


print(torch.__version__)
print(torchaudio.__version__)
print(ffmpeg_utils.get_versions())
print(ffmpeg_utils.get_build_config())
print([k for k in ffmpeg_utils.get_video_decoders().keys() if 'cuvid' in k])

test_02.py

from torchaudio.io import StreamReader
from pathlib import Path


def test_func(src: str, decoder: str, device: str = 'cpu'):
    if device == 'cuda':
        decode_config = {
            'buffer_chunk_size': 50,
            'decoder': f'{decoder}_cuvid',
            'hw_accel': 'cuda',
            "format": None,
        }
    else:
        decode_config = {
            'buffer_chunk_size': 50,
            'decoder': decoder,
            "decoder_option": {"threads": "0"},
            "format": "yuv420p",
        }

    video = StreamReader(src=src)

    video.add_basic_video_stream(1, **decode_config)

    stream = video.stream()
    frame, = next(stream)

    print(frame.device, frame.shape, frame.dtype)
    return frame


if __name__ == "__main__":
    root_dir = Path(__file__).parent
    test_videos = [
        'test_hevc_hdr.mp4',
        'test_hevc_sdr.mp4',
        'test_h264_sdr.mp4'
    ]
    decoders = [
        'hevc',
        'hevc',
        'h264'
    ]
    devices = [
        'cpu',
        'cuda'
    ]

    for test_video, decoder in zip(test_videos, decoders):
        for device in devices:
            src_path = root_dir / test_video
            test_func(str(src_path), decoder, device)

Then I built the docker file in the same folder by running
docker build -t 11.8.0-cudnn8-ubuntu20.04:test -f test.dockerfile .

After the docker image is built, I started a container by
docker run -it --gpus all 11.8.0-cudnn8-ubuntu20.04:test /bin/bash

In the /app folder, I directly ran python test_01.py, and got the following

2.0.1
2.0.2
{'libavutil': (56, 70, 100), 'libavcodec': (58, 134, 100), 'libavformat': (58, 76, 100), 'libavfilter': (7, 110, 100), 'libavdevice': (58, 13, 100)}
--prefix=/opt/conda/envs/torchenv --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-cc --cxx=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-c++ --nm=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-nm --ar=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-ar --disable-doc --disable-openssl --enable-avresample --enable-demuxer=dash --enable-hardcoded-tables --enable-libfreetype --enable-libfontconfig --enable-libopenh264 --enable-gnutls --enable-libmp3lame --enable-libvpx --enable-pthreads --enable-vaapi --enable-gpl --enable-libx264 --enable-libx265 --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-pic --enable-shared --disable-static --enable-version3 --enable-zlib --pkg-config=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/pkg-config
['av1_cuvid', 'h264_cuvid', 'hevc_cuvid', 'mjpeg_cuvid', 'mpeg1_cuvid', 'mpeg2_cuvid', 'mpeg4_cuvid', 'vc1_cuvid', 'vp8_cuvid', 'vp9_cuvid']

which looks okay.

Then I ran python test_02.py, and it errored out with

cpu torch.Size([1, 3, 480, 640]) torch.uint8
Segmentation fault (core dumped)

This dockerfile installed the stable release torchaudio 2.0.2, and it also errored out. Probably it's not an issue of the nightly build's which I suspected before. And the same conda environment works perfectly on my ec2 machine. Probably I am missing some necessary libraries in the docker image? Do you see any problem in the dockerfile?

mthrok · 2023-07-25T12:44:14Z

@w238liu - thanks for reproduction. I will take a look. (unfortunately I don't have an easy access to GPU + docker environment)

meanwhile I updated the mechanism for ffmepg integration, and now torchaudio works with FFmpeg 4, 5 and 6.
Can you try the new nightly and other FFmpeg versions, such as FFmpeg 6?

mthrok · 2023-08-16T15:06:49Z

@w238liu I updated the build process in #3561 and FFmpeg 4.4 should now work (and we dropped the support for 4.3, 4.2 and 4.2)

mthrok · 2023-08-16T15:07:05Z

Feel free to re-open if the issue persists.

mthrok added a commit to mthrok/audio that referenced this issue Jun 7, 2023

Make dlopen ffmpeg default off

17a35bb

To investigate pytorch#3411

mthrok mentioned this issue Jun 7, 2023

Make dlopen ffmpeg default off #3418

Closed

facebook-github-bot pushed a commit that referenced this issue Jun 7, 2023

Make dlopen ffmpeg default off (#3418)

91db978

Summary: To investigate #3411 Pull Request resolved: #3418 Differential Revision: D46535891 Pulled By: mthrok fbshipit-source-id: b90bba399eb54f9f0ae073bd590cd8a46054ed7e

zanussbaum mentioned this issue Jul 4, 2023

Segmentation Fault #3453

Closed

mthrok closed this as completed Aug 16, 2023

w238liu mentioned this issue Nov 13, 2023

[bug] SageMaker Pytorch image has compatibility issues between ffmpeg version and torchaudio.io.StreamReader aws/deep-learning-containers#3069

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compatibility between nightly build and ffmpeg #3411

Compatibility between nightly build and ffmpeg #3411

w238liu commented Jun 7, 2023 •

edited

Loading

mthrok commented Jun 7, 2023

w238liu commented Jun 7, 2023 •

edited

Loading

mthrok commented Jun 7, 2023

w238liu commented Jun 8, 2023

mthrok commented Jun 8, 2023

w238liu commented Jun 9, 2023 •

edited

Loading

mthrok commented Jun 9, 2023

w238liu commented Jun 9, 2023

w238liu commented Jun 16, 2023

mthrok commented Jun 16, 2023

w238liu commented Jul 20, 2023

mthrok commented Jul 25, 2023

mthrok commented Aug 16, 2023

mthrok commented Aug 16, 2023

Compatibility between nightly build and ffmpeg #3411

Compatibility between nightly build and ffmpeg #3411

Comments

w238liu commented Jun 7, 2023 • edited Loading

🐛 Describe the bug

Versions

For FFmpeg 5.1.2 env

For FFmpeg 4.4.2 env

mthrok commented Jun 7, 2023

w238liu commented Jun 7, 2023 • edited Loading

mthrok commented Jun 7, 2023

w238liu commented Jun 8, 2023

mthrok commented Jun 8, 2023

w238liu commented Jun 9, 2023 • edited Loading

mthrok commented Jun 9, 2023

w238liu commented Jun 9, 2023

w238liu commented Jun 16, 2023

mthrok commented Jun 16, 2023

w238liu commented Jul 20, 2023

mthrok commented Jul 25, 2023

mthrok commented Aug 16, 2023

mthrok commented Aug 16, 2023

w238liu commented Jun 7, 2023 •

edited

Loading

w238liu commented Jun 7, 2023 •

edited

Loading

w238liu commented Jun 9, 2023 •

edited

Loading