Skip to content

Compatibility between nightly build and ffmpeg #3411

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
w238liu opened this issue Jun 7, 2023 · 14 comments
Closed

Compatibility between nightly build and ffmpeg #3411

w238liu opened this issue Jun 7, 2023 · 14 comments

Comments

@w238liu
Copy link

w238liu commented Jun 7, 2023

🐛 Describe the bug

I am trying to use the nightly build to have a taste on this feature #3332 . However, I could not figure out which ffmpeg version is compatible with the nightly build.

According to issue #3269 , I first installed ffmpeg with conda install ffmpeg=5.1.2 -c conda-forge, and then installed torchaudio by conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch-nightly -c nvidia. Then I ran the following script

import torch
import torchaudio
from torchaudio.utils import ffmpeg_utils


print(torch.__version__)
print(torchaudio.__version__)
print(ffmpeg_utils.get_versions())
print(ffmpeg_utils.get_build_config())
print([k for k in ffmpeg_utils.get_video_decoders().keys() if 'cuvid' in k])

and got the following error message

2.1.0.dev20230606
2.1.0.dev20230606
Traceback (most recent call last):
  File "/home/ubuntu/.conda/envs/torchqa_nightly/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 134, in wrapped
    _init_ffmpeg()
  File "/home/ubuntu/.conda/envs/torchqa_nightly/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 91, in _init_ffmpeg
    torchaudio.lib._torchaudio_ffmpeg.init()
RuntimeError: Error in dlopen: /lib/x86_64-linux-gnu/libgobject-2.0.so.0: undefined symbol: ffi_type_uint32, version LIBFFI_BASE_7.0
Exception raised from DynamicLibrary at /opt/conda/conda-bld/pytorch_1686036062101/work/aten/src/ATen/DynamicLibrary.cpp:38 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fe50e5c3477 in /home/ubuntu/.conda/envs/torchqa_nightly/lib/python3.10/site-packages/torch/lib/libc10.so)
frame #1: <unknown function> + 0xd9699c (0x7fe557f1899c in /home/ubuntu/.conda/envs/torchqa_nightly/lib/python3.10/site-packages/torch/lib/libtorch_cpu.so)
frame #2: torchaudio::io::detail::ffmpeg_stub() + 0x94 (0x7fe4f3cf0054 in /home/ubuntu/.conda/envs/torchqa_nightly/lib/python3.10/site-packages/torchaudio/lib/libtorchaudio_ffmpeg.so)
frame #3: <unknown function> + 0xef49 (0x7fe4f3c93f49 in /home/ubuntu/.conda/envs/torchqa_nightly/lib/python3.10/site-packages/torchaudio/lib/_torchaudio_ffmpeg.so)
frame #4: <unknown function> + 0x2beb7 (0x7fe4f3cb0eb7 in /home/ubuntu/.conda/envs/torchqa_nightly/lib/python3.10/site-packages/torchaudio/lib/_torchaudio_ffmpeg.so)
frame #5: python() [0x4fc887]
<omitting python frames>
frame #12: python() [0x592592]
frame #14: python() [0x5c32c7]
frame #15: python() [0x5be400]
frame #16: python() [0x4598ca]
frame #21: __libc_start_main + 0xf3 (0x7fe5b46f1083 in /lib/x86_64-linux-gnu/libc.so.6)
frame #22: python() [0x5854ee]


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ubuntu/git/ssimplus-library/research/TorchQA/tmp/test_torchaudio_sr/test_torchaudio.py", line 8, in <module>
    print(ffmpeg_utils.get_versions())
  File "/home/ubuntu/.conda/envs/torchqa_nightly/lib/python3.10/site-packages/torchaudio/_extension/utils.py", line 136, in wrapped
    raise RuntimeError(
RuntimeError: get_versions requires FFmpeg extension which is not available. Please refer to the stacktrace above for how to resolve this.

I then in a new conda env installed ffmpeg 4.4.2 by running conda install -y ffmpeg=4.4.2 -c conda-forge. This time, the test script above passed. However, when I try to decode real videos, the program stopped with a Segmentation fault. Specifically, I created three test video files

ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx265 -pix_fmt yuv420p10le -vtag hvc1 -y test_hevc_hdr.mp4
ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx265 -pix_fmt yuv420p -vtag hvc1 -y test_hevc_sdr.mp4
ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx264 -pix_fmt yuv420p -vtag avc1 -y test_h264_sdr.mp4

and ran the following script in the same folder

from torchaudio.io import StreamReader
from pathlib import Path


def test_func(src: str, decoder: str, device: str = 'cpu'):
    if device == 'cuda':
        decode_config = {
            'buffer_chunk_size': 50,
            'decoder': f'{decoder}_cuvid',
            'hw_accel': 'cuda',
            "format": None,
        }
    else:
        decode_config = {
            'buffer_chunk_size': 50,
            'decoder': decoder,
            "decoder_option": {"threads": "0"},
            "format": "yuv420p",
        }

    video = StreamReader(src=src)

    video.add_basic_video_stream(1, **decode_config)

    stream = video.stream()
    frame, = next(stream)

    print(frame.device, frame.shape, frame.dtype)
    return frame


if __name__ == "__main__":
    root_dir = Path('.')
    test_videos = [
        'test_hevc_hdr.mp4',
        'test_hevc_sdr.mp4',
        'test_h264_sdr.mp4'
    ]
    decoders = [
        'hevc',
        'hevc',
        'h264'
    ]
    devices = [
        'cpu',
        'cuda'
    ]

    for test_video, decoder in zip(test_videos, decoders):
        for device in devices:
            src_path = root_dir / test_video
            test_func(str(src_path), decoder, device)

The program stopped with the following message

[W conversion.cpp:210] Warning: The output format YUV420P is selected. This will be implicitly converted to YUV444P, in which all the color components Y, U, V have the same dimension. (function operator())
Segmentation fault (core dumped)

This error didn't happen with the latest stable release. I am not sure if it's just because nightly build is not built with full functionality or there are some new code changes that I am not aware of.

Versions

For FFmpeg 5.1.2 env

Collecting environment information...
PyTorch version: 2.1.0.dev20230606
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-1017-aws-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.6.124
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A10G
Nvidia driver version: 510.73.08
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7R32
Stepping: 0
CPU MHz: 2799.946
BogoMIPS: 5599.89
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 128 KiB
L1i cache: 128 KiB
L2 cache: 2 MiB
L3 cache: 16 MiB
NUMA node0 CPU(s): 0-7
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid

Versions of relevant libraries:
[pip3] flake8==6.0.0
[pip3] mypy==1.3.0
[pip3] mypy-extensions==1.0.0
[pip3] numpy==1.24.3
[pip3] pytorch-lightning==2.0.2
[pip3] torch==2.1.0.dev20230606
[pip3] torchaudio==2.1.0.dev20230606
[pip3] torchmetrics==0.11.4
[pip3] torchqa==0.2.1
[pip3] torchvision==0.16.0.dev20230606
[pip3] triton==2.1.0
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h6d00ec8_46342
[conda] mkl-service 2.4.0 py310h5eee18b_1
[conda] mkl_fft 1.3.6 py310h1128e8f_1
[conda] mkl_random 1.2.2 py310h1128e8f_1
[conda] numpy 1.24.3 py310h5f9d8c6_1
[conda] numpy-base 1.24.3 py310hb5e798b_1
[conda] pytorch 2.1.0.dev20230606 py3.10_cuda11.8_cudnn8.7.0_0 pytorch-nightly
[conda] pytorch-cuda 11.8 h7e8668a_5 pytorch-nightly
[conda] pytorch-lightning 2.0.2 pypi_0 pypi
[conda] pytorch-mutex 1.0 cuda pytorch-nightly
[conda] torchaudio 2.1.0.dev20230606 py310_cu118 pytorch-nightly
[conda] torchmetrics 0.11.4 pypi_0 pypi
[conda] torchqa 0.2.1 pypi_0 pypi
[conda] torchtriton 2.1.0+9820899b38 py310 pytorch-nightly
[conda] torchvision 0.16.0.dev20230606 py310_cu118 pytorch-nightly

For FFmpeg 4.4.2 env

Collecting environment information...
PyTorch version: 2.1.0.dev20230606
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-1017-aws-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.6.124
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A10G
Nvidia driver version: 510.73.08
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7R32
Stepping: 0
CPU MHz: 2799.946
BogoMIPS: 5599.89
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 128 KiB
L1i cache: 128 KiB
L2 cache: 2 MiB
L3 cache: 16 MiB
NUMA node0 CPU(s): 0-7
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.1.0.dev20230606
[pip3] torchaudio==2.1.0.dev20230606
[pip3] torchvision==0.16.0.dev20230606
[pip3] triton==2.1.0
[conda] blas 1.0 mkl
[conda] mkl 2023.1.0 h6d00ec8_46342
[conda] mkl-service 2.4.0 py310h5eee18b_1
[conda] mkl_fft 1.3.6 py310h1128e8f_1
[conda] mkl_random 1.2.2 py310h1128e8f_1
[conda] numpy 1.24.3 py310h5f9d8c6_1
[conda] numpy-base 1.24.3 py310hb5e798b_1
[conda] pytorch 2.1.0.dev20230606 py3.10_cuda11.8_cudnn8.7.0_0 pytorch-nightly
[conda] pytorch-cuda 11.8 h7e8668a_5 pytorch-nightly
[conda] pytorch-mutex 1.0 cuda pytorch-nightly
[conda] torchaudio 2.1.0.dev20230606 py310_cu118 pytorch-nightly
[conda] torchtriton 2.1.0+9820899b38 py310 pytorch-nightly
[conda] torchvision 0.16.0.dev20230606 py310_cu118 pytorch-nightly

@mthrok
Copy link
Collaborator

mthrok commented Jun 7, 2023

The upgrade to FFmpeg 5 was reverted in #3377, due to inconsistent availability. I am still figuring out the best way to support FFmpeg.

The segfault could be a regression introduced in main branch. We don't have a good CI for GPU decoder so I might have missed something. I will try to look into it.

@w238liu
Copy link
Author

w238liu commented Jun 7, 2023

CPU decoder also failed with the segfault, but seems like CPU decoder was tested in the CI pipeline without any error?

mthrok added a commit to mthrok/audio that referenced this issue Jun 7, 2023
@mthrok
Copy link
Collaborator

mthrok commented Jun 7, 2023

CPU decoder also failed with the segfault, but seems like CPU decoder was tested in the CI pipeline without any error?

Yeah, and I tested it on my macbook pro, and it works fine. I have two hypothesis on this. 1 is some issue with FFmpeg you installed and 2 is the dlopen I introduced the last week.

To rule out 2, I made #3418. I will land it before tomorrow so that this feature is turned off in tomorrow's nightly build, and I would like ask you to try again and see if CPU decoder works and GPU decoder throws an error instead of segfault.

facebook-github-bot pushed a commit that referenced this issue Jun 7, 2023
Summary:
To investigate #3411

Pull Request resolved: #3418

Differential Revision: D46535891

Pulled By: mthrok

fbshipit-source-id: b90bba399eb54f9f0ae073bd590cd8a46054ed7e
@w238liu
Copy link
Author

w238liu commented Jun 8, 2023

CPU decoder also failed with the segfault, but seems like CPU decoder was tested in the CI pipeline without any error?

Yeah, and I tested it on my macbook pro, and it works fine. I have two hypothesis on this. 1 is some issue with FFmpeg you installed and 2 is the dlopen I introduced the last week.

To rule out 2, I made #3418. I will land it before tomorrow so that this feature is turned off in tomorrow's nightly build, and I would like ask you to try again and see if CPU decoder works and GPU decoder throws an error instead of segfault.

So I tested with the new nightly build and I still got the segfault for both both cpu and cuda decoders. See below for the test environment

Collecting environment information...
PyTorch version: 2.1.0.dev20230608
Is debug build: False
CUDA used to build PyTorch: 11.8
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.6 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.10.11 (main, Apr 20 2023, 19:02:41) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-1017-aws-x86_64-with-glibc2.31
Is CUDA available: True
CUDA runtime version: 11.6.124
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA A10G
Nvidia driver version: 510.73.08
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   48 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7R32
Stepping:                        0
CPU MHz:                         3002.855
BogoMIPS:                        5599.58
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       128 KiB
L1i cache:                       128 KiB
L2 cache:                        2 MiB
L3 cache:                        16 MiB
NUMA node0 CPU(s):               0-7
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Mitigation; untrained return thunk; SMT enabled with STIBP protection
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP always-on, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext ssbd ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr rdpru wbnoinvd arat npt nrip_save rdpid

Versions of relevant libraries:
[pip3] numpy==1.24.3
[pip3] torch==2.1.0.dev20230608
[pip3] torchaudio==2.1.0.dev20230608
[pip3] torchvision==0.16.0.dev20230608
[pip3] triton==2.1.0
[conda] blas                      1.0                         mkl  
[conda] mkl                       2023.1.0         h6d00ec8_46342  
[conda] mkl-service               2.4.0           py310h5eee18b_1  
[conda] mkl_fft                   1.3.6           py310h1128e8f_1  
[conda] mkl_random                1.2.2           py310h1128e8f_1  
[conda] numpy                     1.24.3          py310h5f9d8c6_1  
[conda] numpy-base                1.24.3          py310hb5e798b_1  
[conda] pytorch                   2.1.0.dev20230608 py3.10_cuda11.8_cudnn8.7.0_0    pytorch-nightly
[conda] pytorch-cuda              11.8                 h7e8668a_5    pytorch-nightly
[conda] pytorch-mutex             1.0                        cuda    pytorch-nightly
[conda] torchaudio                2.1.0.dev20230608     py310_cu118    pytorch-nightly
[conda] torchtriton               2.1.0+9820899b38           py310    pytorch-nightly
[conda] torchvision               0.16.0.dev20230608     py310_cu118    pytorch-nightly

I also printed the version of ffmpeg by conda list ffmpeg, and the output is

# packages in environment at /home/ubuntu/.conda/envs/torchqa_nightly:
#
# Name                    Version                   Build  Channel
ffmpeg                    4.4.2           gpl_h8dda1f0_112    conda-forge

I also tested the latest stable release of torchaudio (2.0.2-py310_cu118) with the same version of ffmpeg, and the StreamReader works well.

@mthrok
Copy link
Collaborator

mthrok commented Jun 8, 2023

CPU decoder also failed with the segfault, but seems like CPU decoder was tested in the CI pipeline without any error?

Yeah, and I tested it on my macbook pro, and it works fine. I have two hypothesis on this. 1 is some issue with FFmpeg you installed and 2 is the dlopen I introduced the last week.
To rule out 2, I made #3418. I will land it before tomorrow so that this feature is turned off in tomorrow's nightly build, and I would like ask you to try again and see if CPU decoder works and GPU decoder throws an error instead of segfault.

So I tested with the new nightly build and I still got the segfault for both both cpu and cuda decoders. See below for the test environment

I also tested the latest stable release of torchaudio (2.0.2-py310_cu118) with the same version of ffmpeg, and the StreamReader works well.

Thanks for trying. That is strange. I tried your repro script (BTW thanks for the complete repro script), in Windows and it worked fine.

How did you install the ffmpeg? I see the particular build is listed as cf-staging, but I don't know how to install it. regular conda install -c conda-forge ffmpeg does not pick it.

@w238liu
Copy link
Author

w238liu commented Jun 9, 2023

How did you install the ffmpeg? I see the particular build is listed as cf-staging, but I don't know how to install it. regular conda install -c conda-forge ffmpeg does not pick it.

I installed ffmpeg by conda install -y ffmpeg=4.4.2 -c conda-forge.

How do you normally install ffmpeg?

@mthrok
Copy link
Collaborator

mthrok commented Jun 9, 2023

How did you install the ffmpeg? I see the particular build is listed as cf-staging, but I don't know how to install it. regular conda install -c conda-forge ffmpeg does not pick it.

I installed ffmpeg by conda install -y ffmpeg=4.4.2 -c conda-forge.

How do you normally install ffmpeg?

I use the same command but it never picks up those packages from cf-staging.

@w238liu
Copy link
Author

w238liu commented Jun 9, 2023

How did you install the ffmpeg? I see the particular build is listed as cf-staging, but I don't know how to install it. regular conda install -c conda-forge ffmpeg does not pick it.

I installed ffmpeg by conda install -y ffmpeg=4.4.2 -c conda-forge.
How do you normally install ffmpeg?

I use the same command but it never picks up those packages from cf-staging.

To pick up exactly the same ffmpeg, you could try conda install ffmpeg=4.4.2=gpl_h8dda1f0_112 -c conda-forge

BTW, I ran the following three commands to create my conda environemnt for the test.

conda create -n test python=3.10
conda install -y ffmpeg=4.4.2 -c conda-forge
conda install -y pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch-nightly -c nvidia

@w238liu
Copy link
Author

w238liu commented Jun 16, 2023

@mthrok Hi, are you able to reproduce the error in the end? If no, do you need me to give you a dockerfile to reproduce the environment?

@mthrok
Copy link
Collaborator

mthrok commented Jun 16, 2023

@mthrok Hi, are you able to reproduce the error in the end? If no, do you need me to give you a dockerfile to reproduce the environment?

Hi - Sorry, I have not gotten the time to look into it yet. Yes, Docker-based repro would be nice. Thank you

@w238liu
Copy link
Author

w238liu commented Jul 20, 2023

@mthrok Hi, I created a docker file named test.dockerfile as below to reproduce the segmentation fault I encountered.

FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04

# install downloader
RUN apt update &&\
    apt -y upgrade &&\
    apt -y install wget

# install conda
ENV CONDA_DIR /opt/conda
RUN wget -P /media/ https://repo.anaconda.com/archive/Anaconda3-2023.07-1-Linux-x86_64.sh &&\
    /bin/bash /media/Anaconda3-2023.07-1-Linux-x86_64.sh -b -p ${CONDA_DIR}
ENV PATH=${CONDA_DIR}/bin:$PATH

# install torchaudio environment
RUN conda create -y -n torchenv python=3.10

RUN echo "source activate torchenv" > ~/.bashrc &&\
    echo "export PATH=/opt/conda/envs/torchenv/bin:$PATH"

SHELL [ "conda", "run", "-n", "torchenv", "/bin/bash", "-c"]
RUN conda install -y ffmpeg=4.4.2 -c conda-forge &&\
    conda install -y pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia

# change working directory
WORKDIR /app

# copy test scripts and generate test data
COPY test_01.py test_02.py ./
RUN ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx265 -pix_fmt yuv420p10le -vtag hvc1 -y test_hevc_hdr.mp4 &&\
    ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx265 -pix_fmt yuv420p -vtag hvc1 -y test_hevc_sdr.mp4 &&\
    ffmpeg -f lavfi -i mandelbrot -t 3 -c:v libx264 -pix_fmt yuv420p -vtag avc1 -y test_h264_sdr.mp4

I also put the two following test scripts in the same folder of the docker file.

test_01.py

import torch
import torchaudio
from torchaudio.utils import ffmpeg_utils


print(torch.__version__)
print(torchaudio.__version__)
print(ffmpeg_utils.get_versions())
print(ffmpeg_utils.get_build_config())
print([k for k in ffmpeg_utils.get_video_decoders().keys() if 'cuvid' in k])

test_02.py

from torchaudio.io import StreamReader
from pathlib import Path


def test_func(src: str, decoder: str, device: str = 'cpu'):
    if device == 'cuda':
        decode_config = {
            'buffer_chunk_size': 50,
            'decoder': f'{decoder}_cuvid',
            'hw_accel': 'cuda',
            "format": None,
        }
    else:
        decode_config = {
            'buffer_chunk_size': 50,
            'decoder': decoder,
            "decoder_option": {"threads": "0"},
            "format": "yuv420p",
        }

    video = StreamReader(src=src)

    video.add_basic_video_stream(1, **decode_config)

    stream = video.stream()
    frame, = next(stream)

    print(frame.device, frame.shape, frame.dtype)
    return frame


if __name__ == "__main__":
    root_dir = Path(__file__).parent
    test_videos = [
        'test_hevc_hdr.mp4',
        'test_hevc_sdr.mp4',
        'test_h264_sdr.mp4'
    ]
    decoders = [
        'hevc',
        'hevc',
        'h264'
    ]
    devices = [
        'cpu',
        'cuda'
    ]

    for test_video, decoder in zip(test_videos, decoders):
        for device in devices:
            src_path = root_dir / test_video
            test_func(str(src_path), decoder, device)

Then I built the docker file in the same folder by running
docker build -t 11.8.0-cudnn8-ubuntu20.04:test -f test.dockerfile .

After the docker image is built, I started a container by
docker run -it --gpus all 11.8.0-cudnn8-ubuntu20.04:test /bin/bash

In the /app folder, I directly ran python test_01.py, and got the following

2.0.1
2.0.2
{'libavutil': (56, 70, 100), 'libavcodec': (58, 134, 100), 'libavformat': (58, 76, 100), 'libavfilter': (7, 110, 100), 'libavdevice': (58, 13, 100)}
--prefix=/opt/conda/envs/torchenv --cc=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-cc --cxx=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-c++ --nm=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-nm --ar=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/x86_64-conda-linux-gnu-ar --disable-doc --disable-openssl --enable-avresample --enable-demuxer=dash --enable-hardcoded-tables --enable-libfreetype --enable-libfontconfig --enable-libopenh264 --enable-gnutls --enable-libmp3lame --enable-libvpx --enable-pthreads --enable-vaapi --enable-gpl --enable-libx264 --enable-libx265 --enable-libaom --enable-libsvtav1 --enable-libxml2 --enable-pic --enable-shared --disable-static --enable-version3 --enable-zlib --pkg-config=/home/conda/feedstock_root/build_artifacts/ffmpeg_1671040255947/_build_env/bin/pkg-config
['av1_cuvid', 'h264_cuvid', 'hevc_cuvid', 'mjpeg_cuvid', 'mpeg1_cuvid', 'mpeg2_cuvid', 'mpeg4_cuvid', 'vc1_cuvid', 'vp8_cuvid', 'vp9_cuvid']

which looks okay.

Then I ran python test_02.py, and it errored out with

cpu torch.Size([1, 3, 480, 640]) torch.uint8
Segmentation fault (core dumped)

This dockerfile installed the stable release torchaudio 2.0.2, and it also errored out. Probably it's not an issue of the nightly build's which I suspected before. And the same conda environment works perfectly on my ec2 machine. Probably I am missing some necessary libraries in the docker image? Do you see any problem in the dockerfile?

@mthrok
Copy link
Collaborator

mthrok commented Jul 25, 2023

@w238liu - thanks for reproduction. I will take a look. (unfortunately I don't have an easy access to GPU + docker environment)

meanwhile I updated the mechanism for ffmepg integration, and now torchaudio works with FFmpeg 4, 5 and 6.
Can you try the new nightly and other FFmpeg versions, such as FFmpeg 6?

@mthrok
Copy link
Collaborator

mthrok commented Aug 16, 2023

@w238liu I updated the build process in #3561 and FFmpeg 4.4 should now work (and we dropped the support for 4.3, 4.2 and 4.2)

@mthrok mthrok closed this as completed Aug 16, 2023
@mthrok
Copy link
Collaborator

mthrok commented Aug 16, 2023

Feel free to re-open if the issue persists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants