Warm tip: This article is reproduced from serverfault.com, please click

Pass ffpmeg OpenCL filter output to NVenc without hwdownload?

发布于 2020-12-05 13:08:22

I'm trying to do tonemapping (and resizing) of a UHD HDR video stream with ffmpeg. The following command:

ffmpeg -vsync 0 -hwaccel cuda -init_hw_device opencl=ocl -filter_hw_device ocl 
    -threads 1 -extra_hw_frames 3 -c:v hevc_cuvid -resize 1920x1080 -i "INPUT.hevc" 
    -vf "hwupload,
         tonemap_opencl=tonemap=mobius:param=0.01:desat=0:r=tv:p=bt709:t=bt709:m=bt709:format=nv12,
         hwdownload,format=nv12,hwupload_cuda" 
    -c:v hevc_nvenc -b:v 8M "OUTPUT.hevc"

seems to work (around 200 FPS on an RTX 3080). However, I notice that it still uses one CPU core and the GPU usage is reported only as 60-70%. When I only resize without any filters I get around 400FPS with 100% GPU usage.

I suspect that the last hwdownload,format=nv12,hwupload_cuda statements are a problem, because this adds a detour through main memory. I tried just using hwupload_cuda instead without the hwdownload (like suggested here: https://stackoverflow.com/a/55747785/929037 in the filter example near the end of this answer), but then I got the following error:

Impossible to convert between the formats supported by the filter 'Parsed_tonemap_opencl_1' and the filter 'auto_scaler_0'
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #0:0

Trying to use hwmap resulted in

Assertion dst->format == AV_PIX_FMT_OPENCL failed at C:/code/ffmpeg/src/libavutil/hwcontext_opencl.c:2814

Is it possible to avoid this additional hwdownload?

Questioner
w1th0utnam3
Viewed
11
nyanmisaka 2021-01-05 03:52:35

Nope at least for now.

Zero-copy texture sharing aka hwmap filter between Cuda and OpenCL devices is not available in ffmpeg until Nvidia releases an interop method for them.

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__INTEROP.html

Intel and AMD have some OpenCL extensions for d3d11/vaapi - opencl interop. Such as cl_intel_va_api_media_sharing cl_intel_d3d11_nv12_media_sharing from Intel and cl_amd_planar_yuv from AMD.

As for Nvidia, they indeed have cl_nv_d3d11_sharing for d3d11-opencl interop but I don't think it will work well when it comes to Cuda.

Another solution is to port the tone mapping algorithm as a Cuda filter but it'll take some times. Huge speed improvement can be expected once it is finished. You can use it easily like scale_cuda or overlay_cuda filter and so on.

I have seen Intel has already supported tonemap_vaapi filter through hardware function in their latest iGPUs. Not sure if Nvidia NVENC has a similar one in their ASIC.