文章

macOS 平台浅尝 FFmpeg 调用 Vulkan 解码器

macOS 平台浅尝 FFmpeg 调用 Vulkan 解码器

回顾 FFmpeg 对 Vulkan 的支持

通过 FFmpeg 官网的 Release Note 可以看到对 Vulkan 的支持一直在继续,甚至对 Vulkan 功能改进的新版本正式发布前进行多次宣传,这是很少见的。

June 15th, 2020, FFmpeg 4.3 “4:3”

  • Support AMD AMF encoder on Linux (via Vulkan)
  • Vulkan support
  • avgblur_vulkan, overlay_vulkan, scale_vulkan and chromaber_vulkan filters

January 17th, 2022, FFmpeg 5.0 “Lorentz”

the Vulkan code was much improved.

  • vflip_vulkan, hflip_vulkan and flip_vulkan filters

February 28th, 2023, FFmpeg 6.0 “Von Neumann”

A few submitted features, such as the Vulkan improvements and more FFT optimizations will be in the next minor release, 6.1, which we plan to release soon, in line with our new release schedule.

May 31st, 2023, Vulkan decoding

A few days ago, Vulkan-powered decoding hardware acceleration code was merged into the codebase. This is the first vendor-generic and platform-generic decode acceleration API, enabling the same code to be used on multiple platforms, with very minimal overhead. This is also the first multi-threaded hardware decoding API, and our code makes full use of this, saturating all available decode engines the hardware exposes.

Those wishing to test the code can read our documentation page. For those who would like to integrate FFmpeg’s Vulkan code to demux, parse, decode, and receive a VkImage to present or manipulate, documentation and examples are available in our source tree. Currently, using the latest available git checkout of our repository is required. The functionality will be included in stable branches with the release of version 6.1, due to be released soon.

As this is also the first practical implementation of the specifications, bugs may be present, particularly in drivers, and, although passing verification, the implementation itself. New codecs, and encoding support are also being worked on, by both the Khronos organization for standardizing, and us as implementing it, and giving feedback on improving.

November 10th, 2023, FFmpeg 6.1 “Heaviside”

  • Vulkan decode hwaccel, supporting H264, HEVC and AV1
  • color_vulkan filter
  • bwdif_vulkan filter
  • nlmeans_vulkan filter
  • xfade_vulkan filter

September 30th, 2024, FFmpeg 7.1 “Péter”

Support for Vulkan encoding, with H264 and HEVC was merged. This finally allows fully Vulkan-based decode-filter-encode pipelines, by having a sink for Vulkan frames, other than downloading or displaying them. The encoders have feature-parity with their VAAPI implementation counterparts. Khronos has announced that support for AV1 encoding is also coming soon to Vulkan, and FFmpeg is aiming to have day-one support.

August 22nd, 2025, FFmpeg 8.0 “Huffman”

  • Vulkan compute-based codecs: FFv1 (encode and decode), ProRes RAW (decode only)
  • Hardware accelerated decoding: Vulkan VP9, VAAPI - VVC, OpenHarmony H264/5
  • Hardware accelerated encoding: Vulkan AV1, OpenHarmony H264/5

A new class of decoders and encoders based on pure Vulkan compute implementation have been added. Vulkan is a cross-platform, open standard set of APIs that allows programs to use GPU hardware in various ways, from drawing on screen, to doing calculations, to decoding video via custom hardware accelerators. Rather than using a custom hardware accelerator present, these codecs are based on compute shaders, and work on any implementation of Vulkan 1.3.

Decoders use the same hwaccel API and commands, so users do not need to do anything special to enable them, as enabling Vulkan decoding is sufficient to use them.

Encoders, like our hardware accelerated encoders, require specifying a new encoder (ffv1_vulkan). Currently, the only codecs supported are: FFv1 (encoding and decoding) and ProRes RAW (decode only).

这不得不让我好奇,Vulkan 是什么?可以编码也可以解码?并且 FFmpeg 很重视,始终在维护优化着。

什么是 Vulkan

简单来说,Vulkan 是由 Khronos Group(也是 OpenGL 维护者)开发的一种低开销、跨平台的 3D 图形和计算 API。

为什么会有 Vulkan?

在 Vulkan 出现之前,OpenGL 已经存在了二十多年。OpenGL 很好用,但它有一个致命弱点:驱动程序做得太多了。驱动程序会背着开发者管理内存、处理错误、同步线程,这导致了巨大的性能损耗。

Vulkan 的设计目标是:

1、显着降低 CPU 开销: 驱动程序不再干预琐事,CPU 提交渲染命令的速度极快。

2、更好的多核支持: OpenGL 几乎只能在单线程工作,而 Vulkan 天生支持多线程并行创建命令,能榨干现代多核处理器的性能。

3、直接控制 GPU: 开发者可以直接管理显存分配、指令缓存和同步。这虽然让代码量暴增(画一个三角形可能需要上千行代码),但换来的是极致的性能和稳定性。

4、跨平台统一: 它一套 API 就能跑在 Windows、Linux、Android(它是安卓官方的标准 API)甚至是通过 MoltenVK 跑在 macOS/iOS 上。

Vulkan 虽然跨平台,但在不同系统上的“深度”不同:

  • 在 Android 上,它是“亲儿子”,支持最完美。
  • 在 Linux 上,它是绝对的主角。
  • 在 Windows 上,它是 DX12 的强力竞争者。
  • 在 macOS 上,Vulkan 是“二等公民”,因为它不是苹果原生支持的,必须通过一层名为 MoltenVK 的转换层把 Vulkan 指令翻译成 Metal 指令。由于这种“翻译”的存在,一些高级特性(如硬件视频解码队列)往往会缺失。

什么是 MoltenVK

MoltenVK 基于苹果 Metal 实现了 Vulkan 1.4 图形和计算功能的分层式实现方案。

MoltenVK is a layered implementation of Vulkan 1.4 graphics and compute functionality, that is built on Apple’s Metal graphics and compute framework on macOS, iOS, tvOS, and visionOS. MoltenVK allows you to use Vulkan graphics and compute functionality to develop modern, cross-platform, high-performance graphical games and applications, and to run them across many platforms, including macOS, iOS, tvOS, visionOS, Simulators, and Mac Catalyst, and all Apple architectures, including Apple Silicon.

Metal uses a different shading language, the Metal Shading Language (MSL), than Vulkan, which uses SPIR-V. MoltenVK automatically converts your SPIR-V shaders to their MSL equivalents.

To provide Vulkan capability to the macOS, iOS, tvOS, and visionOS platforms, MoltenVK uses Apple’s publicly available API’s, including Metal. MoltenVK does not use any private or undocumented API calls or features, so your app will be compatible with all standard distribution channels, including Apple’s App Store.

The MoltenVK runtime package contains two products:

  • MoltenVK is a implementation of an almost-complete subset of the Vulkan 1.4 graphics and compute API.

  • MoltenVKShaderConverter converts SPIR-V shader code to Metal Shading Language (MSL) shader code. The converter is embedded in the MoltenVK runtime to automatically convert SPIR-V shaders to their MSL equivalents. In addition, the SPIR-V converter is packaged into a stand-alone command-line MoltenVKShaderConverter macOS tool for converting shaders at development time from the command line.

macOS 使用 MoltenVK 解码

在介绍 Vulkan 解码前先看下 FFmpeg 如何开启 VideoToolbox 硬件加速解码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
static int hw_decoder_init(AVCodecContext * ctx, const AVCodecHWConfig* config) {
    int err = 0;
    AVBufferRef *hw_device_ctx = NULL;
    if ((err = av_hwdevice_ctx_create(&hw_device_ctx, config->device_type, NULL, NULL, 0)) < 0) {
        ALOGE("create mac HW device failed for type: %d\n", config->device_type);
        return err;
    }
    //将硬件支持的图像格式传给解码器的方法
    ctx->get_format = get_hw_format;
    av_opt_set_int(ctx, "refcounted_frames", 1, 0);
    //创建hw_device_ctx传给解码器上下文,必须在avcodec_open2之前并且之后不能修改
    ctx->hw_device_ctx = hw_device_ctx;
    return err;
}

#ifdef __APPLE__
    if (avctx->codec_type == AVMEDIA_TYPE_VIDEO && !(st->disposition & AV_DISPOSITION_ATTACHED_PIC)) {
        ALOGI("videotoolbox hwaccel switch:%s\n",ffp->videotoolbox_hwaccel ? "on" : "off");
        if (ffp->videotoolbox_hwaccel) {
            enum AVHWDeviceType type = av_hwdevice_find_type_by_name("videotoolbox");
            const AVCodecHWConfig *config = NULL;
            for (int i = 0;; i++) {
                const AVCodecHWConfig *node = avcodec_get_hw_config(codec, i);
                if (!node) {
                    ALOGE("avdec %s does not support device type %s.\n",
                            codec->name, av_hwdevice_get_type_name(type));
                    break;
                }
                if (node->methods & AV_CODEC_HW_CONFIG_METHOD_HW_DEVICE_CTX && node->device_type == type) {
                    config = node;
                    break;
                }
            }
            
            if (config && hw_decoder_init(avctx, config) == 0) {
                ALOGI("try use videotoolbox accel\n");
            }
        }
    }
#endif
    if ((ret = avcodec_open2(avctx, codec, &opts)) < 0) {
        goto fail;
    }

Vulkan 同样作为一种硬件加速的解码方案,配置方式也是在 avcodec_open2 前配置好 hw_device_ctx 即可,只不过需要手动创建一个 Vulkan 设备给 FFmpeg 解码器使用:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
// 初始化 FFmpeg 的 Vulkan 硬件设备上下文,并与 libplacebo 的 Vulkan 实例共享
static AVBufferRef * vulkan_init(pl_vulkan vulkan,pl_vk_inst vkinst)
{
    AVBufferRef *hw_device_ctx = NULL;
    int ret = 0;

    // 获取 GPU 队列族属性,主要是为了找到视频解码队列
    /*
     * libplacebo initialises all queues, but we still need to discover which
     * one is the decode queue.
     */
    uint32_t num_qf = 0;
    VkQueueFamilyProperties2 *qf = NULL;
    VkQueueFamilyVideoPropertiesKHR *qf_vid = NULL;
    vkGetPhysicalDeviceQueueFamilyProperties2(vulkan->phys_device, &num_qf, NULL);
    if (!num_qf)
        goto error;

    qf = talloc_array(NULL, VkQueueFamilyProperties2, num_qf);
    qf_vid = talloc_array(NULL, VkQueueFamilyVideoPropertiesKHR, num_qf);
    for (int i = 0; i < num_qf; i++) {
        qf_vid[i] = (VkQueueFamilyVideoPropertiesKHR) {
            .sType = VK_STRUCTURE_TYPE_QUEUE_FAMILY_VIDEO_PROPERTIES_KHR,
        };
        qf[i] = (VkQueueFamilyProperties2) {
            .sType = VK_STRUCTURE_TYPE_QUEUE_FAMILY_PROPERTIES_2,
            .pNext = &qf_vid[i],
        };
    }

    vkGetPhysicalDeviceQueueFamilyProperties2(vulkan->phys_device, &num_qf, qf);

    // 创建 FFmpeg 硬件设备上下文
    hw_device_ctx = av_hwdevice_ctx_alloc(AV_HWDEVICE_TYPE_VULKAN);
    if (!hw_device_ctx)
        goto error;

    AVHWDeviceContext *device_ctx = (void *)hw_device_ctx->data;
    AVVulkanDeviceContext *device_hwctx = device_ctx->hwctx;

    // 将 libplacebo 的 Vulkan 对象手动绑定到 FFmpeg 的上下文
    device_ctx->user_opaque = (void *)vulkan;
    device_hwctx->lock_queue = lock_queue;
    device_hwctx->unlock_queue = unlock_queue;
    device_hwctx->get_proc_addr = vkinst->get_proc_addr;
    device_hwctx->inst = vkinst->instance;
    device_hwctx->phys_dev = vulkan->phys_device;
    device_hwctx->act_dev = vulkan->device;
    device_hwctx->device_features = *vulkan->features;
    device_hwctx->enabled_inst_extensions = vkinst->extensions;
    device_hwctx->nb_enabled_inst_extensions = vkinst->num_extensions;
    device_hwctx->enabled_dev_extensions = vulkan->extensions;
    device_hwctx->nb_enabled_dev_extensions = vulkan->num_extensions;

    // 配置队列分配信息(图形、传输、计算、视频解码)
#if LIBAVUTIL_VERSION_INT >= AV_VERSION_INT(59, 34, 100)
    device_hwctx->nb_qf = 0;
    device_hwctx->qf[device_hwctx->nb_qf++] = (AVVulkanDeviceQueueFamily) {
        .idx = vulkan->queue_graphics.index,
        .num = vulkan->queue_graphics.count,
        .flags = VK_QUEUE_GRAPHICS_BIT,
    };
    device_hwctx->qf[device_hwctx->nb_qf++] = (AVVulkanDeviceQueueFamily) {
        .idx = vulkan->queue_transfer.index,
        .num = vulkan->queue_transfer.count,
        .flags = VK_QUEUE_TRANSFER_BIT,
    };
    device_hwctx->qf[device_hwctx->nb_qf++] = (AVVulkanDeviceQueueFamily) {
        .idx = vulkan->queue_compute.index,
        .num = vulkan->queue_compute.count,
        .flags = VK_QUEUE_COMPUTE_BIT,
    };
    for (int i = 0; i < num_qf; i++) {
        if ((qf[i].queueFamilyProperties.queueFlags) & VK_QUEUE_VIDEO_DECODE_BIT_KHR) {
            device_hwctx->qf[device_hwctx->nb_qf++] = (AVVulkanDeviceQueueFamily) {
                .idx = i,
                .num = qf[i].queueFamilyProperties.queueCount,
                .flags = VK_QUEUE_VIDEO_DECODE_BIT_KHR,
                .video_caps = qf_vid[i].videoCodecOperations,
            };
        }
    }
#else
    int decode_index = -1;
    for (int i = 0; i < num_qf; i++) {
        if ((qf[i].queueFamilyProperties.queueFlags) & VK_QUEUE_VIDEO_DECODE_BIT_KHR)
            decode_index = i;
    }
    device_hwctx->queue_family_index = vulkan->queue_graphics.index;
    device_hwctx->nb_graphics_queues = vulkan->queue_graphics.count;
    device_hwctx->queue_family_tx_index = vulkan->queue_transfer.index;
    device_hwctx->nb_tx_queues = vulkan->queue_transfer.count;
    device_hwctx->queue_family_comp_index = vulkan->queue_compute.index;
    device_hwctx->nb_comp_queues = vulkan->queue_compute.count;
    device_hwctx->queue_family_decode_index = decode_index;
    device_hwctx->nb_decode_queues = qf[decode_index].queueFamilyProperties.queueCount;
#endif

    ret = av_hwdevice_ctx_init(hw_device_ctx);
    if (ret < 0) {
        fprintf(stderr, "av_hwdevice_ctx_init failed\n");
        goto error;
    }

    talloc_free(qf);
    talloc_free(qf_vid);
    return hw_device_ctx;

 error:
    talloc_free(qf);
    talloc_free(qf_vid);
    av_buffer_unref(&hw_device_ctx);
    return NULL;
}

AVBufferRef *my_vulkan_hwctx = vulkan_init(p->win->vulkan, p->win->vkinst);
                // 不要用 av_hwdevice_ctx_create,直接引用你之前初始化好的那个
                p->codec->hw_device_ctx = av_buffer_ref(my_vulkan_hwctx);

其他逻辑不变,运行程序发现在解码 HEVC 时报错:

1
2
3
4
5
Codec: hevc (HEVC (High Efficiency Video Coding))
Successfully bound shared Vulkan context to decoder.
Using hardware frame format: vulkan
[hevc @ 0x940406a00] Device does not support the VK_KHR_video_decode_queue extension!
[hevc @ 0x940406a00] Failed setup for format vulkan: hwaccel initialisation returned error.

查看 FFmpeg 7代的代码,确定了报错日志是在检查 Vulkan 扩展能力不具备时打印的:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
static int vulkan_decode_bootstrap(AVCodecContext *avctx, AVBufferRef *frames_ref)
{
    int err;
    FFVulkanDecodeContext *dec = avctx->internal->hwaccel_priv_data;
    AVHWFramesContext *frames = (AVHWFramesContext *)frames_ref->data;
    AVHWDeviceContext *device = (AVHWDeviceContext *)frames->device_ref->data;
    AVVulkanDeviceContext *hwctx = device->hwctx;
    FFVulkanDecodeShared *ctx;

    if (dec->shared_ctx)
        return 0;

    dec->shared_ctx = ff_refstruct_alloc_ext(sizeof(*ctx), 0, NULL,
                                             free_common);
    if (!dec->shared_ctx)
        return AVERROR(ENOMEM);

    ctx = dec->shared_ctx;

    ctx->s.extensions = ff_vk_extensions_to_mask(hwctx->enabled_dev_extensions,
                                                 hwctx->nb_enabled_dev_extensions);

    if (!(ctx->s.extensions & FF_VK_EXT_VIDEO_DECODE_QUEUE)) {
        av_log(avctx, AV_LOG_ERROR, "Device does not support the %s extension!\n",
               VK_KHR_VIDEO_DECODE_QUEUE_EXTENSION_NAME);
        ff_refstruct_unref(&dec->shared_ctx);
        return AVERROR(ENOSYS);
    }

    err = ff_vk_load_functions(device, &ctx->s.vkfn, ctx->s.extensions, 1, 1);
    if (err < 0) {
        ff_refstruct_unref(&dec->shared_ctx);
        return err;
    }

    return 0;
}

由于我的 Vulkan 是通过 MoltenVK(macOS 上的 Vulkan 转换层)提供的,他不能够提供完整的 Vulkan 能力,只实现了基于 Metal 的图形和计算的功能,而解码 HEVC 需要 VK_KHR_video_decode_queue 扩展。

MoltenVK is a implementation of an almost-complete subset of the Vulkan 1.4 graphics and compute API.

所以在 macOS 上使用 MoltenVK 开启 FFmpeg Vulkan 解码是行不通的,以后不确定有没有可能,那就要看 Khronos Group 了。

Metal 和 Vulkan 对比

Apple 自从 2018 年废弃 OpenGL 之后,一直在维护自家的 Metal ,不过 Metal 不能编码也不能解码视频,所以 Metal 和 Vulkan 不是一回事,这里强行跟 Vulkan 做个对比。

Metal 是一个纯粹的图形与计算 API。它确实可以处理视频,但仅限于“视频处理”阶段(如色彩空间转换、色调映射、HDR 渲染、后期滤镜)。

Metal 本身不包含视频编解码功能。 在 macOS/iOS 生态中,视频编解码是由另一个独立的框架 VideoToolbox 处理的。Apple 的设计哲学并不希望你通过图形 API(Metal)去直接操作解码队列。而是提供了一个高效的桥梁:CVMetalTextureCache,你可以把 VideoToolbox 解码出来的 CVPixelBuffer 映射为 Metal 的纹理,然后交给 Metal 处理。这种设计降低了耦合性。

Vulkan 的愿景是“一切皆显存,一切皆队列”。解码、渲染、计算,统统放在 Vulkan 的命令缓冲区(Command Buffer)里,由同一个驱动程序调度。这种设计非常适合 Linux/Windows 的通用驱动架构(厂商可以通过统一驱动实现),但对于 Apple 来说,他已经有了一套性能极高、功耗极其优化的私有视频引擎(VideoToolbox)。

Apple 完全没有动力去在 macOS 的驱动层(即 MoltenVK 的底层)实现复杂的 VK_KHR_video_decode 标准。

所以不能拿 Metal 去跟 Vulkan 进行 “功能对等” 对比,从设计的初衷就决定了的:Vulkan 想要在不同硬件间提供一套一致的低级控制接口;而 Apple 想要提供一套垂直整合的最高效率路径。

这完全是两个设计哲学:

  • 统一化 (Uniform): 解码器是 GPU 的一部分
  • 分层化 (Layered): 解码器是一个独立的模块,单独工作
本文由作者按照 CC BY 4.0 进行授权