Skip to main content

Making a Vulkan Encoder (Scary)

Nathalie Raffray
Undefined Behavior Specialist
· 8 min read

In the end of 2023 Vulkan released Vulkan Video extensions which allowed for GPU-accelerated encoding of H.264 and H.265 video. This was bleeding edge stuff: these extensions provided a GPU-agnostic solution for video compression which did not exist previously. At 3dverse, our renderer was dependent on NVIDIA solely because of video encoding, so this new solution would finally make us GPU-agnostic!

However. To this day, there is very limited information online to help someone implement anything with these extensions. The following is a semi-rageful but triumphant account of working with Vulkan Video from the vantage point of an engine programmer who had no previous experience with Vulkan, or any video compression codecs.


A few months ago I was given the task to integrate an open source Vulkan video encoder, which I will hereby refer to as pyroenc, because that is its name technically, into our renderer. I had never touched Vulkan before (I vaguely knew to feel fear towards it) and didn’t know anything about video compression, namely the H.264 and H.265/HEVC codecs. I figured the task would be just a bit of rote pipework: linking our renderer with pyroenc, some initial configuration, feeding it frames and recuperating the encoded output. I was so innocent.

Why did we need a Vulkan encoder

Well why do we need a video encoder in the first place? Same reason any video streaming platform compresses their video, to decrease network latency. 3dverse streams real time 3d scenes. Our renderer encodes the frame before sending it over the network to the respective Livelink client, and Livelink then takes care of decoding the frame for you client-side, using the WebCodecs API.

Why do we need an encoder that uses Vulkan Video? For hardware encoding, there was already Nvenc from NVIDIA, QuickSync from Intel, Advanced Media Framework (AMF) from AMD, etc. The delineating factor is that Vulkan Video offers a non-vendor specific API for encoding on the GPU. In our case, we already had two encoders - one that encodes on the CPU using ffmpeg (way too slow), and one that encodes on the GPU using Nvenc. A Vulkan encoder releases ourselves from the yoke of NVIDIA (”Stick it to the big man”, as my graphics programmer coworker puts it), so that our renderer can also run on AMD GPU instances in the cloud.

The First Impasse…but before that…

Before getting into any first, second, nth, impasses, there was a primordial impasse. Something crucial to note is that the Vulkan Video extensions were very new when I started working on the encoder. The encoding extensions in the Vulkan specification were finalized in December 2023 for H.264 and H.265. It took some time before NVIDIA/AMD/etc. implemented these encoding extensions in stable drivers. I started working on the Vulkan encoder in December 2024. By that time, there was the Vulkan spec and an official NVIDIA sample program vk_video_samples to go off of. If you had a bug and wanted to look it up online, or just have any information outside from the austere Vulkan doc, nothing would show up. Zilch. Silence.

The First Impasse

I was able to integrate pyroenc pretty well after grasping some Vulkan concepts (what’s a Vulkan queue? What’s a Vulkan image view? etc.). At this time pyroenc was only encoding in H.264 High Profile, so this was functional. The next step was to make it work with Base and Main profiles. I forked pyroenc and got to work. I figured the work would just be some basic configuration, i.e. specifying STD_VIDEO_H264_PROFILE_IDC_BASELINE instead of STD_VIDEO_H264_PROFILE_IDC_HIGH in the place where you had to specify the profile. Nope! Simply doing this led to an issue during the setting of video session parameters 💢.

The dreaded video session parameters

If you are not familiar, video session parameters provide parameters for different scopes of the video stream. The first step to encoding video in H.264/H.265 is setting up these parameters.

SPS (Sequence Parameter Set): High-level parameters that apply to an entire sequence of frames

PPS (Picture Parameter Set): Parameters for individual pictures/frames

VUI (Video Usability Information): Optional additional metadata about how the video should be displayed or processed. Usually appended to the SPS.

H.265 also introduces VPS (Video Parameter Set), which provides a higher level of encoding parameters than SPS, and applies across multiple sequences or layers, i.e. the global video stream.

Between the VPS, SPS, PPS, VUI there are many parameters (>100) that can be set*.* Not all need to be set to a different value than initialized, but some need to be otherwise the encoding will fail. Some parameters are conditionally present based on the values of other parameters. The accepted values of a given parameter can also depend on how others are set.

So there is a lot to know about these parameters and the only place to really understand anything about them is in the 700 page long spec PDFs, which are a joy to read. There is 0 documentation online about them. What I ended up doing most of the time is use Claude.ai to synthesize the information from these PDFs for me, and answer my questions as they arose. I’m partial to Claude.ai compared to ChatGPT.

ITU-T Rec. H.264 (08/2010) Advanced video coding for generic audiovisual services

I guess I had never found the free time to dive into the 700 page-long PDF ITU-T Rec. H.264 (08/2010) Advanced video coding for generic audiovisual services which I had dutifully kept on my bedside table in hopes of one day finding the time before bed.

ITU-T Rec. H.265 (08/2021) High efficiency video coding

The equally arresting 700 page-long sequel ITU-T Rec. H.265 (08/2021) High efficiency video coding.

The mysterious VK_OUT_OF_HOST_MEMORY

My first impasse was at the call to vkGetEncodedVideoSessionParametersKHR which returned VK_OUT_OF_HOST_MEMORY. vkGetEncodedVideoSessionParametersKHR takes a video session parameters object which contains your desired parameters and outputs the encoded parameters in a buffer you give it. VK_OUT_OF_HOST_MEMORY implies that there is not enough CPU memory. I had plenty of RAM left as my renderer was running. My first thought was the buffer I gave it was too small, but it wasn’t that. In fact you can call vkGetEncodedVideoSessionParametersKHR such that it returns the size it needs for said buffer, but even this call would return the mysterious VK_OUT_OF_HOST_MEMORY. A coworker of mine asked me at the time if I could just “debug” it. As in “step in”. Ah…sweet innocence…unfortunately you can’t step into the GPU. There was nothing to be found online regarding this. I decided to update my drivers, update Vulkan SDK, nada, still VK_OUT_OF_HOST_MEMORY.

I decided to probe and gaslight Claude.ai until getting the whole thing to work for Base Profile. It took many hallucinations before finally reaching something true. The issue was actually in the setting of the parameters and VK_OUT_OF_HOST_MEMORY was probably due to some bad allocation in the driver due to my parameters being invalid. A Vulkan Validation Layer would have been appreciated…

This made Base Profile work:

if (impl.info.profile == Profile::H264_Base) {
// Baseline does not support CABAC encoding (only CAVLC) but careful,
// VkVideoEncodeH264CapabilitiesKHR::stdSyntaxFlags can still set the
// VK_VIDEO_ENCODE_H264_STD_ENTROPY_CODING_MODE_FLAG_SET_BIT_KHR flag!
pps.flags.entropy_coding_mode_flag = 0;
// Weight prediction not supported in Baseline.
pps.flags.weighted_pred_flag = 0;
pps.weighted_bipred_idc = STD_VIDEO_H264_WEIGHTED_BIPRED_IDC_DEFAULT;
// Custom scaling matrices not supported in Baseline.
pps.flags.pic_scaling_matrix_present_flag = 0;
// 8x8 Transform mode not supported in Baseline.
pps.flags.transform_8x8_mode_flag = 0;
}

This made Main Profile work:

// Transform 8x8 mode is supported for Main and High profiles normally but it leads to
// an error in encoding the PPS on nvidia only for Main profile. There may be more
// settings to set for Main to support 8x8, or its a driver error.
if (
impl.caps.h264.caps.stdSyntaxFlags & VK_VIDEO_ENCODE_H264_STD_TRANSFORM_8X8_MODE_FLAG_SET_BIT_KHR
&& impl.info.profile != Profile::H264_Main
) {
pps.flags.transform_8x8_mode_flag = 1;
}

Hallelujah. My next step was to encode H265, queue the second impasse.