ndza

Dynamic audio filtergraphs in FFmpeg - avfilter_config_links() was removed

Overview

At http://www.veed.io, we host a video editor, where users can create video editing projects. These projects have a timeline, that among other things, contain cuts (trims) of audio/video. Once ready, they can export these projects into a final video.

As part of this exporting process, we put together (through rendering and compositing) output video frames, using the video frames from various sources. We also do the same for audio after mixing appropriately. These output frames (video and audio) get encoded into the final video.

In this post, I write about our audio handling using FFmpeg's filtergraph. In particular, how we are changing sources and filterchains of the filtergraph on the fly. I also discuss a problem we've now encoutered, due to FFmpeg recently deprecating and removing the avfilter_config_links() function that we have been using for a while.

Audio handling

On our project timeline, audio/video can be placed really anywhere. As an example, if we have a 10 second project, we can have a video at time range (0,3], we can have another video at time range (5, 8] we can also have a video overlapping both of those with time range (1,6]. This is a depiction of what it would look like on our project timeline showing 2 timeline tracks:

0   1   2   3   4   5   6   7   8   9   10 <-- seconds
|--video1---|       |--video2---|          <-- track1
    |--video3-----------|                  <-- track2

Our project exporting pipeline, then renders output video frames, one by one, starting at 0s until the end of the project. Initially decoding frames from video1, then from video1 and video3, then only from video3, then from video2 and video3, then only from video2. You get the idea.

Since these videos have audio too, we have to decode the audio frames similarly and mix them too.

Static filtergraph - the simple way

A simple way to mix all these audio frames, is by using an amix filter of 3 inputs. Our earlier implementation would setup something like 3 source-filterchains for each of the videos and 1 sink-filterchain:

video1-filterchain \
video2-filterchain  -> sink-filterchain
video3-filterchain /

In detail, the source-filterchains would start with an abuffer filter and the sink-filterchain would start with an amix and end with an abuffersink, with many more filters as part of the all the filterchains for purposes such as trimming, fading, volume-control, etc.:

abuffer -> <filters...> \
abuffer -> <filters...>  -> amix -> <filters...> -> abuffersink
abuffer -> <filters...> /

As exporting progresses, we fed audio frames, and get filtered audio frames from the buffersink. This worked really well for us until we started scaling the timeline.

You can imagine, our timeline can be virtually unlimited, with hundreds or even thousands of audio/video elements on various timeline tracks. We had to really move to something more on-demand and on-the-fly.

Dynamic filtergraph - changing source filterchains

After some investigations, we settled for the a novel approach. If you look closely at the timeline above, we really only need to mix 2 audio streams at any one time, not 3. So, we can analyse the timeline in advance, and figure out that there is a need for a maximum of 2 input mixing at any time.

We then introduced the notion of null-filterchain, which is a nullsrc based filterchain producing null samples.

Initially, we setup a filtergraph as follows:

video1-filterchain \
null-filterchain    -> sink-filterchain

As time progresses, we update the filtergraph to include the video3 filterchain:

video1-filterchain \
video3-filterchain  -> sink-filterchain

Then, we update the filtergraph to remove the video1 filterchain:

null-filterchain   \
video3-filterchain  -> sink-filterchain

Then, we update the filtergraph to include the video2 filterchain:

video2-filterchain \
video3-filterchain  -> sink-filterchain

Then, we update the filtergraph to remove the video3 filterchain:

video2-filterchain \
null-filterchain    -> sink-filterchain

After the filtegraph is setup for the after time. It is configured using avfilter_graph_config(). Any other change like the above, requires a reconfigure of the filergraph using the same call. We have to dispose of the filters and links carefully, including the automatic filters added by FFmpeg.

But, crucially, because the new filters and links remain unconfigured, we have to call avfilter_config_links() on them, otherwise the filtergraph crashes. This function is avaliable in FFmpeg 7.1.3, although deprecated.

Note: We've been running dynamic filtergraphs this way for a few years with many happy users, and hardly any user complaints.

Recently, with FFmpeg 8.0, avfilter_config_links() was finally removed, after being deprecated for a while in the earlier major versions.

I've been trying to figure out a way forward. I've also brought this up in multiple threads, for example here, here and previously here.

I'm also not sure if what we're doing here is something novel/unexpected/unsupported, so if you have any good ideas - let me know.

The hack

In either case, we need to forge ahead.

So, while I come up with a better way, I've hacked a patch that will allow us to move forward. I've very simply removed this check in FFmpeg-8.0.1. The function gets called as part of avfilter_graph_config() and is meant to start from the sink filters (with no outputs) and configure the filters and links, recursively, by traversing the graph. Since I've removed the check, it does the config for all filters multiple times - so we take a performance hit at the very least.

But, we are confident this hack works for now. Our unit tests passed! And since then, we've been running in production for more than a week now. No user complaints so far.

I hope to have something better soon.