AD Support in HTML Video

One of the primary challenges of using the browsers’ default video player is its lack of support for additional audio tracks. Whether those tracks are referenced from a 1) separate media file, 2) synthesized by the browser, or 3) in the video file itself, browsers overall today do a poor job of exposing them.

This is a particular concern when trying to satisfy WCAG Success Criterion 1.2.5 Audio Description (Prerecorded). At Level AA, this is required by most organizations in most locales.

For the scope of this post, I am using audio description (AD) as defined by WCAG:

audio description

narration added to the soundtrack to describe important visual details that cannot be understood from the main soundtrack alone

Note: Audio description of video provides information about actions, characters, scene changes, on-screen text, and other visual content.

Note: In standard audio description, narration is added during existing pauses in dialogue. (See also extended audio description.)

Note: Where all of the video information is already provided in existing audio, no additional audio description is necessary.

Note: Also called “video description” and “descriptive narration.”

With the current technology support today, you generally have to make two versions of a video and either link or embed the AD video alongside the original video. That’s a bummer.

But let’s look at support anyway. The videos I am using come from my post Media Queries in HTML Video, so they should look familiar (and explain my shoe-horned AD).

Separate Audio / Video <source>

WHATWG HTML offers some guidance on accessibility with the <video> element:

In particular, this content is not intended to address accessibility concerns. To make video content accessible to the partially sighted, the blind, the hard-of-hearing, the deaf, and those with other physical or cognitive disabilities, a variety of features are available. […] Audio descriptions can be embedded in the video stream or in text form using a WebVTT file referenced using the track element and synthesized into speech by the user agent. […]

Notice it lists only two options. That’s because the <source> element is not meant to hold AD (video or audio), even though a video can have many sources.

The src attribute holds the URL of the file. The type attribute declares its MIME type (file format). The media attribute contains media queries and, as I noted in Media Queries in HTML Video, there is no media query for absent visuals.

So <source> is no use to us here. This is not a limitation of browsers.

Synthesized AD Using <track>

The <track> element is generally used to provide closed captions and subtitles for a video. The kind attribute lets you also define a text <track> to synthesize AD using the descriptions value:

Textual descriptions of the video component of the media resource, intended for audio synthesis when the visual component is obscured, unavailable, or not usable (e.g. because the user is interacting with the application without a screen while driving, or because the user is blind). Synthesized as audio.

WHATWG HTML even has rules for how browsers should handle when the text-powered synthesized AD track takes up more time than allotted, essentially encoding extended audio description:

One example of when a media element would be paused for in-band content is when the user agent is playing audio descriptions from an external WebVTT file, and the synthesized speech generated for a cue is longer than the time between the text track cue start time and the text track cue end time.

In practice, however, no browser synthesizes AD from <track>. You can test it yourself, of course. Granted, I may have messed up the format of the WebVTT file (I even checked the Media Accessibility User Requirements) but this WebVTT spec intro makes me think not:

The majority of the current version of this specification is dedicated to describing how to use WebVTT files for captioning or subtitling. There is minimal information about chapters and time-aligned metadata and nothing about video descriptions at this stage.

That’s from 2019, though, so let’s look at the latest draft:

The majority of the current version of this specification is dedicated to describing how to use WebVTT files for captioning or subtitling. There is minimal information about chapters and time-aligned metadata and nothing about video descriptions at this stage.

§ 1. Introduction in WebVTT: The Web Video Text Tracks Format, Draft Community Group Report, 10 March 2023

Welp. I don’t think we can really blame browsers here?

Has an audio description <track>. I have tried a few variations of video with audio, without audio, with closed captions, without, making it default, etc.
<video preload="metadata" controls poster="star-video_poster.jpg">
    <source src="star-video.mp4" type="video/mp4">
    <track label="English" kind="subtitles" srclang="en-us" src="star-video_base.vtt" default>
    <track label="Audio Description" kind="descriptions" srclang="en-us" src="star-video_AD.vtt">
    Sorry, your browser doesn’t support embedded videos, but don’t worry, you can <a href="star-video.mp4">download it</a>. The <a href="star-video_base.vtt">caption file</a> is also available in case your video player can import it.

Audio Track in the Video File

The third option is to record and embed a separate audio track in the video file itself.

The advantage here is portability. The media file can be shared outside of a web page, so on Windows and macOS (at least) your users can select the embedded AD.

The disadvantage is production. For this video, I had to record a separate audio track, handle the ducking, export it as an audio track, import it and original video into MKVToolNix, get all the metadata right, export it an MKV file, then convert that to MP4. My copy of Camtasia cannot do this, and if Adobe Premier can then its documentation is so awful that I could not figure how.

Oh right, there is another disadvantage — only Safari exposes the AD track to users. Using my production flow, that AD track name is not what I assigned (yay).

In Safari, while the video is playing navigate to the last tab-stop, the double-right arrow (in Safari on iDevices it is three dots) in the bottom right corner labeled “more button”. Activate it and choose “Languages” from the menu, going into the sub-menu to choose between the two audio tracks. That may be tricky to the uninitiated with my 10 second video.

Has an embedded AD track. The track name is “AD (English)”, its language is “en-US”, and I set the “visual impaired” flag.

Safari’s menu to choose between languages, which is how you choose the AD track, lists both the original audio and the AD track as “English”. I am not sure what I did wrong, but to prove a better name is possible I have embedded a video from Apple that has a dedicated AD track named “English Descriptions”.

Apple’s video with a better-named AD track.

Do not use my video as a good example of AD. I talk over the lyrics, I don’t reduce the background music enough, the levels are crap, and so on. I have just enough background in audio engineering to be embarrassed and so little free time that I don’t much care about being embarrassed.

The video with only the AD track


If you want to present audio described video to all your users, make a separate video. Link it or, even better, embed it along with the original video.

The star field used in the video is one of the first images revealed to the public from NASA’s James Webb Space Telescope (image credit: NASA, ESA, CSA, and STScI). The music is Andy You’re a Star by The Killers.

Yes, you’re still a star.

Bug Reports

Because of the nature of bug reporting (have a test case demonstrating the issue), I published this and then filed bugs. Which I list here:

No comments? Be the first!

Leave a Comment or Response

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>