AD Support in HTML Video
One of the primary challenges of using the browsers’ default video player is its lack of support for additional audio tracks. Whether those tracks are referenced from a 1) separate media file, 2) synthesized by the browser, or 3) in the video file itself, browsers overall today do a poor job of exposing them.
This is a particular concern when trying to satisfy WCAG Success Criterion 1.2.5 Audio Description (Prerecorded). At Level AA, this is required by most organizations in most locales.
For the scope of this post, I am using audio description (AD) as defined by WCAG:
- audio description
narration added to the soundtrack to describe important visual details that cannot be understood from the main soundtrack alone
Note: Audio description of video provides information about actions, characters, scene changes, on-screen text, and other visual content.
Note: In standard audio description, narration is added during existing pauses in dialogue. (See also extended audio description.)
Note: Where all of the video information is already provided in existing audio, no additional audio description is necessary.
Note: Also called “video description” and “descriptive narration.”
With the current technology support today, you generally have to make two versions of a video and either link or embed the AD video alongside the original video. That’s a bummer.
But let’s look at support anyway. The videos I am using come from my post Media Queries in HTML Video, so they should look familiar (and explain my shoe-horned AD).
Separate Audio / Video <source>
WHATWG HTML offers some guidance on accessibility with the <video>
element:
In particular, this content is not intended to address accessibility concerns. To make video content accessible to the partially sighted, the blind, the hard-of-hearing, the deaf, and those with other physical or cognitive disabilities, a variety of features are available. […] Audio descriptions can be embedded in the video stream or in text form using a WebVTT file referenced using the
track
element and synthesized into speech by the user agent. […]
Notice it lists only two options. That’s because the <source>
element is not meant to hold AD (video or audio), even though a video can have many sources.
The src
attribute holds the URL of the file. The type
attribute declares its MIME type (file format). The media
attribute contains media queries and, as I noted in Media Queries in HTML Video, there is no media query for absent visuals.
So <source>
is no use to us here. This is not a limitation of browsers.
Synthesized AD Using <track>
The <track>
element is generally used to provide closed captions and subtitles for a video. The kind
attribute lets you also define a text <track>
to synthesize AD using the descriptions
value:
Textual descriptions of the video component of the media resource, intended for audio synthesis when the visual component is obscured, unavailable, or not usable (e.g. because the user is interacting with the application without a screen while driving, or because the user is blind). Synthesized as audio.
WHATWG HTML even has rules for how browsers should handle when the text-powered synthesized AD track takes up more time than allotted, essentially encoding extended audio description:
One example of when a media element would be paused for in-band content is when the user agent is playing audio descriptions from an external WebVTT file, and the synthesized speech generated for a cue is longer than the time between the text track cue start time and the text track cue end time.
In practice, however, no browser synthesizes AD from <track>
. You can test it yourself, of course. Granted, I may have messed up the format of the WebVTT file (I even checked the Media Accessibility User Requirements) but this WebVTT spec intro makes me think not:
The majority of the current version of this specification is dedicated to describing how to use WebVTT files for captioning or subtitling. There is minimal information about chapters and time-aligned metadata and nothing about video descriptions at this stage.
That’s from 2019, though, so let’s look at the latest draft:
The majority of the current version of this specification is dedicated to describing how to use WebVTT files for captioning or subtitling. There is minimal information about chapters and time-aligned metadata and nothing about video descriptions at this stage.
Welp. I don’t think we can really blame browsers here?
<video preload="metadata" controls poster="star-video_poster.jpg">
<source src="star-video.mp4" type="video/mp4">
<track label="English" kind="subtitles" srclang="en-us" src="star-video_base.vtt" default>
<track label="Audio Description" kind="descriptions" srclang="en-us" src="star-video_AD.vtt">
Sorry, your browser doesn’t support embedded videos, but don’t worry, you can <a href="star-video.mp4">download it</a>. The <a href="star-video_base.vtt">caption file</a> is also available in case your video player can import it.
</video>
Audio Track in the Video File
The third option is to record and embed a separate audio track in the video file itself.
The advantage here is portability. The media file can be shared outside of a web page, so on Windows and macOS (at least) your users can select the embedded AD.
The disadvantage is production. For this video, I had to record a separate audio track, handle the ducking, export it as an audio track, import it and original video into MKVToolNix, get all the metadata right, export it an MKV file, then convert that to MP4. My copy of Camtasia cannot do this, and if Adobe Premier can then its documentation is so awful that I could not figure how.
Oh right, there is another disadvantage — only Safari exposes the AD track to users. Using my production flow, that AD track name is not what I assigned (yay).
In Safari, while the video is playing navigate to the last tab-stop, the double-right arrow (in Safari on iDevices it is three dots) in the bottom right corner labeled “more button”. Activate it and choose “Languages” from the menu, going into the sub-menu to choose between the two audio tracks. That may be tricky to the uninitiated with my 10 second video.
Safari’s menu to choose between languages, which is how you choose the AD track, lists both the original audio and the AD track as “English”. I am not sure what I did wrong, but to prove a better name is possible I have embedded a video from Apple that has a dedicated AD track named “English Descriptions”.
Do not use my video as a good example of AD. I talk over the lyrics, I don’t reduce the background music enough, the levels are crap, and so on. I have just enough background in audio engineering to be embarrassed and so little free time that I don’t much care about being embarrassed.
The video with only the AD track
Wrap-up
If you want to present audio described video to all your users, make a separate video. Link it or, even better, embed it along with the original video.
The star field used in the video is one of the first images revealed to the public from NASA’s James Webb Space Telescope (image credit: NASA, ESA, CSA, and STScI). The music is Andy You’re a Star by The Killers.
Yes, you’re still a star.
Bug Reports
Because of the nature of bug reporting (have a test case demonstrating the issue), I published this and then filed bugs. Which I list here:
- Chromium: 1447858 Feature Request: text-based Audio Descriptions in Chrome, 22 May 2023 and my comment on 20 December 2023
- 361123861 Flag expired : text-based-audio-descriptions is expired in M130, 20 August 2024, suggests movement might be stalled.
- 40664325 Support description tracks in native HTML 5 video elements w/o extra extensions, 18 December 2019, has had no activity (other than auto-close efforts) in 5 years. Both of these bugs from Jeff Witt in the A11y Slack.
- MDN Browser Compat Data: 21698 html.elements.track – No UA supports kind=”descriptions”, 20 December 2023
- WebKit: 266724 – AX: Feature request: Support WebVTT-based synthesized audio description in video, 20 December 2023
- Firefox: 1871143 Feature request: Support WebVTT-based synthesized audio description in video, 20 December 2023
Update, 9 January 2025: Safari Synthesized AD Support
An Apple rep responded to my 2023 feature request to note that experimental support for synthesized AD (via <track>
) had been added in 2022. You can enable and confirm it using the synthesized AD video above (no restart necessary).
- Apple icon → System Settings
- Choose Accessibility
- Choose Audio Descriptions
- Toggle “Play audio descriptions when available”
- Go to Safari → Settings…
- Choose Featured Flags
- Check “Audio descriptions for video – Extended”
- Check “Audio descriptions for video – Standard”
I’ve asked when this feature will move out of being strictly experimental and will update when I hear back.
James Scholes confirmed it for iOS:
@aardrian The same feature flags can also be toggled on iOS: Settings -> Apps -> Safari -> Advanced -> Feature Flags. It seems to completely ignore my system speech settings, doesn’t speak through VO if it’s running, and the audio ducking is rubbish.
The audio ducking on macOS was rubbish, too, but at least the feature is available. If you know to hunt for it.
Leave a Comment or Response