Don’t Override Screen Reader Pronunciation

When many devs, testers, and authors first start listening to content through a screen reader, they are surprised to hear dates, pricing, names, abbreviations, acronyms, etc. announced differently than they expect. With the best of intentions (or branding panic) they may seek to force screen readers to announce content as they (are told to) expect.

But they are seeking to solve a problem that does not exist. Instead, these efforts can make interfaces and content harder to use and, in some cases, inaccessible.

Broadly, screen reader users are already familiar with how their screen reader(s) announce these kinds of content (acronyms and all). Mis-pronunciations are fine. These are not bugs.

Do not try to override these default announcements.

Reference

A non-exhaustive list of what plays into how a screen reader announces things:

A CRT computer monitor behind a keyboard; coming out of the screen is a spherical eyeless and noseless face made up of individual keyboard keys, and it has a wide, flat mouth open as if it is speaking.

Reasons we may want to avoid overriding default pronunciation:

I offer some problematic examples of overriding default pronunciation in my post Uncanny A11y.

Wrap-up

If you have documented cases where there is a problem for users, you are almost certainly better off changing wording to avoid or clarify pronunciations (sometimes replacing extended characters). If re-writing will not satisfy your audience (or boss), then be certain to test your approach with those same users to see if it genuinely improves their experience.

What I am talking about is not the same as CSS Speech. CSS Speech is about designing the aural presentation analogously to the visual presentation. CSS Speech affects voices, pitch, rate, volume, stress, pauses, cues, and the like. It does not affect pronunciation. It is also not supported in browsers. Read Léonie Watson’s late 2022 post Why we need CSS Speech for more information (she is the spec editor too) and a follow-up about concerns. You can get an overview in her April 2023 appearance on CSS Cafe.

Update: 12 April 2023

A Masto conversation this morning reminded me that I failed to reference the W3C Web Accessibility Initiative Pronunciation Task Force and its draft Specification for Spoken Presentation in HTML.

More importantly, Brennan Young left a comment on my post Speech Viewer Logs of Lies (two comments, actually) that I am reproducing in part here:

[…]

First of all, screen reader announcements are typically handled by a system-level speech synthesiser. There are usually various ‘voices’ to choose between, to represent different dialects or other preferences.

The pronunciation heuristics that each ‘voice’ follows are not the same, even with different voices in the same ‘dialect’ from the same vendor.

Example: We have products for a medical context where the string “IV” is announced as “four” by some voices and “roman four” by other voices using the same screen reader. Both are wrong in our case. (We need the idiomatic abbreviation for “intravenous”, which is simply the announcement of the two letters).

So… testing on different screen readers will not reveal the full scale of the problem – and in any case, there is little that can be done by the content creator (or web developer) to fix it which wont compromise the output on (say) a Braille device. So testing widely might reveal more pronunciation issues, but we still have no way to solve them.

[…] the best practice is not to dictate phonetics, it is to indicate clear semantics, but we lack a mechanism to do this. Sometimes the sequence :) is not a smiley, and sometimes I want “IV” read out as two letters. Sometimes O2 means oxygen, sometimes “O-squared” is intended. How do I specify these things?

The W3 pronunciation task force was set up to address some of these issues from a standards point of view, and I encourage all readers here to support or even contribute to their efforts, but the problem and its solution lies primarily with speech synth vendors, not the AT vendors or the content creators.

The primary speech synth vendors are Microsoft, Apple and Google, all of whom have made speech synthesis a part of their operating systems. I don’t get the impression that the teams involved in these technologies are even aware that their speech synth offerings have these significant accessibility failures, and they won’t find out if we complain only to Freedom Scientific, or NVDA, or blame the web developers.

All three of the main speech synth vendors offer quite sophisticated accessibility features at a system level, but none of them tackle the problem of poorly guessed screen reader pronunciation. The way that time values are announced is particularly uneven, and time values are a rare case where you can indicate which field refers to which value. In practice, hours, minutes and seconds are often announced wrongly – even if you take the trouble to specify a well-formed datetime attribute.

So how do we – as web content and accessibility professionals – apply pressure at the right (and most effective) place to the appropriate teams at Microsoft, Apple and Google to quit their ‘clever’ guessing, and give content creators a modicum of semantic control over how emojis, dingbats, abbreviations, acronyms, units, technical values and the like are intended to be treated?

Brennan Young;

A further point about multilingual support.

The lang attribute is often ignored, especially if the string for pronunciation is found inside an aria attribute, or a live region. Try it! We find dismal results on many browsers. Safari seems to ignore lang altogether – the system speech synth language setting overrides everything in the markup.

Again, knowing that there are problems with screen reader pronunciation is of questionable value if we have no way to fix them. What is the business case for running tests for problems which have no solution?

Brennan Young; .

Please note that Brennan left those comments two years ago on a different post and with a slightly different technology landscape. Aspects of the comments may be different now and Brennan may offer alternate feedback as a result.

Update: 8 August 2023

Ben Meyers has written The Curious Case of “iff” and Overriding Screenreader Pronunciations, where he looks at a very specific use case. He generally recounts the process I outline here but also links to instructions for setting up custom pronunciation rules in assorted screen readers. Those links are for end users, not developers or testers.

Update: 18 August 2023

In the walled garden that is the Web A11y Slack, Léonie Watson offered this feedback when prompted:

[Y]ou asked how often I need to read a large number by individual number, and the answer is that it depends on what the number represents, what I need it for, and whether it’s important to know the exact number or not.

For example, I’ll read a 2FA code as a single number. If it turns out I misheard it, I might read it by individual number.

If it’s a large number that represents, say, the number of people in a population like 64,150,348, I’ll read the whole number because it’s the 64 million bit that’s most likely the important bit, not the 348 on the end.

Unless I was writing a report on that population, in which case I might explore it by number to make sure I got it right. In part this is because my memory doesn’t work well with numbers and I find it difficult to retain long numbers in my head – it’ll be different for other people of course.

If it’s a phone number I want to call I’ll read it by number, but if it’s a customer services number in the content of a page that I have no intention of calling, then it’s OK for it to be read as a whole number or blocks of numbers.

The TLDR is that there are too many variables, so the best bet is to stick to the expected/proper formats, grammar, and punctuation for the language you’re writing in. accessibility.blog.gov.uk/2017/02/08/advice-for-creating-content-that-works-well-with-screen-readers/

I feel like this kind of useful context to which most developers do not have the benefit of access.

Update: 24 August 2023

I saw someone complain that screen readers should be able to change how numbers are announced based on unspecified context. That demonstrates a level of unfamiliarity with how screen readers work but also, and more importantly, is a common swipe when someone finds that supporting users takes a bit more effort than they thought.

Don’t blame screen readers. Provide enough context for your content so users can work with it and get a copywriter to help ensure that content is more robust.

12 Comments

Reply

While I largely agree with you, there are some exceptions. My favourite was the phrase “300 MARS”, which was a strapline on the Slack website a few years ago. At the time, JAWS read it as “Three hundred million Argentinian pesos”. Are you really suggesting you wouldn’t fix that?

In response to Steve Green. Reply

I might not, no. “Fixing” could create new problems, as I outline above. Without knowing the audience, context, use case, translation needs, experiences outside the site, expectations set elsewhere, and so on, I would not make a blanket change without confirming it is an issue with users and then prototyping a workaround (and maybe documenting it in my Accessibility Statement). If, in that very specific case you cite sans context, it was just marketing text then as a reader I might not even care. I feel like I just re-stated my entire post here.

In response to Steve Green. Reply

Why not “300 Mars?” IIRC you can even use “text-transform: uppercase” for this. Personally I prefer “font-variant: small-caps” in this sort of situation though.

IMO uppercase is overused as a form of emphasis.

It kind of sucks elements like abbr and time aren’t used by AT + synthesizers.

IMO the main problem here is not “presentation” but “semantics.” The only problem isn’t screen readers pronouncing things in a strange or funny way but in a misleading or incorrect way.

Molly Stewart-Gallus; . Permalink
In response to Molly Stewart-Gallus. Reply

Molly, text-transform: uppercase does not solve for what you want, as I note in my post talking about all-caps text in a legal context.

It kind of sucks elements like abbr and time aren’t used by AT + synthesizers.

The <abbr> element is not exposed by browsers (because it is treated as plain text), so AT and synthesizers have no role in this (and title support is optional as a result).

Reply

Adrian,

Thank you for pushing this issue. I’ve had to explain reasons to people on why they shouldn’t create clever pronunciation hacks but your article provides me with much more knowledge I can share.

John Lukosky; . Permalink
Reply

Thank you for nice article.

I’m one of those developers who is obsessed of span lang=”…” markup while making content. I will think on your thoughts and maybe use that markup less often. Yes, those pauses is not good. On the other hand, my use cases is not as simple as as French word ‘croissant’ inside English text. Sadly, when I use English words in Ukrainian text (I’m Ukrainian) without span lang=”en”, pronunciation of most English words become completely screwed, at least with Google TTS on my phone.

But, let me say about abbreviations. I have found that sometimes HTML tag ‘abbr’ without ‘title’ attribute is useful. (For the real abbreviations). <abbr>IV</abbr> should be pronounced as ‘E Wee’, not ‘four’, right? Should you recommend it?

In response to Dmytro. Reply

Dmytro, for the most part foreign words in English text do not need to be wrapped with a lang. “Croissant”, “cappuccino”, “putz”, and so on are part of American English parlance and should not get that treatment. When using a word not commonly known in the language of the surrounding content, then a lang declaration is probably appropriate (especially if using a different alphabet, like бутерброд. This is more about context than about how a screen reader announces it.

As for using <abbr>, without title, if you have found it has helped a screen reader pronounce something better then ace and also please send me a URL so I can see it in action.

Also, if my sandwich auto-translation is offensive in some way, let me know and I will swap it for something with hopefully fewer alternate meanings.

Reply

A recent research about how screen readers read special characters (update of the Deque one) also highlights differences in pronunciation across screen readers. Deque didn’t specify which voice they selected for their test, but this may actually have impacted on the differences between the 2 researches. The researcher also experimented the support issue of lang. The character test table is available on Github, it could be used for further testing with different voices.

In response to Régine Lambrecht. Reply

Hi Régine! I see the author of that article is a co-worker of yours at Eleven Ways. I appreciate you sharing their testing results and the GitHub page, but I especially appreciate it has a CC BY-SA 4.0 license so folks can update and use it, so please pass that along.

For other readers, the Deque article Why Don’t Screen Readers Always Read What’s on the Screen? Part 1: Punctuation and Typographic Symbols that Régine references is from January 2014 and definitely warranted some new testing.

Reply

I agree with not overriding things but to encourage people to not fix simple things like abbreviations seems unnecessary. If “IV” is an issue, why not simply replace it with “intravenous” or “four”? I work for an EHR vendor and I have noticed that screen readers pronounce it as “ear”. Even though I was taught as a child to include periods after each abbreviated letter that rule has clearly been removed since then as I see no sign of periods anywhere (NBC, USA, LOL, ASAP, etc.). I’m trying to get our writers to switch to including the full wording of “electronic health record” before the first use of “EHR”. This will help SEO as well as a11y (another abbreviation).

Dmytro, I wish you and your family all the best if you are still in the Ukraine.

Reply

Hey Adrian! Great post! I have what might be a corner case for you to pick apart with your vast experience.

What about the scenario where there is an email text input followed by some error text?
[ email input ] –> “test@gmial.com”
[ warning text ] –> “We might not be able to deliver mail to this address. Did you mean test@gmail.com instead?”

The Orca screen reader for Linux says… “input test at gmail dot com we might not be able to deliver email to this address did you mean test gmail com instead?” — leaving out the “at” and “dot” in the suggestion.

The two texts are pronounced wildly differently in these contexts in very rapid succession. Is this also a case where real users can “figure it out”?

Thanks for any insight you might have into this!

In response to Luke. Reply

I feel you are getting hung up on the difference in how Orca is announcing the value of an input (where content accuracy is critical) and the accessible description (the text you pasted implies an aria-describedby association). I see no issue there and would not stress about it.

Leave a Comment or Response

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>