Speech Viewer Logs of Lies

When sighted users test with a screen reader it is common to rely on the visual output — checking to see where focus goes, confirming that controls behave, watching the spoken output in a text log.

The problem is that what you see in those speech viewers is not always what you hear.

Consider how an emoji is represented in a log versus how it is announced. While visually it may be simple, it can be an audible assault for a screen reader user. 👏 is much easier to skim past when looking at a tweet, but harder when it is announced as graphic clickable emoji colon clapping hands sign.

The same is true for other extended characters, as in the following example…

As captured in VoiceOver for macOS using Safari. I wrote the captions instead of relying on the speech viewer.

When we rely on the display of the speech instead of listening to it, we as developers or testers or authors are unaware that ┏ is announced as box drawings heavy down and right in VoiceOver. Similarly, the related ━, which may look like an em-dash, is announced as box drawings heavy horizontal.

If I just relied on the VoiceOver speech viewer, those captions would have been much easier to type and also completely inaccurate.

Demonstration

Obviously these characters are exceptions. Using extended characters always comes with risk, but sometimes even the most mundane text can have an unexpected announcement. I made a quick demo with common content I see on the web:

See the Pen PoNYREO by Adrian Roselli (@aardrian) on CodePen.

Here is how that content is recorded in the JAWS Speech History window:

JAWS Speech History window showing the following text: “Demo | SR Announcements, heading level 1, First Name:, First Name: Edit, Type in text, Submit > Button, list of 1 items, 📌, visited Link WAI-ARIA 1.1, list end, NASA and IT got together on 8/4/2020 at 10:00am, 👆 what he said.”
Get to this screen by pressing JAWS + Space, and then H. You can then Ctrl + A to select it all, then Ctrl + C to copy it before accidentally closing the window. Your JAWS key will be either Caps Lock or number pad Insert depending on your keyboard set-up.

Here is how that same content sounds when spoken — I have added captions that reflect the full announcement, so please enable them and read along as you listen:

Using JAWS 2019 with Chrome 84 on Windows 10, using only to navigate.
Transcript

Demo vertical bar ess are Announcements heading level one

First Name colon

First Name colon edit type in text

Submit greater button

list of one items

pushpin

visited link why dash ARIA one point one

list end

blank

NASA and it got together on eight slash four slash twenty twenty at ten AM

blank

white up pointing backhand index what he said

I want to impress upon you, dear reader, that I am only using JAWS because it was quickest for me. Each screen reader will have slightly different heuristics for how these are announced. I simply do not have time to record them all. This post is about assumptions regardless of screen reader.

Takeaways

Do not use this as justification to try to override what a screen reader says. I noted this on Stack Overflow, back when I was wasting too much time there. It has come up on the WebAIM list many times. I have to explain it to clients on the regular.

If you read my post Stop Giving Control Hints to Screen Readers then you already know trying to tailor content for screen reader users can be problematic. Similarly, forcing your preferred pronunciation or phrasing is a bad idea and should never be done (unless backed up by user research and a darn good use case).

All that out of the way, I see two risks from being unaware of what a screen reader is actually saying:

  1. QA teams rely on the output matching a speech log;
  2. verbose interfaces (that sometimes hide important cues).

I have worked with QA teams who send something back to developers because it does not exactly audibly match a provided screen reader log. Often that log is from a different screen reader / browser pairing or a customized set-up not used by QA.

Similarly, if as a developer all I do is ensure the extended characters (emoji, math symbols, etc.) show up in the speech viewer but do not confirm how they are announced (if at all), I can create an unusable interface.

Screen readers offer settings that let you control how verbose they are when encountering punctuation, certain characters, abbreviations, acronyms, or even words. A skilled screen reader user may take advantage of those features, but you cannot count on users doing it and you cannot suggest that as a fix.

What to Do?

Pay attention to times, dates, currency, punctuation, special characters, emoji, math symbols, common (and uncommon) abbreviations & acronyms, and so on. Be aware how they announce across all screen reader and browser pairings that will view them.

Other steps:

If you are working with multilingual content (truly multilingual, not just the word panini in a description of your lunch) then be sure the lang attribute and the appropriate values are in place. This will help limit incorrect pronunciation from incorrectly marked-up content.

Update: Later that afternoon…

The problem with hyperbolic headlines is they can lead to overly-broad statements on Twitter, such as when I first shared this and suggested that you might be doing it wrong if not listening to the output. While Steve gave it the appropriate wary emoji, the following response drives home the point:

One of the benefits of the de facto peer review in accessibility is when people point out a flaw in your over-simplified (character-constrained) conclusions. Making sure comments are enabled on your posts and replies allowed on your tweets is a good way to be certain you are not lost in an echo chamber of one.

Similarly, folks can identify other use cases that you failed to mention:

Considering I have seen this happen (QA signs off on a thing, even though thing is never announced due an unnecessary live region interrupting it), it should have occurred to me to include it in the post.

Update: 24 August 2020

This is a great point about JAWS and which likely applies to the other screen readers (namely that text viewers are not QA tools, they are there to support end users).

One Comment

Reply

There is a solution to these kinds of problem. Certainly all web pages should go through QA and improvements be made so far as possible from their input. But unless you have an expert accessibility consultant on the team (which is not usual), then the next step is then to send the page to an external accessibility audit consultant for them to test.

They will use screen readers on it and listen to what the reader actually says, they won’t look at logs which as pointed out are not helpful. (You also need to invest in having them test on both desktop screen readers and screen readers on mobile devices, as they produce different results.) An experienced audit consultant will also find any other accessibility defects on the page, to help users with other disabilities besides being blind. They have the expertise to do the whole job for you – they do this kind of work full time, not just as a small bit-part among all the other kinds of testing to be done.

Guy Hickling, Accessibility Consultant; . Permalink

Leave a Comment or Response

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>