Dialog Focus in Screen Readers

Creating an accessible dialog on the web is trickier than it should be. Lack of support for the <dialog> element, the need for fundraisers to get inert into WebKit, inconsistent support for the ARIA dialog role, and other annoyances make them problematic. Scott O’Hara has spent a few years covering the mess:

Thankfully, we have a pattern (or variations on a pattern or two) that generally performs well across devices. For starters, you will need the inert polyfill, which essentially walks the DOM and makes everything unclickable and unfocusable. Then you will want to grab Scott O’Hara’s Accessible Modal Dialog pattern and wrangle it into your own project.

I generally took this approach when I made my Periodic Table of the Elements last year, spackling together a modal (with very basic vanilla script). I grabbed that modal recently to test a question that has come up a few times, both on client work and some code review — where do you put focus?

Managing focus for a modal is conceptually straightforward. Whatever launched the modal receives focus again when the modal closes. Easy-peasy. The trickier question is where does focus go when you open a dialog? The dialog wrapper? The heading? The close button? The first interactive control?

The answer depends on a lot of factors. Context, user skill level, experience, and more all come into play. You probably don’t want to put focus on a control if it has a destructive impact; putting focus on cancel or close button would be safer.

But the scenario with which so many seem unfamiliar is screen reader users.

This is because too few teams have the necessary testing suite and experience using screen readers to know how to test, or what to expect from the announcement. I can tell you how they generally announce today (with default settings), and you can use that to inform your larger decision on which approach works best for all your users.

Sample Dialog

See the Pen Assorted Dialog Focus Targets by Adrian Roselli (@aardrian) on CodePen.

The dialog’s accessible name is Frank. Things with focus get outlined in a dashed green line. There is a tabindex="-1" on the <h2> that is only there for the purposes of accepting focus for this demo. The content area has tabindex="0" because the content can scroll and this allows a keyboard-only user to scroll it.

My test suite on Windows 10 is JAWS 2020 with Chrome 85, Firefox 81, and Internet Explorer 11; NVDA 2020.2 with Chrome 85 and Firefox 81. On macOS 10.15.6 Catalina with VoiceOver I used Safari and Chrome 85. On Android 11 I paired TalkBack with Chrome 85 and Firefox 81. And on iOS 14 I used Safari, because that’s all Apple allows.

I tested by activating a button and recording what was announced. Spoiler alert — iOS 14 is still a spoiler.

Output

Wrap-up

I cannot tell you where focus must go when opening a dialog. I can only say that you should test with your users. Absent that, good UX practices should win out. At least with this information, for the three weeks it is current, you will also have an idea what your screen reader users might hear based on which approach you take.

9 Comments

Reply

This is an awesome resource, thanks Adrian!

Alex Tait; . Permalink
Reply

Agreed with Alex. I was just about to do my own testing on this for reference purposes. Now I don’t have to! Many thanks!

In response to James Catt. Reply

Happily. Bear in mind it will be out of date with any new releases of browsers or screen readers, so it has maybe a six week accuracy window before more testing is needed. But at least you have a sample to use.

Reply

The focus management depends, to some extent, on the dialog. I typically consider 3 primary types

  • A notification dialog (one message, one button to acknowledge, at most two buttons, to acknowledge and cancel/close). E.g. a confirmation message.
  • Two-choice dialog, a dialog with a message and two distinct actions, the most significant would be a session timeout dialog (extend session and log out), or a delete confirmation dialog).
  • A larger dialog (containing a form or whatnot)

For the first dialog, ideal experience would be to use aria-describedby on the dialog container referencing the diaog messag and set the focus on the acknowledgement button, user hears message, user gets message, user closes dialog, and lives happily ever after (or something).
This assumes aria-describedby works as intended (i.e. that the dialog title is read, the message is read and the focused button is announced), a whole nother set of complications (authors, don’t put the buttons inside the element referenced with aria-describedby, a.t. vendors/browsers, support for aria-describedby on dialogs isn’t quite what it should be, or wasn’t, when I last tested).

The two choice dialog, same principle, except its best to set focus to the least constructive control (most users want to extend their session if a timeout dialog pops up).

For a large dialog I would either focus the close button or the heading and not use aria-describedby, no use in having screen readers babble on and on uncontrolably, you quickly stop paying attention.

Birkir; . Permalink
In response to Birkir. Reply

Birkir, thanks for your thoughts on what kinds of dialogs to use in which circumstances. I intentionally excluded that kind of detail because it is outside the scope of this post. This post is intended to help inform the kinds of decisions a developer may make when they are not familiar with the SR experience.

However, since you did I want to give caution on two of the statements for readers:

  1. Content exposed via aria-describedby does not convey any structure. A screen reader user will not hear headings, lists, buttons, etc. announced and may still opt to move into the dialog to verify structure. Testing with users has proven this.
  2. For the second dialog pattern, I think you meant least destructive control gets focus, as that is the general advice outside of accessibility circles (pre-dating the considerations we are discussing here).

Regardless, developers should always test with their audience and may find the best fit is some variation on what you suggest.

In response to Adrian Roselli. Reply

First of all, thank you Adrian for a great writeup! However, I still wish to highlight two of the cases mentioned by Birkir as they can somewhat affect the screen reader experience.

The first is the case of small confirmation dialogs that contain a brief statement such as “Are you sure that you want to delete the file.txt?” and buttons to either cancel or confirm. As you said the aria-describedby does not convey any structure and it is likely that some users still wish to read dialog contents manually. However, I find that developers ignore or do not know about benefits of the aria-describedby even when it could prove to be very helpful. For example in this case there is no need to convey any structure and the additional message is just something to help you with the decision. Therefore I think that aria-describedby paired with the focus to the cancel button would be enough for most of the screen reader users to make the desired decision without a need to further explore the dialog contents as opposed to only hearing “Confirm delete dialog, selected, Cancel button”.

The second case is related to input fields in dialogs. I found out that setting focus to an input field has somewhat different effect than setting focus to a button, heading or the dialog itself. At least with NVDA setting focus to an input field seems to cause NVDA to do announcements only once even though in other cases it often likes to read same information twice especially in Firefox.

Sampo; . Permalink
Reply

Here’s me banging my drum. Perhaps approaching the dialog as an ableist idiom can encourage us to stop only converting its visually biased design patterns into a secondary audible experience? Instead, design an inclusive inline ‘hide, include, and show’ pattern that works and convert that to our visual users’ dialog paradigm; styled with CSS?

For example, we seem to have inline accordions and tab patterns’ focus mostly working inclusively? Visually style the resulting panel absolutely on the top layer and inject the non-semantic translucent background division tag you labelled Overlay as needed?

Your demonstration loads the dialog content outside the main element, which is perhaps the root cause for the focus send and retrieve complication? The overlay strategy is a visual one to prevent visual users accidentally interacting with the content behind it. That’s actually not a problem for screen reader users or keyboard navigators when we capture their moving away from the dialog content to a new focus? Or, do we focus-trap them and force the close action? One is a solution.

I am of course paying little attention to WAI-ARIA Authoring Practices 1.1, August 2019, Section 6.4 Deciding When to Make Selection Automatically Follow Focus because I don’t fully understand it. I’m a designer and not a developer so you may need to beat me over my head with a metaphorical conductor’s baton of grounding facts? I liked Sections 6.1 and 6.5.

Among other references, my thinking seems to follow WAI ARIA Practices Alert example, where content is dynamically loaded into the inline div container on clicking the button and Alert Dialog Example. Both load content adjacent to, and inline with the trigger button.

What I am certain of is that accessible is not inherently inclusive. We make people using screen readers work hard enough already. Our industry is managed by visual bias justifying a beautiful visible user experience for majority market; passing off hi-fi wireframes with no thought to the semantic of (DOM or A11y Tree) content. Although engineering is being driven by legislation toward accessibility, perhaps leading inclusive content design at source can remove the habitual barriers our graphic-first legacy encourage?

Or maybe my drumming is out of time with the orchestra? Only, as a newbie I like the tune. Best Wishes, and thank you for a thought provoking post and your time testing where I cannot.

In response to Pat Godfrey. Reply

Pat, I would be willing to see a prototype or design of your ideas. To comment on your notes in the hopes it can guide you…

Instead, design an inclusive inline ‘hide, include, and show’ pattern that works and convert that to our visual users’ dialog paradigm; styled with CSS?

We know a simple disclosure is not a fit, especially after seeing the problems screen reader users had with GitHub’s attempt at using <details> / <summary> as a dialog.

For example, we seem to have inline accordions and tab patterns’ focus mostly working inclusively?

Testing I have performed with dozens of users suggests that we do not. Some users prefer how focus jumps into a panel while others do not. The same is true for moving focus between tabs themselves. As a result, I often urge clients to avoid these patterns if they cannot test with their own users.

Your demonstration loads the dialog content outside the main element, which is perhaps the root cause for the focus send and retrieve complication?

It loads at the end of the DOM so it is less likely to be encountered by accident (through scripting errors or otherwise). Loading it at the end also makes it easier to disable the rest of the page via inert.

The overlay strategy is a visual one to prevent visual users accidentally interacting with the content behind it.

The overlay is there as a visual cue, yes, but also takes a click event so the dialog can be dismissed when a user clicks outside it. Otherwise it is not exposed to screen reader users.

That’s actually not a problem for screen reader users or keyboard navigators when we capture their moving away from the dialog content to a new focus? Or, do we focus-trap them and force the close action? One is a solution.

Screen reader users can navigate by more than Tab, so we need to make the underlying page as a whole inert to prevent them jumping to a heading or control or block of text in the wrong context (outside the dialog).

Among other references, my thinking seems to follow WAI ARIA Practices Alert example, where content is dynamically loaded into the inline div container on clicking the button and Alert Dialog Example. Both load content adjacent to, and inline with the trigger button.

The ARIA Authoring Practices example of inline alerts is one alternative to dialogs, and is common for displaying groups of error messages above a form.

The alert dialog example is not adjacent to the Discard button in the DOM; it is adjacent to the button’s parent. Regardless, since the page is made inactive the button would lose focus (and the user their place in the pace) unless the developer sets focus on or within the dialog that appears. This is expected.

I want to caution that the ARIA Authoring Practices have issues related to little testing, poor mobile support, lacking touch support, and seek to re-create the Windows95 interface on the web (among other issues). I say this to caution you that while a handy reference for idealized patterns, few actually work with users.

Reply

With thanks, Adrian. That set me back some confidence, to be honest. It is frustrating that all the passion and verve in the World and seeking reference through all the complexities and poor information architecture of WAI guidelines that legislation is pushing, I just cannot guarantee to make one simple interaction work inclusively for everyone. At the least not this one and I did make a fair win with an inclusive experience of cartoon strips and infographics! (By criticising and adjusting the guidelines, by coincidence).

I took a breath and blogged my learning based on your kind and patient feedback. It’s not pretty. It does give me a launch pad from which to work on moving forward. It features a 2012 Google tutorial video that demonstrates your points perfectly when focus into the topic’s dialog fails to act as the presenter expected. It’s only a grave shame we have not evolved further in 8 years. My blog post thinking on Inclusive Dialog Design.

Leave a Reply to Birkir Cancel response

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>