Dialog Focus in Screen Readers

Creating an accessible dialog on the web is trickier than it should be. Lack of support for the <dialog> element, the need for fundraisers to get inert into WebKit, inconsistent support for the ARIA dialog role, and other annoyances make them problematic. Scott O’Hara has spent a few years covering the mess:

Thankfully, we have a pattern (or variations on a pattern or two) that generally performs well across devices. For starters, you will need the inert polyfill, which essentially walks the DOM and makes everything unclickable and unfocusable. Then you will want to grab Scott O’Hara’s Accessible Modal Dialog pattern and wrangle it into your own project.

I generally took this approach when I made my Periodic Table of the Elements last year, spackling together a modal (with very basic vanilla script). I grabbed that modal recently to test a question that has come up a few times, both on client work and some code review — where do you put focus?

Managing focus for a modal is conceptually straightforward. Whatever launched the modal receives focus again when the modal closes. Easy-peasy. The trickier question is where does focus go when you open a dialog? The dialog wrapper? The heading? The close button? The first interactive control?

The answer depends on a lot of factors. Context, user skill level, experience, and more all come into play. You probably don’t want to put focus on a control if it has a destructive impact; putting focus on cancel or close button would be safer.

But the scenario with which so many seem unfamiliar is screen reader users.

This is because too few teams have the necessary testing suite and experience using screen readers to know how to test, or what to expect from the announcement. I can tell you how they generally announce today (with default settings), and you can use that to inform your larger decision on which approach works best for all your users.

Sample Dialog

See the Pen Assorted Dialog Focus Targets by Adrian Roselli (@aardrian) on CodePen.

The dialog’s accessible name is Frank. Things with focus get outlined in a dashed green line. There is a tabindex="-1" on the <h2> that is only there for the purposes of accepting focus for this demo. The content area has tabindex="0" because the content can scroll and this allows a keyboard-only user to scroll it.

My test suite on Windows 10 is JAWS 2020 with Chrome 85, Firefox 81, and Internet Explorer 11; NVDA 2020.2 with Chrome 85 and Firefox 81. On macOS 10.15.6 Catalina with VoiceOver I used Safari and Chrome 85. On Android 11 I paired TalkBack with Chrome 85 and Firefox 81. And on iOS 14 I used Safari, because that’s all Apple allows.

I tested by activating a button and recording what was announced. Spoiler alert — iOS 14 is still a spoiler.

Output

Wrap-up

I cannot tell you where focus must go when opening a dialog (nor can HTML apparently). I can only say that you should test with your users. Absent that, good UX practices should win out. At least with this information, for the three weeks it is current, you will also have an idea what your screen reader users might hear based on which approach you take.

(Spec) Update: 26 January 2023

This morning WHATWG merged a PR that addresses the default focus for native dialogs, closing out the issue that prompted this post originally.

Scott O’Hara also posted Use the dialog element (reasonably), where he covers this and reminds us to probably stop dialog overuse.

Now, let’s not get reckless with dialogs. This isn’t really advice that’s unique to the dialog element itself. People have a long history of taking elements/components that were intended for specific use cases, and then stressing them to their limit to fit their use case. Sometimes this makes sense, but other times (e.g., when your modal dialog consists of various headings, landmarks, and long-form content) it’s maybe best to sit back, take a deep breath, and think to yourself:

Go read the rest of that thought process.

Also, it is probably safe to use <dialog> for the average audience:

Instead of waiting for perfect, I personally think it’s time to move away from using custom dialogs, and to use the dialog element instead.

Update: 19 July 2023

When I wrote this post nearly three years ago, using HTML <dialog> was not a practical option. As such, this post does not discuss it.

Manuel Matuzović took the time to test where focus lands in browsers on macOS in his post O dialog focus, where art thou?.

Me being me, I put it through its paces on Windows. I did not re-test on macOS.

Quick nuggets:

  1. If you test this, add :focus to the existing :focus-visible styles so testing with your mouse yields more visible results;
  2. When it says focus moves to the <body> note it is only partially true — focus is lost and .activeElement reports it as on the <body>, but you should not with the style change I suggest that focus truly does not go there.
  3. Using Chrome and Edge on Windows I got different results for examples 6 and 10. Focus went to the first focusable element. I have not fired up the Mac to debug more because it is way over there.

Manuel’s site does not support comments, but I pinged him on Masto in case there was something obviously wrong in my testing.

Whether you are using the <dialog> element or roleing-up a <div>, this post and Manuel’s can give you some insight into the focus bits. Be careful with mixing both approaches.

20 July 2023: Manuel updated his post:

: Changed body to first focusable element for Chrome in demo 6 and demo 10. I had experimental web platform features enabled, which changed the current default behaviour.

He also updated the demos with the CSS fixes I suggested.

16 Comments

Reply

This is an awesome resource, thanks Adrian!

Alex Tait; . Permalink
Reply

Agreed with Alex. I was just about to do my own testing on this for reference purposes. Now I don’t have to! Many thanks!

In response to James Catt. Reply

Happily. Bear in mind it will be out of date with any new releases of browsers or screen readers, so it has maybe a six week accuracy window before more testing is needed. But at least you have a sample to use.

Reply

The focus management depends, to some extent, on the dialog. I typically consider 3 primary types

  • A notification dialog (one message, one button to acknowledge, at most two buttons, to acknowledge and cancel/close). E.g. a confirmation message.
  • Two-choice dialog, a dialog with a message and two distinct actions, the most significant would be a session timeout dialog (extend session and log out), or a delete confirmation dialog).
  • A larger dialog (containing a form or whatnot)

For the first dialog, ideal experience would be to use aria-describedby on the dialog container referencing the diaog messag and set the focus on the acknowledgement button, user hears message, user gets message, user closes dialog, and lives happily ever after (or something).
This assumes aria-describedby works as intended (i.e. that the dialog title is read, the message is read and the focused button is announced), a whole nother set of complications (authors, don’t put the buttons inside the element referenced with aria-describedby, a.t. vendors/browsers, support for aria-describedby on dialogs isn’t quite what it should be, or wasn’t, when I last tested).

The two choice dialog, same principle, except its best to set focus to the least constructive control (most users want to extend their session if a timeout dialog pops up).

For a large dialog I would either focus the close button or the heading and not use aria-describedby, no use in having screen readers babble on and on uncontrolably, you quickly stop paying attention.

Birkir; . Permalink
In response to Birkir. Reply

Birkir, thanks for your thoughts on what kinds of dialogs to use in which circumstances. I intentionally excluded that kind of detail because it is outside the scope of this post. This post is intended to help inform the kinds of decisions a developer may make when they are not familiar with the SR experience.

However, since you did I want to give caution on two of the statements for readers:

  1. Content exposed via aria-describedby does not convey any structure. A screen reader user will not hear headings, lists, buttons, etc. announced and may still opt to move into the dialog to verify structure. Testing with users has proven this.
  2. For the second dialog pattern, I think you meant least destructive control gets focus, as that is the general advice outside of accessibility circles (pre-dating the considerations we are discussing here).

Regardless, developers should always test with their audience and may find the best fit is some variation on what you suggest.

In response to Adrian Roselli. Reply

First of all, thank you Adrian for a great writeup! However, I still wish to highlight two of the cases mentioned by Birkir as they can somewhat affect the screen reader experience.

The first is the case of small confirmation dialogs that contain a brief statement such as “Are you sure that you want to delete the file.txt?” and buttons to either cancel or confirm. As you said the aria-describedby does not convey any structure and it is likely that some users still wish to read dialog contents manually. However, I find that developers ignore or do not know about benefits of the aria-describedby even when it could prove to be very helpful. For example in this case there is no need to convey any structure and the additional message is just something to help you with the decision. Therefore I think that aria-describedby paired with the focus to the cancel button would be enough for most of the screen reader users to make the desired decision without a need to further explore the dialog contents as opposed to only hearing “Confirm delete dialog, selected, Cancel button”.

The second case is related to input fields in dialogs. I found out that setting focus to an input field has somewhat different effect than setting focus to a button, heading or the dialog itself. At least with NVDA setting focus to an input field seems to cause NVDA to do announcements only once even though in other cases it often likes to read same information twice especially in Firefox.

Sampo; . Permalink
Reply

Here’s me banging my drum. Perhaps approaching the dialog as an ableist idiom can encourage us to stop only converting its visually biased design patterns into a secondary audible experience? Instead, design an inclusive inline ‘hide, include, and show’ pattern that works and convert that to our visual users’ dialog paradigm; styled with CSS?

For example, we seem to have inline accordions and tab patterns’ focus mostly working inclusively? Visually style the resulting panel absolutely on the top layer and inject the non-semantic translucent background division tag you labelled Overlay as needed?

Your demonstration loads the dialog content outside the main element, which is perhaps the root cause for the focus send and retrieve complication? The overlay strategy is a visual one to prevent visual users accidentally interacting with the content behind it. That’s actually not a problem for screen reader users or keyboard navigators when we capture their moving away from the dialog content to a new focus? Or, do we focus-trap them and force the close action? One is a solution.

I am of course paying little attention to WAI-ARIA Authoring Practices 1.1, August 2019, Section 6.4 Deciding When to Make Selection Automatically Follow Focus because I don’t fully understand it. I’m a designer and not a developer so you may need to beat me over my head with a metaphorical conductor’s baton of grounding facts? I liked Sections 6.1 and 6.5.

Among other references, my thinking seems to follow WAI ARIA Practices Alert example, where content is dynamically loaded into the inline div container on clicking the button and Alert Dialog Example. Both load content adjacent to, and inline with the trigger button.

What I am certain of is that accessible is not inherently inclusive. We make people using screen readers work hard enough already. Our industry is managed by visual bias justifying a beautiful visible user experience for majority market; passing off hi-fi wireframes with no thought to the semantic of (DOM or A11y Tree) content. Although engineering is being driven by legislation toward accessibility, perhaps leading inclusive content design at source can remove the habitual barriers our graphic-first legacy encourage?

Or maybe my drumming is out of time with the orchestra? Only, as a newbie I like the tune. Best Wishes, and thank you for a thought provoking post and your time testing where I cannot.

In response to Pat Godfrey. Reply

Pat, I would be willing to see a prototype or design of your ideas. To comment on your notes in the hopes it can guide you…

Instead, design an inclusive inline ‘hide, include, and show’ pattern that works and convert that to our visual users’ dialog paradigm; styled with CSS?

We know a simple disclosure is not a fit, especially after seeing the problems screen reader users had with GitHub’s attempt at using <details> / <summary> as a dialog.

For example, we seem to have inline accordions and tab patterns’ focus mostly working inclusively?

Testing I have performed with dozens of users suggests that we do not. Some users prefer how focus jumps into a panel while others do not. The same is true for moving focus between tabs themselves. As a result, I often urge clients to avoid these patterns if they cannot test with their own users.

Your demonstration loads the dialog content outside the main element, which is perhaps the root cause for the focus send and retrieve complication?

It loads at the end of the DOM so it is less likely to be encountered by accident (through scripting errors or otherwise). Loading it at the end also makes it easier to disable the rest of the page via inert.

The overlay strategy is a visual one to prevent visual users accidentally interacting with the content behind it.

The overlay is there as a visual cue, yes, but also takes a click event so the dialog can be dismissed when a user clicks outside it. Otherwise it is not exposed to screen reader users.

That’s actually not a problem for screen reader users or keyboard navigators when we capture their moving away from the dialog content to a new focus? Or, do we focus-trap them and force the close action? One is a solution.

Screen reader users can navigate by more than Tab, so we need to make the underlying page as a whole inert to prevent them jumping to a heading or control or block of text in the wrong context (outside the dialog).

Among other references, my thinking seems to follow WAI ARIA Practices Alert example, where content is dynamically loaded into the inline div container on clicking the button and Alert Dialog Example. Both load content adjacent to, and inline with the trigger button.

The ARIA Authoring Practices example of inline alerts is one alternative to dialogs, and is common for displaying groups of error messages above a form.

The alert dialog example is not adjacent to the Discard button in the DOM; it is adjacent to the button’s parent. Regardless, since the page is made inactive the button would lose focus (and the user their place in the pace) unless the developer sets focus on or within the dialog that appears. This is expected.

I want to caution that the ARIA Authoring Practices have issues related to little testing, poor mobile support, lacking touch support, and seek to re-create the Windows95 interface on the web (among other issues). I say this to caution you that while a handy reference for idealized patterns, few actually work with users.

Reply

With thanks, Adrian. That set me back some confidence, to be honest. It is frustrating that all the passion and verve in the World and seeking reference through all the complexities and poor information architecture of WAI guidelines that legislation is pushing, I just cannot guarantee to make one simple interaction work inclusively for everyone. At the least not this one and I did make a fair win with an inclusive experience of cartoon strips and infographics! (By criticising and adjusting the guidelines, by coincidence).

I took a breath and blogged my learning based on your kind and patient feedback. It’s not pretty. It does give me a launch pad from which to work on moving forward. It features a 2012 Google tutorial video that demonstrates your points perfectly when focus into the topic’s dialog fails to act as the presenter expected. It’s only a grave shame we have not evolved further in 8 years. My blog post thinking on Inclusive Dialog Design.

Reply

Hi Adrian, I was excited when I learned that native is quite supported now (if you can dismiss IE users it seems to be supported OK).
Am sure that you checked it out when it’s support landed in Safari as well.

Any thoughts? Still best to use ARIA powered dialogs or can we now “don’t use ARIA” because HTML has the element?

A11y student; . Permalink
In response to A11y student. Reply

Scott updated his post Having an open dialog nine months ago:

It’s now March of 2022, and Webkit 15.4 has shipped the <dialog> element, as well as Firefox 98. All major browsers now support the <dialog> element, and that’s really exciting.

[…]

For the time being, I would still advocate people use robust custom dialogs, such as a11y-dialog, or at the very least ensure their <dialog> elements can fallback to custom dialogs in the event people are not using the most up-to-date browsers. That is until usage stats for browsers that support <dialog> outweigh those that don’t. For instance, it’s awesome that Safari now supports the <dialog> element, but since Safari releases are so tightly coupled with OS updates, not everyone is going to get this update nearly as quickly as those using Firefox, Chrome and Edge.

I know nothing of your user base nor their technology profile. I am also not tracking regressions (looking at Safari here).

Reply

Hey Adrian,

Great work as always. I see the focus of discussion is often around where focus goes where the modal opens, but I’m not sure if there’s been much discussion around what the expected behaviour is when keeping focus within modal dialogs.

Classic modal dialogs always kept focus within the modal dialog. If you tab or shift-tab, focus will loop through the focusable objects within the modal dialog.

However, the element will allow the user to move focus through the web browser controls, and then return to the modal dialog. I’m concerned that users won’t be expecting this, and it also adds several extra TAB key presses if a user just wanted to loop back to the start of the modal dialog.

While in theory it’s nice to always have access to the web browser controls, might it make modals (especially small ones), more complex than they need to be? Just an idea/consideration.

Cheers, Matt.

Matthew Putland; . Permalink
In response to Matthew Putland. Reply

I see the focus of discussion is often around where focus goes where the modal opens, but I’m not sure if there’s been much discussion around what the expected behaviour is when keeping focus within modal dialogs.

Indeed, this post is scoped to only where to place focus on open. The second paragraph links to the inert polyfill, which is meant to restrict focus within a custom dialog, just like a native dialog would.

However, the element will allow the user to move focus through the web browser controls, and then return to the modal dialog. I’m concerned that users won’t be expecting this, and it also adds several extra TAB key presses if a user just wanted to loop back to the start of the modal dialog.

Then that implementation of <dialog> is buggy. For that browser you would need to use the custom version (see the second paragraph of this post).

While in theory it’s nice to always have access to the web browser controls, might it make modals (especially small ones), more complex than they need to be?

Dialogs, native or custom, should always allow access to the browser’s chrome. Otherwise bad actors could trap people on pages.

Reply

Hi Adrian, thanks for the article and the updates. I have someone exploring loading states in a modal. For instance, the modal gathers some information which is submitted/saved, but there may be wait time, or even failure, associated with that primary action.
I realize one of your responses may be ‘why is it a modal?’ — a question I often ask. But setting that aside, I’m wondering if you’ve seen any good articles/demos about considerations around conveying state and results within a modal context.

In response to Mike Gower. Reply

Mike, since the modal is essentially a document in its own right, I would suggest that whatever method you use to convey a busy/loading state, errors in the form, or even success should match what that site does for analogous content outside the modal. I cannot immediately think of a reason to create a different pattern just for a modal, barring some obvious tweaks like space restrictions. That may be a lack of imagination on my part.

I have seen no articles nor demos discussing this, but I may also have ignored any that popped into my timelines given what I just said.

In response to Adrian Roselli. Reply

Thanks! I’ll pass on anything I find that may be useful to the discussion.

Leave a Reply to Adrian Roselli Cancel response

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>