Comparing Manual and Free Automated WCAG Reviews

Automated accessibility testing tools cannot test 100% of WCAG. This position is not controversial. Other than overlay vendors, no automated tool maker makes that claim. This is partly because WCAG is targeted at humans, not code, and so nuance and context apply.

Free automated accessibility testing tools may have even fewer features available to users, which makes sense. My experience is that the majority of people who are testing against WCAG are starting or supplementing with free automated checkers. There is nothing wrong with that. We should want the easy bits handled to free us up for the more nuanced work.

My concern is that too many managers, bosses, stakeholders, and even testers, may do no more than run a free automated tool against a site or page and consider that sufficient. This post shows why it may not be sufficient. Bear in mind that manual testers are not free, so this is not a one-to-one comparison between humans and tools (nor should it be).

Again, this not a criticism of the tool makers. This is a function of WCAG itself and its correct and appropriate focus on people over code.

The Process

I picked a reasonably popular site and, using only its home page, performed a review against WCAG 2.1 Levels A & AA using the following:

A beat-up old computer, keyboard and monitor all as one piece, with a yellowed piece of paper taped over half the screen; the paper has the word “FAIL” in large red letters at the top, and below it is a checklist. — The typical set-up of a professional WCAG reviewer.

Manual using bookmarklets, assorted contrast checkers, dev tools, and assistive technology.
axe DevTools v4.47.0 browser extension (using axe-core v4.6.2) for Chrome and Firefox.
ARC Toolkit v5.4.2 browser extension for Chrome.
WAVE Evaluation Tool v3.2.2.0 browser extension for Chrome and Firefox.
Equal Access Accessibility Checker (EAAC) v3.1.42.9999 browser extension for Chrome and Firefox.

I performed the testing on Saturday, 14 January 2023 against a live version of the site. I also archived it to the Wayback Machine when I started testing.

For all tests, I used screen reader pairings of Chrome/JAWS, NVDA/Firefox, and VoiceOver/Safari on desktop. I did not test on mobile, but I did test across viewport sizes. I tested in private/incognito mode and not, making sure to clear my cache between reloads. The page had a reCAPTCHA component which I arbitrarily declared out of scope and did not test.

Highlights

A quick, if a bit insincere, comparison is to count against how many Success Criteria (SCs) each tool found a violation:

Number of SCs failed
Tool	Total	A	AA	Notes
Manual	18	11	7
axe	2	2	0
ARC	3	3	0
WAVE	0	0	0	It had an issue against reCAPTCHA, but I declared it out of scope.
EAAC	3	3	0
AA	5	5	0	Read the 22 January update

Some Success Criteria can be failed multiple times. Some failures can be logged against multiple Success Criteria. These are the counts for total number of unique failures. These also exclude the same instance of an issue replicated against analogous nodes (so different duplicated ids might happen a dozen times, but I count it once):

Total SC failures found
Tool	Total	A	AA	Notes
Manual	37	24	11	Severity provided. This includes 4 contrast failures.
axe	2	2	0	Severity provided.
ARC	3	3	0
WAVE	0	0	0
EAAC	5	5	0	Rule aria_hidden_focus_misuse is reported under both 1.3.1 and 4.1.2.
AA	4	4	0	Read the 22 January update

Some tools also provide warnings or alerts for the reviewer to test manually. The value of these is very much dependent on the site, the reviewer, and the context. These sometimes include WCAG Success Criteria for reference. The number of these may or may not be useful to you or your team, so do not read too much into non-zero numbers.

Warnings or alerts provided
Tool	Alerts	Notes
Manual	0
axe	0
ARC	7	They reference the tool’s own rules.
WAVE	6	They reference WCAG SCs.
EAAC	20	They reference WCAG SCs.
AA	10	Read the 22 January update

For some interesting differences between the substance of the reporting:

Both ARC and I failed an SVG for no img role, though I did so under 4.1.2 Name, Role, Value and ARC did it under 1.1.1 Non-text Content.
Both axe and I failed the still-functional link with aria-hidden under 4.1.2 Name, Role, Value. ARC failed it under 2.1.1 Keyboard. EAAC failed it under 1.3.1 Info and Relationships.
Most called out links given a listitem role as not belonging to a list. Axe and I failed it under 1.3.1 Info and Relationships, ARC and EEAC under 4.1.2 Name, Role, Value.

In my manual review I found almost seven-and-a-half times (7½×) more issues than the tool with the next highest set of found issuess across three times (3×) as many Success Criteria.

It is possible I was overzealous, as many accessibility reviewers can understandably be. I had no QA review (often helping me cut 3 or 4); nobody looked over these to make sure I was not being too aggressive. However, I try to adhere to my own take on Postel’s Law in my WCAG reviews — be liberal in what you flag, conservative in what you fail. I also no longer fail issues using SC 4.1.1. My results are in the tables so you can judge for yourself.

Raw Results

These sections contain the output from a manual review and from the tools. Not only the WCAG failures, but also the alerts and warnings the tools issue for further manual review.

WCAG Failures

The following two tables are WCAG 2.1 Success Criteria cross-referenced with the five testing approaches. The first table only covers SCs at Level A and the second table at Level AA.

Each cell identifies if the sample passes, does not apply, or fails against the SC. If the sample fails, a bulleted list describes the issue and, where appropriate, a path is provided to allow you to navigate to the affected node or nodes using your browser’s developer tools (I use XPath).

Each issue description in the first column is followed by a severity of low, medium, high, or critical. These four priorities break down as follows:

Low: Users can accomplish the task with no or very minor workarounds.
Medium: Users can accomplish the task with workarounds.
High: Users can accomplish the task with difficulty and/or significant workarounds.
Critical: It is impossible for some users to accomplish the task.

For brevity I abbreviate “accessible name” to accName and “accessible description” to accDesc.

It is possible that in my manual review I missed other issues. I may have also over- or under-estimated the impact of an issue. With no insight into the audience, no independent QA team, and no interaction with the development team I have no sense of what may be a true barrier or problem for the site’s users. There may be cases where extensive testing with users shows something I flagged as a problem is preferred by its users.

Where cells are blank, the automated checker returned no failures. Tools do not indicate a pass nor that an SC was not applicable, nor should they be expected to.

Comparing WCAG Level A manual and automated test results
WCAG 2.1 SCs at Level A	Manual	Axe	ARC	EAAC
1.1.1 Non-text Content	Fail Icons are essentially decorative when their text equivalent is the same as the text that follow them. 17 instances (11 of the 28 have distinct text) Low `//div[@class="card__eyebrow flow"]/span` Chrome developers logo near bottom of page has no text alternative. 1 instance Low `//section[contains(@class,"homepage__developers")]//svg`		Fail The `<svg>` acts as an image, but is missing an explicit `role="img"`. Rule: noImageRole 1 instance `<svg viewBox="0 0 238 36" fill="currentColor" height="36" width="238" xmlns="http://www.w3.org/2000/svg"></svg>`
1.2.1 Audio-only and Video-only (Prerecorded)	Pass
1.2.2 Captions (Prerecorded)	N/A
1.2.3 Audio Description or Media Alternative (Prerecorded)	N/A
1.3.1 Info and Relationships	Fail Nodes with `listitem` role have no parent with `list` role. 28 instances Medium `//a[@role="listitem"]` Named regions absent. Evidence they are intended is `<div>`s with class `region`, with `<h5>` as first child, then a `<div>` with `aria-label` (a problematic construct being disallowed in ARIA 1.3). 6 instances Low `//div[contains(@class,"region")]/h5/following-sibling::div[@aria-label][not(@role)]` The “Themes” heading is not programmatically associated with the filter buttons, which should also be in a group to allow naming. 1 instance Low `//h2[@id="themes"]/following-sibling::*/div[@class="w-chips cluster"]` Each grouped block of content is introduced with an `<h5>`, but its descendant links all use `<h4>`, as opposed to `<h6>` or as opposed to the intro heading being `<h3>`. 28 instances Medium `//div[@class="region wrapper"]/h5/following-sibling::div/a//h4` Nested interactives — links in `<label>`. 2 instances Low `//label[@for="sub-pii-spii"]/a`	Fail Certain ARIA roles must be contained by particular parents. Rule: aria-required-parent 21 instances Critical `a[data-label="Topic: New patterns"]` …		Fail Element `<a>` should not be focusable within the subtree of an element with an `aria-hidden` attribute with value `true`. Rule: aria_hidden_focus_misuse Reason: Fail_1 2 instances `<a href="/interop-2022-wrapup/" aria-hidden="true">` `<a href="/web-platform-12-2022/" aria-hidden="true">`
1.3.2 Meaningful Sequence	Pass
1.3.3 Sensory Characteristics	N/A
1.4.1 Use of Color	Fail The buttons used as filters (“Themes”) use only a color to visually indicate their state (from a transparent background to a background color). 7 instances Medium `//button[@data-state]`
1.4.2 Audio Control	N/A
2.1.1 Keyboard	Pass		Fail `aria-hidden="true"` is used on a focusable element. Rule: ARIAHiddenUsedOnFocusable 2 instances `<a aria-hidden="true" href="/interop-2022-wrapup/"></a>` `<a aria-hidden="true" href="/web-platform-12-2022/"></a>`
2.1.2 No Keyboard Trap	Pass
2.1.4 Character Key Shortcuts	N/A
2.2.1 Timing Adjustable	N/A
2.2.2 Pause, Stop, Hide	Fail Autoplaying video (no audio) with no controls / option to pause, which does not honor use motion preferences. 1 instance Medium `//video[@autoplay][@loop]`
2.3.1 Three Flashes or Below Threshold	Pass
2.4.1 Bypass Blocks	Pass			Fail Content is not within a landmark element. Rule: Rpt_Aria_OrphanedContent_Native_Host_Sematics Reason: Fail_1 1 instance `<a href="#main" data-type="primary" class="skip-link button">`
2.4.2 Page Titled	Fail The `<title>` simply replicates the URL (“web.dev”). It references none of the concepts mentioned in either of the page’s two `<h1>` elements (“Building a better web, together” and “Check out new web platform features from Chrome”). 1 instance Low `//title`
2.4.3 Focus Order	Fail When using the search combobox, pressing `Esc` closes the entire search disclosure but focus is not set back to the trigger. The browser papers over this by letting the next `Tab` press move to the next control, however. 1 instance Low `//div[@role="combobox"]/input[@role="searchbox"]`
2.4.4 Link Purpose (In Context)	Pass
2.5.1 Pointer Gestures	N/A
2.5.2 Pointer Cancellation	Pass
2.5.3 Label in Name	Fail Search box uses `placeholder="Search"` but has `aria-label="All articles"`, meaning its visible label does not match its accName. 1 instance Medium `//input[@placeholder="Search"]` The language select has a visually hidden label that provides the accName, but only displays the current language. 1 instance Medium `//select[@id="preferred-language"]`
2.5.4 Motion Actuation	N/A
3.1.1 Language of Page	Pass
3.2.1 On Focus	Pass
3.2.2 On Input	Fail The language selector fires with an onchange event; it has no submit button. 1 instance Medium `//select[@id="preferred-language"]`
3.3.1 Error Identification	Fail The first name, last name, and email fields rely on browser default validation messages and do not persist. 3 instances High `//input[@id="sub-firstname"]` `//input[@id="sub-lastname"]` `//input[@id="sub-email"]`
3.3.2 Labels or Instructions	Fail The “Themes” filter buttons have no instructions conveying that they only allow one to be active at a time. 1 instance Low `//h2[@id="themes"]/following-sibling::*/div[@class="w-chips cluster"]` The first name, last name, and email fields rely on placeholder for visible label. 3 instances Medium `//input[@id="sub-firstname"]` `//input[@id="sub-lastname"]` `//input[@id="sub-email"]`
4.1.1 Parsing	N/A
4.1.2 Name, Role, Value	Fail Interactive controls with aria-hidden must not receive focus. 2 instances High `//a[@aria-hidden="true"][not(@tabindex="-1")]` Hamburger trigger acts as disclosure but lacks `aria-expanded` property. Same is true of search icon button. 2 instances Medium `//*[@role="banner"]/div/button[not(@aria-expanded)]` `//button[@aria-label="Open search"]` The search combobox conveys its list is already open with `aria-expanded="true"` before the user has even searched. 1 instance Low `//div[@role="combobox"][@aria-expanded="true"]` Buttons used as filters (“Themes”) use a custom attribute to convey their state instead of `aria-pressed`. 7 instances Medium `//button[@data-state]` Functional links have the `listitem` role, which is not interactive. 28 instances High `//a[@href][@role="listitem"]` Using `aria-label` is disallowed on elements with the generic role (`<div>`s in these cases). 8 instances Low `//div[@aria-label][not(@role)]` Chrome developers logo near bottom of page needs `img` role. 1 instance Low `//section[contains(@class,"homepage__developers")]//svg` The first name, last name, and email fields do not programmatically convey when they are in error. 3 instances High `//input[@id="sub-firstname"]` `//input[@id="sub-lastname"]` `//input[@id="sub-email"]` `aria-checked` is not allowed on a native checkbox that has the `switch` role. 1 instance Low `//input[@id="theme-toggle"]`	Fail ARIA hidden element must not be focusable or contain focusable elements. Rule: aria-hidden-focus 2 instances Serious `a[href$="interop-2022-wrapup/"][aria-hidden="true"]` `a[href$="web-platform-12-2022/"][aria-hidden="true"]`	Fail The `<li>` list item elements lack an appropriate `<ol>` (ordered list) or `<ul>` (unordered list) parent element. Rule: listItemHasNoListParent 28 instances `<a class="card card-vertical" data-label="Topic: New patterns" data-action="click" data-category="web.dev" href="/new-patterns-july-2022/" role="listitem"></a>` `<a class="card card-horizontal col-2 bg-tertiary" data-label="Topic: The CSS Podcast" data-action="click" data-category="web.dev" href="https://pod.link/thecsspodcast/" role="listitem"></a>` …	Fail Element `<a>` should not be focusable within the subtree of an element with an `aria-hidden` attribute with value `true`. Rule: aria_hidden_focus_misuse Reason: Fail_1 2 instances `<a href="/interop-2022-wrapup/" aria-hidden="true">` `<a href="/web-platform-12-2022/" aria-hidden="true">` The ARIA attributes `aria-checked` are not valid for the element `<input>` with ARIA role `switch`. Rule: aria_attribute_allowed Reason: Fail_invalid_role_attr 1 instance `<input aria-checked="true" type="checkbox" role="switch" id="theme-toggle" class="toggle-switch__input">` The ARIA role `listitem` is not valid for the element `<a>`. Rule: aria_semantics_role Reason: Fail_1 28 instances `<a role="listitem" href="/new-patterns-july-2022/" data-category="web.dev" data-action="click" data-label="Topic: New patterns" class="card card-vertical">` … The element with role `listitem` is not contained in or owned by an element with one of the following roles: `list`. Rule: Rpt_Aria_RequiredParent_Native_Host_Sematics Reason: Fail_1 28 instances `<a role="listitem" href="/new-patterns-july-2022/" data-category="web.dev" data-action="click" data-label="Topic: New patterns" class="card card-vertical">` …

Comparing WCAG Level AA manual and automated test results
WCAG 2.1 SCs at Level A	Manual	Axe	ARC	WAVE	EAAC
1.2.4 Captions (Live)	N/A
1.2.5 Audio Description (Prerecorded)	N/A
1.3.4 Orientation	Pass
1.3.5 Identify Input Purpose	Fail The first name, last name, email, country, and language fields. 5 instances Low `//input[@id="sub-firstname"]` `//input[@id="sub-lastname"]` `//input[@id="sub-email"]` `//select[@id="sub-country"]` `//select[@id="preferred-language"]`
1.4.3 Contrast (Minimum)	Fail Smaller text at top of video. White (`#fff`) against darkest part of area on which it sits (`#ebefee`) is 1.2:1. Fails at any size. 1 instance Low `//a[@class="feature-card"][1]/span` Larger text on bottom of video. Whitish (`#fdfefe`) against dark yellow (`#bd8b2a`) is 3:1. Its `font-weight: 400` is normal text. Its computed `font-size` is 20px. This is below 24px or 18.5px bold. 1 instance Low `//a[@class="feature-card"][1]/h3` Larger text at bottom of case study image (`#fff`) against lighter parts of image (`#bdb8a4`) is 2:1. 1 instance Low `//a[@class="feature-card"][2]/h3` The placholder text (`#757575`) for the first name, last name, and email fields against the dark theme dark gray background (`#202124`). 3 instances High `//input[@id="sub-firstname"]` `//input[@id="sub-lastname"]` `//input[@id="sub-email"]`
1.4.4 Resize text	Fail When zooming 200% at 1,024 × 768, the last item of the navigation (“Case Study”) is outside the viewport and cannot be scrolled into view (the page scrolls but not the navigation). 1 instance Critical `//nav[@aria-label="main navigation"]`
1.4.5 Images of Text	Pass
1.4.10 Reflow	Pass
1.4.11 Non-text Contrast	Pass
1.4.12 Text Spacing	Pass
1.4.13 Content on Hover or Focus	Fail The hamburger trigger has an associated tool-tip that does not persist and is not hoverable. The same is true for the close button in the expanded navigation. 2 instances Low `//*[@role="banner"]/div/button[not(@aria-expanded)]/span[@id="menu-button-toolip"]` `//nav[@aria-label="main navigation"]/button/span[@class="tooltip__content"]`
2.4.5 Multiple Ways	Pass
2.4.6 Headings and Labels	Pass
2.4.7 Focus Visible	Fail At narrower viewports the navigation that is visually hidden is not removed from the focus order, meaning each item gets focus with no indicator. 6 instances High `//nav[@aria-label="main navigation"]//a` Conversely, when the navigation is expanded in narrower viewports, focus can move outside and behind it, which means it is obscured, blurred, or clipped by the lack of page scroll. 1 instance (not counting every focusable node in the page) Critical `//body[@class="overflow-hidden"]`
3.1.2 Language of Parts	Fail None of the options in the language selector has a `lang` attribute, even though the options are in that language (and character set in some cases). 7 instances High `//select[@id="preferred-language"]/option[not(@lang)]`
3.2.3 Consistent Navigation	Pass
3.2.4 Consistent Identification	Pass
3.3.3 Error Suggestion	Fail The email field does not convey formatting issues in a manner that persists, relying solely on browser defaults. 1 instance Medium `//input[@id="sub-email"]`
3.3.4 Error Prevention (Legal, Financial, Data)	N/A
4.1.3 Status Messages	N/A

Warnings

Three of the four automated tools provide warnings. These are cues to the reviewer to dive further into the sample and identify if the thing is really a WCAG failure or a problem for users.

A manual review should result in binary pass/fail, so it uncommon to find warnings in one. You may find “best practices” instead, which I address later.

These lists represent the warnings from each tool. Each tool handles this differently, which is neither bad nor good, but I have done my best to normalize them for consistency here. I did not include specific paths from the tools when provided because I ran out of steam.

Manual alerts
1. None; manual reviews should generally get you a binary pass/fail
axe DevTools warnings
1. None; the “needs review” category is not in this release
ARC Toolkit alerts
1. aria-hidden used, 8 instances, rule ARIAHiddenUsed.
2. Unable to determine text contrast against image background, 8 instances, rule textWithBackgroundImage.
3. Heading level skipped, 2 instances, rule headingLevelSkipped.
4. Multiple naming techniques used, 4 instances, rule multipleLabellingTechniquesUsed.
5. Autocomplete missing, 7 instances, rule autocompleteMissing.
6. Multiple header landmarks, 2 instances, multipleHeaderLandmarks.
7. Empty list, 5 instances, rule emptyList.
WAVE Evaluation Tool alerts
1. Long alternative text, 1 instance, citing 1.1.1 Non-text Content.
2. Skipped heading level, 2 instances, citing 1.3.1 Info and Relationships, 2.4.1 Bypass Blocks, 2.4.6 Headings and Labels.
3. Redundant link, 3 instances, citing Link Purpose (In Context).
4. Noscript element, 1 instance.
5. HTML5 video or audio, 1 instance, citing 1.2.1 Prerecorded Audio-only and Video-only, 1.2.2 Captions (Prerecorded), 1.2.3 Audio Description or Media Alternative (Prerecorded), 1.2.5 Audio Description (Prerecorded), 1.4.2 Audio Control.
6. YouTube video, 1 instance, citing 1.2.1 Prerecorded Audio-only and Video-only, 1.2.2 Captions (Prerecorded), 1.2.3 Audio Description or Media Alternative (Prerecorded).
Equal Access Accessibility Checker needs review messages
1. Confirm Windows high contrast mode is supported when using CSS to include, position or alter non-decorative content, 1 instance, cites 1.1.1 Non-text Content.
2. Verify that captions are available for any meaningful audio or provide a caption track for the <video> element, 1 instance, cites 1.2.1 Audio-only and Video-only (Prerecorded).
3. Verify that captions are available for any meaningful audio or provide a caption track for the <video> element, 1 instance, cites 1.2.2 Captions (Prerecorded).
4. Verify that captions are available for any meaningful audio or provide a caption track for the <video> element, 1 instance, cites 1.2.4 Captions (Live).
5. Verify the ‘::before’ and ‘::after’ pseudo-elements do not insert non-decorative content, 2 instances, cites 1.3.1 Info and Relationships.
6. If the following text is a quotation, mark it as a <q> or <blockquote> element: “Building a faster YouTube on web”, 1 instance, cites 1.3.1 Info and Relationships.
7. Verify that this ungrouped checkbox input is not related to other checkboxes, 2 instances, cites 1.3.1 Info and Relationships.
8. If the following text is a quotation, mark it as a <q> or <blockquote> element: “aside flow bg-state-warn-bg col … -12-3v-2h2v2h-2zm0-4h2v-4h-2v4z”, 1 instance, cites 1.3.1 Info and Relationships.
9. If the word(s) ‘background-clip’ is part of instructions for using page content, check it is still understandable without this location or shape information, 1 instance, cites 1.3.3 Sensory Characteristics.
10. If the word(s) ‘top’ is part of instructions for using page content, check it is still understandable without this location or shape information, 1 instance, cites 1.3.3 Sensory Characteristics.
11. Verify color is not used as the only visual means of conveying information, 1 instance, cites 1.4.1 Use of Color.
12. Verify that text sized using viewport units can be resized up to 200%, 1 instance, cites 1.4.4 Resize text.
13. Verify the <div> element with “listbox” role has keyboard access, 1 instance, cites 2.1.1 Keyboard.
14. Verify media using <audio> and/or <video> elements have keyboard accessible controls, 1 instance, cites 2.1.1 Keyboard.
15. Verify <frame> content is accessible, 1 instance, cites 2.4.1 Bypass Blocks.
16. Verify that using the filename as the page <title> value is descriptive, 1 instance, cites 2.4.2 Page Titled.
17. Component with “combobox” role does not have a tabbable element, 1 instance, cites 2.4.3 Focus Order.
18. Confirm the element should be tabbable, and is visible on the screen when it has keyboard focus, 1 instance, cites 2.4.7 Focus Visible.
19. Verify the <form> element has a submit button or an image button, 1 instance, cites 3.2.2 On Input.
20. The input element does not have an associated visible label, 1 instance, cites 3.3.2 Labels or Instructions.

The axe DevTools release I used for this post no longer has a “needs review” category. However, I have a copy of 4.36.2 which uses axe-core 4.4.2 and which returned these items flagged as “needs review”:

Elements must only use allowed ARIA attributes, 8 instances, rule aria-allowed-attr (for the aria-label on the <div>s).
Text elements must have sufficient color contrast against the background, 8 instances, rule color-contrast.
<video> elements must have a <track> for captions, 1 instance, rule video-caption.

I kept it separate because it is not using the same axe-core and because users on the current release no longer have access to “needs review” entries.

Bonus: My Best Practices

Each of the tools provide best practices as well. They can address known bugs, anti-patterns, or opinions from the developers, but generally should not represent WCAG failures (though they sometimes do, from experience). For this page multiple <h1>s, redundant roles, or unnecessary HTML attributes were referenced. I am not logging them because it was just taking me too long to write this already.

I am including my own best practices for the page, but I did not include recommendations (I reserve those for paying clients):

Visible non-interactive content that is not repeated elsewhere has aria-hidden, which can make for an odd experience for sighted screen reader users.
//span[@class="hero__eyebrow"][@aria-hidden="true"]
If an image has a blank alt (<img alt="">), then aria-hidden is redundant and unnecessary and redundant.
//img[@alt=""][@aria-hidden="true"]
The primary and footer navigation announce as “Main navigation navigation” and “Footer navigation navigation” because the word “navigation” is included in the aria-label.
//nav[contains(@aria-label,"navigation")]
The placeholder for the search does not fail 3.3.2 because it is a common pattern, however consider a persistent visible label regardless.
//input[@placeholder="Search"]
Many of the links are verbose. A couple links are duplicated but have differing link text. Combined these make for a complex page to navigate by link text alone.
//a[@href="/interop-2022-wrapup/"], //a[@href="/web-platform-12-2022/"]
The centered all-caps “CHROME DEVELOPERS” text is a link among other centered text. It is not immediately apparent as a link and in dark mode its background (#2c333f) against the page background (#303136) has a 1:1 contrast ratio.
//section[contains(@class,"homepage__developers")]//a[@data-type="primary"]
The country select uses mixed languages and character sets within options. Because you cannot break up text in an <option> to give parts a lang, consider not mixing these.
//select[@id="sub-country"]
4.1.3 is mooted on the cookie consent since it appears first in the DOM and is drawn at page load. It may not need to be a live region (nor did it announce as one).
//div[@class="web-snackbar__label"][@role="status"]

Takeaways

Automated accessibility checkers lack the context of a page and user. They can only run against the code in the current state of the page. This means you have to run and re-run them while checking assorted viewport sizes, orientations, states, and whatever else may be a factor.

They may also disagree on which Success Criterion best matches an issue they identify in the code. This post makes no judgment on which tool is right or wrong in that regard (if some of these can be wrong).

This does not mean you should avoid automated tools. Like any tool, it means they have a place in your toolbox. When used correctly they can be extremely helpful. When used in the hands of a novice they can result in a sense of complacency. When not used at all they can be a missed opportunity.

Please remember that WCAG itself is also the bare minimum of accessibility. Conforming to WCAG does not guarantee something is accessible. It does not even guarantee something is usable. All WCAG does is provide you with a starting point. Lots of WCAG failures suggest the page has not even made it to the starting line.

Speech Viewer Logs of Lies, August 2020
XPath for In-Browser Testing, April 2021
Beware False Negatives, September 2021
What Does X% of Issues Mean?, July 2022
The 411 on 4.1.1, December 2022 (repeated from above)

Update: 22 January 2023

Rachele DiTullio ran the same page through the Access Assistant extension for Chrome v8.10.0.11 (which I refer to above as AA) and recorded results to match the format I used in this post: Comparing Level Access automated tools to manual accessibility testing

It found 4 issues and logged them against 5 Success Criteria (so one was logged to two SCs). For the sample (tested a few days after my test and seemingly still the same code), Access Assistant found more issues than WAVE Evaluation Tool, axe DevTools, and ARC Toolkit, and fewer than Equal Access Accessibility Checker.

Update: 23 January 2023

I have been asked about Microsoft Accessibility Insights and Google Chrome Lighthouse and how they compare. I did not test either of them because they use axe-core, the same engine as axe DevTools. Accessibility Insights and Lighthouse both use axe-core 4.4.1, which is an older version than axe DevTools uses.

That being said, each returns more issues that need review than axe DevTools does (which returns none in its current release).

Sales Pitch: 27 April 2023

Shuttered as of 19 May 2025

Nobody asked me to make the computer image in this post into a branded product, so I did it anyway.

Bring a red or yellow notebook to a meeting so folks know you are serious. Nring a matching mug and laptop sticker. Get a blue t-shirt with the logo on the chest to cosplay as a weird accessibility first responder (or lighter blue with black text or any color tee you want).

These are only examples. You can customize colors and fits and notebook styles and such. My TeePublic store gets progressively less accessible as you go through the order process, sadly.

Thank you for humoring me on my first ever product pitch on this site.

Automated Tools Comparison: 15 April 2024

Equal Entry has published A Comparison of Automated Testing Tools for Digital Accessibility, which takes a slightly different approach than I do here:

It tests six tools (including two from overlay vendors AudioEye and Level Access) with two or three overlapping the tools I tested (maybe three since since Equal Entry does not name the tools);
It does not provide its test results;
It created a 31 page site with 104 guaranteed WCAG violations (I used a single public page);
It does not provide a URL for the test site;
It has no manual testing results as a control;
It performed the comparison as contract work for Evinced.

We know only about a quarter to a third of WCAG is automatable. As such, every tool is going to have an upper limit against a test of all SCs.The 6 automated checkers caught 3.8–10.6% of the issues. In contrast (or support), my own tests above showed a hit rate of 0–13.5%, which is arguably similar.

Without seeing the test site it is hard to understand how to read these results. For example, I can create a 1.1.1 Non-text Content issue that gets caught by all tools or I can create one that gets missed by all tools. I can even create a 1.1.1 issue that I know will be caught in some and missed in others.

One minor concern about reporting, but not the tests, is this statement: If a product links to a WCAG technique that is a published standard, then it is easier to ensure that the issues reported are consistent between products. A WCAG Technique is not a standard. Each is an example of how to meet an SC, but they are not normative (not standards). Some Techniques may even recommend things that simply don’t work.

Regardless, it is still an interesting report and approach.

Update: 24 March 2025

In a prior update I link to Karl Groves’ 2012 evidence that only a quarter to a third of WCAG is automatable.

Today I add Steve Faulkner’s own take on WCAG and automated testing, A TOOLS ERRAND:

Here is a list of WCAG 2.2 Level A and AA success criteria that I think cannot be completely tested with automated tools. These criteria require manual testing because they involve meaning, usability, intent, or user experience that automated tools cannot fully evaluate.

A TOOLS ERRAND, Steve Faulkner, 24 March 2025

Steve is not saying these tools are useless, just that they cannot do it all on their own.

If you scroll up and look at items I failed versus what the automated tools caught, you will see Success Criteria that Steve did not list (1.4.3, 1.4.4, 2.4.2, 3.3.1, 3.3.3). This does not mean Steve is wrong. It could simply mean the toolmakers did not build a robust test.

Update: 13 June 2025

Robert Dodd published Automated accessibility test tools find even less than expected (at LinkedIn).

He made six pages with a total of 164 (non-exhaustive) WCAG violations and ran five tools. He found 36% coverage from one tool, with others catching less or more. It’s not clear if 36% was the median. He spent some time discussing accessible names, owing to them having their own well-documented spec, and was broadly disappointed in coverage.

12 Comments

Reply

Absolutely fantastic work, Adrian! Wish I had you on my team!

Mike Paciello; 19 January 2023 at 4:28 pm. Permalink

Reply

Thanks, Adrian–a real eye-opener!

Georgia Cotrell; 20 January 2023 at 10:07 am. Permalink

Reply

Thank you, Adrian!
This sure makes me wonder how many people never move past the practice of relying solely on automated tools. At least they have something – that’s better than nothing, I guess?
Excellent content.
We appreciate you!

Tommy Pollard; 20 January 2023 at 4:52 pm. Permalink

Reply

Where does Microsoft’s Accessibility Insights for Web fit in to this list?

Mark; 22 January 2023 at 5:16 pm. Permalink

In response to Mark. Reply

Dunno. I did not test it. It uses axe-core 4.4.1, which is an earlier release than axe DevTools uses. Lighthouse also uses axe-core, and I responded on Mastodon to note that Lighthouse still presents the “needs review” items no longer in axe DevTools.

If you try it with Microsoft Accessibility Insights, please post your results here.

Adrian Roselli; 22 January 2023 at 8:26 pm. Permalink

Reply

This is a very great article and very educational. I would add to this that manual testing is vital and you need and users that use the technology you are testing with. It is very difficult to perform a manual test with the technology you do not use every day. I think all of those things are vital to a really accurate test of any content. But a lot of companies overlook the importance of native testing by users of assistive technology.

Desiree; 23 January 2023 at 7:16 am. Permalink

In response to Desiree. Reply

Very much agreed, Desiree. I am not explicit about that here given the nature of this post (though I go into it in my post Your Accessibility Claims Are Wrong, Unless…).

Manual testing with AT can only be as good as the tester’s skill with the AT — which is often poor.

Adrian Roselli; 23 January 2023 at 11:25 am. Permalink

Reply

This has been an interesting read – in terms of using the Axe Devtools – did you include all the manual tests as well? Or did you just run the automated part of DevTools?

Ive only started using the devtools the past 6 months or so, and find they tend to find alot of issue with little consequence, but also occasionally find things I may not have picked up on totally manually.

Ky Lane; 23 January 2023 at 9:42 pm. Permalink

In response to Ky Lane. Reply

Ky, if by “manual tests” you mean the new “Intelligent Guided Test” then no, since that is a paid feature. If by “manual tests” you mean the “needs review” category, that has been removed in the latest release of axe DevTools, which is what I used in this post. For each tool I only logged the automated bits in the tables.

Adrian Roselli; 24 January 2023 at 8:49 am. Permalink

In response to Adrian Roselli. Reply

Ok very interesting! I can vouch for the manual guided tests – they are pretty thorough, although some of the instructions are a little ambiguiously worded. It took me a few attempts to start getting consistent results – but its helped from both a legal standpoint and a practical one too. There are still things it tends to miss of course, but theyre also things that seem quite blatent, but nested and are ok in their own isolated context, but once nested in others, it gets messy.

Can highly recommend trying the manual guided tests out, if only out of interest.

Ky Lane; 30 January 2023 at 5:43 pm. Permalink

In response to Ky Lane. Reply

Indeed, I have tried the Manual Guided Tests. It was outside the scope of this post and I have no interest in being critical of one tool’s (paid) features when other free tools don’t all have the same features. Frankly, doing so puts me into the realm of “free labor for competitive analysis”. As such, I will not vouch for Manual Guided Tests (nor do I use them).

Adrian Roselli; 30 January 2023 at 7:26 pm. Permalink

Reply

Ran out of time, but I did notice that the severity of this is 2.4.3 issue is low for (what I think is) the wrong reasons:

When using the search combobox, pressing Esc closes the entire search disclosure but focus is not set back to the trigger. The browser papers over this by letting the next Tab press move to the next control, however.

It’s true that browsers forgive this, but it is still problematic for users tracking keyboard focus. These include some screen magnify users and screen reader users. Screen magnify users tracking keyboard focus will be moved to the top of the page. The virtual cursor on screen readers will announce content at the top of the page when navigating to the next item (DOWN ARROW key.)

Andrew; 24 January 2023 at 11:24 pm. Permalink