Comparing Manual and Free Automated WCAG Reviews
Automated accessibility testing tools cannot test 100% of WCAG. This position is not controversial. Other than overlay vendors, no automated tool maker makes that claim. This is partly because WCAG is targeted at humans, not code, and so nuance and context apply.
Free automated accessibility testing tools may have even fewer features available to users, which makes sense. My experience is that the majority of people who are testing against WCAG are starting or supplementing with free automated checkers. There is nothing wrong with that. We should want the easy bits handled to free us up for the more nuanced work.
My concern is that too many managers, bosses, stakeholders, and even testers, may do no more than run a free automated tool against a site or page and consider that sufficient. This post shows why it may not be sufficient. Bear in mind that manual testers are not free, so this is not a one-to-one comparison between humans and tools (nor should it be).
Again, this not a criticism of the tool makers. This is a function of WCAG itself and its correct and appropriate focus on people over code.
The Process
I picked a reasonably popular site and, using only its home page, performed a review against WCAG 2.1 Levels A & AA using the following:
- Manual using bookmarklets, assorted contrast checkers, dev tools, and assistive technology.
- axe DevTools v4.47.0 browser extension (using axe-core v4.6.2) for Chrome and Firefox.
- ARC Toolkit v5.4.2 browser extension for Chrome.
- WAVE Evaluation Tool v3.2.2.0 browser extension for Chrome and Firefox.
- Equal Access Accessibility Checker (EAAC) v3.1.42.9999 browser extension for Chrome and Firefox.
I performed the testing on Saturday, 14 January 2023 against a live version of the site. I also archived it to the Wayback Machine when I started testing.
For all tests, I used screen reader pairings of Chrome/JAWS, NVDA/Firefox, and VoiceOver/Safari on desktop. I did not test on mobile, but I did test across viewport sizes. I tested in private/incognito mode and not, making sure to clear my cache between reloads. The page had a reCAPTCHA component which I arbitrarily declared out of scope and did not test.
Highlights
A quick, if a bit insincere, comparison is to count against how many Success Criteria (SCs) each tool found a violation:
Tool | Total | A | AA | Notes |
---|---|---|---|---|
Manual | 18 | 11 | 7 | |
axe | 2 | 2 | 0 | |
ARC | 3 | 3 | 0 | |
WAVE | 0 | 0 | 0 | It had an issue against reCAPTCHA, but I declared it out of scope. |
EAAC | 3 | 3 | 0 | |
AA | 5 | 5 | 0 | Read the 22 January update |
Some Success Criteria can be failed multiple times. Some failures can be logged against multiple Success Criteria. These are the counts for total number of unique failures. These also exclude the same instance of an issue replicated against analogous nodes (so different duplicated id
s might happen a dozen times, but I count it once):
Tool | Total | A | AA | Notes |
---|---|---|---|---|
Manual | 37 | 24 | 11 | Severity provided. This includes 4 contrast failures. |
axe | 2 | 2 | 0 | Severity provided. |
ARC | 3 | 3 | 0 | |
WAVE | 0 | 0 | 0 | |
EAAC | 5 | 5 | 0 | Rule aria_hidden_focus_misuse is reported under both 1.3.1 and 4.1.2. |
AA | 4 | 4 | 0 | Read the 22 January update |
Some tools also provide warnings or alerts for the reviewer to test manually. The value of these is very much dependent on the site, the reviewer, and the context. These sometimes include WCAG Success Criteria for reference. The number of these may or may not be useful to you or your team, so do not read too much into non-zero numbers.
Tool | Alerts | Notes |
---|---|---|
Manual | 0 | |
axe | 0 | |
ARC | 7 | They reference the tool’s own rules. |
WAVE | 6 | They reference WCAG SCs. |
EAAC | 20 | They reference WCAG SCs. |
AA | 10 | Read the 22 January update |
For some interesting differences between the substance of the reporting:
- Both ARC and I failed an SVG for no
img
role, though I did so under 4.1.2 Name, Role, Value and ARC did it under 1.1.1 Non-text Content. - Both axe and I failed the still-functional link with
aria-hidden
under 4.1.2 Name, Role, Value. ARC failed it under 2.1.1 Keyboard. EAAC failed it under 1.3.1 Info and Relationships. - Most called out links given a
listitem
role as not belonging to a list. Axe and I failed it under 1.3.1 Info and Relationships, ARC and EEAC under 4.1.2 Name, Role, Value.
In my manual review I found almost seven-and-a-half times (7½×) more issues than the tool with the next highest set of found issuess across three times (3×) as many Success Criteria.
It is possible I was overzealous, as many accessibility reviewers can understandably be. I had no QA review (often helping me cut 3 or 4); nobody looked over these to make sure I was not being too aggressive. However, I try to adhere to my own take on Postel’s Law in my WCAG reviews — be liberal in what you flag, conservative in what you fail. I also no longer fail issues using SC 4.1.1. My results are in the tables so you can judge for yourself.
Raw Results
These sections contain the output from a manual review and from the tools. Not only the WCAG failures, but also the alerts and warnings the tools issue for further manual review.
WCAG Failures
The following two tables are WCAG 2.1 Success Criteria cross-referenced with the five testing approaches. The first table only covers SCs at Level A and the second table at Level AA.
Each cell identifies if the sample passes, does not apply, or fails against the SC. If the sample fails, a bulleted list describes the issue and, where appropriate, a path is provided to allow you to navigate to the affected node or nodes using your browser’s developer tools (I use XPath).
Each issue description in the first column is followed by a severity of low, medium, high, or critical. These four priorities break down as follows:
- Low
- Users can accomplish the task with no or very minor workarounds.
- Medium
- Users can accomplish the task with workarounds.
- High
- Users can accomplish the task with difficulty and/or significant workarounds.
- Critical
- It is impossible for some users to accomplish the task.
For brevity I abbreviate “accessible name” to accName and “accessible description” to accDesc.
It is possible that in my manual review I missed other issues. I may have also over- or under-estimated the impact of an issue. With no insight into the audience, no independent QA team, and no interaction with the development team I have no sense of what may be a true barrier or problem for the site’s users. There may be cases where extensive testing with users shows something I flagged as a problem is preferred by its users.
Where cells are blank, the automated checker returned no failures. Tools do not indicate a pass nor that an SC was not applicable, nor should they be expected to.
WCAG 2.1 SCs at Level A | Manual | Axe | ARC | WAVE | EAAC |
---|---|---|---|---|---|
1.1.1 Non-text Content |
Fail
|
Fail
|
|||
1.2.1 Audio-only and Video-only (Prerecorded) | Pass | ||||
1.2.2 Captions (Prerecorded) | N/A | ||||
1.2.3 Audio Description or Media Alternative (Prerecorded) | N/A | ||||
1.3.1 Info and Relationships |
Fail
|
Fail
|
Fail
|
||
1.3.2 Meaningful Sequence | Pass | ||||
1.3.3 Sensory Characteristics | N/A | ||||
1.4.1 Use of Color |
Fail
|
||||
1.4.2 Audio Control | N/A | ||||
2.1.1 Keyboard | Pass |
Fail
|
|||
2.1.2 No Keyboard Trap | Pass | ||||
2.1.4 Character Key Shortcuts | N/A | ||||
2.2.1 Timing Adjustable | N/A | ||||
2.2.2 Pause, Stop, Hide |
Fail
|
||||
2.3.1 Three Flashes or Below Threshold | Pass | ||||
2.4.1 Bypass Blocks | Pass |
Fail
|
|||
2.4.2 Page Titled |
Fail
|
||||
2.4.3 Focus Order |
Fail
|
||||
2.4.4 Link Purpose (In Context) | Pass | ||||
2.5.1 Pointer Gestures | N/A | ||||
2.5.2 Pointer Cancellation | Pass | ||||
2.5.3 Label in Name |
Fail
|
||||
2.5.4 Motion Actuation | N/A | ||||
3.1.1 Language of Page | Pass | ||||
3.2.1 On Focus | Pass | ||||
3.2.2 On Input |
Fail
|
||||
3.3.1 Error Identification |
Fail
|
||||
3.3.2 Labels or Instructions |
Fail
|
||||
4.1.1 Parsing | N/A | ||||
4.1.2 Name, Role, Value |
Fail
|
Fail
|
Fail
|
Fail
|
WCAG 2.1 SCs at Level A | Manual | Axe | ARC | WAVE | EAAC |
---|---|---|---|---|---|
1.2.4 Captions (Live) | N/A | ||||
1.2.5 Audio Description (Prerecorded) | N/A | ||||
1.3.4 Orientation | Pass | ||||
1.3.5 Identify Input Purpose |
Fail
|
||||
1.4.3 Contrast (Minimum) |
Fail
|
||||
1.4.4 Resize text |
Fail
|
||||
1.4.5 Images of Text | Pass | ||||
1.4.10 Reflow | Pass | ||||
1.4.11 Non-text Contrast | Pass | ||||
1.4.12 Text Spacing | Pass | ||||
1.4.13 Content on Hover or Focus |
Fail
|
||||
2.4.5 Multiple Ways | Pass | ||||
2.4.6 Headings and Labels | Pass | ||||
2.4.7 Focus Visible |
Fail
|
||||
3.1.2 Language of Parts |
Fail
|
||||
3.2.3 Consistent Navigation | Pass | ||||
3.2.4 Consistent Identification | Pass | ||||
3.3.3 Error Suggestion |
Fail
|
||||
3.3.4 Error Prevention (Legal, Financial, Data) | N/A | ||||
4.1.3 Status Messages | N/A |
Warnings
Three of the four automated tools provide warnings. These are cues to the reviewer to dive further into the sample and identify if the thing is really a WCAG failure or a problem for users.
A manual review should result in binary pass/fail, so it uncommon to find warnings in one. You may find “best practices” instead, which I address later.
These lists represent the warnings from each tool. Each tool handles this differently, which is neither bad nor good, but I have done my best to normalize them for consistency here. I did not include specific paths from the tools when provided because I ran out of steam.
- Manual alerts
- None; manual reviews should generally get you a binary pass/fail
- axe DevTools warnings
- None; the “needs review” category is not in this release
- ARC Toolkit alerts
- aria-hidden used, 8 instances, rule ARIAHiddenUsed.
- Unable to determine text contrast against image background, 8 instances, rule textWithBackgroundImage.
- Heading level skipped, 2 instances, rule headingLevelSkipped.
- Multiple naming techniques used, 4 instances, rule multipleLabellingTechniquesUsed.
- Autocomplete missing, 7 instances, rule autocompleteMissing.
- Multiple header landmarks, 2 instances, multipleHeaderLandmarks.
- Empty list, 5 instances, rule emptyList.
- WAVE Evaluation Tool alerts
- Long alternative text, 1 instance, citing 1.1.1 Non-text Content.
- Skipped heading level, 2 instances, citing 1.3.1 Info and Relationships, 2.4.1 Bypass Blocks, 2.4.6 Headings and Labels.
- Redundant link, 3 instances, citing Link Purpose (In Context).
- Noscript element, 1 instance.
- HTML5 video or audio, 1 instance, citing 1.2.1 Prerecorded Audio-only and Video-only, 1.2.2 Captions (Prerecorded), 1.2.3 Audio Description or Media Alternative (Prerecorded), 1.2.5 Audio Description (Prerecorded), 1.4.2 Audio Control.
- YouTube video, 1 instance, citing 1.2.1 Prerecorded Audio-only and Video-only, 1.2.2 Captions (Prerecorded), 1.2.3 Audio Description or Media Alternative (Prerecorded).
- Equal Access Accessibility Checker needs review messages
- Confirm Windows high contrast mode is supported when using CSS to include, position or alter non-decorative content, 1 instance, cites 1.1.1 Non-text Content.
- Verify that captions are available for any meaningful audio or provide a caption track for the
<video>
element, 1 instance, cites 1.2.1 Audio-only and Video-only (Prerecorded). - Verify that captions are available for any meaningful audio or provide a caption track for the
<video>
element, 1 instance, cites 1.2.2 Captions (Prerecorded). - Verify that captions are available for any meaningful audio or provide a caption track for the
<video>
element, 1 instance, cites 1.2.4 Captions (Live). - Verify the ‘::before’ and ‘::after’ pseudo-elements do not insert non-decorative content, 2 instances, cites 1.3.1 Info and Relationships.
- If the following text is a quotation, mark it as a
<q>
or<blockquote>
element: “Building a faster YouTube on web”, 1 instance, cites 1.3.1 Info and Relationships. - Verify that this ungrouped checkbox input is not related to other checkboxes, 2 instances, cites 1.3.1 Info and Relationships.
- If the following text is a quotation, mark it as a
<q>
or<blockquote>
element: “aside flow bg-state-warn-bg col … -12-3v-2h2v2h-2zm0-4h2v-4h-2v4z”, 1 instance, cites 1.3.1 Info and Relationships. - If the word(s) ‘background-clip’ is part of instructions for using page content, check it is still understandable without this location or shape information, 1 instance, cites 1.3.3 Sensory Characteristics.
- If the word(s) ‘top’ is part of instructions for using page content, check it is still understandable without this location or shape information, 1 instance, cites 1.3.3 Sensory Characteristics.
- Verify color is not used as the only visual means of conveying information, 1 instance, cites 1.4.1 Use of Color.
- Verify that text sized using viewport units can be resized up to 200%, 1 instance, cites 1.4.4 Resize text.
- Verify the
<div>
element with “listbox” role has keyboard access, 1 instance, cites 2.1.1 Keyboard. - Verify media using
<audio>
and/or<video>
elements have keyboard accessible controls, 1 instance, cites 2.1.1 Keyboard. - Verify
<frame>
content is accessible, 1 instance, cites 2.4.1 Bypass Blocks. - Verify that using the filename as the page
<title>
value is descriptive, 1 instance, cites 2.4.2 Page Titled. - Component with “combobox” role does not have a tabbable element, 1 instance, cites 2.4.3 Focus Order.
- Confirm the element should be tabbable, and is visible on the screen when it has keyboard focus, 1 instance, cites 2.4.7 Focus Visible.
- Verify the
<form>
element has a submit button or an image button, 1 instance, cites 3.2.2 On Input. - The input element does not have an associated visible label, 1 instance, cites 3.3.2 Labels or Instructions.
The axe DevTools release I used for this post no longer has a “needs review” category. However, I have a copy of 4.36.2 which uses axe-core 4.4.2 and which returned these items flagged as “needs review”:
- Elements must only use allowed ARIA attributes, 8 instances, rule aria-allowed-attr (for the
aria-label
on the<div>
s). - Text elements must have sufficient color contrast against the background, 8 instances, rule color-contrast.
<video>
elements must have a<track>
for captions, 1 instance, rule video-caption.
I kept it separate because it is not using the same axe-core and because users on the current release no longer have access to “needs review” entries.
Bonus: My Best Practices
Each of the tools provide best practices as well. They can address known bugs, anti-patterns, or opinions from the developers, but generally should not represent WCAG failures (though they sometimes do, from experience). For this page multiple <h1>
s, redundant roles, or unnecessary HTML attributes were referenced. I am not logging them because it was just taking me too long to write this already.
I am including my own best practices for the page, but I did not include recommendations (I reserve those for paying clients):
- Visible non-interactive content that is not repeated elsewhere has
aria-hidden
, which can make for an odd experience for sighted screen reader users.
//span[@class="hero__eyebrow"][@aria-hidden="true"]
- If an image has a blank alt (
<img alt="">
), thenaria-hidden
is redundant and unnecessary and redundant.
//img[@alt=""]
[@aria-hidden="true"] - The primary and footer navigation announce as “Main navigation navigation” and “Footer navigation navigation” because the word “navigation” is included in the
aria-label
.
//nav[contains
(@aria-label,"navigation")] - The placeholder for the search does not fail 3.3.2 because it is a common pattern, however consider a persistent visible label regardless.
//input
[@placeholder="Search"] - Many of the links are verbose. A couple links are duplicated but have differing link text. Combined these make for a complex page to navigate by link text alone.
//a[@href="/interop-2022-wrapup/"]
,//a[@href="/web-platform-12-2022/"]
- The centered all-caps “CHROME DEVELOPERS” text is a link among other centered text. It is not immediately apparent as a link and in dark mode its background (#2c333f) against the page background (#303136) has a 1:1 contrast ratio.
//section[
contains(@class,"homepage __developers")] //a[@data-type="primary"] - The country select uses mixed languages and character sets within options. Because you cannot break up text in an
<option>
to give parts alang
, consider not mixing these.
//select[@id="sub-country"]
- 4.1.3 is mooted on the cookie consent since it appears first in the DOM and is drawn at page load. It may not need to be a live region (nor did it announce as one).
//div
[@class="web-snackbar__label"] [@role="status"]
Takeaways
Automated accessibility checkers lack the context of a page and user. They can only run against the code in the current state of the page. This means you have to run and re-run them while checking assorted viewport sizes, orientations, states, and whatever else may be a factor.
They may also disagree on which Success Criterion best matches an issue they identify in the code. This post makes no judgment on which tool is right or wrong in that regard (if some of these can be wrong).
This does not mean you should avoid automated tools. Like any tool, it means they have a place in your toolbox. When used correctly they can be extremely helpful. When used in the hands of a novice they can result in a sense of complacency. When not used at all they can be a missed opportunity.
Please remember that WCAG itself is also the bare minimum of accessibility. Conforming to WCAG does not guarantee something is accessible. It does not even guarantee something is usable. All WCAG does is provide you with a starting point. Lots of WCAG failures suggest the page has not even made it to the starting line.
Related posts:
- Speech Viewer Logs of Lies, August 2020
- XPath for In-Browser Testing, April 2021
- Beware False Negatives, September 2021
- What Does X% of Issues Mean?, July 2022
- The 411 on 4.1.1, December 2022 (repeated from above)
Update: 22 January 2023
Rachele DiTullio ran the same page through the Access Assistant extension for Chrome v8.10.0.11 (which I refer to above as AA) and recorded results to match the format I used in this post: Comparing Level Access automated tools to manual accessibility testing
It found 4 issues and logged them against 5 Success Criteria (so one was logged to two SCs). For the sample (tested a few days after my test and seemingly still the same code), Access Assistant found more issues than WAVE Evaluation Tool, axe DevTools, and ARC Toolkit, and fewer than Equal Access Accessibility Checker.
Update: 23 January 2023
I have been asked about Microsoft Accessibility Insights and Google Chrome Lighthouse and how they compare. I did not test either of them because they use axe-core, the same engine as axe DevTools. Accessibility Insights and Lighthouse both use axe-core 4.4.1, which is an older version than axe DevTools uses.
That being said, each returns more issues that need review than axe DevTools does (which returns none in its current release).
Sales Pitch: 27 April 2023
Nobody asked me to make the computer image in this post into a branded product, so I did it anyway.
Bring a red or yellow notebook to a meeting so folks know you are serious. Nring a matching mug and laptop sticker. Get a blue t-shirt with the logo on the chest to cosplay as a weird accessibility first responder (or lighter blue with black text or any color tee you want).
Thank you for humoring me on my first ever product pitch on this site.
Automated Tools Comparison: 15 April 2024
Equal Entry has published A Comparison of Automated Testing Tools for Digital Accessibility, which takes a slightly different approach than I do here:
- It tests six tools (including two from overlay vendors AudioEye and Level Access) with two or three overlapping the tools I tested (maybe three since since Equal Entry does not name the tools);
- It does not provide its test results;
- It created a 31 page site with 104 guaranteed WCAG violations (I used a single public page);
- It does not provide a URL for the test site;
- It has no manual testing results as a control;
- It performed the comparison as contract work for Evinced.
We know only about a quarter to a third of WCAG is automatable. As such, every tool is going to have an upper limit against a test of all SCs.The 6 automated checkers caught 3.8–10.6% of the issues. In contrast (or support), my own tests above showed a hit rate of 0–13.5%, which is arguably similar.
Without seeing the test site it is hard to understand how to read these results. For example, I can create a 1.1.1 Non-text Content issue that gets caught by all tools or I can create one that gets missed by all tools. I can even create a 1.1.1 issue that I know will be caught in some and missed in others.
One minor concern about reporting, but not the tests, is this statement: If a product links to a WCAG technique that is a published standard, then it is easier to ensure that the issues reported are consistent between products.
A WCAG Technique is not a standard. Each is an example of how to meet an SC, but they are not normative (not standards). Some Techniques may even recommend things that simply don’t work.
Regardless, it is still an interesting report and approach.
12 Comments
Absolutely fantastic work, Adrian! Wish I had you on my team!
Thanks, Adrian–a real eye-opener!
Thank you, Adrian!
This sure makes me wonder how many people never move past the practice of relying solely on automated tools. At least they have something – that’s better than nothing, I guess?
Excellent content.
We appreciate you!
Where does Microsoft’s Accessibility Insights for Web fit in to this list?
In response to .Dunno. I did not test it. It uses axe-core 4.4.1, which is an earlier release than axe DevTools uses. Lighthouse also uses axe-core, and I responded on Mastodon to note that Lighthouse still presents the “needs review” items no longer in axe DevTools.
If you try it with Microsoft Accessibility Insights, please post your results here.
This is a very great article and very educational. I would add to this that manual testing is vital and you need and users that use the technology you are testing with. It is very difficult to perform a manual test with the technology you do not use every day. I think all of those things are vital to a really accurate test of any content. But a lot of companies overlook the importance of native testing by users of assistive technology.
In response to .Very much agreed, Desiree. I am not explicit about that here given the nature of this post (though I go into it in my post Your Accessibility Claims Are Wrong, Unless…).
Manual testing with AT can only be as good as the tester’s skill with the AT — which is often poor.
This has been an interesting read – in terms of using the Axe Devtools – did you include all the manual tests as well? Or did you just run the automated part of DevTools?
Ive only started using the devtools the past 6 months or so, and find they tend to find alot of issue with little consequence, but also occasionally find things I may not have picked up on totally manually.
In response to .Ky, if by “manual tests” you mean the new “Intelligent Guided Test” then no, since that is a paid feature. If by “manual tests” you mean the “needs review” category, that has been removed in the latest release of axe DevTools, which is what I used in this post. For each tool I only logged the automated bits in the tables.
In response to .Ok very interesting! I can vouch for the manual guided tests – they are pretty thorough, although some of the instructions are a little ambiguiously worded. It took me a few attempts to start getting consistent results – but its helped from both a legal standpoint and a practical one too. There are still things it tends to miss of course, but theyre also things that seem quite blatent, but nested and are ok in their own isolated context, but once nested in others, it gets messy.
Can highly recommend trying the manual guided tests out, if only out of interest.
In response to .Indeed, I have tried the Manual Guided Tests. It was outside the scope of this post and I have no interest in being critical of one tool’s (paid) features when other free tools don’t all have the same features. Frankly, doing so puts me into the realm of “free labor for competitive analysis”. As such, I will not vouch for Manual Guided Tests (nor do I use them).
Ran out of time, but I did notice that the severity of this is 2.4.3 issue is low for (what I think is) the wrong reasons:
When using the search combobox, pressing Esc closes the entire search disclosure but focus is not set back to the trigger. The browser papers over this by letting the next Tab press move to the next control, however.
It’s true that browsers forgive this, but it is still problematic for users tracking keyboard focus. These include some screen magnify users and screen reader users. Screen magnify users tracking keyboard focus will be moved to the top of the page. The virtual cursor on screen readers will announce content at the top of the page when navigating to the next item (DOWN ARROW key.)
Leave a Comment or Response