There Is No Document Outline Algorithm

I figured I would state the entire argument in the title. After all, as of this writing and the last seven-plus years, the statement is accurate as far as the browsers are concerned.

I am penning this as sort of a follow-up to my post from 2013, The Truth about “The Truth About Multiple H1 Tags”. Even after that post helped kick off an update to the W3C HTML5 specification, it is not reflected in current tutorials and informative pieces.

Bad Information Persists

The appeal of ignoring heading levels for developers and authors is pretty compelling when you do not know how or where that content will appear (or it appears in many different locations). In particular, the all-<h1> approach appeals to many for its simplicity.

It makes sense why a developer might see this advice, or hear about it through misinformed articles, and never look back. This advice is a free pass. Many content authors don’t even know where their content will appear, making an all-<h1> approach feel like the safe approach.

Unfortunately, despite all the activity in the standards world along with the lack of activity on the part of the browsers, many developers continue to be unaware that this imparts no benefits to users and even harms many of those users. I run into this repeatedly when I answer questions on Stack Overflow, when I talk to developers in real life, and even from generally trusted outlets:

This was part of the spec, and it was “revoked”, which is not a nice thing to do. And it was revoked not because they considered that it was a bad idea, but because of screen readers not implementing it correctly.

This is a common position, captured succinctly in this one example.

Disregarding the fact that it was never part of the final W3C spec, that the spec had a warning for three years, that nobody considered the algorithm a bad idea, that screen readers had nothing to do with it, and that browsers not implementing it is different from correctly implementing it, there is one statement that belies the issue at hand.

Not a nice thing to do is a value judgment. It presumes that the specification’s primary benefactors are developers when in reality it is about users. It also presumes that it is acceptable to give developers advice (that harms some users) that has never been supported.

Like it or not, browsers are not moving on this feature and citing the purely theoretical document outline does nothing to move it forward. We as developers need to resolve this while still making it easy for content authors.

Update: February 13, 2017

There is a new issue opened against the W3C specification to try to understand how the outline algorithm is supposed to work so a polyfill can be created. This is sometimes a first step to getting support built into browsers. Read more at the issue, Update outline algorithm #794.

One Alternative (in Two Parts)

The average web developer would rather not have to think about mapping the appropriate heading level for every potential re-use of content. Authors should not have to think about it at all.

Server Side

Way back in the early oughts (actually, 1999–2000) I wrote a CMS (Content Management System) based on delimited text files. It was a lark. I wanted to teach myself some programming skills and my brother needed a mini-CMS while he was overseas.

I quickly ran into the heading issue that HTML5 tried to solve — sometimes his content would be re-used elsewhere in the layout, and the headings would not make sense anymore. But I solved it. I solved it without any fancy frameworks or libraries or HTML5 retooling.

Every content container carried a variable (this was all server-side code). That variable was a number reflecting its nesting level on the page. That number was then used to replace the number in any <h#> levels in the content (the content was chunked enough that there was not more than one heading).

I carried that technique forward into projects on much beefier CMSes and never had to worry about training authors how to manage chunked content on their home pages (and similar chunked pages). The move to HTML5 never made me consider an all-<h1> solution, partly because I knew the outline was not supported.

Client Side

Since so much of the content on modern sites comes in via client-side scripting, the code would simply need to be updated to run in the browser — this assumes you don’t mind offloading simple processing to thousands of users across uncontrollable run-times. But then if you are relying on client-side scripts to render a page you have already made your decision.

The following embedded code shows an HTML document that uses an all-<h1> structure. With a (not production-ready) chunk of JavaScript and some custom data- attributes on the sectioning containers, I re-write the <h1>s to reflect a document outline appropriate for this content. I use some CSS generated content to include the heading level after the text of each heading so you can easily see which is which. If the script does not work, you will see black headings sans parentheses for all.

Conceivably you can let your authors continue an all-<h1> approach, while your templates just tweak the structure based on attributes you embed in the layout.

See the Pen Dynamic Heading Level Demo by Adrian Roselli (@aardrian) on CodePen.

You can see (steal, fix) the script at CodePen directly, or you can view it as a full page and make sure your assistive technology (such as a screen reader in this case) can navigate the corrected heading structure as you expect.

Another Alternative

We can work to get browsers to support another new element, the <h> element (proposed in April, which was probably based on Gez Lemon’s 2004 suggestion). Browsers would still need to implement some sort of document outline algorithm, but in this case a new element means no need to rewrite existing <h#> logic.

That will require the developer community to come together as it did for the <picture> element. It can be done, it just requires some effort.

Update: January 18, 2017

While the issue opened last April has since been closed (since it was about a language change in the spec), a new issue was opened specifically for adding an <h> element.

Minutiae

The statement in the title of this post is not new. It has been discussed for at least three years in standards bodies. It has been ignored by browsers for longer (more than seven years), though the browser bugs linked at the end are only a couple years old. Anyone who claims this is a recent change has not confirmed that with the W3C specification in two years.

The following links are just evidence I have needed to provide repeatedly to demonstrate these points. I guess they are more for me to easily reference from future Stack Overflow answers.

To recap, the Document Outline Algorithm was never a recommendation in a final W3C spec. There was a warning explicitly against authors relying on it, though the outline language was retained for browsers to understand how to implement support (eventually).

Regardless of whether you like the idea of the document outline algorithm, it does not reflect reality since no user agent supports it.

Update: January 23, 2018

WHATWG has maintained the fiction of the document outline despite no implementations. An issue against the spec was opened in 2015 to rectify this, where it has languished.

Until today. New conversation has started, with any eye toward accessibility. So there is promise we will either see a more workable proposal for browsers (to ignore?) and/or acknowledgment that the current WHATWG definition needs to be scrapped.

Update: March 1, 2019

I was made aware that MDN has an entire section on using the document algorithm outline, so I edited the page to drop a warning into it in three spots.

Update: October 15, 2019

An effort is underway at WHATWG to try to resurrect <hgroup> (Alternative take on hgroup #5002), an element dropped from the W3C version of HTML in 2013 and never supported in any browser. If you pay attention to the description, however, it is the latest effort to try to get a Document Outline Algorithm into the WHATWG HTML5 specification.

I made a pitch, which got some positive emoji responses, but a dismissive response from the OP:

Alternative proposal:

  1. Declare that <hgroup> on its own does nothing;
  2. Mint <h>.

Then <hgroup> does not modify <h#>, thereby leaving existing structures intact and not modifying author intent for explicitly-chosen heading levels.

Given use of <hgroup> I have seen in the wild, this approach will not make any existing heading structures less accessible.

If the effort here is to justify <hgroup>, and by extension try again at a Document Outline Algorithm, then let’s mint a new required child element, <h>. The <h> element can then get its nesting level from the algorithm proposed here.

This has the advantage of keeping existing heading parsing logic in place and compartmentalizing the logic of this new effort at a Document Outline Algorithm without blowing up 30 years of existing content and rules. It may also make uptake in user agents a bit easier to swallow.

We can lean on a previous effort to mint <h> to kick this off.

For the sub-heading concept, we can argue that any non-<h> non-phrasing-content-element child of <hgroup> is a de facto sub-head, whether it is a <div> or a <p> (probably more thought required there).

I suspect there will never be support for <h>.

Update: January 7, 2020

In a new post at Smashing Magazine, Why You Should Choose HTML5 <article> Over <section>, Bruce Lawson reaffirms that there is no document outline algorithm and that no, you should not pepper your page with <h1>s.

Update: January 25, 2020

Ongoing efforts at WHATWG to create a functional outline algorithm that browsers want to, and can, implement continues despite two years with no progress. Well, there was progress in so far as Mozilla tried, and failed, to implement the latest effort. Remember, W3C never had a document outline algorithm in a final spec, though WHATWG did (that email implies it was in a spec that browsers implemented), even though it never reflected reality.

It’s been 7 years of no browser support. This latest effort trying to resurrect <hgroup> as the new keeper of the Document Outline Algorithm may not end that drought.

Update: February 10, 2020

Steve Faulkner has wrapped up the history and current situation (as detailed in my January 25 update above) in his aptly-titled post A decade of heading backwards.

Update: April 6, 2022

Interestingly, the first version of HTML discussed, and dropped, heading levels that adapted to sectioning:

Should we support headers for which the level is implicitly defined by nestable section elements?*2 We could also support autonumbering of headers. Unfortunately, on further investigation these ideas proved trickier than thought at first, and so have been dropped from this draft.

2. For example with <H> for headers and <SECTION> for nestable sections.

Update: April 9, 2022

Thanks to Ramón Corominas’ memory, I was able to confirm that IE9 / JAWS 13 announced <h1>s in simple nested <section>s at an appropriate depth. Chrome 99 / JAWS 13 did not, Firefox 91 ESR refused to work with JAWS 13 at all. I did not record more complex constructs, but it started to fall apart pretty quickly as I adjusted the nesting to match things I have seen in real life. For timeline context, JAWS 13 was released in late 2011, while the algorithm was still a draft.

I found it! 🙂 It was in May 2012, and the combination was: Firefox 10/IE 9 + JAWS 13, but it only worked when using <h1> for every heading. <hgroup> had no support at all

If any <h2>-<h6> were used within a section, the level was incorrectly increased, and any headings with a calculated level higher than 6 were no more interpreted as headings

I recorded a video that uses this HTML, pulled from the WHATWG HTML specification examples for headings and sections:

 <h1>Apples</h1>
 <p>Apples are fruit.</p>
 <section>
  <h1>Taste</h1>
  <p>They taste lovely.</p>
  <section>
   <h1>Sweet</h1>
   <p>Red apples are sweeter than green ones.</p>
  </section>
 </section>
 <section>
  <h1>Color</h1>
  <p>Apples come in various colors.</p>
 </section>
JAWS 13 with Internet Explorer 9.

This is a case where IE9 was not exposing the nesting level information (IE9 does not expose heading semantics in the accessibility layer), but JAWS was using heuristics to try to support the draft specification.

Steve Faulkner gave some context:

The JAWS implementation was flawed and they couldn’t get it right, so they pulled it.

The JAWS implementation was sponsored by Rich S/IBM in discussion with me at the time

Despite one of the WHATWG HTML editors asserting this week that The problem is about the mismatch with accessibility tech, it looks like some accessibility tech tried to match the draft specification in 2011 and rolled it back.

This is all on the radar again since Léonie Watson is trying to get some help from WHATWG on publishing the January 2021 HTML Review Draft as a W3C Candidate Recommendation after Steve (and I along with others) raised an objection since it contains the fictional Document Outline Algorithm.

Update: April 18, 2022

Back in 2015, one of the WHATWG HTML contributors suggested a preference for removing the Document Outline Algorithm from the WHATWG HTML specification instead of adding a warning, but the editor at the time disagreed, stating a full re-write was necessary. Then 7 years of no movement from WHATWG.

Last week the now-current WHATWG HTML editor, after the failure of anyone to get the outline algorithm implemented in the last 7 years, pivoted back to the 2015 plan, though once again implying he would not do it.

So Steve Faulkner did. Steve filed WHATWG HTML PR #7829 removes outline algorithm . On Easter Sunday. If all goes well, maybe it won’t be another 7 years for this to be merged.

Steve also points out that User Agent default CSS style sheets do not visually honor the Document Outline Algorithm (something folks have incorrectly asserted for years):

See the Pen incomplete implementation of outline styles by steve faulkner (@stevef) on CodePen.

Update: July 1, 2022

The Document Outline Algorithm is now gone from the WHATWG HTML specification.

It took 6¾ years from when Steve Faulkner first opened the issue, with the intervening time seeing piles of evidence ignored, the backing of dozens of experts, spec editor gatekeeping, a pull request, and help shepherding it through the WHATWG process, but Steve pulled it off.

If you see any tools, editors, articles, “experts”, etc., pitching the Document Outline Algorithm, remind them they are wrong (and have been).

Update: July 7, 2022

Bruce has provided some context as well, which is far shorter than my stove-piped post, in Why the HTML Outlining Algorithm was removed from the spec – the truth will shock you!. This part hits home:

One of the reasons I liked having a W3C versioned specification for HTML is that it would reflect the reality of what browsers do on the date at the top of the spec. A living standard often includes things that aren’t yet implemented. And the worse thing about having zombie stuff in a spec is that lots of developers believe (in good faith) that it accurately reflects what’s implemented today.

With the version-less WHATWG spec, the update is only there if people remember to look. So it might be some time before folks believe it. Even after years of evidence.

17 Comments

Reply

I had occasion to look back for on this already in the course of 24 hours or so and I didn’t find this in your links, so I’m including it in the comments just incase anyone is interested https://discourse.wicg.io/t/html5-h-custom-element/438/39

In response to Brian Kardell. Reply

Thanks. I totally spaced on that after you showed it to me. Appreciate you linking it.

Reply

In the interest of generating a solid outline for screen readers, I was wondering if it was okay practice to blend explicit els with role="heading" + aria-level="x", and then hide it for non-screenreaders; e.g.

    Main Site Navigation

The idea being to have presentational structures that don’t translate to screen readers without excluding their users from understanding. (also hopefully google doesn’t ding us)

Gregory; . Permalink
In response to Gregory. Reply

Short answer: No.

Longer answer: Not all screen reader users are blind.

Yet longer answer: If you can already put the appropriate heading level into an aria-level attribute, then you can use the correct <h#>, so this seems like a lot of extra effort for a potentially confusing (and SEO-risky) approach.

Reply

Dude, you’re a bozo. Plain and simple. Even if there’s “no such thing as document outline”, just by using a document outline can help tell you if the site is complete shit. You can actually tell by just visiting the site you want to outline. 99% of the time, it completely matches the shittiness that is displayed. Almost as bad as your writing. Almost.

In response to Alf. Reply

Alf, other than your points about me being a bozo and my poor writing, I am not sure I understand what you are saying. On the first two points I agree, though. So kudos on being right in your opening and close!

Reply

Hi Adrian,
I just recently found your blog and the information here is really helpful. Thank you so much.
In short, does it mean that there is always only one H1, which is the title, in a web page?
Chris

Chris Wong; . Permalink
In response to Chris Wong. Reply

Chris, yes, you have distilled my position well. Have a single <h1> per page, and that <h1> should correspond to the value of the <title> (excluding the site name, marketing tagline, etc).

Reply

Perhaps this is not the place nor time to ask (i preemptively apologize if so) but is it allowed to use multiple <header> elements on a page?
And should <section> elements be used?

In response to J Redhead. Reply

J Redhead, it is totally fine to use multiple <header>s on a page. Note that only the first instance of <header> under <body> will be considered a banner landmark. <section> elements are fine to use, but my rule of thumb is not to use them unless they would also get an appropriate <h#> heading.

Reply

Thanks a lot for the article, Adrian!

I am really confused when MDN says “the outline algorithm should not be used to convey document structure to users.” Does it mean that developers should not use html sectioning elements in their code to structure the html markup for better accessibility. Sorry, I am totally lost!

In response to SS. Reply

I added that to convey that the proposed Document Outline Algorithm as described in this post (where use of sectioning elements resets heading levels) should not be used, and that we should used the same best practices we have used for 20+ years.

Reply

Wow this article has been an exciting journey to read. Thanks you for your commitment to documenting this bizarre part of the web’s history that most people will not even notice.

Theo; . Permalink
Reply

If the document outline algorithm is now removed completely from the spec, is there any purpose to sectioning elements? Is there any context where they would be recommended, or used in preference to to tags? Aside from the fictional and now additionally removed document outline algorithm, do they have any semantic value?

Merce Lutzker; . Permalink
In response to Merce Lutzker. Reply

If the document outline algorithm is now removed completely from the spec, is there any purpose to sectioning elements?

Yes. Sectioning elements have worked for years to help users (screen reader users only so far) jump to regions or landmarks (see below) of a page. The WHATWG HTML specification addresses some of this in 4.3 Sections

Is there any context where they would be recommended, or used in preference to to tags?

I think you included some HTML elements in there, but did not escape the < and >, so I don’t know what they were. What I can say is that yes, the context for using them is marking up page landmarks — page header, footer, navigation, search, and main regions as well as rare cases of other named regions (landmarks). WebAIM has a very brief introduction to regions. TetraLogical goes into more detail on landmarks (I am using “named regions” interchangeably with “landmarks”). Léonie Watson demonstrates with a screen reader in this 2019 video.

Aside from the fictional and now additionally removed document outline algorithm, do they have any semantic value?

Again, without knowing which elements you mean, HTML regions have a ton of value. Named regions, <section> and <article> with accessible names, also have value but only if used sparingly.

In response to Adrian Roselli. Reply

Thank you! I’m, as you may be able to tell, a very naive dev at the moment, so this understanding helps me a lot. Rereading, my question seems sort of silly, but essentially, the big idea is that, while the sectioning elements added in HTML5 do not contribute to a document outline as originally specified, they are useful as landmarks?

Merce Lutzker; . Permalink
In response to Merce Lutzker. Reply

Yes.

Leave a Comment or Response

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>