On-Demand WayBack URLs

While this may be old news to some of you (just over a week now), I only discovered this weekend that the Internet Archive Wayback Machine can archive a page for you, with its own URL, on demand.

Thanks to a tweet from Jennifer Sutton, I came across the post Instant WayBack URL where Karen Coyle outlines the announcement from the Internet Archive. She outlines two key benefits:

putting permalinks in your documents rather that URLs that can break

linking to a particular version of a document when citing

These alone are great reasons. For those who must provide citations using an academic standard, such as the Chicago Manual of Style citation format (the standard at Web Standards Sherpa), you should include the date a page was last changed or accessed.

With the Wayback URL, you can link to a version of the page as you saw it on the specific date. You may need to update the citation format to include the source URL and capture date, but now you can at least make the version you saw available.

Screen shot of the on-demand Wayback URL field.

Many documents go through multiple versions, whether it’s a blog post or technical document. Some organizations, such as the W3C, have a method for posting versions of each specification at unique URLs for easy reference. In the absence of that process, creating a snapshot of the page makes it a lot easier to cite what you saw on the day you saw it, as opposed to guessing some time in the future what a page might have said.

As someone who has written for other web sites, I would love to have had this feature in the past. I would have captured all of my evolt.org articles before we flipped over to the new old design (that was more than a decade ago now?). When .net Magazine moved its content into the Creative Bloq site, I would have used it then archive what I had written, including the comments (which are now lost).

As I snipe at sites from PrintShame, it may have been more fair of me to link to versions of the sites on the day I captured them, since some of them (ok, only two that I know of) made an effort to update their sites after appearing there. Just as a I have referenced good and bad examples in my talks, it is more useful to attendees to see the versions of the pages I referenced, making the resources from my talk useful for more than a few days.

As I make changes to my site, I can treat this as a poor-man’s archive. Before I roll out a major (or even minor) updated, I can capture a page so I can compare how it looked or what content it contained. While it’s no substitute for a proper back-up, it can prove handy. Of course, if you use the Wayback URL in this way then you should probably consider donating to the Internet Archive.

If you find yourself in a contentious debate on someone’s blog (or a news site, etc), the on-demand ability to archive might be a handy way to track if someone is editing the original content as the fight evolves, making it far easier to call BS that much sooner. It’s also a bit harder to dismiss this kind of capture as having been ‘Shopped.

Trying It

To demonstrate how absurdly simple it is to make a page in the Wayback archive, I posted this blog entry as a stub with the only following content:

This is a stub for the post that’s coming. I am trying to show off the on-demand hotness.

I then generated a custom Wayback URL and then wrote this post. You can see the original stub of this post at: https://web.archive.org/web/20131104184829/http://blog.adrianroselli.com/2013/11/on-demand-wayback-urls.html

Update: November 5, 2013

A comment over at my Google+ share points out that this has value for Wikipedia, where web references are a necessary part of the documentation process, and where those links sometimes die over time.

Update: January 24, 2017

It is worth noting that you can also capture Tweets with the WayBack machine. If it refuses because the page is already online then try adding a query string with the language code.

Tip: Capture a tweet you think might get deleted by pasting its URL into the Save Page Now box at @internetarchive: archive.org/web

Adrian Roselli 🗯 (@aardrian) January 24, 2017

Follow-on tip: If Save Page Now form does not accept it (claims it exists on web), add ?lang=en to URL. Example: web.archive.org/web/20170124224627/https:/twitter.com/GoldenGateNPS/status/823624278230695936?lang=en

Adrian Roselli (@aardrian) January 24, 2017

Update: January 29, 2017

The Internet Archive has posted some tips on how to use the WayBack machine to save pages: If You See Something, Save Something – 6 Ways to Save Pages In the Wayback Machine:

the online form;
the Chrome extension;
a Bookmarklet from Wikipedia;
join the volunteer archive team;
sign up for your own Archive-It crawler;
nominate a site to appear in a government administration end-of-term archive.

Update: February 24, 2019

Hey, a quick way to save a URL plus a bookmarklet:

Typing web.archive.org/save/ in front of any URL saves that content in the Wayback Machine forever. Nasty tweet? Type web.archive.org/save/ in front of the URL, and archive it forever. Hat tip: @t.

zeldman (@zeldman) February 23, 2019

Great tip, thanks! I turned it into a bookmarklet to make it even easier: plasticmind.com/0s-and-1s/bookmarklet-archive…

Jesse Gardner (@plasticmind) February 24, 2019

2 Comments

Reply

When I use all permalinks in my articles, how will this affect my linkbuilding efforts?
It is important to link to high quality, trusted authority websites in order to get higher rankings for your site in Google. What will be the result when I just link to an instant Wayback URL that I obtained from the original site?

Sensuelas; 22 February 2018 at 6:22 pm. Permalink

In response to Sensuelas. Reply

I would only link a Wayback URL when the original URL has gone away, as I do for pages I point to in my posts that go 404 after a few years. That is just a better experience for users as they get the latest site until it is gone, then they get a snapshot.

I cannot say how it affects SEO. It does not seem to have negatively impacted me. Maybe all the outbound 200s from the Wayback are better than them all being outbound 404s as far as Google / Bing / DuckDuckGo / etc are concerned.

Adrian Roselli; 22 February 2018 at 6:28 pm. Permalink