Skip to main content [aditude-amp id="stickyleaderboard" targeting='{"env":"staging","page_type":"article","post_id":2150641,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"bots,business,cloud,dev,media,mobile,","session":"D"}']

The Internet Archive launches Wayback Machine Chrome extension to combat link rot

Internet Archive - Wayback Machine

Image Credit: Paul Sawers / VentureBeat

The Internet Archive is making it easier for web users to access archived versions of dead web pages with a new official add-on for Google Chrome browser.

Above: Wayback Machine: Chrome Extension

Once you install the Wayback Machine extension, whenever you land on a once-valid web page that now delivers an error code — such as “page not found” or a “404” — the extension will query the Wayback Machine to check whether there is anything in the archives. If there is, you’ll be asked to click to view the most recently archived version.

[aditude-amp id="flyingcarpet" targeting='{"env":"staging","page_type":"article","post_id":2150641,"post_type":"story","post_chan":"none","tags":null,"ai":false,"category":"none","all_categories":"bots,business,cloud,dev,media,mobile,","session":"D"}']

For the uninitiated, the Internet Archive has been documenting the web’s evolution since 1996, crawling millions of websites and documenting changes and edits at intermittent periods. So, for example, anyone wishing to return to the Twitter homepage of 2006, or the FBI homepage in 1996, can do so.

The broader Internet Archive is an incredibly useful tool for curious geeks interested in the history of the web. But it also serves a more important purpose as it prevents content publishers, from newspapers to government agencies, from whitewashing their online history.

AI Weekly

The must-read newsletter for AI and Big Data industry written by Khari Johnson, Kyle Wiggers, and Seth Colaner.

Included with VentureBeat Insider and VentureBeat VIP memberships.

Link rot

By way of example, an estimated 83 percent of PDF documents on .gov domains disappeared during President Obama’s first term in the White House. Such vanishing acts aren’t always sinister, however, as there may be any number of legitimate reasons for content disappearing — departments may merge, or projects may become obsolete.

But the Internet Archive and its partner organizations have made it their mission to document government website data. As George W. Bush’s time in office was coming to an end in 2008, the End of Term Web Archive was launched with one sole purpose: to serve as a permanent record of government-related communications during presidential transitions. Last month, the Internet Archive revealed plans to preserve 100 terabytes of government website data.

Furthermore, a Harvard study from 2013 found that 49 percent of hyperlinks relating to Supreme Court decisions no longer work. And this is the problem that the Wayback Machine is looking to solve. So-called link rot is a growing concern, and online archives are vital to preserving a vast swathe of important data.

As for the new Wayback Machine extension, the Internet Archive says that it is continuing to try to protect user privacy by not recording the IP addresses, and it says that it’s in discussions with Google about “adding a proxy server as an additional layer of protection,” the Wayback Machine’s director, Mark Graham, noted in a blog post.

Additionally, in response to concerns over what a Donald Trump presidency may mean for privacy and censorship on the internet, the Internet Archive recently announced it was building a replica database in Canada.

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Learn More