5 Tools to help you search the archived internet 510 k database

The archived internet deserves more recognition. Online security has been a hot button topic in the tech community recently, with data scandals and privacy policy updates constantly driving the conversation. But, keeping the internet a stable and reliable network isn’t all about data security – it’s also about data preservation.

Anything that’s low tech is dismissed as “from the stone age,” but stone is by far the most stable way to record information. Not only will the hard drives and networked routers of today never last a thousand years, but plenty of information online won’t even last the decade. As local newspapers or long-in-the-tooth startups go under, they all leave dead links scattered across the internet, constantly replaced with fresh links that will themselves eventually die.


Wow, sorry, didn’t mean to get too dark there. My point is, memories that you might want to keep are increasingly likely to exist only on the internet — rambling G-Chat conversations with your best friend, say, or your first WordPress blog. If you want to preserve, protect, or search through your online footprint, read on to learn which five online tools can best help you comb through the archived internet.

You can search through the Archive.is site for previously archived webpages, if you’re interested in tracking a specific Twitter account or tech company. There’s even a draggable bookmarklet that you can add to your bookmarks bar to archive future webpages with a single click.

Given the social fallout that can come from a single bad tweet, this site can be a useful way to grab a verifiable, photoshop-proof evidence of a tweet or post that will likely be deleted soon. The saved webpage that results won’t have any active elements or scripts (no popups or paywalls, in other words), but should look more or less the same, even down to the same clickable hyperlinks that the original page boasted. Lumen

Data loss on the internet isn’t always due to the natural process of link rot, as servers or domains become permanently unavailable. One major cause is due to legal demands for content removal. While the content removed due to takedowns can’t itself be archived, the legal complaints themselves can be.

“Our goals are to educate the public, to facilitate research about the different kinds of complaints and requests for removal–both legitimate and questionable–that are being sent to Internet publishers and service providers, and to provide as much transparency as possible about the “ecology” of such notices, in terms of who is sending them and why, and to what effect,” the website explains.

Type any search term into the site and you’ll likely pull up thousands of results. Use the advanced search functions, and you’ll be able to narrow down the DMCA requests by topic, sender, recipient, tags, country, language, action taken, and date.

The search results page includes an easily scanned list of takedown requests, including details such as who submitted them, on behalf of whom, and who to (the latter is almost always Google). You might want to use this database if you’re interested in why a seemingly innocent post in your search results or a favorite YouTube video has suddenly disappeared due to a content claim. You’ll get all the information you need to follow up on the takedown with the company who submitted the request in the first place.

Lumen has a feature that allows you to construct a DMCA counter notice, if you’re the one who has been hit with a takedown that you want to contest. You can also report your own takedown notification though the contact information available on the site. PeekYou

Often, the data you’re looking for can’t be located with a normal internet search engine. The “ deep web” is one reason: Plenty of archives, from the Social Security Administration’s baby name database to this trove of 19th century British book reviews, live in online portals that can’t be crawled. Another issue is how spread out the data often is: Local papers, social media accounts, and blog platforms might all hold bits of information, while never revealing the big picture.

People-search engines are designed to comb through these isolated databases, and PeekYou is a great example of a service that combines disparate sources of content to find a wide swathe of information on individuals. You’ll need to know a name and location, a username, or a phone number.

The Wayback Machine is useful as a portal into a different era — The MySpace homepage on June 10, 2004 is just a click away. And, if you’ve operated a website during the past two decades, the site might have logged snapshots of data you had thought was long lost.

You’ll likely find something worth preserving in the rest of the Internet Archive’s vaults, too, like over 900 classic 70s and 80s-era arcade games, high-res scans of the 1950s science fiction magazine Galaxy or 1983 instructions on how to build a Yugoslavian computer. The Wayback Machine Downloader

Just finding the website data on the Wayback Machine won’t help preserve it for future generations, however. What happens if the Internet Archive loses its funding? You’ll have to download the data today if you want to do your part in preserving it, and for that you’ll need this program.

Once installed, the downloader will retrieve the latest version of every file the Wayback Machine has for any website that you request with the base url. You can further filter the data you download with more complex commands: Github has additional information on how it all works. Finally, you can ensure that if World War III hits, the world will be able to remember your 2004 DeviantArt account.

banner