Finding lost content
The Internet can be a frustrating resource. There is constant pressure on webmasters to update their web sites and keep them current. While this is a wonderful imperative it can lay waste to otherwise useful content because it is arbitrarily deemed “too old”. It is estimated that the average lifespan of a web page is 45-75 days. It is very likely that information you find useful today will not be accessible in 3 months..
Over the last 14 years I have watched as fantastic collections of information have vanished overnight. If information was valuable enough to be published in the first place, it should be given some regard at “update time”. Happily I am not alone in my opinion and efforts are being made to preserve an archive of the Internet.
Google itself offers a short-term solution to recently deceased pages. In the Google results page you will see in the bottom line of each citation a link called “cache”. This is a snapshot of the page the last time Google crawled the site. Given the frequency with which Google crawls the Internet these cached pages are usually no more than a few days old.
The Internet Archive is the most ambitious attempt to archive online content. The site collects publicly available Internet documents using a web crawler. You cannot keyword search the archive but if you know the website you are interested in you can search on the URL. The results will provide you with a list of archived dates available. By clicking on the dates you can access the content as it appeared at the time referenced.
Be forewarned however, sometimes authors and publishers express a desire for their documents not to be included in the archive either by tagging a file for robot exclusion or by contacting the Internet Archive directly. Therefore this resource does not always yield results.
The Canadian government has taken a more thorough approach to archiving its own content. Their strategy is based on two sites.
The first is the Electronic Collection which consists of books and periodicals published online in Canada by both government and non-government publishers. All Canadian publishers are required to deposit copies of their online publications with Library and Archives Canada (LAC). Currently the composition of the collection is 68% federal government, 29% commercial / non-commercial sources and 3% provincial government.
You can keyword search this resource to located documents. It is not only useful as an archive but it provides access to current online publications as well.
The second resource is the Government of Canada Web Archive. This site provides an archive of federal government web sites as a whole. You can search by keyword, by department name, and by URL. The archive however does not provide any content beyond pages requiring user input such as a search screen to a database or content requiring the user to pay.
The pages within the archive are clearly tagged with a bright green bar across the top so that you are well aware that you are looking at an older version of a web site. It should be noted that the content in this archive cannot be accessed via Google.
When you subscribe to our Stats Link Canada Source Lists we provide you with free archive services. If you come across a dead link, simply report it to us along with the reference Stats Link ID. We will first search for a new live link for you. If that does not exist we will recover an archived copy of the poll, survey or report referenced and e-mail it to you.