/etc

Amid some recent handwringing1 about the state of digital archiving in general and Usenet archiving in particular, I decided to investigate the state of the current Usenet archives we have available to us. What are they? Where are they? What format are they in? How can we access them? What do they have? What do they omit?2

My initial searches turned up the following archives:

According to Katharine Mieszkowski’s 2002 article, The geeks who saved Usenet, the oldest post in the Deja News/Google Groups archive was on May 11, 1981 by Mark Horton, starting us in media res of a thread with the subject “newsgroup fa, net, etc.” on net.general. This gives us a good starting point to search for in our archives.

In fact, looking through our net.general.mbox file from The Internet Archive Usenet Historical Collection for the net.general messages that predate our test message (“DEC on Usenet” and “New Disk Drive”) reveals that we can recover one of them: Note here the difference between the Date header and the X-Google-ArrivalTime header, which is probably why this wasn’t counted as the “oldest” message in the archive.

We can find the same message in the UTZOO archive in news001f1/a2/decvax.116. Interestingly, we can also find the “New Disk Drive” message in the UTZOO archive in news001f1/a2/duke.757 (a message which I cannot find in net.general.mbox):

This is just an initial investigation with one test, and by no means comprehensive. A good next step for someone interested in early Usenet posts would probably be to try to check coverage between the UTZOO collection and the Usenet Historical Collection to see if there are any gaps which can be filled in by merging them together. Another question to try to answer would be how comprehensive the Usenet Historical Collection is for the 1991-on range not covered by the UTZOO collection.

Footnotes

  1. Matthew Braga. Google, a Search Company, Has Made Its Internet Archive Impossible to Search. Vice Motherboard. Published 2015-02-13. Accessed 2015-02-23.

    Andy Baio. Never trust a corporation to do a library’s job. Medium. Published 2015-01-28. Accessed 2015-02-23.

    Ian Sample. Google boss warns of ‘forgotten century’ with email and photos at risk. The Guardian. Published 2015-02-13. Accessed 2015-02-23.

    Gareth Millward. I tried to use the Internet to do historical research. It was nearly impossible. The Washington Post. Published 2015-02-17. Accessed 2015-02-23.

  2. Preserving all of Usenet, including all binary postings, would be a pretty daunting task. I’m not really aware of anyone who’s actually trying to do that, though even archiving just metadata about binary postings might provide an interesting historical record.

  3. See this blog post on bang path addressing for a note on the email addresses in these early Usenet archives.