Thursday 20 November 2008

Comments on an obsolescence scale, please

Many people in this business know the famous Jeff Rothenberg quote: “It is only slightly facetious to say that digital information lasts forever--or five years, whichever comes first” (Rothenberg, 1995). Having just re-read the article, it’s clear that at that point he was particularly concerned about the fragility of storage media, something that doesn’t concern me for this post. What does concern me is the subject of much of the rest of his article: obsolescence of the data. There’s another very nice quote from the article, accepted gospel now but put so nicely: “A file is not a document in its own right--it merely describes a document that comes into existence when the file is interpreted by the program that produced it. Without this program (or equivalent software), the document is a cryptic hostage of its own encoding” (ibid).

Now the first Rothenberg quote is often assumed to apply in the second case, but there’s clear evidence emerging that data are readable well beyond 5 years. Most files on my hard disk that have been brought forward from previous computers (with very different operating systems) remain readable even after 15 years. Most, but not all; in my case, the exceptions are my early PowerPoint files, created with PowerPoint 4 on a Mac in the 1990s, unreadable with today’s Mac PowerPoint version. However, these files are not completely inaccessible to me, as a colleague with a Windows machine has a different version of PowerPoint that can read these files and save them in a format that I can handle. It’s a comparatively simple, if tedious, migration.

I have spent some time and asked in various fora for evidence of genuinely obsolete, that is completely inaccessible data (Rusbridge, 2006). There are some apocryphal stories and anecdotes, but little hard evidence so far. But perhaps complete inaccessibility isn’t the point? Perhaps the issue is more about risk to content, on the one hand, or extent of information loss on the other? Maybe this isn’t a binary issue (inaccessible or not), but a graded obsolescence scale? Is there such a thing? I couldn’t find one (there’s a hint of one in a NTIS report abstract (Slebodnick, Anderson, Fagan, & Lisez, 1998), but the text isn’t online, and it appears to be about obsolescence of people)!

So here are a couple of attempts, for comment.

Here’s what might be called a Category approach. Users would be asked to select one only from the following categories (it worries me that they are neither complete nor non-overlapping!):
A Completely usable, not deemed at risk
  • Open definition with no significant proprietary elements
  • multiple implementations, at least one Open Source
  • high quality treatment in secondary implementations
  • widespread use
B Currently usable, at risk
  • Closed definition, or
  • Open definition, but with significant proprietary elements
  • More than one implementation
  • imperfect treatment in secondary implementations
C Currently usable, at significant risk
  • Proprietary/closed definition
  • single proprietary implementation
  • Not widespread use
D Currently inaccessible, migration path known
  • Proprietary tools don't run on current environment
  • Potentially imperfect migration
E Completely inaccessible
  • no known method of extraction or interpretation
Here’s a more subjective, but perhaps more complete scale. Users are asked to select a number from 1 to 5 representing where they assess their data lies, on a scale whose ends are defined as
1 Completely usable, not deemed at risk, and
5 Completely inaccessible
And here’s a more Likert-like scale. Users are asked to nominate their agreement with a statement such as “My data are completely inaccessible”, using the Likert scale:
1 Strongly disagree
2 Disagree
3 Neither agree nor disagree
4 Agree
5 Strongly Agree
Hmmm. How could you neither agree nor disagree with that statement? “Excuse me, I haven’t a clue”?

Comments on this blog, please; but if you’re commenting elsewhere, can you use the tag “Obsolescence scale”, and maybe drop me a line? Thanks!

Rothenberg, J. (1995). Ensuring the longevity of digital documents. Scientific American, 272(1), 42. (Accessed through Ebscohost)
Rusbridge, C. (2006). Excuse Me... Some Digital Preservation Fallacies? [Electronic Version]. Ariadne from http://www.ariadne.ac.uk/issue46/rusbridge/.
Slebodnick, E. B., Anderson, C. D., Fagan, P., & Lisez, L. (1998). Development of Alternative Continuing Educational Systems for Preventing the Technological Obsolescence of Air Force Scientists and Engineers. Volume I. Basic Study.

2 comments:

  1. Hi Chris,

    This morning I posted an article on "Obsolescence scale" on our weblog http://eisenduurzaamdigitaaldepot.blogspot.com, but since it is in Dutch, I'll give you a short translation.
    In 1997 I bought STOA, a collection of poems that was only published on a 3.5" disk in WordPerfect 5.1 and Word 6.0. With the Rothenberg-dogma in mind, I expected to have a nice object to illustrate his point. But alas!
    The disk was perfectly OK (in my organisation al desktops are still equiped to read those disks) and both files can effortless be opened in Word 2002. So again, no good example of total loss.
    But I'll keep an eye open.

    ReplyDelete
  2. It's not clear from your post whether you're referring to formats 'on hand' for existing data or to making format choices for creating data from here on. In either case, it may be very difficult for content creators to even know the answers to some of the proposed questions.

    In discussing format choices, I often advocate the use of standards based open formats by suggesting some key characteristics. Formats should:


    * be based on freely available standards.
    * be developed by a community of interest rather than by a single entity.
    * have multiple implementations in software.
    * have no intellectual property restrictions (eg patents) attached.

    Also, I suggest that obsolescence is something that can never be judged until some years (decades?) after it has taken place.

    Thought provoking stuff anyway.

    --
    Michael Carden

    ReplyDelete

Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.