Monday 9 March 2009

Repository preservation revisited

Are institutional repositories set up and resourced to preserve their contents over the long term? Potentially contradictory evidence has emerged from my various questions related to this topic.

You may remember that on the Digital Curation Blog and the JISC-Repositories JISCmail list on 23 February 2009, I referred to some feedback from two Ideas (here and here) on the JISC Ideascale site last year, and asked 3 further questions relating to repository managers’ views of the intentions of their repositories. Given a low rate of response to the original posting (which asked for votes on the original Ideascale site), I followed this up on the JISC-Repositories list (but through oversight, not on the blog), offering the same 3 questions in a Doodle poll. The results of the several different votes appear contradictory, although I hope we can glean something useful from them.

I should emphasise that this is definitely not methodologically sound research; in fact, there are methodological holes here large enough to drive a Mack truck through! Nevertheless, we may be able to glean something useful. To recap, here are the various questions I asked, with a brief description of their audience, plus the outcomes:
a) Audience, JISC-selected “expert” group of developers, repository managers and assorted luminaries. Second point is the same audience, a little later.
  • Idea: “The repository should be a full OAIS [CCSDS 2002] preservation system.” Result 3 votes in favour, 16 votes against, net -13 votes.
  • Idea: “Repository should aspire to make contents accessible and usable over the medium term.” Result: 13 votes in favour, 1 vote against, net +12 votes.
b) Audience JISC-Repositories list and Digital Curation Blog readership. Three Ideas on Ideascale, with the results shown (note, respondents did not need to identify themselves):
  • My repository does not aim for accessibility and/or usability of its contents beyond the short term (say 3 years). Result 2 votes in favour, none against.
  • My repository aims for accessibility and/or usability of its contents for the medium term (say 4 to 10 years). Result 5 votes in favour, none against.
  • My repository aims for accessibility and/or usability of its contents for the long term (say greater than 10 years). Result 8 votes in favour, 1 vote against, net +7 votes.
A further comment was left on the Digital Curation Blog, to the effect that since most repository managers were mainly seeing deposit of PDFs, they felt (perhaps naively) sufficiently confident to assume these would be useable for 10 years.

c) Audience JISC-Repositories list. Three exclusive options on a Doodle poll, exact wording as in (c), no option to vote against any option, with the results shown below (note, Doodle asks respondents to provide a name and most did, with affiliation, although there is no validation of the name supplied):
  • My repository does not aim for accessibility and/or usability of its contents beyond the short term (say 3 years). Result 1 vote in favour.
  • My repository aims for accessibility and/or usability of its contents for the medium term (say 4 to 10 years). Result 0 votes in favour.
  • My repository aims for accessibility and/or usability of its contents for the long term (say greater than 10 years). Result 22 votes in favour.
I guess the first thing is to notice the differences between the 3 sets of results. The first would imply that long term is definitely off the agenda, and medium term is reasonable. The second is 50-50 split between long term and the short/medium term combination. The third is overwhelmingly in favour of long term (as defined).

By now you can also see at least some of the methodological problems, including differing audiences, differing anonymity, and differing wording (firstly in relation to the use of the term “OAIS”, and secondly in relation to the timescales attached to short, medium and long term). So, you can draw your own conclusions, including that none can be drawn from the available data!

Note, I would not draw any conclusions from the actual numerical votes on their own, but perhaps we can from the values within each group. However, ever hasty if not foolhardy, here are my own tentative interpretations:
  • First, even “experts” are alarmed at the potential implications of the term “OAIS”.
  • Second, repository managers don’t believe that keeping resources accessible and/or usable for 10 years (in the context of the types of material they currently manage in repositories) will give them major problems.
  • Third, repository managers don’t identify “accessibility and/or usability of its contents for the long term” as implying the mechanisms of an OAIS (this is perhaps rather a stretch given my second conclusion).
So, where to next? I’m thinking of asking some further questions, again of the JISC-Repositories list and the audience of the Digital Curation Blog. However, this time I’m asking for feedback on the questions, before setting up the Doodle poll. My draft texts are
  • My repository is resourced and is intended to keep its contents accessible and usable for the long term, through potential technology and community changes, implying at least some of the requirements of an OAIS.
  • My repository is resourced and is intended to keep its contents accessible and usable unless there are significant changes in technology or community, ie it does not aim to be an OAIS.
  • Some other choice, please explain in free text…
Are those reasonable questions? Or perhaps, please help me improve them!

This post is made both to the Digital Curation Blog and to the JISC-repositories list...

OAIS: CCSDS. (2002). Reference Model for an Open Archival Information System (OAIS). Retrieved from http://public.ccsds.org/publications/archive/650x0b1.pdf.

4 comments:

  1. I teach preservation management in a couple of library schools, and if I'm interpreting this correctly, full OAIS compliance is a non-starter -- which leaves TRAC off the table altogether. Is that too strong a statement?

    ReplyDelete
  2. OAIS is interesting and practioner's understanding of it is even more interesting. OAIS, first and foremost, is a model; a platonic ideal of what a repository should look like. (Others may call me a heretic but I question it's platonic standing.)

    When I hear or read conversations concerning OAIS "compliance" there seems to be little understanding of what is meant. Some seem to think that every part must be embodied by software. It does not. Much of the model has more to do with policy statements than code. (Did I just scare more developers?)

    Is "compliance" defined by meeting the defined mandatory responsibilities? Section 3.1 "Mandatory Responsibilities" lists only 6 responsibilities and each of them are so general that few should fear their compliance. Even when each of these points are discussed in greater detail it takes the form of "should" or as one might call it: "things to think about."

    The rest of the report then attempts to explain the model (not requirements) with functional entities. "This aids OAIS designers of future systems and provides a more precise set of terms and concepts for discussion of current systems." (page 4-1)

    Even these functional entities should not scare very many people once they understand them. We need to understand that part of this model's implementation is policy and only a portion of this is appropriate for programming or computer system design.

    In the long term the model doesn't really matter. What matters is our intelligent implementation of it. The OAIS model was designed to give developers a common vocabulary (which it has for the high level entities). We often seem to forget, however, the functional entity vocabulary it gave us as well. But, if OAIS "compliance" is scaring us away from using it, what good is it really?

    ReplyDelete
  3. @VickyMcC, as @Seth says, OAIS compliance is pretty easy to achieve. The only publicly available formal compliance statement is for LOCKSS (2004 Formal statement of Conformance to ISO 14721:2003 from http://www.lockss.org/lockss/OAIS), and LOCKSS preserves documents in HTML "as is". The proposed Repository Audit and certification standard being worked on in CCSDS would be MUCH tougher!

    IMHO, OAIS is an excellent high level conceptual model and an overly-complex architecture for preservation systems.

    ReplyDelete
  4. I'm aware of what OAIS is intended to do, but I appreciate the concise explanation.

    Saying that, I think library/archives faculty perhaps have a different view of OAIS (and derivative projects like TRAC and PREMIS) from what prevails in the IR community. I may be guilty of generalizing there, but I'd like to have a better understanding (beyond anecdotal evidence) of where preservation concepts and IRs currently intersect -- or don't.

    Is this subject still blog fodder, or has it risen to the level of publication or proceedings? Anything you can point me to?

    BTW, the DAITSS project in Florida declares itself a functional implementation of OAIS. http://www.fcla.edu/digitalArchive/pdfs/DAITSS.pdf

    ReplyDelete

Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.