Monday, 23 February 2009

Repositories and preservation

I have a question about how repository managers view their role in relation to long term preservation.

I’m a member of the Blue Ribbon Task Force on Sustainable Digital Preservation and Access (hereafter BRTF). At our monthly teleconference last week, we were talking about preservation scenarios, and I suggested the Institutional Repository system, adding that my investigations had shown that repository managers did not (generally) feel they had long term preservation in their brief. There was some consternation at this, and a question as to whether this was based on UK repositories, as there was an expressed feeling that US repositories generally would have preservation as an aim.

My comment was based on a number of ad hoc observations and discussions over the years. But more recently I reported in an analysis of commentary on my Research Repository System ideas on discussions that had taken place on Ideascale last year, during preparatory work for a revision of the JISC Repositories Roadmap.

In this Ideascale discussion, I put forward an Idea relating to Long Term preservation: “The repository should be a full OAIS preservation system”, with the text:
“We should at least have this on the table. I think repositories are good for preservation, but the question here is whether they should go much further than they currently do in attempting to invest now to combat the effects of later technology and designated community knowledge base change...”
See This Idea turned out to be the most unpopular Idea in the entire discussion, now having gathered only 3 votes for and 16 votes against (net -13).

Rather shocked at this, I formulated another Idea, see “Repository should aspire to make contents accessible and usable over the medium term”, with the text:
“A repository should be for content which is required and expected to be useful over a significant period. It may host more transient content, but by and large the point of a repository is persistence. While suggesting a repository should be a "full OAIS" has not proved acceptable to this group so far, investment in a repository and this need for persistence suggest that repository managers should aim to make their content both accessible and usable over the medium (rather than short) term. For the purposes of this exercise, let's suggest factors of around 3: short term 3 years, medium term around 10 years, long term around 30 years plus. Ten years is a reasonable period to aspire to; it justifies investment, but is unlikely to cover too many major content migrations.

“To achieve this, I think repository management should assess their repository and its policies. Using OAIS at a high level as a yard stick would be appropriate. Full compliance would not be required, but thought to each major concept and element would be good practice.”
This Idea was much more successful, with 13 votes for and only one vote against, for a net positive 12 votes. (For comparison, the most popular Idea, “Define repository as part of the user’s (author/researcher/learner) workflow” received 31 votes for and 3 against, net 28.)

Now it may be that the way the first Idea was phrased was the cause of its unpopularity. It appears that the 4 letters OAIS turn a lot of people off!

So, here are 3 possible statements:

1) My repository does not aim for accessibility and/or usability of its contents beyond the short term (say 3 years)
( )

2) My repository aims for accessibility and/or usability of its contents for the medium term (say 4 to 10 years)
( )

3) My repository aims for accessibility and/or usability of its contents for the long term (say greater than 10 years).
( )

Could repository managers tell me which they feel is the appropriate answer for them? Just click on the appropriate URI and vote it up (you may have to register, I’m not sure).

(ermmm, I hope JISC doesn’t mind my using the site like that… I think it’s within the original spirit!)

(This was also a post to the JISC-Repositories list)


  1. I can only speak for myself. Of course we (repository managers) think preservation is important (but to an extent is it one of those 'would you like more money?' questions, only one answer).

    However, relative to other issues it takes a back seat, probably because (a) we hope things will be ok (b) we don't want to turn research away for not meeting preservation criteria.

    Maybe we are to some extent complacent.
    Without testing/checking, I think I could probably open a 20 year old text file, and probably a Word doc of the same age. And I'm guessing I could probably open a PDF document from the early 90s. Now of course this all may be true for the bulk of items we deal with, and I acknowledge there will be exceptions and files where this does not apply.
    Most of the files we deal with are PDF, so this is the file type we can't help but consider when people talk about preservation. Even if PDF falls out of fashion in a few years time, then there will still be PDF readers, and ample time to 'convert' documents from this fairly open standard. And being in a managed repository means they are far more likely to be 'converted' to an appropriate file format than random files on the web.

    Now, I'm sure this shows my own vast ignorance, and no doubt you are cursing at the screen in frustration! But I'm guessing there are people with not too different views to this and if we are wrong then that is the issue in itself.

    Most repositories have very little staff time allocated to them, and they require advocacy, management, data/copyright admin, and technical work. (I'm doing all of them at the moment, and the repository only makes up a small part of my job!).

    So what can be done to help raise preservation's profile?
    -Give us simple rules to remember, not long lists of requirements and forms, not that I'm suggesting you have done the latter! (eg. don't use file format X, it's terrible for long term preservation!)
    -practical advice is always good. Which free PDF creator is the best for creating archival PDF(a) documents?
    - A 'preservation plugin' (which works behind the scenes creating archival-format copies of deposited items, and perhaps adds essential fields for preservation purposes) for popular IR software just takes a little time to install, and if endorsed by DCC we know if does 'good stuff'. i.e. we don't need to become preservation experts, we just install a technical package created by the experts and we get on with what we were doing.
    Where possible, technical solutions are of course best (i.e. imagine trying to dictate to academics which version of the PDF file format to use, a system which automatically converted PDFs to the right PDF format on deposit is surely far easier)

    If this sounds negative (and ignorant) then apologies. It's not meant to. Like I said, I do believe in the importance of preservation. I want people to be able to access this research for years to come, and have certainly tried to make decisions with this in mind.

    Chris Keene

  2. Chris, this is a great comment, and I agree with a lot of it. The pragmatism of it really works for me. I was writing different words but similar intent to someone else earlier this evening (I wish she'd made her comments more publicly, as you did).

    I certainly used to believe the "don't use format X' rule, where X == MS Word, for example. But see some of Murray Rust's comments in an earlier email about why MS Word is better in some ways than, eg PDF; for example because it has better options for encoding information semantically. I don't know if he's right, and I'd rather it was possible in PDF, but like everyhing else, it's complicated!


Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.