Friday, 31 August 2007

Archiving Service

The other day I had an interesting discussion with a group of IT facilities management guys here at Edinburgh, who have been asked to write the requirements for a new version of their Archive Service, capable of handling modern requirements for huge amounts of data. It was an interesting discussion, and I hope to remain involved in what they are planning (which is a long way from getting funded, I guess). They had some good ideas, thinking perhaps derived in part from software management systems, of checking data in and out, managing versions, etc.

We spoke a bit about OAIS, and about the issues of making data understandable for the long term. They were somewhat reluctant to go there, not surprisingly; general IT attitude to "archiving" in the past has tended to be something like: "You give me your blob of data, and I'll give you back an identical blob of data at some future time". IT people are pretty comfortable about that approach (although I also mentioned some emerging issues such as those mentioned by David Rosenthal in his blog post about keeping very large data for very long times).

We also discussed the need to make the data accessible from outside the University, to allow it to be linked from publications. This too was a bit outside their previous remit.

It struck me afterwards how much the conversation was affected by the IT services point of view (I'm not trying to denigrate these individuals at all; as far as I can see they took it all on board). I have talked with a lot of people about "digital preservation" or "repositories"; in most of those conversations, the issues about managing the bits themselves are assumed to be pretty much a solved problem, and the conversation takes different forms, about metadata, about formats, about representation information, about emulation versus migration, and so on. The participants tend to be linked to the library, or to some subject area, but rarely from IT.

I wondered if it was possible to imagine the IT guys providing a secure substrate on which a repository service sits, with independence between the two; that would allow everyone to stay in their comfort zone. I don't think OAIS would quite work in those terms, and I think there would be dependencies both ways, although I'll have to think more about this. The only counter example I have heard of is a rumour of Fedora providing high level services based on SRB as a secure storage substrate, although I don't have a reference.

What was also interesting is how few examples we could think of in Universities, of large scale, long term archiving systems for records and data, leaving aside the publication-oriented repository movement.

Can any readers give us pointers?

1 comment:

  1. Don Sawyer sent an update to the MOIMS-DAI list on September 25. See http://mailman.ccsds.org/pipermail/moims-dai/
    2007-September/000422.html

    (I thank Cal Lee for pointing this out to me.)

    ReplyDelete

Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.