Saturday, 2 August 2008

Morning at the Repository Fringe

I only managed the first couple of hours of the Edinburgh Repository Fringe event [two days ago... but Google went bonkers and decided this was a spam blog meanwhile!] but it was already great fun, and I can see that those who stay the course will have a great time, and learn heaps. Note to self: more conferences should be like this!

Caveat Auditor (she can’t really be Caveat Lector in the flesh) opened proceedings with plenty of controversy: the IR is dead, we did it! But Vive le IR, maybe it has signs of life, maybe it can provide real services to users (depositors) after all, if we all listen hard, think hard, and work together.

The second hour was divided into 3 20-minute mini-sessions, each with 8 separate soapbox “performances” in little alcoves in the wonderful Playfair Library in Old College, Edinburgh University where the event was being held. I went to 3: Les Carr from Southampton wondering if, and if so how we might get the best out of the 3 or more different software platforms available (DSporaPrints?), rather than being locked in to one; how we might end the “platform arms race”. No clear answer but components and services seemed like useful ideas.

Richard Wright from the BBC (who have a particular problem with volume: soon they will be generating a Petabyte every few days... this is serious volume!) was putting together some of the recent work on error rates in large disk farms to warn us we must abandon that comfortable reliance on the underlying storage layer integrity (be afraid), together with the effects of those errors on different file types, particularly newer formats with high degrees of lossless compression. He showed and played to us some examples of un-damaged and damaged files in several different formats. I forget the exact figures, but a half percent error rate in a BMP file shows a smattering of black pixels, whereas in a GIF file there were serious artefacts and visible damage introduced. Same error rate on a WAV file produces a barely audible rustle effect, while on a MP3 files sound is seriously distorted/. Same error rate on a DOC or PDF file, and you get “File damaged, cannot open”. Be very afraid! Should we take many, many copies of compressed files, or fewer copies of uncompressed files?

Dorothea Salo had a soapbox on data and e-science. Maybe the right paradigm here, she asserted, is the archivist rather than the librarian. As a stalking horse, the paradigm was the retiring Prof passing the university archivist’s office, saying “here are the keys to my office, anything there you want you can have”. The archivist may throw lots away, but it’s according to a well-established protocol. The horse falls down (my metaphor may be mixed here) because in the case of data it’s clearly much more important to capture the stuff while the Prof is still around (far too much tacit knowledge and un-documented context). Plus the whole thing about curation needing the domain scientist, not the generalist (so that’s about partnership, right?). Plus the whole rights and ownership and ethics and “I believe in Open Acess- for everyone else” thing is re-doubled in spades for data. Now, I can’t be saying (I’m definitely not saying honest) “Don’t waste your time with data”. I am saying, again I think, listen hard, think hard, and work together.

Well I had a great time and so did everyone else that I could see. Clearly you can’t hold a Repository Fringe anywhere else, so I hope I will see you in Edinburgh next year!

BTW... this is the 100th post on the Digital Curation Blog!

1 comment:

  1. Regarding error rates, a recent article in DLIB, "A Format for Digital Preservation of Images: A Study on JPEG 2000 File Robustness", has some concrete examples with JPEG, JPEG2000, and TIFF. I found these helpful. See
    http://www.dlib.org/dlib/july08/buonora/07buonora.html

    ReplyDelete

Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.