Oya Rieger has been speaking about their large scale book digitisation processes. They first entered an agreement with Microsoft, and later with Google; they were naturally very disappointed when Microsoft pulled out, although this did give them unrestricted access to the books digitised under that programme. On the down-side, they suddenly found they need 40 TB of storage to manage these resources, and it took a year or so before they could achieve this. Oya related their work to the OAIS preservation reference model, and it was interesting to see not only that infamous diagram, but also a mapping of actual tools to the elements of the process model. It’s worth looking at her paper to see this; I noted that they were using ADORe for the archival storage layer, but there were several other tools that I did not manage to note down.
The next speaker, Kam Woods from Indiana has been talking about problems in making access to, and preserving, large CD collections published by the Government Printing Office in the US. With time, the old method of issuing CDs to individual readers and having them mount and view these on library workstations, has become increasingly impractical, so they needed to make these available online and preserve them. Notable here was the extent of errors in an apparently well-standardised medium. Because of the wide variety of environments, their solution was a virtualisation approach; they have now made over 4000 CDs available online, see www.cs.indiana.edu/svp/.
Brian Kelly spoke very rapidly (40 slides in 15 minutes, but the slides are on Slideshare and a video of the talk will be posted as well) on the JISC PoWR project (Preservation of Web Resources, http://jiscpowr.jiscinvolve.org/). This is a very entertaining talk, and well worth looking up. Brian is not a preservationist, but is a full-blown technogeek discussing the roles of the latest Web 2.0 technologies on his blog, in his role as UK Web Focus. His initial finding at Bath, slightly tongue-in-cheek, was that neither the archivist nor the web-master were initially interested in the preservation of their web presence, although by the end of the project both were much more convinced! This project achieved a strong level of interaction through its several workshops, and will shortly release a guide resource.
Adrian Brown of TNA spoke for his colleague Amanda Spencer, unable to attend, about trying to improve web resource continuity through web archiving. He quoted, I think, an initial finding that 60% of web links in Hansard (the record of parliamentary debate) were broken!
Steve Abrams closed the session with an interesting talk on JHOVE2, currently in development. Most notable was a change from a 1-to-1 match between digital object (file) and format, recognising that sometimes one object has multiple components with different formats. There is an interesting change of terminology here: source units and reportable units, with often a tree relationship between them. Their wiki is at http://confluence.ucop.edu/display/JHOVE2Info/Home .