Friday, 14 November 2008

DCC White Rose day

Fresh from a fascinating day in Sheffield, organised by the DCC and the White Rose e-Science folk. Objectives for the day included building closer relationships between White Rose and DCC, helping us learn more about their approach to e-Science, identify data issues, and influence the DCC agenda!

After introductory remarks, the day started with Martin Lewis, Sheffield Librarian, talking about the UK Research Data Service (UKRDS) feasibility study. Martin was keen to emphasise the last two words; it’s important to manage expectations here, and we’re a long way from an operational service. There’ll be a conference to discuss the detailed proposals in February 2009, after which they hope to get funding for several Pathfinder projects. Martin and I argued ferociously, as we always do, but I should be clear that this has the potential to be an extremely valuable development. The really difficult conundrum is getting the balance of sustainability, scalability, persistence and domain science involvement in the curation. Oh, plus an appealing pitch to the potential funders. Not at all an easy balance!

Then we heard from Dr Darren Treanor from Leeds about their virtual slide library for histopathology; he claims it as one of the largest in the world, with over 40K slides digitised, even if only a thousand or so are fully described and publicly available. At 10 Gb or more a slide, these are not small; not just storage but bandwidth becomes a very real issue when accessing them. We had a brief discussion about technologies that optimise bandwidth for view very large or high resolution images; think Google Earth as one example, or the amazing Seadragon, created by Blaise Aguera y Arcas and now owned by Microsoft (the technology appears to be incorporated into Photosynth, but I can’t explore it, as it doesn’t work in my OS!).

Sarah Jones from the DCC and HATII at Glasgow spoke about the Data Audit Framework, now being piloted at 4 sites. This is a tool aimed at helping institutions to discover their research data resources, and begin to understand the extent to which they are at risk. More about that one later, I think!

Then we heard about 3 more very different projects, although Virtual Vellum probably had more in common with the virtual slide library than the presenter would have previously expected. This project shows digitised and cleaned up medieval manuscripts, with their TEI-based transcription in parallel. We heard an entertaining rant against SRB (wrong tool for the job, I think), but more worryingly that these projects were strongly constrained by IPR problems (yes, the digitised versions of half-millennium-old manuscripts do acquire new copyright status, and can also be ringed about with contractual restrictions). So much for open scholarship!

We also heard about the CARMEN project, aiming to support sharing and collaboration in neuroscience. Graham Pryor will be releasing a report from his involvement with CARMEN before too long, so I won’t dwell on it, other than to say it is a really good project that has thought a lot of issues through very carefully. It also has one decent and one cringeworthy acronym: MINI, for the Minimum Information for a Neuroscience Investigation, and MIAOWS, for Minimum Information About a Web Service… but maybe Frank Gibson was pulling our legs, since I didn’t find it on a Google search, nor is it in the MIBBI (Minimum Information for Biological and Biomedical Investigations) Portal!

Finally we heard about the University of York Digital Library developments; this is not so much a project as the development phases of a service. They've choesn Fedora plus Muradora as their technology platform, but it was clear again that this was a thoughtful project, recognising that there's much more to it than "build it and they will come".

This was a really interesting day; I hope we gave good value; I certainly learned a lot. Thanks everyone, especially Dr Joanna Schmidt, who organised it with Martin Donnelly.

