Wednesday 10 June 2009

How can Social Bookmarking tools support community resource building?

In the DCC we are trying to work out ways that we can present tools to the community that help you to help us, to help you. The most primitive example of this would be the use of email lists: having identified some issue, we ask a question on a list, and use feedback from list members to develop our response to the issue. But we want to go further; just not sure how.

In this post I want to explore two use cases where social bookmarking tools might be helpful, and to seek advice on how to take these ideas forward. The two cases are:
  • getting input from the community on curation tools and resources worth investigating
  • extending a proposed bibliography on data curation.
In the first case, we’re looking for suggestions for quality curation resources. These could be tools of various kinds, guides, policies, templates, even standards (although we have separate ideas on the latter). We currently have a form on our web site for suggestions, but it’s long-winded and clunky, and we don’t get much input. What if we could use something like Delicious. It’s very easy to bookmark a resource with Delicious, as I’m sure you know. A couple of clicks and a few keystrokes, and you’ve bookmarked and tagged something. But how can we arrange for Delicious bookmarking to feed in to a set of resources for us to review? I wondered if asking people to use a tag such as <DCC-suggest> might work?

In the second case, I have been building an extensive bibliography of books, articles and reports relevant to research data curation, management and preservation. We can load such a bibliography onto our web site in a variety of formats, including simple web pages for reading, and downloadable BibTex, RIS or other formats. But that leaves the bibliography as a static resource, and the responsibility for maintenance and enhancement lies entirely with us. And if someone identifies a good candidate, there’s no easy way to feed it into the bibliography.

Now there are a few social bookmarking sites that are specifically oriented towards managing references, including Connotea. But I can’t work out how to use them in this way. I had a go at using Connotea a year or so ago, but have largely given up because it wasn’t very good at extracting the metadata for the kinds of resources I was bookmarking (so I had to do all the work anyway), and while I could do a download once from Connotea into the reference management tool I was then beginning to use on my desktop (a commercial product I won’t name), I couldn’t work out how to do incremental downloads. I had another poke around today, and while there clearly is some way of sharing, it didn’t feel like the simple act that social networking requires. And I couldn’t see much value in other people’s tags.

Today after reading an interesting article (Hull, Pettifer, & Kell, 2008), I experimented with Mendeley, which looked interesting. I’m not sure it works a lot better for me, for various reasons (although the metadata extraction works a bit better), but it was hard to be convinced it would be useful for this use case, given relatively low usage. I also remembered that I played with CiteULike a while ago; again I couldn’t quite work out how to use it as I want to, either personally, or in this use case.

I’m hoping that there is some way, with one or other of these tools, to load up the bibliography, maybe tagged in some way such as <data-curation>. That might allow others to find and access these resources, download the bookmarks etc. People could also presumably upload further bookmarks and tag them with the same tag, so that adds to the resources available to others. I’m not sure what can be done in this circumstance to quality-validate these resources, so that the whole bibliography remains of appropriate quality. Any ideas?


Hull, D., Pettifer, S. R., & Kell, D. B. (2008). Defrosting the Digital Library: Bibliographic Tools for the Next Generation Web. PLoS Comput Biol, 4(10), e1000204. http://dx.doi.org/10.1371%2Fjournal.pcbi.1000204

Need to read a CT scan from an old Magneto-optical disk

Philipp Westebbe from Berlin wrote to me:
“I am working at Rathgen Research Laboratory (National Museums in Berlin) in Germany. I have some valuable old data (computer tomography images) on MOD [Magneto-optical disc – CR] format. I am aware this is not the usual topic of the list, but we cannot think of a better place to start with. I am working on a project using an old MOD (Laser Drive Media LM1200-002) for recording computer tomography images. I'm looking for an old PC and the belonging software with which I can read this old MOD from 1992.”
In a later email, he wrote:
“At the moment I am coordinating the project about the Egyptian Queen Nefertiti bust [see Wikipedia- CR]. 1992 the Charite hospital in Berlin made a CT from the bust to analyse the internal structure. The CT images were saved at that time on a MOD LM 1200-002. The data are unfortunately no longer readable, because the Charite hospital doesn't have the specific computer any more. The only things I have in my research lab are the drive and the MOD with Nefertiti's scan on it. I got over long paths to your email. I am looking for someone who has such a computer with which I can read my old MOD from Nefertiti, and for a possibility to copy this old format into a new digital format. Nevertheless, I hope to get a contact, which helps me to read the old MOD from Nefertiti. It would be very nice if the person who reads this mail could help me handling my problem, or could send the email to someone who is still working with such an old CT system. It is quite difficult to get an old MOD from 1992 back into a common digital format.”
In a third email, he made clear that he has one disk, on which the CT scans are stored, and that the data are regarded as very valuable, as the bust is too fragile to scan again. He didn’t say, but perhaps the importance (and hence the value) of these data is increased by a current controversy over the authenticity of the bust.

Documents from NARA and other places show that the LM1200 disk is 12” Write-Once, Read-Many (WORM) dual-sided media, capable of holding 1 GByte each side. A catalogue from Siemens identifies it as Plasmon (Philips) WORM Media. These media are reputed to have a long life expectancy.

As far as I can tell, this drive will have been part of an embedded computer in the CT machine, although we don’t at this stage know the brand of the CT machine (or the type of embedded computer; many such machines of that vintage would have been “mini-computers” rather than PCs). The UK-based Plasmon PLC bought Philips LMS, the company that produced Philips 12” optical disks and drives, in 1999. It looks like Plasmon failed at the end of last year, and was in its turn bought by Alliance Storage Technologies in the US.

Their technical support points out that reading the data off the disk may not be enough, because of the proprietary formats involved at the time. “Your best bet is to contact a data migration company that has experience with the application that wrote the data or the company that wrote the software. If you have trouble reading the media because of a problem with your drive we may be able to help but there are few parts left for the drive… Sorry to provide such a bleak outlook. Medical companies tend to maintain legacy equipment the longest so there is some hope.”

Nevertheless, before there is a chance of interpreting the data, Phillip needs to get it off the disk. Does anyone know of a contact who could read and interpret such a disk?

Thursday 4 June 2009

SNIA "Terminology Bridge" report

Quite a nice report has just been published by the Storage Networking Industry Association's Data Management Forum, called "Building a Terminology Bridge: Guidelines for Digital Information Retention and Preservation Practices in the Datacenter". I don't think I'd agree with everything I saw on a quick skim, but overall it looks like a good set of terminology definitions.

The report identifies "two huge and urgent gaps that need to be solved. First, it is clear that digital information is at risk of being lost as current practices cannot preserve it reliably for the long-term, especially in the datacenter. Second, the explosion of the amount of information and data being kept long-term make the cost and complexity of keeping digital information and periodically migrating it prohibitive." (I'm not sure that I agree with their apocalyptic cost analysis, but it certainly deserves some serious thought!)

However, while still addressing these large problems, they found that what "began as a paper focused at developing a terminology set to improve communication around the long-term preservation of digital information in the datacenter based on ILM[*]-practices, has now evolved more broadly into explaining terminology and supporting practices aimed at stimulating all information owning and managing departments in the enterprise to communicate with each other about these terms as they begin the process of implementing any governance or service management practices or projects related to retention and preservation."

It's worth a read!

(* ILM = Information Lifecycle Management, generally not related to the Curation Lifecycle, but oriented towards management of data on appropriate storage media, eg moving less-used data onto offline tapes, etc.)

Tuesday 2 June 2009

New JISC Research Data Management Programme

I have not been posting for over a month now, due to pressure of work (bid writing, mainly), and its curiously hard to get back into it. There's plenty of things to write about; too many, really; too hard to pick the right one. But now something has arrived that absolutely demands to be blogged: the JISC Research Data Programme, Data Management Infrastructure Call for Projects (JISC 07/09) has been released (closing data 6 August 2009, Community Briefing 6 July 2009)!

We are very excited about this. The Programme aims for 6-8 projects in English or Welsh (not Scottish or Irish, grrrr) institutions that will identify requirements to manage data created by researchers, and then will deploy a pilot data management infrastructure to address these requirements. Coupled with some projects already funded under the JISC 12/08 call (CLARION at Cambridge, Bril at Kings, EIDCSR at Oxford, Lifespan RADAR at Royal Holloway, and the Materials Data Centre at Southampton), and some work not eligible for this call, such as Edinburgh DataShare, these projects will really begin to build experience in managing research data at institutional level (or groups of institutions) in the UK.

The DCC has a key role in the Programme. Apart from mentions of the Curation Lifecycle and the Data Audit Framework, paragraph 30 of the Call says
"30. Role of the Digital Curation Centre (DCC): Bidders are invited to consult with the DCC in preparing their bids. The DCC will provide general support for this strand of activities and for the programme more broadly. This will be done by contributions to programme events as well as the current channels of information, and through its principal role as a broker for expertise and advice in the management and curation of data. Projects are encouraged to engage directly with the DCC and its programme of information exchange - for example, by contributing to the Research Data Management Forum (RDMF)."
We are preparing to deliver on this role both in the short term (during the bid-writing phase), and later during the Programme execution. I have been very pleased to work with Simon Hodson, the JISC Programme Manager for this Call. I'm sure Simon does not yet realise how much influence he will be exerting over the future of research data curation in institutions in the UK!