Sunday 27 January 2008

Repositories for the people?

I have been doing some thinking over the past couple of months about the role of repositories in digital curation, and it appears that others have as well. Dorothea Salo (Digital Repository Services Librarian at George Mason University) wrote a series of fascinating posts on her Caveat Lector blog, illustrating through fictional personae the dilemmas faced by many of the players in science, in academic and non-academic management in the fictional University of Achaea. The players are:
Among many interesting aspects that she highlights, the following quote is perhaps the most scary (partly because it is so often true):
"Dr. Troia’s basketology data, which are unique and could not be recreated if they were to disappear, live on the computer in her office. This computer is not to her knowledge backed up. Dr. Troia doesn’t want to put data on the department’s shared network drive, because she isn’t sanguine about its security, and her data are vital to her professional advantage, not to be pawed over by just anyone. Some of her older data are in a file format her current software can’t open; Dr. Troia shrugs about that—it’s just how software works, and she has a workaround (though a tedious and annoying one) for any file she absolutely must get into."
There is other evidence for this problem. For example, the Survey Report from the SToRe project notes:
"2.3.6 On a matter related to access and protection, the apparently common practice of storing unique and original research data on the hard drives of PCs and laptops, which seemed to be especially prevalent in chemistry and the biosciences, is a cause for concern."
Anecdotally, many of those hard drives were (and are) inadequately backed up. (Mea culpa here, having lost a months worth of data on a lost laptop. It's perhaps worth mentioning in passing here that convenient, reliable backup for a small department or group of independent-minded researchers each using their own technology approach such as Windows, Mac, Linux, desktop or laptop etc, can be surprisingly hard to organise. But I guess that's a topic for another day.)

In a related post, Dorothea writes:
"It’s really hard to put yourself in the mindset of “the user” when you’re trying to put a piece of software together. Working from standards or other specifications, that’s easy; the worst you’ll run into is linguistic ambiguity. But “the user”? Who the heck is “the user”? And that’s the point. There’s no such animal as “the user.” There’s various sorts of people who will be using your software. Persona development is an attempt to ditch “the user” in favor of a portrait that actually resonates, a concrete mental image that one can build a product or service design around."
Dorothea went down the path of creating personae because, although we think we know about researchers, "We have not been designing repositories for a typical, representative faculty member." She goes on to list a number of claimed repository advantages, that just don't cut any ice with Dr Troia. But:
"What Dr. Troia does need, and would immediately admit that she needs, is secure, networked, backed-up, maybe even version-controlled storage with access controls. Not (I cannot say this strongly enough) archival storage, because she needs to use it for in-progress work. But storage that had archival bolted onto the side with nice pointers about what to archive and when and how, that she might well use. Goes double if the system makes it easier for her to send her work to journals, or comply with data-retention or funder open-access requirements.

Do our repositories make any of that easier? Do they hell. They make it harder, and they don’t sweeten the pot by solving Dr. Troia’s other problems. No wonder all the Dr. Troias out there don’t use them."
This ties in with a comment from Peter Murray Rust in a much more recent post:
"I shall argue that the conventional model where information is “put” into “repositories” is the wrong design - certainly for data. Repositories have to be part of the scientific process - the key person is the scientist. "
The repository has to be in the science workflow for it to be used properly!

Saturday 5 January 2008

More posts on the 3rd Digital Curation Conference

Somewhat belatedly, I found a series of posts blogged during the 3rd International Digital Curation Conference in Washington last month by K-State Libraries Conference Reports (more than one individual). The posts included (with some highly selective personal choice extracts):

Digital Curation Conference: a general comment
"I just spent four days hearing over and over again, however, how thousands of scientists in many countries make use of the very same technologies used by your average teenager--namely, blogs, wikis, Facebooky things, etc.--to do essential work and collaborate with their peers. The head of Web publishing at the Nature Publishing Group, certainly no lightweight or frivolous organization, proudly states that they're creating social networking tools built directly on the Facebook model, and sees in blogging part of the future of scientific communication. Scientist after scientist stood up and showed how wikis and blogs are used for essential communication and collaboration, how some labs have gone nearly entirely over to a social collaborative model."
Digital Curation Conference: Surveying Bloggers' Perspectives
Digital Curation Conference: Sustaining Engineering Informatics
"[Joshua Lubell] then outlined the three access scenarios (the three Rs): reference, reuse, and rationale. The first is simply the ability to view and visualize the engineering data. Reuse is what it sounds like (STEP [ISO 10303] supports this), namely, taking the data and modifying or reengineering it. Rationale is the ability to display information such as construction history, design intent, etc., which go beyond the design itself. He compared this to the chess column in a newspaper, where, in addition to a snapshot of the board at some point in the game, you get a description of the moves required to reach that state. Loosely put, it's the set of 'why' questions that can arise from a design. STEP does not include this rationale piece, which is the point behind his work."

Digital Curation Conference: Moving Archival Practices Upstream
Digital Curation Conference: Day Two Keynote
"Carol Goble, U of Manchester

"[n.b.- This talk was the highlight of DCC for me. Rather than highlighting, again, only the challenges without really offering solutions, she showed concrete examples and tools, in pretty good detail.]"
Digital Curation Conference: Towards Distributed Infrastructures
Digital Curation Conference: Day One Closing Plenary
"Could it be that we are well enough funded to be too comfortable with our traditional roles? Rick Luce asked this question near the end of his first day closing keynote. Otherwise, his talk was a fairly standard review of what's going on and what needs to be done to solve some of the pressing issues, but this question struck me as unique. He's right, I think. Our funding is sufficient to continue operating much as we have for a long time; sure, making incremental changes, but never really taking the great leap forward to stop doing most of what we do now and really take on some major new challenges."

Digital Curation Conference: Sustainable Access to the Records of Science
"... liked to say Fedora, too, as did many people at the CNI meeting. Clearly, it's the flavor of the week, and I sensed a lot of uncertainty among library leaders at CNI who did not yet have a Fedora installation at their library. "We need to move to Fedora now" appears to be the current mantra. That's a bit amusing. Sure, Fedora is great, and can do many wonderful things, but so can a lot of other platforms and solutions, who were either popular way back when (2005, gasp) or have yet to gain traction but are on the horizon. What I've seen and heard in the last three days convinces me that my gut feeling about Fedora is not incorrect: if you have a crew of developers, it might work well for you, but if you lack the commitment to hire and hold such a crew, Fedora is not for you."
Digital Curation Conference: National Perspectives
"In something of an aside, [Rhys Francis] pointed out that computer science began 60 years ago, communications (in the network sense) about 15-20 years ago, and he thinks that something significant and as yet unnamed and vague happened about three years ago. He said we're now living in the data deluge, and in future decades we'll look back and have a name for what is happening now. For me, that's both an exciting and somewhat unsettling notion, since those working in these areas during periods of inception and definition tend to look rather silly to their successors (don't we laugh at the notion of typing catalog cards, after all?), only much later to be recognized for their efforts and innovations.

"In his opinion, there are four facets for collaboration: data, compute, interoperation, access. One must do all four, not one or two or three, which he noted is a key message to an audience largely consisting of data managers of one sort or another.

"One of his closing questions was whether data is actually infrastructure."
David Rosenthal was also at the conference, and was also taken with Carole Goble's day 2 keynote, as he noted in his blog:
"She's a great speaker, with a vivid turn of phrase, and you have to like a talk about science on the web in which a major example is VivaLaDiva.com, a shoe shopping site."
David was interested in Carole's myexperiment implementation, and his blog is worth reading for other insights into that. But I also liked this
"The emerging world of web services is the big challenge facing digital preservation. Her talk was a wonderful illustration both of why this is an important problem, in that much of reader's experience of the Web is already mediated by services, and why the barriers to doing so are almost insurmountable."
(I think "doing so" here means preserving the reader's experience in this environment.)

10 Downing St on AHDS...

How exciting to receive an email from 10 Downing St this morning! It tells me that "The Prime Minister's Office has responded to [the AHDS] petition and you can view it here"

This refers to the AHRC's decision to cease funding the AHDS, which this blog discussed earlier. I signed the petition, the details of which were:
"On 11 May 2007, Professor Phillip Esler, Chief Executive of the AHRC, wrote to University Vice-Chancellors informing them of the Council's decision to withdraw funding from the AHDS after eleven years. The AHDS has pioneered and encouraged awareness and use among Britain's university researchers in the arts and humanities of best practice in preserving digital data created by research projects funded by public money. It has also ensured that this data remains publically available for future researchers. It is by no means evident that a suitable replacement infrastructure will be established and the AHRC appears to have taken no adequate steps to ensure the continued preservation of this data. The AHDS has also played a prominent role in raising awareness of new technologies and innovative practices among UK researchers. We believe that the withdrawal of funding for this body is a retrogade step which will undermine attempts to create in Britain a knowledge economy based on latest technologies. We ask the Prime Minister to urge the AHRC to reconsider this decision."
The Prime Minister's response is:
"Thank you for your e-petition.

"Government policy is that decisions on such services are a matter for the Research Council concerned.

"The Government expects that such decisions would only be taken after a careful review of the service in question. We are aware that the Arts and Humanities Research Council (AHRC) only took the decision to cease funding the Arts and Humanities Data Service (AHDS) after detailed consideration. The Council concluded at its March 2007 meeting that having regard both to the performance and cost of AHDS, this was not something they should continue to fund. There are no grounds for the Government seeking to ask the Council to reconsider that decision."
Well, I guess those who wrote and those who signed the petition didn't expect much more than this. But I think I have yet to see any "suitable replacement infrastructure" and the AHRC still "appears to have taken no adequate steps to ensure the continued preservation of this data".