Sunday, 27 January 2008

Repositories for the people?

I have been doing some thinking over the past couple of months about the role of repositories in digital curation, and it appears that others have as well. Dorothea Salo (Digital Repository Services Librarian at George Mason University) wrote a series of fascinating posts on her Caveat Lector blog, illustrating through fictional personae the dilemmas faced by many of the players in science, in academic and non-academic management in the fictional University of Achaea. The players are:
Among many interesting aspects that she highlights, the following quote is perhaps the most scary (partly because it is so often true):
"Dr. Troia’s basketology data, which are unique and could not be recreated if they were to disappear, live on the computer in her office. This computer is not to her knowledge backed up. Dr. Troia doesn’t want to put data on the department’s shared network drive, because she isn’t sanguine about its security, and her data are vital to her professional advantage, not to be pawed over by just anyone. Some of her older data are in a file format her current software can’t open; Dr. Troia shrugs about that—it’s just how software works, and she has a workaround (though a tedious and annoying one) for any file she absolutely must get into."
There is other evidence for this problem. For example, the Survey Report from the SToRe project notes:
"2.3.6 On a matter related to access and protection, the apparently common practice of storing unique and original research data on the hard drives of PCs and laptops, which seemed to be especially prevalent in chemistry and the biosciences, is a cause for concern."
Anecdotally, many of those hard drives were (and are) inadequately backed up. (Mea culpa here, having lost a months worth of data on a lost laptop. It's perhaps worth mentioning in passing here that convenient, reliable backup for a small department or group of independent-minded researchers each using their own technology approach such as Windows, Mac, Linux, desktop or laptop etc, can be surprisingly hard to organise. But I guess that's a topic for another day.)

In a related post, Dorothea writes:
"It’s really hard to put yourself in the mindset of “the user” when you’re trying to put a piece of software together. Working from standards or other specifications, that’s easy; the worst you’ll run into is linguistic ambiguity. But “the user”? Who the heck is “the user”? And that’s the point. There’s no such animal as “the user.” There’s various sorts of people who will be using your software. Persona development is an attempt to ditch “the user” in favor of a portrait that actually resonates, a concrete mental image that one can build a product or service design around."
Dorothea went down the path of creating personae because, although we think we know about researchers, "We have not been designing repositories for a typical, representative faculty member." She goes on to list a number of claimed repository advantages, that just don't cut any ice with Dr Troia. But:
"What Dr. Troia does need, and would immediately admit that she needs, is secure, networked, backed-up, maybe even version-controlled storage with access controls. Not (I cannot say this strongly enough) archival storage, because she needs to use it for in-progress work. But storage that had archival bolted onto the side with nice pointers about what to archive and when and how, that she might well use. Goes double if the system makes it easier for her to send her work to journals, or comply with data-retention or funder open-access requirements.

Do our repositories make any of that easier? Do they hell. They make it harder, and they don’t sweeten the pot by solving Dr. Troia’s other problems. No wonder all the Dr. Troias out there don’t use them."
This ties in with a comment from Peter Murray Rust in a much more recent post:
"I shall argue that the conventional model where information is “put” into “repositories” is the wrong design - certainly for data. Repositories have to be part of the scientific process - the key person is the scientist. "
The repository has to be in the science workflow for it to be used properly!

1 comment:

  1. Small correction: I'm no longer at George Mason. I work for the University of Wisconsin.


Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.