Wednesday 2 July 2008

Negative Click, Positive Value Research Repository Systems


I promised to be more specific about what I would like to see in repositories that presented more value for less work overall, by offering facilities that allow it to become part of the researcher’s workflow. I’m going to refer to this as “the Research Repository System (RRS)” for convenience.

At the top of this post is a mind map illustrating the RRS. A more complete mind map (in PDF form) is accessible here.

The main elements that I think the RRS should support are (not in any particular order):
Here’s a quick scenario to illustrate some of this. Sam works in a highly cross-disciplinary laboratory, supported by a Research Repository System. Some data comes from instruments in the lab, some from surveys that can be answered in both paper and web form, some from reading current and older publications. All project files are kept in the Persistent Storage system, after the disaster last year when both the PIs lost their laptops from a car overseas, and much precious but un-backed-up data were lost. The data are managed through the RRS Data Management element, and Sam has requested a checkpoint of data in the system because the group is near finalising an article, and they want to make sure that the data that support the article remain available, and are not over-written by later data.

Sam is the principal author, and has contributed a significant chunk of the article, along with a colleague from their partner group in Australia; colleagues from this partner group have the same access as members of Sam’s group. Everyone on the joint author list has access to the article and contributes small sections or changes; the author management and version control system does a pretty good job of ensuring that changes don’t conflict. The article is just about to be submitted to the publisher, after the RRS staff have negotiated the rights appropriately, and Sam is checking out a version to do final edits on the plane to a conference in Chile.

None of the data are public yet, but they are expecting the publisher to request anonymous access to the data for the reviewers they assign. Disclosure control will make selected check-pointed data public once the article is published. Some of the data are primed to flow through to their designated Subject Repository at the same time.

One last synchronisation of her laptop with the Persistent Storage system, and Sam is off to get her taxi downstairs…

This blog post is really too big if I include everything, so I [have released] separate blog posts for each Research Repository System element, linking them all back to this post... and then come back here and link each element above to the corresponding detailed bits.

OK I’m sure there’s more, although I’m not sure a Research Repository System of this kind can be built for general use. Want one? Nothing up my sleeve!

6 comments:

  1. May I borrow your diagram for a presentation at Repository Fringe 2008 (with credit of course)?

    ReplyDelete
  2. Yes of course... it was the pressure of that upcoming event that forced me to turn this from mind map into text! Especially since I may not be there. Do feel free to ask folks at the Repository Fringe event to improve this in any way they can...

    ReplyDelete
  3. Hello Chris,

    What mind mapping software did you use to make this mind map?

    ReplyDelete
  4. Edwin, I currently use NovaMind on the Mac; I was a user of Mind Manager, but they did not have a Mac version so when I changed jobs and machines I picked on NovaMind, which I now like a lot. It really helps me to organise my thoughts. I wrote about this change of software, and the associated problem of authentic versions, in a blog post last year: http://digitalcuration.blogspot.com/2007/07/authenticity-across-migrations.html

    ReplyDelete
  5. interesting series of posts Chris.

    btw, perhaps Sam can become an archetype, see

    http://www.nfais.org/2004_Abstracts_Program.htm

    Perhaps we should develop a cast of characters to illustrate various use cases.

    peter burnhill

    ReplyDelete
  6. There may be data protection issues to factor in to the system here. We are not told the specifics of the data to be stored in the system. If it includes personal data (the survey data perhaps?) the design would need to take account of the data protection principles in the DPA 1998 which include obligations such as keeping data secure by taking appropriate technical and organisational measures against unauthorised or
    unlawful processing and accidental loss, destruction or damage and not transferring to countries outside the European Economic Area (Australia in this example), unless there is adequate protection.

    Your point about the checkpoint makes me think of the 4th DP principle which is about keeping data up to date. I don’t actually know what you mean by checkpoint but I imagine it is a snapshot of the data at that time so that even if further changes are made the previous version of the data is still available ( is that right?). Keeping personal data that is not up to date could be an infringement of this principle. However, the obligation is qualified by the words "where necessary". I’ve mentioned before that this could be of benefit where it is useful to keep the data as it was at a
    certain point for historical/evidential/archival reasons. So hopefully it would apply here.

    ReplyDelete

Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.