Wednesday, 2 July 2008

Research Repository System persistent storage

This is the seventh and last of a series of posts aiming to expand on the idea of the negative click, positive value repository, which I'm now calling a Research Repository System. I've suggested it should contain these elements:
At a very basic level, the RRS should provide a Persistent Storage service. Completely agnostic as to objects, Persistent Storage would provide a personal, or group-oriented (ie within the institution) or project-oriented (ie beyond the institution) storage service that is properly backed up. There’s no claim that Persistent Storage would last for ever, but it must last beyond the next power spike, virus infection or laptop loss! It has to be easy to use, as simple as mounting a virtual drive (but has to work equally easily for researchers using all 3 common OS environments). Conversely (and this isn’t easy), there must be reliable ways of taking parts of it with you when away from base, so synchronisation with laptops or remote computers is essential. It should support anything: data, documents, ancillary objects, databases, whatever you need. It’s possible that “cloud computing” eg Amazon S3, the Carmen Cloud or other GRID services might be appropriate.

[I'll include the last two shorter parts here.

The RRS might include a full-blown OAIS digital preservation archive. Not many institutions run these at this time, so although I think it should, I hesitate to suggest it must!

Some spinoffs you should get from your RRS would include persistent elements for your personal, department, group or project web pages (even the pages themselves). It should provide support for your CV, eg elements of your bibliography, project history, etc. It will provide you and your group with persistent end-points to link to. And your institution will benefit, first from the fact that it is supporting its researchers in curating their data and supporting verifiability of publication, and also benefiting from the research disclosure aspects.]

I guess that just about wraps it up. So who's going to build one?


  1. A very interesting series of posts - you're not asking for much are you? :-)

    Seriously, though - the system you've sketched out is quite complicated, not so much in the individual parts of it, (though several of those parts are quite substantial) but in the design vision required to connect up all those features in a way that would be easy and natural to use.

    Can you identify a subset of parts that would make a useful version 1?

  2. Chris, I note the overall focus of your vision is on researcher workflow, hence the Research Repository System (RRS) rather than an institutional repository. Our focus in the Preserv 2 project is on an institutionally-directed, rather than researcher-directed, repository. So the question is: which of these approaches will prevail?

    The irony of the negative click philosophy is that it has led you to produce a visionary but significantly more complex system. This arises because one response to the lack of research papers (open access) content in IRs is to say that repositories need to integrate with researchers' workflow, to grab the evolving content at an earlier stage. Another argument presented for the lack of content in repositories is it takes too long to deposit something, which suggests that keeping deposit simple is the answer, somewhat contrary to the workflow case.

    This is a dilemma which we don't appear to have evidence to answer now, but we can see how the answer matters, because it determines the nature of the content that needs to be stored and at what stage the service needs to be effective.

    My guess is that because not many institutions have yet invested heavily and sustainably in IRs, in terms of funding and policy, perhaps because they have not been moved by researcher demand, it is hard to see why they might invest in something more complex and costly to serve researchers. Yet apart from the workflow aspect that underpins the RRS, some of the other elements - notably 'spinoffs', another little developed aspect in the blog but seen clearly in the accompanying diagram, as well storage - are happening or beginning to happen because they have institutional benefits.

  3. :Bill, I agree this is complicated, maybe too much so. Having got this far, I do plan to try and think about various subsets.

    :Steve, I think the one thing that's absolutely critical is to get the repository into the workflow. Maybe my idea is too complex; maybe there are better ways of doing it. But I really do not believe that simply making deposit simpler is the answer. It needs to be simple at the right point in the researcher workflow, or they won't even think about it!


Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.