Wednesday, 4 June 2008

The negative cost repository, and other archive services

I've been at a meeting of research libraries here in Philadelphia these past two days; a topic that came up a bit was the sorts of services that libraries might offer individuals and research groups in managing their research collections. I was reminded about my post about internal Edinburgh proposals for an archive service, last year. Subsequent to that it struck me that there is quite a range of services that could be offered by some combination of Library and IT services; I mentioned some of these, and there seemed to be some resonance. There could well be more, but my list included:
  • a managed current storage system with "guaranteed" backup, possibly related to the unit or department rather than individual
  • a "bit bucket" archive for selected data files, to be kept in some sense as a record (perhaps representing some critical project phase) for extended periods, probably with mainly internal access (but possibly including access by external partners, ie "semi-internal"). Might conflate to...
  • a data repository, which I would see as containing all or most data in support of publication. This would need to be static (the data supports the publication and should represent it), but might need to have some kind of managed external access. This might extend to...
  • a full-blown digital preservation system, ie with some commitment to OAIS-type capabilities, keeping the data usable. As well as that we have the now customary (if not very full)...
  • publications repository, or perhaps this might grow to be...
  • a managed publications system providing support for joint development of papers and support for publication submission, and including retention & exposure of drafts or final versions as appropriate.
I really like the latter idea, which I have seen various references to. Perhaps we could persuade people to deposit if the cost of deposit was LESS than the cost of non-deposit. The negative-cost repository, I like that!


  1. Chris, Interesting ideas. These meetings are fertile ground for new thinking. I'll comment on a current project this time, rather than one from a decade ago (comment on your blog on Wikiproteins).

    At the PASIG conference last week the Preserv 2 project described a storage controller for EPrints ( Something similar is being developed for Fedora, and no doubt DSpace will have one too if these prove effective. The idea is you can select storage for your repository on the basis of scale, resilience, location ('cloud'), service provider, cost, etc., clearly a much wider range of storage options than now. In addition, the controller provides the ability to store objects in different locations based on metadata or type, and enables direct interaction between the repository software and open storage platforms. So you can see how many of the services in your list can develop, centred on the institutional repository.

    This can be allied to the work being done in the project using OAI-ORE ( What has been demonstrated is the ability to copy a whole repository, in tact, metadata included, from one repository software to another. So it's no longer just about open access and open source. With open repositories and open storage we have great flexibility in managing repository content in ways you anticipate.

    Now we are working on ways to make the interactions smarter between these open services, and many others out there including preservation services and tools.

  2. Chris, this could work as a layered protocol stack for repos. Strangely, we'll be unbundling the stack from the top down. Presumably not the way these things normally happen!


Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.