Thursday, 5 February 2009

A National Research Data Infrastructure?

Two weeks without a post? My apologies! How about this piece of speculation that has been brewing at the back of my mind for some time...

It is clear that a national research data infrastructure is needed, but there are problems with all of the approaches taken so far to address this. Subject data centres provide subject domain curation expertise, but there are scalability issues across the domain spectrum: it appears unlikely that research funders will extend their funding to a much larger set of these data centres (indeed the AHDS experience might suggest a concern to cut back). Institutional data repositories are being explored, but while disclosing institutional data outputs might provide sustainability incentives, and such data repositories might be managed at a storage level by developments from existing institutional library/archive and IT support services, it is difficult to see how domain expertise can be brought to bear from so many domains across so many disciplines. Meanwhile, various of the studies done by UKOLN/DCC with Southampton University suggest the value of laboratory or project repositories in assisting with curation in a more localised context.

To square this circle, perhaps we have to realise that the storage infrastructure and the curation expertise are orthogonal issues. It is reasonable to suggest that institutions, faculties, departments, laboratories or projects should manage data repositories, databases etc with varying degrees of persistence. But in terms of the curation of the data objects (ie aspects of appraisal, selection, retention, transformation, combination, description, annotation, quality etc), somehow expertise from each domain as a whole has to be brought to bear. It is tempting to think of this as a parallel with the "editorial board" function of a "virtual data journal". This would clearly only be scalable if it were managed across the sector, rather than individually for each data repository.

So we might suggest a federation of repositories on the one hand, and a collective organisation (or set of differing collective organisations) of curation expertise in different disciplines or domains on the other hand; the latter is referred to below as national curation mechanisms.

In such a system, we might see roles for some of the main stakeholders as follows:
  1. Research funders define their policies and mandates, and their compliance mechanisms; Research Information Network to participate and assist?
  2. Publishers likewise define their own policies and mandates with regard to data supporting publication; JISC and RIN could assist coordination here.
  3. Research Institutions define their own policies, establish local research data infrastructure, and encourage appropriate researchers to participate in national curation mechanisms (could participation in these become an element of researcher prestige as membership of editorial boards currently is?).
  4. Researchers are responsible for their own good practice in managing and curating their data (possibly in laboratory or project data repositories or databases), and where appropriate for participating in national curation mechanisms.
  5. Subject/discipline domain data centres are responsible for managing and curating data in their domain, as required by their funders. They also assist in defining good practice in their domains and more widely, undertake community proxy roles (NSB, 2005), and participate in national curation mechanisms.
  6. A number of bodies such as the Digital Curation Centre and the proposed UK Research Data Service could undertake some coordination roles in this scenario, and could also undertake good practice dissemination and skills development through knowledge exchange activities.
(NSB. (2005). Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century. Retrieved from

What do you think? Is that at all plausible?


  1. This needs a more detailed response than I can put here, now, but

    (a) I would pull the RIN, JISC role out of your points 1) and 2) as I see that coordination/information sharing role as something that exists independent of any of the stakeholders in this game.

    b) While I see RIN and JISC as important to the HEI community, the communities that need to buy into such a proposal are much larger - not just RCUK, but Welcome etc ... so I don't see RIN and JISC as having a mandate to coordinate. They're players, not referees (particularly RIN). On the other hand, they might fund referees, which maybe was your point.

    c) You allude to "data journals" but don't list it as integral in your enumeration. I think it should be integral to the national spectrum. Without publication metrics, and editorial glory (as you surmise) the work wont get done. And being integral one needs to think carefully about how it is done and the relationships to the other components.

  2. I think the context in which I included JISC and RIN was in discussion with publishers, where they have had some influence, and in their ability to convene fora where things can be discussed. Plus JISC definitely has a role in building infrastructure across UK HE.

    My reference for this has been the Australian ANDS situation, where there definitely IS a mandate to coordinate (and it will be tough enough even so). What really upsets me is the totally fragmented nature of our landscape here, in comparison. There are so many players, with overlapping or conflicting interests, that it's difficult to see much getting done.

    So this is an attempt to start a conversation that might lead to a better, more coordinated approach. More ideas and comments welcome!

    On "data journals", I know this is a key idea from CLADDIER, but I was thinking more metaphorically than actually. Maybe "actually" is the way to go. Thanks, Bryan...


