Thursday, 12 July 2007

Open Access everyone? Or not?

I occasionally look at the OpenDOAR service, which list information about repositories, and check out those which claim to include data (the term they use is datasets, although it is possible that “other” might also be applicable!). Last time I looked there were 7 listed in the UK which claimed to collect datasets; 4 of these were institutional repositories, one is the Southampton eCrystals archive, the 6th is the National Digital Archive of Datasets (NDAD), which collects government datasets on behalf of The National Archive, and the 7th is Nature Precedings.

I used to think that was a creative piece of listing by NDAD, since OpenDOAR was created for institutional repositories, wasn’t it? But as time has gone on, I’ve begun to think NDAD are right; the old definitions of repositories were much too tight, and NDAD and many other data archives ought to be considered as repositories, and listed in these kind of resources. On that basis perhaps AHDS should have been listed (although they might not bother now) and, I thought, the UK Data Archive, part of the Economic and Social Data Service, should also be listed. After all, they are repositories, and open access is a good thing, isn’t it?

At the UKDA’s 40th birthday celebrations yesterday, it was clear that Kevin Schurer (Director of UKDA) doesn’t quite share that view. The view expressed certainly was not a total rejection of Open Access in favour of a commercial approach. But he was certainly arguing for some barriers between some data and the users. In particular, while access to UKDA data is free for certain categories of users (certainly UK academics and probably some others), ALL users are required to register. Kevin made a strong case for this being an advantage; registration means that he can report to his funders who his users are (including how many of them might be independent or Government researchers as well as academic ones), and can also monitor which datasets they are using. He knows, for example, that the user base has grown from 25 or so after the first 5 years to 45,000 (I don't remember the exact figure, but of that order) after the first 40 years.

The same registration mechanism (and associated authentication and authorisation mechanisms) also allow them to apply greater access controls to more sensitive datasets, including the possibility that the user may have to sign and observe special licences. It’s worth remembering that much of the data they hold is about people, and some is extremely sensitive information, which has been provided (under “informed consent”) for certain specific purposes.

In a very different environment the day before, I met some scientists with very long-term experiments collecting data on insects. For perhaps different reasons, they were happy to make their data available to collaborators, but not openly available on the web. Too many risks of mis-interpretation, which then requires extra effort to refute, was one reason. No doubt an extra paper as co-author from a collaboration was another motive (not unreasonable).

Are these approaches Open? Are they consistent with the OECD Principles and Guidelines for Access to Research Data that I wrote about earlier? Let’s remember that the Openness Principle was carefully worded:
“Openness means access on equal terms for the international research community at the lowest possible cost, preferably at no more than the marginal cost of dissemination. Open access to research data from public funding should be easy, timely, user-friendly and preferably Internet-based.”
So it looks as though the approach is reasonably consistent, provided access is on “equal terms for the international research community”, by that definition. I suspect there are other definitions (who said “the nice thing about standards is there are so many to choose from”?) by which these approaches would fail (eg the Berlin Declaration).

What the UKDA approach might do is make certain kinds of comparative work, including automated data mining, more difficult. There was a plea with respect to the former, from an Australian speaker, for more sophisticated international cross-archive access management (code: Shibboleth-enable). But I suspect with respect to the latter (preventing data mining) Kevin might argue “very right and proper too”!


  1. It might help to cross-check the ROAR registry too...

    And, of course, not just research articles, but their associated data too, should be self-archived in researchers' institutional repositories too. (It's just that the articles are a more urgent and reachable priority).

  2. Thanks Stevan. I have sinced cross-checked ROAR, where the appropriate category appears to be "Database/A&I Index". Unfortunately it only includes 3 entries for the UK in that category, only one of which is not in OpenDOAR. The addition is the Research Findings Register, which is a Department of Health database containing research summaries. I remember contacting them once before, and they admitted they do not really contain any datasets.


