Tuesday, 15 July 2008

IPR and science data integration

Preparing for the JISC Innovation Forum, I have been reading John Wilbanks’ comment piece (Wilbanks, 2008) tracing the reasoning behind the Science Commons approach of abandoning Creative Commons-style licensing for integratable data, in favour of a dedication to the public domain plus codified community norms. To be honest, I was gob-smacked when this came about; it seemed that a desirable outcome (CC-like licensing for data) had been abandoned. However, Wilbanks makes a powerful argument, and as a non-lawyer (albeit someone who has been arguing about the existence and/or nature of any implied licence for web pages since 1995 or so), it seems to me a very convincing one.

It is a critical point that this is the approach of choice where the data are to be offered for integration with other data on the open web. So this approach would be sensible for most of the kinds of data we think of as forming part of the semantic web, for instance.

The arguments clearly do not apply, and are specifically not intended to apply, to all classes of science data or databases. For instance, many science data are collected under grants from funders who require conditions to be placed on subsequent use. Wilbanks analysis might suggest that some of these mandates are misguided, and this may be so, but some have very strong ethical and legal bases, particularly (but not only) research producing or using social science or medical data relating to individuals.

The UK Data Archive, for example, requires registration and agreement to an end user licence before access to any datasets. The standard licence, for example, requires me (my emphasis):

“2. To give access to the Data Collections, in whole or in part, or any material derived from the Data Collections, only to registered users who have received permission from the UK Data Archive to use the Data Collections, with the exception of Data Collections supplied for the stated purpose of teaching.”


“5. To preserve at all times the confidentiality of information pertaining to identifiable individuals and/or households that are recorded in the Data Collections where the information contained in the Data Collections was created less than 100 years previously, or where such information is not in the public domain. In addition, where so requested, to preserve the confidentiality of information about, or supplied by, organisations recorded in the Data Collections. In particular I undertake not to use or attempt to use the Data Collections to deliberately compromise or otherwise infringe the confidentiality of individuals, households or organisations. Users are asked to note that, where Data Collections contain personal data, they are required to abide by the current Data Protection Act in their use of such data. “

UKDA are right to take a strong protectionist approach to most of these data (maybe the 1881 census data could get a more wide-ranging exclusion than that implied in the previous paragraph). Problems of cascading and non-inter-operable licence conditions may arise, but probably only on a small scale, and likely to be resolvable on the basis of negotiations, or through emerging facilities specifically aimed at providing access to potentially disclosive data.

Wilbanks recognises these problems:

“There will be significant amounts of data that is not or cannot be made available under this protocol. In such cases, it is desirable that the owner provides metadata (as data) under this protocol so that the existence of the non-open access data is discoverable.”

The UKDA metadata catalogue is both open and interoperates, so it does effectively apply this rule (although perhaps not yet to the letter).

Likewise, BADC distributes some data that it has acquired through NERC (funder) mandates (effectively their data policy, apparently under review), but also some data that it has acquired from sources such as the Met Office Hadley Centre, for whom it might have high commercial value. It's not surprising that some of these funders impose restrictions, although it would be good if they would look long and hard at the value question before doing so.

At the Forum legal session on data, there was a debate on the motion: “Curating and sharing research data is best done where the researcher’s institution asserts IPR claims over the data”. Prior to the speakers, a straw poll suggested that 5 were in favour, 10 against with 7 abstentions. After the debate, the motion was lost with 6 in favour, 14 against and 2 abstentions. What the debate most illuminated, perhaps, was the widespread distrust held for the whole apparatus: institutions, publishers, researchers, even curators… and in addition, we all probably shared a lack of clarity about the nature of data, the requirements of collaboration, the impacts of disciplinary norms, effective business models, etc, etc.

In practice, of course, curation is going to require a partnership of all the stakeholders identified above, and probably more!

WILBANKS, J. (2008) Public domain, copyright licenses and the freedom to integrate science. Journal of Science Communication, 7. http://jcom.sissa.it/


