Tuesday 9 March 2010

When data shouldn’t be open?

There is a big momentum these days about data being accessible, available, and re-usable. Increasingly people want open data; Science Commons have been recommending using CC0 to make the fully open status of data clear. More recently the Panton Principles start:

“Science is based on building on, reusing and openly criticising the published body of scientific knowledge.

For science to effectively function, and for society to reap the full benefits from scientific endeavours, it is crucial that science data be made open.”

We’ve been big fans of Open Access at the DCC since its early days. We use a Creative Commons licence for our content by default. This blog was one of the earliest to be specific about a Creative Commons licence not only for the core text that we write, but also for the comments that you might add here.

So we strongly support the Open Data approach… where possible. For of course in some areas of science and research, there are data that cannot be open. Usually this is because the data are sensitive. They could be personal data, protected under Data Protection laws. Sensitive personal data (such as medical record data) has extra requirements under those laws. They could be financial microdata, commercially sensitive. Or perhaps data with strong commercial exploitation potential. They could be anthropological data, sensitive through cultural requirements. Research needs to go anywhere, whatever the issues; we can’t be constrained to only research where the data can be open.

So perhaps it’s as simple as that: some science should have open data, and some should have closed data?

Well, maybe not. Because the underlying issue of the Panton Principles must still apply. Research should be verifiable, whether through repeatable experiments or through re-analysable data. Unverifiable research is, well, unreliable- perhaps indistinguishable from fraud. Some access is needed; perhaps we should think of even sensitive data as Less Open Data rather than closed data.

So how do you go about dealing with sensitive data? Keep it secure, transfer securely, provide access under strict licences and controls in dat enclaves, aggregate, de-identify, anonymise, there are plenty of tricks in the book. That’s the topic of the 4th Research Data Management Forum starting tomorrow in Manchester. I’ll hope to have more to write about what we learn later.


Post a Comment

Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.