Thursday, 6 November 2008

Data publishing and the fully-supported paper

Cameron Neylon’s Science in the Open blog is always good value. He’s been posting installments of a paper on aspects of open science, and there’s lots of good stuff there. Of course, Cameron’s focus is indeed on open science rather than data, but data form a large part of that vision. In part 3:
“Making data available faces similar challenges but here they are more profound. At least when publishing in an open access journal it can be counted as a paper. Because there is no culture of citing primary data, but rather of citing the papers they are reported in, there is no reward for making data available. If careers are measured in papers published then making data available does not contribute to career development. Data availability to date has generally been driven by strong community norms, usually backed up by journal submission requirements. Again this links data publication to paper publication without necessarily encouraging the release of data that is not explicitly linked to a peer reviewed paper.”
“In other fields where data is more heterogeneous and particular where competition to publish is fierce, the idea of data availability raises many fears. The primary one is of being ’scooped’ or data theft where others publish a paper before the data collector has had the ability to fully analyse the data.”
In part 4, Cameron discusses the idea of the “fully supported paper”.
“This is a concept that is simple on the surface but very complex to implement in practice. In essence it is the idea that the claims made in a peer reviewed paper in the conventional literature should be fully supported by a publically accessible record of all the background data, methodology, and data analysis procedures that contribute to those claims.” He suggests that “the principle of availability of background data has been accepted by a broad range of publishers. It is therefore reasonable to consider the possibility of making the public posting of data as a requirement for submission.”
Cameron does remind us
“However the data itself, except in specific cases, is not enough to be useful to other researchers. The detail of how that data was collected and how it was processed are critical for making a proper analysis of whether the claims made in a paper to be properly judged.”
This is the context, syntax and semantics idea in another guise. Of course, Cameron wants to go further, and deal with “the problem of unpublished or unsuccessful studies that may never find a home in a traditional peer reviewed paper.” That’s part of the benefit of the open science idea.


Post a Comment

Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.