Thursday, 10 September 2009

Nature special section on data sharing

This week's issue of Nature includes a very interesting special section on data sharing. There is a (not entirely accurate) editorial, a feature (on why data archives are so empty), and two opinion pieces.

The first opinion piece is on pre-publication deposit, linked to a workshop held in Toronto (and continuing the famous Bermuda and Fort Lauderdale workshops on data sharing in genomics) where attendees "[recommended] extending the practice to other biological data sets". The workshop produced the "Toronto Statement", which has recommendations for funding agencies, data producers, data anlysts and users, and scientific journal editors. The statement encourages rapid pre-publication deposit where the datasets are:
  • "Large scale (requiring significant resources over time)
  • Broad utility
  • Creating reference data sets
  • Associated with community buy-in"
The second opinion piece is on post-publication deposit, linked to a meeting of mouse researchers in Rome "[proposing] ways to promote a culture of sharing". It's partly about other kinds of research materials (mice and cell-lines), but mostly about data. A couple of paragraphs are worth quoting (I hope Nature doesn't mind!):
"Where they don't yet exist, clear criteria should be developed for reviewers of grants to help them assess data and material-sharing plans submitted as part of a funding proposal. There are already examples of good practice in this regard from the NIH, the Howard Hughes Medical Institute, and several UK funding organizations such as the Wellcome Trust and the Medical Research Council. Data-sharing plans are required in proposals, efforts are made to facilitate sharing, such as putting investigators in touch with repositories and, for some organizations, compliance is an important consideration in funding renewal.

"Deposition of data and resources into public repositories is important for the validation of published results, as well as facilitating reuse. Although it is usual practice for major public databases to make data freely available to access and use, any restrictions on use should be strongly resisted and we endorse explicit encouragement of open sharing, for example under the newly available CC0 public domain waiver of Creative Commons."
The latter point is particularly interesting; in the UK restrictions on sharing often do seem to exist, even if only required registration to access. Sometimes the argument here is about protecting confidentiality; sometimes it is about ensuring the archives know their customers, for sustainability reasons. So this paragraph is also interesting:
"Many of the major public data repositories have no stable underlying funding and there are data types, particularly new ones, without appropriate public data repositories. We encourage further investment and recommend that public database coverage and stability be looked at in a coordinated way by funding organizations and the community with increased urgency. A good model is provided by the UK Biotechnology and Biological Sciences Research Council's Bioinformatics and Biological Resources Fund, which provides dedicated funding for development and sustainability of public resources and informatics tools."
This all appears to be Open Access, and is well worth reading.