Thursday, 14 May 2009

OAIS version for public examination

Thanks to David Giaretta for the following information on the state of the revision to OAIS (I have commented earlier on this process):
OAIS version for public examination

Many comments and ideas for clarifications and improvements for OAIS were received as part of its 5 year review process.

These suggestions were reviewed and the proposed dispositions sent to their originators for further comment. This draft version of OAIS contains these and many other improvements and is the candidate for submission to ISO for review. At this stage we are seeking primarily to identify errors rather than further ideas.

The PDF file is available at

Please send corrections to by 15 June 2009

(NB there are some cross-reference errors which will be corrected in the final version)

Shortly after this date the corrected OAIS update will be sent to ISO and in due course this will be released for international review at which point further comments may be submitted.

John Garrett (chair) David Giaretta (deputy-chair)

Tuesday, 5 May 2009

What are data?

Another nice blog post from Peter Murray-Rust, in his "thinking out loud before a presentation" series, from which I quote:
"Different people would have cutoffs at different points on this hierarchy [CR: Data → Information → Knowledge → Wisdom] but I think the following are fairly common attributes of data:
  • it is [sic] distinct from most prose (although some prose would be better recast as data)
  • it is generally a component of a larger information or knowledge structure
  • facts and data are closely related
  • many data are potentially reproducible or unique observations, are not opinions (though different people may produce different data)
  • data, as facts, are not copyrightable.
  • Collections of data and annotated data (data + metadata) may have considerably enhanced value over the individual items.
  • Data can be processed by machine
Here are some statements which provide data:
  • 36 26 38
  • Melting Point: 300 K
  • The reaction product was red
  • my blog page is
and here are some which are not data
  • her work is well respected
  • we thank Dr. XYZZY for the crystals
  • we find this reaction very difficult to perform
I think I mostly agree with that, although it made me think quite hard, and of course at the extreme anything is data for someone (these words are mere data for Blogger or our RSS aggregator or Google, for example). this can get difficult when your mission is to support research data! Our advice from JISC is to recognise the potential ambiguity, be flexible in accepting others, but take our own view in what we create.

Monday, 4 May 2009

New nature corresponding author policy on data

Thanks to Michael Jubb for pointing me to the editorial outlining the new Nature policy:
"Accordingly, we have modified the Nature journal policy on authorship, which is detailed on our website ( For papers submitted by collaborations, we now delineate the responsibilities of the senior members of each collaboration group on the paper. Before submitting the paper, at least one senior member from each collaborating group must take responsibility for their group's contribution. Three major responsibilities are covered: preservation of the original data on which the paper is based, verification that the figures and conclusions accurately reflect the data collected and that manipulations to images are in accordance with Nature journal guidelines (, and minimization of obstacles to sharing materials, data and algorithms through appropriate planning."

This sort of policy from respected journals is seriously good for data curation!

[Updated to enable the Nature tinyurls.]