Tuesday, 5 May 2009

What are data?

Another nice blog post from Peter Murray-Rust, in his "thinking out loud before a presentation" series, from which I quote:
"Different people would have cutoffs at different points on this hierarchy [CR: Data → Information → Knowledge → Wisdom] but I think the following are fairly common attributes of data:
  • it is [sic] distinct from most prose (although some prose would be better recast as data)
  • it is generally a component of a larger information or knowledge structure
  • facts and data are closely related
  • many data are potentially reproducible or unique observations, are not opinions (though different people may produce different data)
  • data, as facts, are not copyrightable.
  • Collections of data and annotated data (data + metadata) may have considerably enhanced value over the individual items.
  • Data can be processed by machine
Here are some statements which provide data:
  • 36 26 38
  • Melting Point: 300 K
  • The reaction product was red
  • my blog page is http://wwmm.ch.cam.ac.uk/blogs/murrayrust
and here are some which are not data
  • her work is well respected
  • we thank Dr. XYZZY for the crystals
  • we find this reaction very difficult to perform
I think I mostly agree with that, although it made me think quite hard, and of course at the extreme anything is data for someone (these words are mere data for Blogger or our RSS aggregator or Google, for example). this can get difficult when your mission is to support research data! Our advice from JISC is to recognise the potential ambiguity, be flexible in accepting others, but take our own view in what we create.

1 comment:

  1. I agree; while the DIKW framework sounds simple at first, it's application can be tendentious. Of course, it's easy to get caught up arguing edge cases, but even the basic assumptions of the model are open to question.

    I don't know if you read the recent article in the Journal of Info Science on this topic, The Knowledge Pyramid: A critique of the DIKW hierarchy (Fricke, 2009), but it's a nicely related and a thought-provoking piece (if not one I quite agree with).

    You make a good point how the categorization of data/info/etc is inherently subjective, and I think you draw the right moral: the system should work for the application, not the other way around.


Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.