Wednesday, 9 July 2008

Semantic web on the Today programme

Turning on the radio this morning, I was surprised to hear someone discussing data on the Today programme on BBC Radio 4. It turned out to be Sir Tim Berners Lee talking about the semantic web; he even managed to mention RDF and HTML without confusing the interviewers too much. The interesting 8 and a half minute discussion is available via the BBC iPlayer.

I'm quite keen on understanding better how the semantic web might relate to science data. When people talk about data in relation to the semantic web, they often seem to be thinking the sort of relatively unitary facts or simple relationships that RDF triples are quite good at expressing, such as the population of China is 1,330,044,605. It's certainly not clear to me how this generalises to express the changing population, let alone how it could express the data from a spectrometer or
a crystal structure determination or remote-sensing data. If anyone can point me to a good resource discussing this, I would be grateful!


  1. This doesn't quite answer your question, but I have tended to keep, in the back of my mind, Clay Shirky's essay on the weakness inherent in the Semantic Web's 'worldview'.

    He expands on your point about the challenges to the semantic web vision of a world not bounded in simple, atomic relationships.

  2. You can represent pretty much anything in RDF, though quite quickly you need a little graph fragment, ie a few linked statements, rather than a single statement.

    For something like your China population example, obviously that number keeps changing, so what you are really talking about is some kind of observation or measurement of the population, taken at a particular time using a particular method.

    So you could have some RDF statements like (subject : property : value)

    measurement1 : population of china : 1,000,000,000
    measurement1 : date : 2008-07-10
    measurement1 : method : national census
    measurement1 : carried out by : Chinese government

    In graph terms, this is a single item, "measurement1" with a number of 'spokes' coming out from it for the different properties. Sometimes you need a more complicated structure, using blank or linking nodes. There's always lots of different ways to do it, like any kind of data modelling, and the way you choose depends on the purpose. Supposing you were looking at a table of population info, along the lines of:

    country population date
    China 1,300,000,000 2006
    Australia 20,000,000 2007
    China 1,000,000,000 1997

    In this case the population is a property of China, but the date is a property of the measurement of the population. You need some kind of intermediate node in your graph, so you can have a set of RDF statements like this. (Shame you can't easily draw diagrams in blog comments).

    China : population : measurement1
    measurement1 : value : 1,300,000,000
    measurement1 : date of observation : 2006
    Australia : population : measurement2
    measurement2 : value : 20,000,000
    measurement2 : date of observation : 1997

    You could also be a bit more specific about what these observations are:
    measurement2 : type_of : measurement

    I set up a company 6 months or so ago to try to tackle some of the issues around more effective collaborating around data. There's some info about it at if you are interested. We're aiming initially more at business applications than science, but many of the principles are the same, and because of my background I am always interested in the potential scientific applications. I certainly think that you could use RDF effectively for storing science data.

    Sorry, this comment ended up being rather long! But hopefully useful.


Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.