Thursday 3 December 2009

IDCC 09: William Michener Presents DataONE

Michener began his presentation introducing DataONE by asking “Why?” He explained that his team is focussed on the environmental challenges that are of increasing concern to us all.

He described their approach to build a knowledge pyramid and the role of citizen science to contribute more site-based data. Michener also highlighted the problem of data loss not just due to fire and other physical factors created by unstable data storage systems such as tapes, but also an inability to archive all the data – predicting that next year they will lose as much data as gets collected in the course of the year.

Data that gets collected does lose value over time. Scientists are more familiar with their data set at the time of publication, so without comprehensive metadata, we lose a lot of the details that make the data re-useable and relevant.

DataONE is designed to address some of these challenges. They aim to provide universal access to data about life on earth. They are developing a cyber infrastructure with distributed nodes, including member nodes where data are stored at existing sites, and co-ordinating nodes, where there is not physical storage of data, but keep the metadata catalogues, and an investigator toolkit for students and scientists to help access and use this data.

Michener then tool us through how they plan to support the data lifecycle – providing examples of the systems and tools to support data contributors and data users.

DataONE is initially focussing on biological and environmental data, but they recognise that many of the issues require using other data sources – including social science. They already have a range of data sources and a diverse array of partners which include libraries, academic institutions, research networks. They plan to leverage existing structures, but also expanding the network throughout the project. He also explained the link between DataONE and the Data Conservancy described by Carole Palmer in the previous presentation.

Michener showed how they have already identified a number of member nodes across the globe representing a number of research networks – but there is no common data management standard between them. “The problem with standards is that there are so many of them,” he noted, so there is no one-size fits all approach.

He demonstrated some of the investigator tools that they are developing to help search and filter the data available, which also enable you to bring up the metadata as well as the original data. Finally, he also introduced the EVA working group, which looks at exploring, visualising and analysing data, and gave a practical example of how they have combined data and created visualisations mapping vegetation and bird migration.

0 comments:

Post a Comment

Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.