(Andy Powell liveblogging this at http://tinyurl.com/dhfdbn)
Chaired by John Wood, who chairs the UKRDS Steering Committee, talking big numbers: 20 PByte per year from CERN, 200 PByte per year from some German project…
Jean Sykes, Project Director on the Challenge of Data Lifecycle Management.. The challenge includes the data deluge, but really it’s about research data as an untapped resource, due to lack of coherent/consistent policies & standards, not to mention infrastructure. Request to libraries for storage in repositories, no added value. But it’s the management of data through the whole lifecycle from creation through to re-use, with storage as one part, that is needed.
Cliff Lynch, Nature: “digital data are so easily shared and replicated and so recombinable, they present tremendous re-use opportunities…”
Four case studies: Bristol, Leeds, Leicester, Oxford (doing their own study at the same time).
Survey results: data volume 360% in 3 years; 50% of data has useful life of up to 10 years, 26% indefinite retention value. Most data held locally (often precarious). 43% think their research could be improved by access to more data.
DCC Lifecycle model “provides a useful standard for data management”. DRAMBORA gets a mention too. JISC Information Environment & JANET provide the infrastructure. JISC & RIN studies provide context.
So a more coherent UK-wide approach is feasible & desirable.
US model: NSF Datanets $100M over 5 years, second round in process. Australia, centralised top-down (but also distributed??).
UKRDS to address sustainability, not just about storage. Many building blocks in place but gaps to be filled, so embrace rather then replace existing facilities, to leverage research value.
Next we have John Milner on the Pathfinder approach. Aim to build a fully functional service but with only a limited population of stakeholders. Comes up with that boil the ocean metaphor that I’m sorry, I hate! Hope is we can build something that scales; something more than a conventional pilot. So start and continue, rather than start and re-start. Pathfinders will be the case studies plus Cardiff, Glasgow & Edinburgh (gets 3 HE funders on board).
Service requires clear expectation for both client and server on what it offers and what it doesn’t. Hmmm unreadable stakeholder diagram, maybe in the notes. Segmented into early and later adopters. Another diagram: simplified view of research service and research data sharing process, plus UKRDS services and administration. Lots of this is advisory, policy, strategy etc. Not much if anything I can see about storage or curation. I’m not quite getting it yet; no long term without short term!!! Front office/back office division; maybe the back office does the hard stuff?
Diagram emerges showing overlap between UKRDS, DCC and RIN. Either alarming or comforting for us, depends how you read it! Also nominate UKDA to help.
DCC & UKDA can provide infrastructure & tools… RIN with ongoing requirements planning &QA…
Summing up as feasible… but not yet addressed where the CURATION is done!
Now Q&A… John Wood on EU 5th freedom: freedom of knowledge!
Malcolm Atkinson, e-Science envoy: glad to see integrated approach, but bit worried maybe arbitrary boundary. Value of data limited if can’t compute on it. Need to think carefully about computational access at same time as data management; hoped to see linkage with e-Science such as NGS; integrated approach necessary. John Wood agrees; EU programme called PREIS??? Always 1 world-leading computer in EU though may move around.
Kevin Schurer, UKDA still confused about process; second slide on “demand-led”. Something missing: the supply; have we over-estimated relationship between demand and supply. 5 Veg a day example; people say yes but don’t do it. Similar for research data: agree they should share but don’t do the necessary: documentation, metadata, etc, so getting useful data out of researchers is hard (this is bitter experience talking!). Do local institutions have enough skill to support this? JM answers: demand is for the service, reservations are right, but focus is on gaps. Primary requirement is for researchers to do coherent data management plans so that’s where we start. Execution of DM plans another thing? JS: we don’t know until we try it. (CR surely part of the answer is building it into the research process through automation?) Kevin’s response: fear is 4 institutions not the most typical in the country, so can it scale up to the wider group? Leicester as the non-Russell Group uni.
Mark Thorley, NERC, and OECD WG on data management guidelines. Solution fundamentally flawed? NERC spends more in 1 year than UKRDS on 5 years, but still only scratching the surface. Naïve on what can be done. Need to talk now in DETAIL on people with real experience of difficulties of getting researchers engaged in process. Somewhat fudgy answer: have to start somewhere.
Bryan Lawrence, Director of Environmental archival: miss Mark’s point: worth doing but still hopelessly naïve, and frightened by the lack of relaisation that first attempt is bound to be wrong and may need to be thrown away and re-done. JM answer: point is that commitments to researchers have to be honoured. BL: make sure the scope of what you’re committing reflects the scope of what you can do.