Thursday, 26 February 2009

UKRDS session 3

Ross Wilkinson, Director of Australian National Data Service (ANDS),live via satellite (saving carbon, if not sleep) has been building on lot of work particularly in the UK. Role of data more etc; now answer questions unrelated to why the data were collected originally. Who cares and why varies a lot. Pleased that Australian Govt cares to extent of $24M. Institutions driven by the Code for responsible conduct of research (CR: blogged about this before): have to take data seriously. Not clear if disciplines care at the national level (some have cared at an international level). Researchers SHOULD care, but though he has been a researcher he hasn’t cared enough. So why should they? First the Code, second the role of data citation, 3rd changing nature of research.

The Code has obligations on the funders, the institutions AND the researchers. Probably a bit of a sleeper (researchers largely unaware yet), but funders are going to ratchet this up.

Fame: data citation. Sharing detailed research data associated with increased citation rate. This is about stories: Human Genome, Hubble telescope.

ANDS has 4 coordinated inter-related service delivery programs: frameworks, utilities, seeding the commons, and capability development. Plus service development funded through NeAT in partnership with ARCS (foreign acronym soup is so much less clear than one’s own!)

Frameworks on influencing national policies, eg count data citation rates just like publication citation rates; proposals should have slot for data space. Common understanding of data management issues, eg what is best practice. Don’t see much best practice; what is decent practice?

Utilities: discovery service (you come to us, we come to you flavours), persistent identifiers for deposited data, and collections registry.

Seeding the commons: improve & standardise institutional repositories; need routine deposit in sustainable infrastructures, etc (missed some).

Capability development: assist researchers to align data management practice, to improve quantity & quality of data, to do the right thing.

Not too much theory: build stuff. National collections registry service ready, work on persistent identifier and discovery services. Exemplar discovery spaces (water? Crystallography, law, biosecurity?). Advice not storage or management for data: this is done locally but networked. Layer on top of data.

Shows exemplar page on Dugongs, aquatic mammal found off the Australian coast. Might build linkages from data and researchers to allow browsing of the space. Using ISO 2146 to drive this: collections, parties, activities, services etc all need to be described, and the relationships matter. Not discovery but browsing service? Harvest info from a wide variety of sources beyond the data.

Talks about See Also service: other info to pop up (cf Amazon; other data come to you). Maybe could share such a service with the UK; if you’re interested in this then you might also be interested in this other stuff happening in the UK…

What do researchers get? Local managed data repository (from code); persistent identified data, and data discovery. Mission is more researchers sharing/re-using more data more often. Lower costs and raise benefits.

Q&A (so Ross can go home to bed): Robin Rice from Edinburgh: the Repository word, mny think about it in terms of publications, how does it differ? Problem word, with overloaded usage. Not necessarily a big issue; may use different underlying technology, but that’s not that important. Different tools and indexes maybe, but conceptually not that different.

Kevin Schurer again: Likes talk but fundamentally disagrees with one thing: quantity over quality? Our funders would prefer to ingest 10 datasets a year with huge use and relevance than 4,000 that don’t. It’s not really about quantity or quality but impact. Could be a data car-boot sale rather than data Sotheby’s. Ross likes an argument but does disagree. Eg Monash group with carefully crafted crystallography datasets, but want to get people through the door and then move up in quality, driven by discipline and community.

John Wood, what if institution doesn’t have capability to hold the data. Ross, many looking for funding to do so, but they could if they wanted enough. ANDS not the source of such funding.

Peter Burnhill again: mentioned Geospatial data, as in See also, could you say more about spatial infrastructure work going in and around ANDS. Ross: two areas: marine and earth sciences, with slightly different take on the problem. ANDS doesn’t mind which way wins, want to make sure that it’s engaged with an international perspective, particularly data that’s not intrinsically geo-referenced but geo-ref adds value, eg Atlas of Living Australia.

Mark Thorley from NERC again: any work to add value to data; good work on discovery and persistence, but they find researchers don’t want data in the form generated, but in a way more convenient for re-use. Who does this? Ross: mainly outside; have some funding for such developments but oriented to communities that want to use the data rather than providers. Two areas of struggle: data mining, how useful is that (versus model-based), secondly data visualisation. Brokering conversations but favouring community rather than technology.

Now closer to home, Carlos Morais Pires from EU perspective: capacity-building; domains of infrastructures, and then renewed forward strategy.

Capacities programme versus research programme: a new thing in FP7. What can e-Infrastructures (not ICTs) do for science? Parsimony: sharing resources; increase the user base; but… science is excellence. E-Science double face: e-infrastructures change science, also scientists change e-infrastructures. Shows a 5-layer model, e-Science on top of comms, on theory, on experiment, on ???

Domains: innovating science process: global virtual research communities; accessing knowledge (science data); sharing resources (grid); linking (Geant); designing future facilities (novel e-infrastructures). Aha the programme we heard of before: DEISA and petascale machine.

Information continuums most interesting for today’s conference: between past and present, between raw data and results, between science disciplines, between institutions, between research & education. Nice picture of an integrated view showing LHC, astronomy, biology and health/clinical data on same GRID (CR: be afraid?).

Looking ahead: continue dialogue with horizon of 2020+. Early calls in FP7 heavily over-subscribed by factors of 5 or 8, means very interesting proposals not funded. Some work still for next 2 years. New resolution hope will be adopted next week, and hope EU funding will reflect importance.

Stefan Winkler-Nees from DFG, German approaches and policies. All sitting in the same boat, facing the same problems. 2003, DFG signed OA Berlin Declaration. 2006, DFG position paper on funding priorities through 2015. Last year new national priority initiative on research data. Principle is that DFG tries to support scientists, bringing them together with information experts from libraries, archives and IT (didn’t say Computing Science). Criteria; substantial added value for supra-regional infrastructures, contributions to advancement of scientific infrastructure, science relevance, etc. Subtext: complex funding infrastructure; key is the alliance of German science organisations. June 22, 2008: national Priority Initiative “Digital Information”. Don’t forget legal frameworks.

Have working group on primary research data, drafting policy to promote the need for action and demonstrate usefulness.

Prime objective: transparency, traceability and follow-up use. Honour principles of safeguarding good scientific practice. Can publication of data become a self-evident part of the science recognition system?

What’s on the agenda: disciplines, working across sectors, encouraging publication of datasets, and exploring the international perspective.

Q&A again: Stephen Pinfield from Nottingham, interested in incentivising. Stick approach from funders, maybe other incentives, eg improved citations (CR, see, quote "At least 7 of the top 20 UK-based Authors of High-Impact Papers, 2003-07 ranked by citations ... published data sets"). Schurer, what about the JISC Data Oscars; Malcolm Read maybe the data ostriches would be easier?

Burnhill again: reward system key, data must get some sort of recognition; lots of bad data just as bad papers in journals.


Post a Comment

Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.