John Coggins, VP for Life Sciences from Glasgow on researcher’s perspective. Wants to prompt us to think, about “culture change” and resources.
Volume & complexity of digital is rapidly increasing. In many fields data are rarely used outside lab/dept. Data management skills under-developed, research data often unstructured & poorly documented & inaccessible. Represents huge un-tapped resource. Never even fully analysed by researchers who generate it. Could be much added value from data mining & combining datasets.(CR from break conversation with Jmes Currall: need lots of additional tacit context knowledge to understand properly.) Most data stored locally and often lost. Most researchers say they are willing to share data, usually through informal peer exchange networks (CR: important distinction!). Some of those who deposit data regularly do so because journals won’t publish unless they do (CR: Bermuda agreement in bio-sciences; Alma Swan says compliance lower than expected).
Funders want to protect & enhance their investment by making data widely available,. Must be agreement among researcher on quality and format. Slow to achieve, cf story about Protein Data Bank. Story about surplus computers from particle physicists in the 1970s, playing with 7 protein structures; lots of subsequent arguments about the best way to store this stuff in those primitive computers; now 36 years on, 44,000 structures available, huge investment since then (eg early stuff re-examined in light of later advances). Essential to invest in data management, curation & storage as well as for easy access. Needs international collaboration & substantial funding for permanent delivery, eg EBI.
Researchers want confidence their data will be permanently stored & accessible; tht costs will be met; be able to access other people’s data, preferably on world-wide basis; and training.
Complications: commercially sensitive data, personal data, eg patient data, etc.
All Universities should have access to a UK-wide service. FCs and RCs likely to provide funding (goodwill but not yet much in £ notes). What about private sector? Charities? Other government departments like DEFRA (no-one from DEFRA in the room). These departments not celebrated for willingness to commit resources to research sector.
Way forward: research community would welcome UK-wide approach. Need more consistency, will take time. Researchers need to improve their skills for managing & using data. Significant building blocks in place. International dimension important. Culture change and mindset critical and must be gradually changed, needs investment in training & infrastructure, & spreading best practice. This is the way forward but won’t be easy.
David Lynn from Wellcome Trust (independent charity interested in biomedical research) for a funder’s perspective. (CR: is bio-science TOO good an example; poster child from which wrong inferences can be drawn?) Talking about size & rate of data, about integrating large datasets to gain insights into complex systems (CR: via Google Earth!!! Reminds me of NERC meeting last week again!) Now on immense potential to link research papers and data (CR: yeah!). Also growth in traffic of researchers accessing data from others (via web traffic on Sanger Institute site).
Meeting challenges: infrastructure: key data resources need coordinated and long-term sustainable funding. ELIXIR programme aiming to build sustainable infrastructure across Europe. Technical and cultural challenges: coordination & advocacy from key communities (funders, institutions, publishers etc). Data security challenge, important for some parts of bio-medical community. Recent high profile incidents in UK. Recent reports from Academy of Medical Sciences, Council of Sci & tech, Thomas/Walport review, US Inst of Medicine. Calls for safe havens & codes of practice. So management and governance of such data will be a key concern to retain public confidence. Mentions Bermuda Principles (1996) & Fort Lauderdale principles (2003); need to involve researchers in all such discussions,
Agrees with coordinated approach to preserving key research data & ensure long-term value; must meet needs of researchers & funders. Devil is in the detail; must develop in way that truly adds value; must link effectively with existing activity & expertise; will need buy-in of ALL major funders & stakeholders; will need to accommodate differences in approach between funders & disciplines; & must appropriately resource initiative.
Would like clarification of detail on Pathfinder study, but like to see it go forward; full specification, but with careful assessment of funding implications.
Q&A session. Malcolm Atkinson again. PDB took place initially at national lab at Brookhaven; perhaps the researchers had a bit of time. Those who put together the data for re-use don’t get an immediate reward. Those who get the immediate reward don’t pay attention to the long term but very focused on immediate need. PhD skills means PhD courses take longer. Where does the extra resource come from to invest in this change in researchers, and how to invest in incentive (promotion, RAE etc) to make sure this happens? John Coggins: it’s going to be hard. Was working at Brookhaven at the time of PDB, did manage to publish wee paper, but true, none of those involved are now household names. Focus on VFM from government doesn’t help as too short term. David Lynn: 3 ways to recognise work: institutions themselves think about their own promotion criteria, FCs think about REF and consider metrics for such activities other than direct research; research funders now requiring data management plans.
Michael Jubb from RIN: Tension between disciplinary difference versus one size fits all. Coggins: consistency means common attitude that it’s worth sharing data and accessible; fierce arguments on data structures etc; seems to be blaming arts & humanities for being poorly funded and behind the digital times (CR: largely not fair!). Lynn: engage with researchers not in the abstract.
Bryan Lawrence again: no new money just old money re-purposed. Always tension between research, teaching & data, and only 1 pot for all of it, so where decision point lands will differ.
Peter Burnhill, EDINA: statement makers versus artisans? Observing what’s happening, can assuming the internet is always on. Need to think about what it means to publish data which includes archival issues. Not just take away, but software access. Variety out there, will be drivers, researchers are driven. What constitutes-well-published data including some notion of designated community to which you are publishing. Lynn, funders are increasing expectation on what researchers do in this respect, but can only drive this so far. Wellcome want researchers to tell us in DM plans what their approach will be to curating the data, making it available (including when); this DM plan is then peer-reviewed (larger grants, population studies etc) as part of the selection process. Coggins: well run labs with careful research systems means data better documented. Burnhill, provenance an issue, yes. Need to understand the drivers for someone wanting to re-analyse someone else’s data.
Kevin Schurer again: reflection on Qs so far. Making fruits of research available not a new conundrum, RS celebrating 350 years. The ways this is done very segmented: researcher, publisher, copy-editor; very segmented process. Problem with data (varies from disciplines): PI often the data generator, publisher & often distributor as well. UKRDS provides possibility to separate those things out again. Agreement from panel.