UKRDS session 4

At last the big bucks: Paul Hubbard from HEFCE. Policy case for a national service: if it needs to be done, can be done, and they have the resources, then they will do it. National policy & funding picture: have had a very good ten years (CR: sounds ominous) for a world-class national research base. Note HEFCE now a minority funder of research; comes about 3rd in the ranking in UK. Can’t say HEFCE will fund this but can say HEFCE will work with others to fund it. Low note: recession ☹. HEFCE role: hands out money for research but doesn’t say what it’s for, celebrate that. Research base is competitive, dynamic & responsive, and efficient.

How? Excellent strong & efficient research information resources. Print services, collaborative collection management, national research libraries, BL Document Supply centre, UK Research Reserve (launched last week with £10M) building on it. Online resources: licensing EJs, JISC Infrastructure services and DCC as key player (hey, he said that!!!).

Making the case: HEFCE hopes to see a good, strong clear, specific case. Case made in general but not yet in detail. Drawing on experience with UK Research Reserve, need to show it meets a proven need in the research base, that data is being and will be produced that can be (and is) re-used, and that researchers will collaborate in ensuring data stored and demand access to it later. Need a consortium of funders, notably RCs, Health Services and major charities, for sustainability and efficiency. Need standards for what is kept, for how long and in what form, cataloguing and metadata, security, and scale. Key point, need to improve efficiency in research process, so need sound business case. Evidence for levels of current and foreseeable demand, evidence for savings from storing data that would otherwise have to be re-created later. Also the capital and recurrent costs of doing all this.

Need strong arrangements for management and future sustainability: ownership of storage, of the system, of the data. Who determines standards & procedures. Arrangements need to be designed with a good chance for lasting many years, and that the implementation is thoroughly planned. So good in principle, needs more detail, so, over to you (us!).

Malcolm Read, head of JISC. JISC already funds a lot of work in this area. Three areas:
  • technical advice and standards: DCC (he says he would not want to move the DCC “at this stage” into the fledgling UKRDS, thank goodness);
  • practitioner advice: UKRDS;
  • Repositories split two ways, subject repositories from RCs, but won’t be universal in UK, institutional repositories from HEIs maybe a SHED shared service option. (CR: what is this? Another shared data centre proposal? Maybe it’s about sharing the storage?)
UKRDS to give advice on
- What to save
- How to save it
- When to discard it, and
- Subject specialist.
Need some sort of horizontal vision across the infrastructure (CR, my orthogonal curation layer???) So more work to do:
- ownership/IPR (don’t care who but does care it’s unambiguous)
- how much to save
- capital (startup) costs - funding service
- recurrent costs – long term sustainability
- coherence across repositories (wants more assurance and comfort on this)
- scope for creating a data management profession (career opportunities, hierarchy, training & skills etc; cf learning technologists & digital librarians).
Governance options: depends on ownership: HEIs or FCs; responsibilities etc. Key is advisory rather than regulatory. Recommends: HEI owned, possibly in receipt of FC/JISC grant; initially a formal partnership of HEIs under an MOU; eventually a separate legal entity by subscription.

Q&A: CR on UK equivalent of Australian Code for Responsible conduct of research, UK equivalent much weaker, revision been in consultation but perhaps not much changed. Malcolm Read also impressed by the Australian code, and also by the German statement last year. Astrid Wissenburg from ESRC, says it’s more than just the Code. But also welcomes the collaborative approach, but wonders at the HEI-focused approach. Paul H expects links to RC-funded centres to be key. Sheila Cannell from Edinburgh on the suggested profession: does it emerge from researchers or from existing professions such as librarians, archivists, IT etc. Malcolm says: yes!

Simon Courage (?) Sunderland, from aspect of researcher recognition: will citations of data be counted in next REF. Paul Hubbard says short answer is no, as no clear way of doing it. Not willing to act as incentiviser of everything; should be done because it’s right for the science and required by the institution, and only enforced if nothing else works. Schurer: of 70 staff, only 2 from library backgrounds, so doesn't agree with Malcolm. Read, yes lots of people think that, not all of them are right (???). Mark Thorley from NERC: really need to make use of the knowledge and expertise in existing data work. (CR: library key because of organisational centrality/locus?).

Closing speech by David Ingram from UCL: summing up and moving forward. Science being transformed. Bio-informatics now core discipline of biology (this very room, 2005). Research & practice increasingly information intensive. Multiple legacy information systems in use; problems in supporting and linking health care, research and industry. Government creating pervasive new ICT infrastructure & core services. Other national/international initiatives creating relevant infrastructures & services. Talks about the MRC Data Sharing Service. Some nice Escher pics: order and chaos: use & re-use of data needs to be careful and context aware. Got to be very sensitive to the nature & provenance of the data.

Information explosion in health care: shows growth in SNOMED terminology to around 200K terms, and this is getting away from us. Too many ways of saying the same thing and too hard to work out that it’ so. Comment on importance of stopping people doing (unuseful) things.

Data explosion. Transferred at 100 MB/sec (??), a Petabyte takes 116 days, so need a van for your backup.

The circle of knowledge: are there logical relationship, epistemology about linking things across domains (CR: I may have missed the point there).

Dimensions of challenge
  • Diversity
  • Discipline and standards
  • Scale
  • Evolution over time
  • Willingness and capacity to engage (potential losers as well as winners)
  • Education and capacity at all levels
  • Implementation: be completely realistic
  • Shareable vs shared; to do with motivation and practice improvement
  • Commonality vs diversity; requirements
  • Confidential/restricted vs public; about access management and control
  • Competition vs cooperation; modus operandi
  • Global vs local; implementation, capacity, organisation and resources
  • Standardised for general applicability vs optimised for particular purposes; what approach will work
Direction of travel: problem is urgent; research requirements should drive data standards and shared services; research communities must value and own the outcomes; practical experience in implementation (CR: read, existing data centres etc?) should guide policy, strategy & investment; and rigorously control scope, sclae and timing.

Zimmerli: IT as a horizontal technology that allows us to do previously impossible things.

Finishes with Escher picture Ascending and descending, to suggest a need for a constructive balance between top-down and bottom-up. Needs an experimental approach; will need to evolve. Times in 1834: unlikely the medical profession will ever use stethoscopes, as their use requires substantial training and involves great difficulties. UKRDS in this sort of area?

Martin Lewis closing: keep this on the agenda, can’t get everything right, so we need to start.


