Monday 2 March 2009

Report on Data Preservation in High Energy Physics

There's a really interesting (if somewhat telegraphic) report by Richard Mount of SLAC on the workshop on data preservation in high energy physics, published in the January 2009 issue of Ariadne. The workshop was held at DESY (Deutsche Elektronen-Synchrotron), Hamburg, Germany, on 26-28 January 2008.
"The workshop heard from HEP experiments long past (‘it’s hopeless to try now’), recent or almost past (‘we really must do something’) and included representatives form experiments just starting (‘interesting issue, but we’re really very busy right now’). We were told how luck and industry had succeeded in obtaining new results from 20-year-old data from the JADE experiment, and how the astronomy community apparently shames HEP by taking a formalised approach to preserving data in an intelligible format. Technical issues including preserving the bits and preserving the ability to run ancient software on long-dead operating systems were also addressed. The final input to the workshop was a somewhat asymmetric picture of the funding agency interests from the two sides of the Atlantic."
There's a great deal to digest in this report. I'd agree with its author on one section:
"Experience from Re-analysis of PETRA (and LEP) Data, Siegfried Bethke (Max-Planck-Institut für Physik)

For [Richard], this was the most fascinating talk of the workshop. It described ‘the only example of reviving and still using 25-30 year old data & software in HEP.’ JADE was an e+e- experiment at DESY’s PETRA collider. The PETRA (and SLAC’s PEP) data are unlikely to be superseded, and improved theoretical understanding of QCD (Quantum ChromoDynamics) now allows valuable new physics results to be obtained if it is possible to analyse the old data. Only JADE has succeeded in this, and that by a combination of industry and luck. A sample luck and industry anecdote:

‘The file containing the recorded luminosities of each run and fill, was stored on a private account and therefore lost when [the] DESY archive was cleaned up. Jan Olsson, when cleaning up his office in ~1997, found an old ASCII-printout of the luminosity file. Unfortunately, it was printed on green recycling paper - not suitable for scanning and OCR-ing. A secretary at Aachen re-typed it within 4 weeks. A checksum routine found (and recovered) only 4 typos.’

The key conclusion of the talk was: ‘archiving & re-use of data & software must be planned while [an] experiment is still in running mode!’ The fact that the talk documented how to succeed when no such planning had been done only served to strengthen the conclusion."
I had heard of this story from a separate source (Ken Peach, then at CCLRC), so it's good to see it confirmed. I think the article that eventuated is
Bethke, S. (2000). Determination of the QCD coupling α_s J. Phys. G: Nucl. Part. Phys., 26.
One particularly sad remark from Amber Boehnlein (US Department of Energy (DOE))
"Amber was clear about the DoE/HEP policy on data preservation: ‘there isn’t one.’"
The DCC got a mention from David Corney of STFC, who runs the Atlas Petabyte Data Store, however I can confirm that we don't have 80 staff, or anywhere near that number (just under 13 FTE, if you're interested!). The reporter may have mixed us up with David's group, which I suspect is much larger.

In the closing sessions, we have Homer Neal,
"who set out a plan for work leading up to the next workshop. In his words:
  • ‘establish the clear justification for Data Preservation & Long Term Analysis
  • establish the means (and the feasibility of these means) by which this will be achieved
  • give guidance to the past, present and future experiments
  • a draft document by the next meeting @SLAC.’"
Well worth a read!


Post a Comment

Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.