Digital Curation Blog: March 2009

Tuesday, 31 March 2009

More on the ICTHES journals

I've had 3 responses by email to yesterday's post on the ICTHES journals (some responding to an associated email from me on the same issue). I'll summarise the two where quote permission was not explicit, and quote the third at length.

Adam Farquhar of the BL told me he had discussed it with their serials processing team under the voluntary scheme for legal deposit of digital material, and they will download the material into the BL's digital archive, where it will become accessible in the reading rooms (in due course, I guess). Wider access to such open access material should be available later under their digital library programme.

Tony Kidd, of the University of Glasgow and UKSG suggested that an OpenLOCKSS type approach might be feasible. This is consistent with the email from Vicky Reich of LOCKSS; she told me I could post her response. So here it is:

"UK-LOCKSS can, and should, preserve the four ICTHES journals.
First step: Contact the publisher and ask them to leave the content online long enough for it to be ingested.
Second step: Ask the publisher to put online a LOCKSS permission statement.
Third step: Someone on the LOCKSS team does a small amount of technical work to get content ingested.
With these minimal actions, the content would be available to those institutions who are preserving it in their LOCKSS box.

If librarians want to rehost this O/A content for others, there are two additional requirements:
a) the content has to be licensed to allow re-publication by someone other than the original copyright holder. This is best done via a Creative Commons license.
b) institutions who hold the content have to be willing to bear the cost of hosting the journals on behalf of the world.
Librarians, even those who advocate open access have not taken coordinated steps to ensure the OA literature remains viable over the long term. Librarians are motivated to ensure perpetual access to very expensive subscription literature, but ensuring the safety of the OA literature is not a priority because... it's available, and it's free. [...]

When the majority of librarians who think open access is a "good idea" step up and preserve this content (and I don't mean shoving individual articles into institutional repositories), then we will be well on our way to building needed infrastructure"

See also the comment from Gavin Baker to yesterday's post, which i think backs up Vicky's last point:

"I've thought for a while that archiving OA journals should be a goal of the library and OA community, maybe via a consortium which would harvest new issues of journals listed in the DOAJ. (We can treat as separate, for these purposes, the question of short-term archiving in case a journal goes under from the question of long-term preservation.) Is there a reason why this approach isn't undertaken? Do people assume that any OA journal worth archiving is already being archived by somebody somewhere?"

Let's be quite clear, contrary to my simplistic assumptions, the Internet Archive is NOT undertaking this task!

Monday, 30 March 2009

Charity closing, possible loss of 4 OA titles

I note from Gavin Baker's [not Peter Suber's; my mistake- CR] blog entry that the charity ICTHES is closing, and as a result its 4 OA journals, listed may disappear. I have checked the Internet Archive, and in case we should be complacent about that as a system of preservation, found only 1 issue out of 18 issues from 4 titles had actually been gathered there.

The Journals are

I see from Suncat that these titles are variously held by BL, Cambridge, Oxford and NLS, so I guess they are regarded as serious titles.

Since UKSG is now in progress, I wondered if I could challenge UKSG on what it (or we, the community) can and/or should and/or will do about this! Would there be any opportunity in the programme to discuss this? (BTW unfortunately I am not able to come to Torquay, so I'm niggling, and indeed watching the #uksg tweets, from a distance.)

Options for action that I can see include

a) some kind of sponsored crawl by Internet Archive

b) an emergency sponsored crawl by UKWAC or one of its participants (which may of course already have happened),

c) an urgent approach by a group of those participating in LOCKSS for the charity to join the programme (which may be stymied by lack of development effort and time), would only make available to participants, I think

d) Ditto for CLOCKSS, which at least might have the resources to make available publicly on a continuing basis,

e) sponsored ingest into something like Portico; again, only available to participants as I understand it

f) tacitly suggest libraries grab copies of the 18 or so PDFs, or

g) get a group of libraries to offer to host a historical archive of the titles for the charity...

h) appraise the titles as not worth preserving, and consign to the bitbin of history

i) ummm, errr, dither...

PS this blog entry is based on an email sent to the conference organisers and others, unfortunately after the conference has started. I have already had one response, from the BL, suggesting they would discuss with their journals people...

Thursday, 26 March 2009

Are research data facts and does it matter?

This should probably be titled “are research datasets comprised of facts and does it matter?”. It certainly does appear to matter whether datasets are comprised of facts, as in some legal jurisdictions facts are not copyrightable. If this is so, then without other protection such as the EU Database Right, or perhaps contract law, then there is no basis for licences (you can’t control someone else’s use unless you have a right to exercise that control). This is part of the argument that led Science Commons to abandon attempts to find variants of the Creative Commons licences for datasets and databases, in favour of its proposals for putting datasets into the public domain.

This is an appealing solution in some research contexts, but worrying for other kinds of research. This is not for reasons of profit, but of ethics. Many medical, social science, anthropological, financial and other datasets contain data that are private, perhaps personally, culturally or corporately; these data were usually gathered with some kind of informed consent on use, and have to be protected. They cannot be placed into the public domain. If they are to be made available at all for re-use, there must be terms and conditions attached, ie some kind of licence.

I’m not attempting to argue the legal angle here. But I am interested in the “factness” of the data, that might inform the legal angle.

One might assume the height of Mount Everest is a fact. But check out the Wikipedia article on the subject to see a range of results. One might assume that the physical properties of chemical substances are facts, but check out Chemspider’s approach of assembling different measurements with their provenance (see http://www.chemspider.com/blog/there-are-no-facts-in-science-only-measurement-embedded-within-assumptions.html which links to Jen-Claude Bradley’s earlier UsefulChem article ). Or think of a geospatial database, some elements must be pretty much “skill and judgment” rather than facts, such as the point where a river debouches into the sea. Finally, one might assume that the names of the winners of horse races are facts, and so they are, but only after a race committee has adjudicated on the photo-finish, or whether interference took place.

In practice, what goes into datasets is rarely what is directly measured; it is almost always highly derived through various computations, adjustments and combinations. Environmental sciences can be quite explicit on this, see for example the British Atmospheric Data Centre’s description of the UARS (Upper Atmosphere Research Satellite) data levels. Here level 0 is the raw output data streaming from telemetry and instrumentation, effectively at the level of voltage changes; it is devoid of context. Level 1 data has been converted to the physical properties being measured, but will still be in formats tied to the instrument. Level 2 is post-calibration, and would refer to entities such as calculated geophysical profiles. Level 3 would be gridded and interpolated, and at this level there might be no clear correspondence with any observations (but there should be a clear computational lineage or provenance path linking these steps).

So we seem to be in a situation where datasets contain highly derived data, at some creative distance from direct observations, and what we think of as facts are (or ought to be) contestable consensus based on potentially conflicting evidence,

In fact (hah!) after a while it becomes hard to think of any good example of real science/research data that are facts. The question is, does this matter enough to make any difference?

Tuesday, 24 March 2009

Ada Lovelace Day

I've been travelling and waiting pretty much all day, but it's still 24 March, so I have an hour or so to make a post for Ada Lovelace Day: a day to celebrate women excelling in technology. I didn't sign the pledge, partly as I knew I'd be out of touch, but I wanted to do a post anyway, if possible.

I haven't asked my heroines for permission to mention them, so I'll just use first names (possibly fictional). I hope they know who they are, if they read this.

Marita taught me APL (she was writing an APL interpreter), or at least enough of it to know I didn't want to know it. I've had an aversion to highly compressed code ever since! Marita also taught me a huge amount about project and people management. Mostly though, she taught me there were excellent ways to balance and integrate social and work life; ways I wished later I could emulate!

Christine was a friend; I never worked with her closely. But in between being a sociable, funny party animal mother, she took part in a management buyout, ran her company with a colleague for a decade or more, and sold her HR software across the world (still running when I moved to a new University a hemisphere away at the turn of the millenium). All apparently effortlessly!

Lynne turned me down for a job once but not twice (but then again). She showed you didn't have to be a techie specialist to understand the implications of technology for universities and libraries, and what could be achieved with the right application of forcefulness to the levers of power. She might not always agree with you, or you with her, but you always got a fair listen. And hey did things get done!

I never worked with Carole either, nor do I know her well. But you absolutely have to admire someone who can run a large, highly successful Computer Science group, making practical and highly successful contributions to data curation, and give the most successful keynote at a conference, linking curation to the VivaLaDiva shoe shopping site!

I know several Nancys of influence and technical bent in libraries. The Nancy I want to celebrate here is a negotiator, a maker of mind meetings, one of those people who gets things to happen, things involving many other people, smoothly, productively. In, round and about technology, but not fixated on it; not technology for a purpose, but a purpose supported by technology. And a good, warm-hearted friend.

Some ten years after I started in computing, I remember being quite shocked at reading about the gender gap in computing. It had never struck me before, and indeed I don't think the male/female ratio then was any different from any walk of professional life (in fact probably more balanced). This was before the PC, games and the geek factor. Even since then, I don't know what the stats say, but I've worked with so many excellent women in technology, that the saddest thing about my list is the many great candidates I have to leave out if it's to be published today.

Secondary school exam results in Britain over the last few years (girls far outstripping boys, specially in maths) suggest we should have many more Ada Lovelace candidates in the future. I certainly hope so!

Monday, 23 March 2009

Friday, 20 March 2009

International Repositories Infrastructure workshop

Amsterdam in Spring, who could turn down the offer? Perhaps it would be irresistible a little later in Spring than early March (brrr), but when the sun did come out, and the workshop was done, it was lovely. I was in Amsterdam for a curious International workshop on repository infrastructure, funded by JISC and SURF, with the DRIVER project. It turns out I had no idea what repository infrastructure meant before I went, and I guess I know only a little more now.

I was asked to take part in this workshop late last year, and was supposed to stimulate discussion on a use case on preservation. There were 4 use cases:

Preservation
Access
Deposit Workflows and Environments
Online reputation and reporting

Maybe some of us didn’t do enough work on this, as somewhere along the line this morphed into 4 proposed action plans, each with a breakout group:

Identifier infrastructure
Citation Services
Repository handshake
Organisation Structures

The first 3 of those probably fit with the last 2 use cases, but I don’t know what happened to the others, or where the last one came from.

If you use Twitter, you can read (in reverse order) some of the backchat on this through the hashtags #repinf09 (for the whole workshop), #rihs09 (for the handshake group) and #reporg09 (for the organization group). If you do have a look, bear in mind that what you read is devoid of context, and represents only a small part of what was going on, probably from a vocal, cantankerous, ornery, rumbustious and maybe just plain rude subset! (As a new Twit, I haven’t yet worked out how much ruder tweeting during a meeting is, compared with passing notes in class, say.) Oh, and not all the Twits were actually there.

So what happened? The introductory presentation was given by Norbert Lossau of DRIVER. Lots of interesting stuff, but I was a bit taken aback by claims that the defining vision of repositories was the Berlin Open Access Declaration, and that data were out of scope. Personally I think Open Access is most often a Good Thing, but sometimes it’s inappropriate: repositories of sensitive data, for instance. And I’m not at all sure that you can cleanly distinguish document from data, or that it makes sense to do so, especially when supplementary materials and extended documents come into the frame.

I was in the organisation group, so I don’t know in detail what happened in the other groups, but we did come together right at the end for a plenary. I was particularly impressed by what Les Carr said about the Citation Group, including the idea of creating a test corpus, and also a competition for the best text mining algorithms to find citations and references (and even their surrounding context: “we illustrate the errors in Rusbridge (2006)" is clearly different from “following the excellent suggestions given in Carr (2009)"). I don’t think much was said about the citation microformat (perhaps it really is dead?), nor about whether text mining might be aided by embedding RDF etc in documents. Nevertheless, a sensible plan of work was laid out.

On identifiers, it also sounded as if some sensible progress was made. In particular, it seemed as if identifiers for authors (as disambiguation tools for citations etc) went from being ruled out to being a strong part of the work plan. They had the ultimately cool mindmap.

From the repository handshake tweets, it looked like they had a bumpy ride getting their 8 use cases agreed, but did get through their “stormin’, normin’ and formin’” stages into at least the first part of “performin’”. I did like “beg” as a repository verb!

So to the repository organisation group. Here too, there were plenty of storms, and maybe the odd teacup. It was quite hard to work out what sort of organization we were interested in; what it was intended to do. There was a strong feeling of unspoken sub-text; whatever it was we were talking about was proposed to launch on October 7. One of the best quotes for me came from Sandy Payette; she said the DSpace/FEDORA organizations could provide some of the things that were being talked about, but they would have to “feel the hunger”. It was difficult to feel the hunger for whatever this thing was. At one point, I’m afraid I compared it unfavourably with jelly; whenever I thought I had hold of it, something bulged out somewhere else (yucky image, sorry).

On the start of day 2 our organisers had a new approach, which worked better. They broke the breakout into yet smaller breakouts, and gave each of us a role to play and 6 questions to answer. My table was “funders”. That’s OK, I’ve been one, I could handle that (even funded repository work back in 1998 or thereabouts, think Harnad’s CogPrints and Krichel’s WoPEc/RePEc, which made it particularly annoying to be patronized at one point as someone who obviously didn’t know much about repositories). I think the two main “expectations” for our group were “clarity of aims” and “benefits to justify investment”. Anyway, the result of all this was that, by the skin of our teeth, we did have something to say at the final plenary. Not that I yet know quite what this organization would be for!

Was it entirely coincidental in this context that the DRIVER project was ending in a few months? Well, we had heard that DRIVER did repository infrastructure, that DRIVER did published papers, not data, that DRIVER Guideline enabled interoperability, that DRIVER Guidelines were in demand internationally and had been translated into (2? 3?) other languages, and that the continuation of the DRIVER brand was seen as important. Some Twit asked “If the DRIVER Guidelines are the answer, what is the question?” Join the dots…

The workshop was ended with some closing remarks from Cliff Lynch, as perspicacious as ever, but slightly more hoarse than usual (not from shouting at us; too much inter-planetary travel, I suspect). And after the workshop there was a “funders’ meeting”. Real funders, the kind with money. What went on there, I have no idea!

Tuesday, 10 March 2009

Obsolete drives; sideways thinking?

I’ve been trying to write this post for ages, but the draft never seemed right, so this is starting all over again, blank sheet.

Many people have a bunch of stuff on media for which it’s hard to find a working drive, whether disk or tape, or punched card or paper tape, or… A feature of the response to my “12 files for Christmas” post was that those who responded have their interesting stuff locked away on such media. There may be other challenges to reading it, but the first is getting the content off the media.

We tend to go all gloom and doom about this. I’ve got stuff on Iomega Jaz cassettes from an earlier Mac, so without a Jaz drive I can’t read it. Might as well chuck it in a bin? Folks have stuff on early Mac 3.5” drives, or Amiga drives, neither of which can be easily read on current systems. And 5.25” or 8” drives are even scarcer in working condition.

Is the only answer to find a working drive on a working computer of the day, the "technology preservation" approach? I remember an ancient engineer scolding me years ago for failure of the imagination, on the subject of disappearing 7-track magnetic tape drives. “Young man” he gruffed, “if you really care about this stuff, lay the tape on your kitchen table, cover it with paper, scatter iron filings across the top, give it a tap, and read the bits off with a magnifying glass!” He didn’t mention that I had to know the relationship between domains and bits, the parity and other features of character encodings, the block structure of the tapes etc; in those days we knew that stuff!

Now I’ve done the maths, and this is NOT feasible with a 3.5”, 2 MByte floppy disk (the iron filings are too big)! It might just be feasible with some of the 8” floppy formats. Different technological approaches (not iron filings, but some other means of making magnetic domains visible, if such means exist) might be feasible for higher densities. In any such case, you would end up with an image of the bits on the disk, in concentric tracks. From here, you have a computational task, or a series of such tasks: identify the tracks, separate into sectors, decode into bytes or characters, decode into directories and file structures, process into files, and now you have something to operate on! Yes, it’s tough to do all that, but you would be able to combine lots of contributions together to do it.

Now, I’m NOT saying that’s the best approach. I AM saying, 8” drives were advanced technology when introduced more than 20 years ago. The requirements for a production 8” drive included high read/write performance (for the time). The requirements now have changed. Performance isn't the issue; scraping every last reliable bit off that drive is!

Today, more than 20 years later, storage engineers in their clean-room high tech environments can build amazingly high performance production drives with previously inconceivable capacities and speeds. But what could you do today with a Masters-level Electronic Engineering lab, some bright students, a few hundred dollars, these ancient media formats, and a much-reduced performance requirement? I don’t care if it takes 10 minutes to read my disk, as long as I can do it!

Is this important? These disks have been stashed away for 20 years, who cares what’s on them? Well, in many cases no-one does. But just think who was using those early drives, and what for. They certainly include authors, poets, scientists, scholars, politicians, philosophers… and many of those people, if not in the first flush of youth then, are moving towards retirement now. Some of these will be candidates for approaches from libraries and archives interested in their “papers”. Previously this meant boxes of paper, photos, diaries etc. Now it includes old media, dropped in the box years ago. Who knows what treasures they may contain? (See the Digital Lives project for examples.)

So, I think there is or will be an emerging interest in these obsolete media and their contents. And at the same time, I think (hope) it would represent an interesting challenge to set students. Perhaps not quite in the same class as building a car to drive across the country on solar power, or robots to play football, but interesting in its own different way.

One of those combined Computing Science and Electronic Engineering schools would be perfect. Would a prize help? Maybe this could factor in something like the Digital Preservation Awards one year? A new kind of Digital Preservation Challenge?

Monday, 9 March 2009

Repository preservation revisited

Are institutional repositories set up and resourced to preserve their contents over the long term? Potentially contradictory evidence has emerged from my various questions related to this topic.

You may remember that on the Digital Curation Blog and the JISC-Repositories JISCmail list on 23 February 2009, I referred to some feedback from two Ideas (here and here) on the JISC Ideascale site last year, and asked 3 further questions relating to repository managers’ views of the intentions of their repositories. Given a low rate of response to the original posting (which asked for votes on the original Ideascale site), I followed this up on the JISC-Repositories list (but through oversight, not on the blog), offering the same 3 questions in a Doodle poll. The results of the several different votes appear contradictory, although I hope we can glean something useful from them.

I should emphasise that this is definitely not methodologically sound research; in fact, there are methodological holes here large enough to drive a Mack truck through! Nevertheless, we may be able to glean something useful. To recap, here are the various questions I asked, with a brief description of their audience, plus the outcomes:

a) Audience, JISC-selected “expert” group of developers, repository managers and assorted luminaries. Second point is the same audience, a little later.
Idea: “The repository should be a full OAIS [CCSDS 2002] preservation system.” Result 3 votes in favour, 16 votes against, net -13 votes.
Idea: “Repository should aspire to make contents accessible and usable over the medium term.” Result: 13 votes in favour, 1 vote against, net +12 votes.
b) Audience JISC-Repositories list and Digital Curation Blog readership. Three Ideas on Ideascale, with the results shown (note, respondents did not need to identify themselves):
My repository does not aim for accessibility and/or usability of its contents beyond the short term (say 3 years). Result 2 votes in favour, none against.
My repository aims for accessibility and/or usability of its contents for the medium term (say 4 to 10 years). Result 5 votes in favour, none against.
My repository aims for accessibility and/or usability of its contents for the long term (say greater than 10 years). Result 8 votes in favour, 1 vote against, net +7 votes.
A further comment was left on the Digital Curation Blog, to the effect that since most repository managers were mainly seeing deposit of PDFs, they felt (perhaps naively) sufficiently confident to assume these would be useable for 10 years.

c) Audience JISC-Repositories list. Three exclusive options on a Doodle poll, exact wording as in (c), no option to vote against any option, with the results shown below (note, Doodle asks respondents to provide a name and most did, with affiliation, although there is no validation of the name supplied):
My repository does not aim for accessibility and/or usability of its contents beyond the short term (say 3 years). Result 1 vote in favour.
My repository aims for accessibility and/or usability of its contents for the medium term (say 4 to 10 years). Result 0 votes in favour.
My repository aims for accessibility and/or usability of its contents for the long term (say greater than 10 years). Result 22 votes in favour.

I guess the first thing is to notice the differences between the 3 sets of results. The first would imply that long term is definitely off the agenda, and medium term is reasonable. The second is 50-50 split between long term and the short/medium term combination. The third is overwhelmingly in favour of long term (as defined).

By now you can also see at least some of the methodological problems, including differing audiences, differing anonymity, and differing wording (firstly in relation to the use of the term “OAIS”, and secondly in relation to the timescales attached to short, medium and long term). So, you can draw your own conclusions, including that none can be drawn from the available data!

Note, I would not draw any conclusions from the actual numerical votes on their own, but perhaps we can from the values within each group. However, ever hasty if not foolhardy, here are my own tentative interpretations:

First, even “experts” are alarmed at the potential implications of the term “OAIS”.
Second, repository managers don’t believe that keeping resources accessible and/or usable for 10 years (in the context of the types of material they currently manage in repositories) will give them major problems.
Third, repository managers don’t identify “accessibility and/or usability of its contents for the long term” as implying the mechanisms of an OAIS (this is perhaps rather a stretch given my second conclusion).

So, where to next? I’m thinking of asking some further questions, again of the JISC-Repositories list and the audience of the Digital Curation Blog. However, this time I’m asking for feedback on the questions, before setting up the Doodle poll. My draft texts are

My repository is resourced and is intended to keep its contents accessible and usable for the long term, through potential technology and community changes, implying at least some of the requirements of an OAIS.
My repository is resourced and is intended to keep its contents accessible and usable unless there are significant changes in technology or community, ie it does not aim to be an OAIS.
Some other choice, please explain in free text…

Are those reasonable questions? Or perhaps, please help me improve them!

This post is made both to the Digital Curation Blog and to the JISC-repositories list...

OAIS: CCSDS. (2002). Reference Model for an Open Archival Information System (OAIS). Retrieved from http://public.ccsds.org/publications/archive/650x0b1.pdf.

Monday, 2 March 2009

Report on Data Preservation in High Energy Physics

There's a really interesting (if somewhat telegraphic) report by Richard Mount of SLAC on the workshop on data preservation in high energy physics, published in the January 2009 issue of Ariadne. The workshop was held at DESY (Deutsche Elektronen-Synchrotron), Hamburg, Germany, on 26-28 January 2008.

"The workshop heard from HEP experiments long past (‘it’s hopeless to try now’), recent or almost past (‘we really must do something’) and included representatives form experiments just starting (‘interesting issue, but we’re really very busy right now’). We were told how luck and industry had succeeded in obtaining new results from 20-year-old data from the JADE experiment, and how the astronomy community apparently shames HEP by taking a formalised approach to preserving data in an intelligible format. Technical issues including preserving the bits and preserving the ability to run ancient software on long-dead operating systems were also addressed. The final input to the workshop was a somewhat asymmetric picture of the funding agency interests from the two sides of the Atlantic."

There's a great deal to digest in this report. I'd agree with its author on one section:

"Experience from Re-analysis of PETRA (and LEP) Data, Siegfried Bethke (Max-Planck-Institut für Physik)

For [Richard], this was the most fascinating talk of the workshop. It described ‘the only example of reviving and still using 25-30 year old data & software in HEP.’ JADE was an e+e- experiment at DESY’s PETRA collider. The PETRA (and SLAC’s PEP) data are unlikely to be superseded, and improved theoretical understanding of QCD (Quantum ChromoDynamics) now allows valuable new physics results to be obtained if it is possible to analyse the old data. Only JADE has succeeded in this, and that by a combination of industry and luck. A sample luck and industry anecdote:

‘The file containing the recorded luminosities of each run and fill, was stored on a private account and therefore lost when [the] DESY archive was cleaned up. Jan Olsson, when cleaning up his office in ~1997, found an old ASCII-printout of the luminosity file. Unfortunately, it was printed on green recycling paper - not suitable for scanning and OCR-ing. A secretary at Aachen re-typed it within 4 weeks. A checksum routine found (and recovered) only 4 typos.’

The key conclusion of the talk was: ‘archiving & re-use of data & software must be planned while [an] experiment is still in running mode!’ The fact that the talk documented how to succeed when no such planning had been done only served to strengthen the conclusion."

I had heard of this story from a separate source (Ken Peach, then at CCLRC), so it's good to see it confirmed. I think the article that eventuated is

Bethke, S. (2000). Determination of the QCD coupling α_s J. Phys. G: Nucl. Part. Phys., 26.

One particularly sad remark from Amber Boehnlein (US Department of Energy (DOE))

"Amber was clear about the DoE/HEP policy on data preservation: ‘there isn’t one.’"

The DCC got a mention from David Corney of STFC, who runs the Atlas Petabyte Data Store, however I can confirm that we don't have 80 staff, or anywhere near that number (just under 13 FTE, if you're interested!). The reporter may have mixed us up with David's group, which I suspect is much larger.

In the closing sessions, we have Homer Neal,

"who set out a plan for work leading up to the next workshop. In his words:
‘establish the clear justification for Data Preservation & Long Term Analysis
establish the means (and the feasibility of these means) by which this will be achieved
give guidance to the past, present and future experiments
a draft document by the next meeting @SLAC.’"

Well worth a read!

Digital Curation Blog