Digital Curation Blog: Significant Properties

Showing posts with label Significant Properties. Show all posts

Monday, 29 September 2008

iPres 2008 Preservation planning session

Starting with Dirk von Suchodoletz from Freiburg, talking about emulation. I’ve always had a problem with emulation; perhaps I’ve too long a memory of those early days of MS-DOS, when emulators were quite good at running well-behaved programs, but were rubbish at many common programs, which broke the rule-book and went straight into the interrupt vectors to get performance. OK, if you’re not that old, maybe emulation does work better these days, and is even getting trendy under the new name of virtualisation. My other problem is also a virtue of emulation: you will be presented with the object’s “original” interface, or look and feel. This sounds good, but in practice the world has moved on and most people don’t like old interfaces, even if historians may want them. I guess emulation can work well for objects which are “viewed”, in some sense; it’s not clear to me that one can easily interwork an emulated object with a current object. Dirk does point out that emulation does require running the original software, which itself may create licensing problems.

Gareth Knight talking about significant properties. We’ve discussed these before; conspicuously absent (is that an oxymoron???) from OAIS, an earlier speaker suggested the properties were less about the object than about the aims of the organisation and its community, than about the object. But Gareth is pursuing the properties of the object, as part of the JISC-funded InSPECT project (http://www.significantproperties.org.uk/). They have a model of significant properties, and are developing a data dictionary for SPs, which (after consultation) they expect to turn into a XML schema. Their model has an object with components with properties, and also agents that link to all of these. Each SP has an identifier, a title, a description and a function. It sounds a lot of work; not clear yet how much can be shared; perhaps most objects in one repository can share a few catalogued SPs. It seems unlikely that most repositories could share them, as repositories would have different views on what is significant.

Alex Ball is talking about the problems in curating engineering and CAD data. In what appears to be a lose-lose strategy for all of us, engineering is an area with extremely long time requirements for preserving the data, but increasing problems in doing so given the multiple strangleholds that IPR has: on the data themselves, on the encodings and formats tied up in specific tightly controlled versions of high cost CAD software, coupled with “engineering as a service” approaches, which might encourage organisations to continue to tightly hold this IPR. An approach here is looking for light-weight formats (he didn’t say desiccated but I will) that data can be reduced to. They have a solution called LiMMA for this. Another approach is linking preservation planning approaches with Product Lifecycle Management. In this area they are developing a Registry/Repository of Representation Information for Engineering (RRoRIfE). Interesting comment that for marketing purposes the significant properties would include approximate geometry and no tolerances, but for manufacturing you would want exact geometry and detailed tolerances.

Finally for this session, Mark Guttenbrunner from TUV on evaluating approaches to preserving console video games. These systems started in the 1970s, and new generations and models are being introduced frequently in this very competitive area. It might sound trivial, but Lynne Brindley had earlier pointed out that one generation’s ephemera can be another generation’s important resource. In fact, there is already huge public interest in historical computer games. They used the PLANETS Preservation Planning approach to evaluate 3 strategies: one simple video preservation, and 2 emulation approaches. It was clear that IPR could become a real issue, as some games manufacturers are particularly aggressive in protecting their IPR against reverse engineering.

In the Q & A session, I asked the panellists whether they thought the current revision process for OAIS should include the concept of significant properties, currently absent. A couple of panellists felt that it should, and one thought that the concept of representation information should be cleaned up first! Session Chair Kevin Ashley asked whether anyone present was involved in the revision of this critical standard, and no-one would admit to it; he pointed out how worrying this was.

Tuesday, 8 April 2008

Seriously Seeking Significance

Yesterdays event at the British Library on Significant Properties in digital objects was a real gem. The programme was packed to the hilt, with eleven different speakers plus 45 minutes for audience discussion at the end of the day. It was a fantastic opportunity to hear more about the SigProp research that JISC has funded, as well as how significant properties are being explored in a number of other contexts such as in the PLANETS research into preservation characterisation.

Andrew Wilson started the day with a great and very targeted keynote that explored the background to significant properties research and what significant properties means to him. I was heartened when he almost immediately kicked off by talking about authenticity and significant properties, because authenticity is a big deal for me insofar as preservation is concerned! But this isn't a post about what we mean by authenticity in a digital environment, I shall save that for another day/time/place. Hot on his heels came Stephen Grace from CeRch, with an overview of the InSPECT project and future work, closely followed by David Duce presenting the conclusions of the Vector Images significant properties study. Interestingly, this used a slightly different measure of significance to the InSPECT project, which used a scale of 0 - 10 whereas the vector images study used from 0- 9. This imay be a minor deviation at the moment but might well assume more relevance when dealing with automated processes for migrating and assessing collections of mixed object types. Mike Stapleton then presented the results of the Moving Images study and, after a short break, on came Brian Matthews with the results of the Software study and Richard Davis with the results of the eLearning Objects study.

It quickly became clear that there was some variance in people's understanding of significant properties, particularly when one speaker stated that for different preservation approaches they would need different significant properties to acheve the desired level of performance. This is different to how I perceive significant properties. For me - and for several other speakers - significant properties define the essence of the object and are those elements that must be preserved in order to retain the ability to reproduce an authentic version of the object. To select different significant properties based on a given preservation approach surely means there is a diferent underlying understanding and use of the term significant properties.

What does this all go to show? That it's a case of different strokes for different folks? Well, to some extent, yes. It was widely accepted several years ago that different sectors had different requirements insofar as preservation was concerned - I remember attending an ERPANET workshop in Amsterdam in 2004, for example, that clearly illustrated just this point. And yesterday's audience and speakers represented an array of sectors with different requirements for preservation. So the whole concept of significant properties and use of the term across different sectors is something that I think we'd benefit from returning to discuss some more.

The afternoons sessions were, I think, intended to put a different perspective on the day. We heard from the PLANETS project, Barclays Bank, the DCC SCARP project, the SIGPROPS project from Chapel Hill, and a presentation on the relationship between Representation Information and significant properties. Cal Lee's presentation (SIGPROPS) on preserving attachments from email messages was fascinating and I suspect I'm not alone in wishing we'd had more time to hear more from him, but there simply wasn't the time. I suspect we would have had much more discussion if the programme had been spread out over two days - the content certainly justified it.

As always, the presentations will be available from the DPC website in due course. Keep an eye out though for the final report of the INSPECT project - they're not finished yet due to the change from the AHDS to CeRCH, but I expect it will be a fascinating read when they are done.

Saturday, 5 April 2008

When is revisability a significant property?

I’ve been thinking, off and on, about significant properties, and reading the papers for the significant properties workshop on Monday. Excellent papers they are, too, and suggest a fascinating day (the papers are available from here). I’m not entirely through them yet, so maybe this post is premature, but I didn’t see much mention of a property that came to mind when I was first thinking about the workshop: revisability (which I guess in some sense fits into the “behaviour” group of properties that I mentioned in the previous post).

In the analogue world, few resources that we might want to keep over the long term are revisable. Some can easily be annotated; you can write in the margins of books, and on the backs of photos, although not so easily on films or videos. But the annotations can be readily distinguished from the real thing.

In the digital world, however, most resources are at least plastic and very frequently revisable. By plastic I mean things like email messages or web pages that look very different depending on tools the reader chooses. By revisable I mean things like word processing documents or spreadsheets. When someone sends me one of the latter as an email attachment, I will almost always open it with a word processor (a machine for writing and revising), rather than a document reader. The same thing happens when I download one from the web. For web pages, the space bar is a shortcut for “page down”, and quite often I find myself attempting to use the same shortcut on a downloaded word processing document. In doing so, I have revised it (even if in only trivial ways). Typically if I want to save the down-loaded document in some logical place on my laptop, I’ll use “Save As” from within the word processor, potentially saving my revisions as invisible changes.

Annotation is rarer in the digital world, although many word processors now have excellent comment and edit-tracking facilities (by the way, there’s a nice blog post on da blog which points to an OR08 presentation on annotations, and an announcement on the DCC web site about an annotation product that looks interesting, and much of social networking is about annotation, and…).

My feeling is that there’s a default assumption in the analogue world that a document is not revisable, and an opposite assumption in the digital world.

One of the ways we deal with this when we worry about it, at least for documents designed to be read by humans, is to use PDF. We tend to think of PDF as a non-revisable format, although for those who pay for the tools it is perhaps more revisable than we think. PDF/A, I think, was designed to take out those elements that promote revisability in the documents.

If you are given a digital document and are asked to preserve it, the default assumption nearly always seems to kick in. People talk about preserving spreadsheets, worry about whether they can capture the formulae, or about preserving word processing documents, and worry about whether the field codes will be damaged. In some cases, this is entirely reasonable; in others it doesn’t matter a hoot.

When I read the InSPECT Framework for the definition of significant properties, I was delighted to find the FRBR model referenced, but disappointed that it was subsequently ignored. To my mind, this model is critical when thinking about preservation and significant properties in particular. In the FRBR object model, there are 4 levels of abstraction:

Work (the most abstract view of the intellectual creation)
Expression (a realisation of the work, perhaps a book or a film)
Manifestation (eg a particular edition of the book)
Item (eg a particular copy of the book; possibly less important in the digital world, given the triviality and transience of making copies).

The model does come from the world of cataloguing analogue objects, so it’s maybe not ideal, but it captures some important ideas. I’ll assert (without proof) that for many libraries the work is the thing; they are happy to have the latest edition, maybe the cheaper paperback (special collections are different, of course). Again I’ll assert (without proof) that for archives (with their orientation towards unique objects, such as particular documents with chain of custody etc), the four conceptual levels are more bound together.

Why is this digression important? Because many of the significant properties of digital objects are bound to the manifestation level. Preserving them is only important if the work demands it, or the nature of the repository demands it. Comparatively few digital objects have major significant properties at the work level. Some kinds of digital art would have, and maybe software does (I haven’t read the software significant properties report yet). If you focus on the object, you can get hung up on properties that you might not care about if you focus on the work.

Last time, I said the questions raised in my mind included:

"* what properties?
* of which objects?
* for whom?
* for what purposes?
* when?
* and maybe where?"

Here I’m suggesting that for many kinds of works, revisability is not an important significant property for most users and many repositories. That would mean that for those works, transforming them into non-revisable forms on ingest is perfectly valid, indeed might make much more sense than keeping them in revisable form. This isn’t at all what I would have thought a few years ago!

Wednesday, 26 March 2008

Significant Properties workshop

For my sins, I have agreed to chair the BL/DPC/JISC workshop "What to preserve? Significant Properties of Digital Objects" in just over a week. Mostly my duties will be to introduce speakers, and to keep them to the (ferocious) timetable, with perhaps a little light stimulation of discussion towards the end. However, I thought I should spend some time thinking about the idea of Significant Properties in digital preservation before the meeting. I hope (but don't guarantee, due to other commitments) that a couple of further posts on the topic will appear shortly.

This idea is something that has been on the fringes of my consciousness for a long time. The earliest reference I have was in 2001, when the CEDARS project (part of the eLib programme that I managed up to 2000) suggested in their final report that during the pre-ingest phase, an archive would have to assess the significant properties of the objects to be ingested:

"The archive will need to make decisions about what level of preservation is appropriate for each digital object (or each class of objects). This involves assessing which properties of a particular digital object are regarded as significant. These decisions influence the levels and methods of access that will be possible for the object, and the level of preservation metadata required for long-term retention."

The project later held a workshop where participants attempted to agree on the significant properties of a sample set of digital objects, and the work continued and overlapped with the CAMiLEON project, JISC/NSF funded and jointly between Leeds and Michigan.

OK, so that's the ancient history (I would be interested to know of anything even more ancient; I could not find any reference in OAIS, for example, so it maybe that the CEDARS team invented the concept). Thinking about it now, I have a whole bunch of questions in my mind, including:

what properties?
of which objects?
for whom?
for what purposes?
when?
and maybe where?

I'm sure there will be a bunch of definitions of significant properties coming up during the meeting. I liked one in a separate context, from Barbara Sierman of the Netherlands KB I hope she doesn't mind me quoting it):

"an important property of a certain digital object, as experienced by the user. Significant properties can be classified by five aspects of a digital object: structure, content, context, appearance, and behaviour. Examples: text (content), chapters (structure), metadata (context), colour (appearance), zoom-functionality (behaviour)."

The forthcoming workshop will feature reports from a number of projects and studies that JISC has funded in this area. It should turn out very interesting!

Digital Curation Blog

Monday, 29 September 2008

iPres 2008 Preservation planning session

Tuesday, 8 April 2008

Seriously Seeking Significance

Saturday, 5 April 2008

When is revisability a significant property?

Wednesday, 26 March 2008

Significant Properties workshop

Creative Commons

Blog Archive

Contributors

Labels