Friday 6 July 2007

Authenticity across migrations

I discovered a few days ago that I have 4 digital objects that are (I believe, but am not certain) in some strong senses “the same” (in their information content), but which are also completely different (in their bits). These objects are the result of a chain of “exports” and “imports”, and “save as…” operations, prompted partly by a change of technology (from a Windows PC running Mind Manager to a Macintosh running NovaMind), and partly from a need to make the content of the object more accessible to colleagues who do not use either software package. In case you're interested, these files are now accessible at the URLs below; I have shown the original date for each file:

15/10/2004: http://www.dcc.ac.uk/docs/blog/GWG%20vision%20and%20action.mmp
09/08/2005: http://www.dcc.ac.uk/docs/blog/GWG%20vision%20&%20action.xml
06/10/2005: http://www.dcc.ac.uk/docs/blog/GWG%20vision%20and%20action.pdf
10/01/2006: http://www.dcc.ac.uk/docs/blog/GWG%20vision%20&%20action.nmind

I’m very interested in the question of authenticity across migrations. Whilst migrations for preservation purposes are perhaps likely to be needed much less often than we once thought, they are nevertheless inevitable given long enough time. How can we plausibly assert that these objects represent “the same thing”?

In this case, each of these objects represents a simple document, that is they can each be transformed into a 2-D representation. Three of the objects have other capabilities: with the appropriate software they can be edited, by adding or removing elements, and by moving elements in relation to one another. This is true to some extent of the fourth object (a PDF, which for most people can be viewed to see a representation of the object), but in a different and much more restricted way. However, this editability, while vital for some kinds of re-use, is not essential for conveying the information essence of the objects.

These documents happen to be mind maps. The PDF should help some of those unfamiliar with this type of document. Mind mapping is a particular approach to organising ideas that originated from Tony Buzan (Buzan 1974). I find the approach particularly useful and productive; however I suspect most of my colleagues groan inwardly when yet another mind map is produced for them to consider. They are a little lightweight as communications devices (and so may have rather limited value as records), yet I persist in thinking they may have value for others. An interesting feature here, however, is that there is no real standardisation in this market. Mind Manager (from MindJet) may perhaps have the market share to force some degree of standardisation (hence the ability of NovaMind and at least one of the Open Source versions that I am aware of , ie FreeMind, to import from Mind Manager XML files). This fractured market led to my having these 4 versions.

I realise that this blog post runs a real risk of being dismissed because of a naïve and undefined approach to “sameness”, and that this idea of sameness, is likely to be linked to the “significant properties” of the object; further that these significant properties will differ for different users. I’ll try to re-visit these ideas later (for now, note a report by Hedstrom and Lee, and a recent ITT from JISC).

One approach to asserting authenticity in this case might be to print out or view each document, and use a visual comparison to assert sameness. It’s worth noting though that I cannot use this technique easily at the moment across the full set, because I no longer have access to the application software that will render the first of the objects (although I believe the test could be done; the object is not obsolete). Furthermore, although the comparison is feasible for such a small number of objects, it is not scalable to the large numbers of objects that one would expect to find in a typical digital repository environment.

One other technique could perhaps be to record provenance data on the processes involved in the migration. In this case the object was created with an old version of Mind Manager, and then exported as a Mind Manager XML file once the conversion to Mac had occurred, and it appeared that no Mac version of Mind Manager was likely (although two years later one exists). This XML version was created explicitly as a migration stage, because it could be read an imported by the Mac application of choice, the software NovaMind. At some point, a PDF version was created to allow the object to be emailed to colleagues who did not have either application software. Shorn of dates and evidence, that is the provenance chain for these 4 objects… except I have a niggling concern that I might have changed the first Novamind version after importing it. A true provenance chain would record ALL the changes to the object; in this case, I have found a way of checking, but I’ll leave it as an exercise for the reader to work out whether it was changed or not! The problem is that this provenance chain only tells us what happened at a gross level (what large scale operations were applied), not whether the migrations were successful, nor what artefacts were introduced or features lost.

Perhaps there is a way, maybe linked to particular families or genres of document types, of computing properties from documents that we would strongly wish to see as stable across migrations. In this case, the text labels on the arms of the mind maps might be one such desirable invariant; quite a strong one, but not perfect (for example, the text labels may show as invariant but might have become detached from their branches, destroying the meaning in their relationships, while more adventurous mind mappers use much more in the way of imagery and other techniques to convey part of their message).

However, if we are to plausibly assert authenticity across migrations, we will have to identify some such invariants. They may be extremely important parts of the preservation metadata, just as checksums and signatures are for checking the authenticity of objects that we believe have not been changed. Any suggestions on candidate invariants for some object classes?

* BUZAN, T. (1974) Use Your Head, BBC.

0 comments:

Post a Comment

Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.