Showing posts with label Authenticity. Show all posts
Showing posts with label Authenticity. Show all posts

Thursday, 3 April 2008

A Question of Authenticity

I’m just on my way back from Wigan, having given a presentation this afternoon on the role of the records manager in digital preservation to a group of, you’ve guessed it, records managers. I was really encouraged to see that the group had decided to dedicate their entire meeting today to tackling the thorny issue of digital preservation. It doesn’t happen very often – usually there’s just the odd session on digital preservation at this type of event – but it’s really great that they recognised from the start that digital preservation can’t be covered in 45 minutes! That’s not to say it can be covered in a day of course, but at least it gave them the opportunity to hear from a number of different speakers, all of whom approached the subject from a different perspective.

One of the things I decided to focus on was the issue of authenticity. It’s a real interest of mine and I’ve been wondering for a while if perhaps we might not be paying as much attention to it as we ought to be, particularly insofar as office records such as text documents, spreadsheets and emails are concerned. The immediate value of this type of record lies, to a great extent, in their evidential value. Their evidential value rests on their authenticity. If the authenticity of the records is compromised then their evidential value is too, and then we run into all sorts of issues like legal accountability and so on.

One of the reasons I think it’s such a relevant issue for records managers is the fact that even small migrations through different applications and application versions have the potential to impact on the authenticity of a record. Such migrations are known to have the ability to alter, for example, automatically generated content such as date and author fields in a text document. In a spreadsheet that derives strings of cell content from embedded formula, a migration of this sort can also impact on the end calculation – particularly if the spreadsheet has been badly created or has errors in it.

The problem is, of course, that migrations on this scale can take place fairly frequently. An organisation may decide to ‘upgrade’ the software they use, or a user may decide to access and develop a shared file with a different application to the one that’s commonly used, with unexpected results. If there’s no attempt to control these circumstances (and I’m not necessarily saying they can be controlled but that we should at least make an attempt to control them) then the risks to a record’s authenticity are increased. And all this before it even reaches an archive.

We talk a lot in our community about Trusted Digital Repositories and preserving digital objects once in the archive. But I’d really love to see some more discussion of how we can ensure records are maintained in an authentic way before they are actually ingested. This is where the records managers – and the records creators – come in. Because otherwise, we run the risk of preserving records whose authenticity could be in question despite their storage/preservation in a TDR. And, because they are stored/preserved in a TDR, their authenticity prior to the point of ingest may never even be questioned and non-authentic records therefore find their way into use.

Friday, 6 July 2007

Authenticity across migrations

I discovered a few days ago that I have 4 digital objects that are (I believe, but am not certain) in some strong senses “the same” (in their information content), but which are also completely different (in their bits). These objects are the result of a chain of “exports” and “imports”, and “save as…” operations, prompted partly by a change of technology (from a Windows PC running Mind Manager to a Macintosh running NovaMind), and partly from a need to make the content of the object more accessible to colleagues who do not use either software package. In case you're interested, these files are now accessible at the URLs below; I have shown the original date for each file:

15/10/2004: http://www.dcc.ac.uk/docs/blog/GWG%20vision%20and%20action.mmp
09/08/2005: http://www.dcc.ac.uk/docs/blog/GWG%20vision%20&%20action.xml
06/10/2005: http://www.dcc.ac.uk/docs/blog/GWG%20vision%20and%20action.pdf
10/01/2006: http://www.dcc.ac.uk/docs/blog/GWG%20vision%20&%20action.nmind

I’m very interested in the question of authenticity across migrations. Whilst migrations for preservation purposes are perhaps likely to be needed much less often than we once thought, they are nevertheless inevitable given long enough time. How can we plausibly assert that these objects represent “the same thing”?

In this case, each of these objects represents a simple document, that is they can each be transformed into a 2-D representation. Three of the objects have other capabilities: with the appropriate software they can be edited, by adding or removing elements, and by moving elements in relation to one another. This is true to some extent of the fourth object (a PDF, which for most people can be viewed to see a representation of the object), but in a different and much more restricted way. However, this editability, while vital for some kinds of re-use, is not essential for conveying the information essence of the objects.

These documents happen to be mind maps. The PDF should help some of those unfamiliar with this type of document. Mind mapping is a particular approach to organising ideas that originated from Tony Buzan (Buzan 1974). I find the approach particularly useful and productive; however I suspect most of my colleagues groan inwardly when yet another mind map is produced for them to consider. They are a little lightweight as communications devices (and so may have rather limited value as records), yet I persist in thinking they may have value for others. An interesting feature here, however, is that there is no real standardisation in this market. Mind Manager (from MindJet) may perhaps have the market share to force some degree of standardisation (hence the ability of NovaMind and at least one of the Open Source versions that I am aware of , ie FreeMind, to import from Mind Manager XML files). This fractured market led to my having these 4 versions.

I realise that this blog post runs a real risk of being dismissed because of a naïve and undefined approach to “sameness”, and that this idea of sameness, is likely to be linked to the “significant properties” of the object; further that these significant properties will differ for different users. I’ll try to re-visit these ideas later (for now, note a report by Hedstrom and Lee, and a recent ITT from JISC).

One approach to asserting authenticity in this case might be to print out or view each document, and use a visual comparison to assert sameness. It’s worth noting though that I cannot use this technique easily at the moment across the full set, because I no longer have access to the application software that will render the first of the objects (although I believe the test could be done; the object is not obsolete). Furthermore, although the comparison is feasible for such a small number of objects, it is not scalable to the large numbers of objects that one would expect to find in a typical digital repository environment.

One other technique could perhaps be to record provenance data on the processes involved in the migration. In this case the object was created with an old version of Mind Manager, and then exported as a Mind Manager XML file once the conversion to Mac had occurred, and it appeared that no Mac version of Mind Manager was likely (although two years later one exists). This XML version was created explicitly as a migration stage, because it could be read an imported by the Mac application of choice, the software NovaMind. At some point, a PDF version was created to allow the object to be emailed to colleagues who did not have either application software. Shorn of dates and evidence, that is the provenance chain for these 4 objects… except I have a niggling concern that I might have changed the first Novamind version after importing it. A true provenance chain would record ALL the changes to the object; in this case, I have found a way of checking, but I’ll leave it as an exercise for the reader to work out whether it was changed or not! The problem is that this provenance chain only tells us what happened at a gross level (what large scale operations were applied), not whether the migrations were successful, nor what artefacts were introduced or features lost.

Perhaps there is a way, maybe linked to particular families or genres of document types, of computing properties from documents that we would strongly wish to see as stable across migrations. In this case, the text labels on the arms of the mind maps might be one such desirable invariant; quite a strong one, but not perfect (for example, the text labels may show as invariant but might have become detached from their branches, destroying the meaning in their relationships, while more adventurous mind mappers use much more in the way of imagery and other techniques to convey part of their message).

However, if we are to plausibly assert authenticity across migrations, we will have to identify some such invariants. They may be extremely important parts of the preservation metadata, just as checksums and signatures are for checking the authenticity of objects that we believe have not been changed. Any suggestions on candidate invariants for some object classes?

* BUZAN, T. (1974) Use Your Head, BBC.