Tuesday 1 July 2008

Responses to RAW versus TIFF: image-related

This post summarises image-related responses to the “RAW versus TIFF” post made originally by Dave Thompson of the Wellcome Library. The key elements of Dave’s post were whether we should be archiving images using RAW (which contained more camera and exposure-related data) or TIFF (which was more standardized, likely to be accessible for longer, and had more available utilities). A subsidiary question on whether we should archive both is greatly affected by cost; responses related to this element are summarized in a separate post. As I mentioned before, these responses came on the semi-closed DCC-Associates email list. [I forgot to mention that Maureen Pennock re-posted Dave's question to this blog with some remarks of her own, and we also received a useful response which you can view there.]

One comment [on the DCC-Associates list] made by many was that JPEG2000 should be considered. Alan Morris of Morris and Ward asked if it was an ISO standard, to which Shuichi Iwata (former President of CODATA and Professor at the University of Tokyo) responded:
“Yes, it is. Organized by Joint Photographic Experts Group of ISO. It is used widely for compression of images at different levels.

http://www.jpeg.org/
James Reid of EDINA, University of Edinburgh, mentioned that “the DPC report on JPEG 2000 has relevance”. He also expressed concern:
“RAW images tend to have proprietary and sensor specific aspects that makes them less than ideal for a generic preservation format - they are of course lossless but there would be an implicit overhead in deriving sensor specific options to make preservation possible (and these may not be publicly available due to proprietary interests)...”
Larry Murray of the Public Record Office of Northern Ireland mentioned how essential RAW is to him as a keen amateur photographer, allowing him to manipulate images. He added:
“James is quite correct in his comment but personally I think the decision to save the RAW images is one of available resources. The RAW images give the data owner the opportunity to control so many of the characteristics of the original image without data loss or integrity and the ability to bulk process said changes means serious consideration must be given to their value.

For other reference on Open RAW formats see http://www.openraw.org/
Chris Puttick, CIO of Oxford Archaeology asked:
“RAW is the original - would you store a scan of a document as the primary method of preserving the document? I think RAW + [a common image format] is the best option. It does of course require greater storage space, but storage is cheap and getting cheaper every day.

Software such as DigiKam has libraries that read pretty much all the RAW formats including rarer ones such as that produced by the Foveon sensor (Sigma, Polaroid). Any additional documentation needed for RAW should be easily prised out of the sensor manufacturers if the right pressure is brought to bear...”
Stuart Gordon of Compact Services (Solutions) Ltd agrees:
“In order to ensure that the images are viewable for posterity we must have the primary archive images in a universal lossless format, currently TIFF however as standards evolve this will inevitably change.

The camera RAW formats are proprietary and are therefore not suitable as the primary archival format; however, if space is available then do archive the RAW as well.

As with all questions it depends on what your goals are and in our case it is ensuring that the archival images are stored in a format that will ensure they are accessible for posterity.”
Paul Wheatley, Digital Preservation Manager at the British Library commented:
“As well as the extremes of the camera/manufacturer specific formats and then at the other end of the spectrum, formats like JPEG2000 and TIFF, we also have a number of emerging RAW file “standards”, of which Open RAW is one. I’ve not seen any independent technology watch work to assess all these options in a reasoned way and make recommendations. Perhaps this is a candidate for a DPC Tech Watch piece? If so, I would like to see authorship from a panel of experts to ensure a balanced view. Both an independent viewpoint and careful analysis of the objectives behind the use of different file formats were a little lacking in the last DPC report. As Larry is hinting at here, I’m sure there are different answers depending on exactly what you want to achieve from the storage and preservation. These aims need to be carefully captured as well as the recommendations to which they relate.”
[UPDATE: Rob Berentsen, of Deloitte Consulting B.V.] brings us back to preservation concerns:
“The discussion whether to use Raw, Tiff or both is one that depends on what the (future) user demands will be and the current preservation demands are. As you can see, Larry prefers Raw so he can use the image in an optimal manner as photographer/manipulator. James however might have a different view on the importance of certain aspects of images and might have different future demands in mind.

Looking at the OAIS model, the most raw image should leave all roads open to create a TIFF image (or others). This TIFF might be more accessible, and even the non-lossless format JPG might be good enough for some people to have [the] image easily accessible over the internet.

To cope with this, the OAIS's archival information package (AIP) offers room for more then just one 'manifestation'. You can see one (the Raw version) as the preservation format, and others are possible future preservation formats or accessible formats. The accessible formats are commonly the ones end-users can get (DIP-package).”
Marc Fresko of Serco suggested there are two questions here:
“One is the question addressed by respondents so far, namely what to do in principle. The answer is clear: best practice is to keep both original and preservation formats.

The second question is "what to do in a specific instance". It seems to me that this has not yet been addressed fully. This second question arises only if there is a problem (cost) associated with the best practice approach. If this is the case, then you have to look at the specifics of the situation, not just the general principles. Most importantly:
  • what will you lose by going from RAW to TIFF (probably nothing that matters for many archival applications, but you will lose something which would matter in other applications)?
  • over how long do you need to preserve access to the images, compared to the likely lifetime of the software you'll need to access the specific RAW format?
You can easily see that different answers to these two questions will lead you to prefer different formats.”
Marc’s comment brings us back to costs, and there is a further series of emails on this topic which I will include in another blog post.

Colin Neilson of the DCC SCARP Project took the discussion towards science data:
“If you generalise and treat your camera as a "Digital Image Acquisition System" :-) then do you keep the original data from the instrument (camera) or would you prefer only to keep the derived data in some form of standardised output?

I suspect in a scientific context you would want to keep both. The problem is that the steps taken in deriving the data are not standardised or documented and may not be repeatable at a future time e.g. when the particular make of instrument (camera) becomes obsolete. The software & algorithms for processing captured image data to standard format are embedded in the camera and function to adjust the data produced from the proprietary image processor (e.g. Canon's DIGIC chip). Some of the "improvements" produced in image quality may not be welcome in a scientific context (e.g. may have introduced artefacts). Some makes and models of camera offer the option save both the "raw" image and the derived image at the cost of memory and processing time.

You can seem some of the issues developed when looking at open microscopy environments where the issue is image capture with effective data management system for multidimensional images including spatial, temporal, spectral and lifetime dimensions. Slide 13 of this presentation by Kevin Eliceiri sets out the current situation. This is quite an exciting area!

Of course this may all be a bit academic (not to say "over the top") if your main concern is to take the best steps to protect you holiday pictures or personal photography efforts. I have found this article useful in an amateur photography context.”
Finally (for this post), Riccardo Ferrante, who is IT Archivist & Electronic Records Program Director for the Smithsonian Institution Archives agreed with Colin:
“We have scientists here that consider the raw data absolutely critical and whose integrity and authenticity must be preserved (archival package). However, the usefulness of the data comes only when the raw data is filtered. The filtered data (derivation) is also considered an archival package by these scientists since it is reused repeatedly in the future, cited in publications, etc. and therefore needs its authenticity, integrity, and provenance to be preserved. To use OAIS-Reference Model language - the DIP generated from the original AIP (raw data) is itself a derived AIP.

In contrast, we have artists who deal solely in photographs and do not consider the raw image to be an object of record, but more like an ingredient for the object that is finalized after post-processing occurs, i.e. after it is published.

To me, it seems the common point is what the creator considers the object of record and the purpose of the raw data in light of that - this should be a major factor in determining the object of record from a recordkeeping viewpoint.”
I thought this was an extremely useful conversation!

1 comment:

  1. Thank you very much for this post - useful for me to come up to speed on this issue.

    ReplyDelete

Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.