Just to recap, the OAIS formula is that a Data Object interpreted using its Representation Information yields an Information Object. Examples often cite specifications or standards, eg suggesting that the Repinfo (I’ll use the contraction instead of “representation information”) for a PDF Data Object might be (or include) the PDF specification.
Sometimes there is controversy about repinfo versus format information (often described by the repinfo enthusiasts as “merely structural repinfo”). So it’s nice to read a sensible comparison:
"For the purposes of this paper, the definition of a format proposed by the Global Digital Format Registry will be used:This does seem to place repinfo and format information (by this richer definition) in the same class.
“A byte-wise serialization of an abstract information model”.
The GDFR format model extends this definition more rigorously, using the following conceptual entities:
• Information Model (IM) – a class of exchangeable knowledge.
• Semantic Model (SM) – a set of semantic information structures capable of realizing the meaning of the IM.
• Syntactic Model (CM) – a set of syntactic data units capable of expressing the SM.
• Serialized Byte Stream (SB) – a sequence of bytes capable of manifesting the CM.
This equates very closely with the OAIS model, as follows:
• Information Model (IM) = OAIS Information Object
• Semantic Model (SM) = OAIS Semantic representation information
• Syntactic Model (CM) = OAIS Syntactic representation information
• Serialized Byte Stream (SB) = OAIS Data Object"
Time for a short diversion here. I was quite taken by the report on significant properties of software, presented at the workshop by Brian Matthews (not that it was perfect, just that it was a damn good effort at what seemed to me to be an impossible task!). He talked about specifications, source code and binaries as forms of software. Roughly the cost of instantiating goes down as you move across those 3 (in a current environment, at least).
- In preservation terms, if you only have a binary, you are pretty much limited to preserving the original technology or emulating it, but the result should perform “exactly” as the original.
- If you have the source code, you will be able to (or have to) migrate, configure and re-build it. The result should perform pretty much like the original, with “small deviations”. (In practice, these deviations could be major, depending on what’s happened to libraries and other dependencies meanwhile.)
- If you only have the spec, you have to re-write from scratch. This is clearly much slower and more expensive, and Brian suggests it will “perform only gross functionality”. I think in many cases it might be better than that, but in some cases much worse (eg some of the controversy about the MicroSoft-based OOXML standard with MS internal dependencies).
“…Access Software provides a means to interpret a Data Object. The software therefore acts as a substitute for part of the representation information network – a PDF viewer embodies knowledge of the PDF specification, and may be used to directly access a data object in PDF format.”This seems to make sense; again, it’s in the OAIS spec, but hard to find. So Brown proposes that:
“…representation information be explicitly defined as encompassing either information which describes how to interpret a data object (such as a format specification), or a component of a technical environment which supports interpretation of that object (such as a software tool or hardware platform).”Of course the software tool or hardware platform will itself have a shorter life than the descriptive information, so both may be required.
The bulk of the report, of course, is about representation information registries (including format registries by this definition), and is also well worth a read.