Tuesday 17 July 2007

The National Archives and Microsoft join forces...

On 4 July, The National Archives and Microsoft announced a Memorandum of Understanding "ensuring preservation of the nation´s digital records from the past, present and into the future". Partly this relates to the standardisation of the Office Open XML format for Microsoft's products (I note that O'Reilly appear strongly in favour of this activity, seeing no conflict with the standardisation of Open Document Format). I had thought, through the rumour mill, that MS had provided (or was to provide) the specifications of obsolete file formats. This would fit with TNA's PRONOM file format registry, and their Seamless Flow programme. However, the key paragraph instead seems to be:
"Today´s announcement sees Microsoft provide The National Archives with access to previous versions of Microsoft´s Windows operating systems and Office applications powered by Microsoft Virtual PC 2007. Virtual PC 2007 enables people to run multiple operating systems at the same time on the same computer. This allows The National Archives to configure any combination of Windows and Office from one PC, thereby allowing access to practically any document based on legacy Microsoft file formats. It is estimated that The National Archives will have to manage many terabytes of data in these formats."
This sounds as if TNA must keep licensed versions of all older MS products, but then can run them under this emulation mode. This is a LOT better than nothing, but not as good as open access to the old file formats, with the ability to build additional tools that implies. But maybe there's more than is apparent in the press release?

Should data people care? Apart from the documentation and other information that many will have stored in obsolete proprietary formats, it turns out that many store their data that way as well. The excellent JISC-funded StORe project did a survey of several disciplines, and got over 350 responses. Although one respondent said “God preserve us from idiots who archive data in proprietary commercial formats (Excel spreadsheets and MS-word documents)!”, 220 said they kept source data in spreadsheets and the same number kept them in word processed documents. These were the largest categories except images (228)!

Let's hope they are keeping them somewhere else as well...

1 comment:

  1. Totally off-topic, but may I be a bit forward and make the suggestion that you have a tag-list or tag-cloud in the sidebar. T'would be most helpful. Cheers.


Please note that this blog has a Creative Commons Attribution licence, and that by posting a comment you agree to your comment being published under this licence. You must be registered to comment, but I'm turning off moderation as an experiment.