Digital Curation Blog: Data Recovery

Showing posts with label Data Recovery. Show all posts

Wednesday, 10 June 2009

Need to read a CT scan from an old Magneto-optical disk

Philipp Westebbe from Berlin wrote to me:

“I am working at Rathgen Research Laboratory (National Museums in Berlin) in Germany. I have some valuable old data (computer tomography images) on MOD [Magneto-optical disc – CR] format. I am aware this is not the usual topic of the list, but we cannot think of a better place to start with. I am working on a project using an old MOD (Laser Drive Media LM1200-002) for recording computer tomography images. I'm looking for an old PC and the belonging software with which I can read this old MOD from 1992.”

In a later email, he wrote:

“At the moment I am coordinating the project about the Egyptian Queen Nefertiti bust [see Wikipedia- CR]. 1992 the Charite hospital in Berlin made a CT from the bust to analyse the internal structure. The CT images were saved at that time on a MOD LM 1200-002. The data are unfortunately no longer readable, because the Charite hospital doesn't have the specific computer any more. The only things I have in my research lab are the drive and the MOD with Nefertiti's scan on it. I got over long paths to your email. I am looking for someone who has such a computer with which I can read my old MOD from Nefertiti, and for a possibility to copy this old format into a new digital format. Nevertheless, I hope to get a contact, which helps me to read the old MOD from Nefertiti. It would be very nice if the person who reads this mail could help me handling my problem, or could send the email to someone who is still working with such an old CT system. It is quite difficult to get an old MOD from 1992 back into a common digital format.”

In a third email, he made clear that he has one disk, on which the CT scans are stored, and that the data are regarded as very valuable, as the bust is too fragile to scan again. He didn’t say, but perhaps the importance (and hence the value) of these data is increased by a current controversy over the authenticity of the bust.

Documents from NARA and other places show that the LM1200 disk is 12” Write-Once, Read-Many (WORM) dual-sided media, capable of holding 1 GByte each side. A catalogue from Siemens identifies it as Plasmon (Philips) WORM Media. These media are reputed to have a long life expectancy.

As far as I can tell, this drive will have been part of an embedded computer in the CT machine, although we don’t at this stage know the brand of the CT machine (or the type of embedded computer; many such machines of that vintage would have been “mini-computers” rather than PCs). The UK-based Plasmon PLC bought Philips LMS, the company that produced Philips 12” optical disks and drives, in 1999. It looks like Plasmon failed at the end of last year, and was in its turn bought by Alliance Storage Technologies in the US.

Their technical support points out that reading the data off the disk may not be enough, because of the proprietary formats involved at the time. “Your best bet is to contact a data migration company that has experience with the application that wrote the data or the company that wrote the software. If you have trouble reading the media because of a problem with your drive we may be able to help but there are few parts left for the drive… Sorry to provide such a bleak outlook. Medical companies tend to maintain legacy equipment the longest so there is some hope.”

Nevertheless, before there is a chance of interpreting the data, Phillip needs to get it off the disk. Does anyone know of a contact who could read and interpret such a disk?

Tuesday, 21 April 2009

Lazyweb: Mac backup software associated with Iomega Jaz?

In my data recovery sideline, I thought I might tackle some of my own ancient media. I have a number of Mac backups on ancient Jaz 1 GB disks, written in the period 1997-1999. Andrew Treloar of ANDS has recovered the contents of those files onto CD-ROM, so I'm not looking to read the media any more. There are 2 backups on each of 2 Jaz disks, and the CD has 4 files from each Jaz disk. The earliest two have the following names:

FIGIT 2.971113A #001
FIGIT 2.971113A.FULL

I'm guessing these were written on 13 November, 1997. Definitely from a Mac. Does anyone know what backup software wrote these? I would like to recover the contents if possible!

The .FULL is about 250 KB, while the #001 appears to be around 300 MB and is presumably the actual backup!

Any ideas on what the backup software might be? I thought it might be some standard software on the Iomega install. The install disks I still have don't read properly, but I managed to get sight of a directory for a French DOS install disk, and couldn't see any file names that looked like backup software!

Sunday, 19 April 2009

Amiga disk data recovery: progress and limitations

You may remember that I have been attempting to recover files and content from various sources from 10 or more years ago. One of these was an Amiga disk. On the label is the note: "Dissertation 17/4/96, CV.asc, CV, 29 September 1996".

I’ve described earlier some attempts to get the Catweasel controller to read the disks. After eventually figuring out how to configure the disk-reading program ImageTool3 for the Catweasel, I tried the Amiga disk. It worked fine, with as far as I can see zero errors. From a cursory scan of Google, I reckon this raw disk format is known as ADF, so I renamed it XXXAmiga.adf (.adf was one of the candidate extension names under the selected "Plain" category for the ImageTool3 program).

Now, of course, we have to work out first how to extract files from the disk image, and then how to convert your particular file format into a modern day format.

Just simply reading the raw disk image with Notepad on Windows or Textedit (on my Mac) shows that there is real text there, that made sense to my colleague (see below)!

A comment from “Euan” on my earlier post suggested that we try the WinUAE Amiga emulator, and my colleague did that. He reported:

“Success. I've not only got the WinUAE Amiga emulator working, but managed to find a copy of the application that I wrote my CV and dissertation in (Final Writer 5) and have been able to read the files off the disk image you sent and display them (screenshots attached).

Not having any luck reading the individual files directly [CR: from his Windows system], though -- other than the odd word related to fonts and colours -- but then they are in native FW5 format.”

Image from the dissertation seen in Final Writer

I asked if he was able to do any "save as" operations in his emulated Final Writer program, to move files from the disk image into the Windows file store. He reported:

“I've tried re-saving my files in another format, if that's what you mean, but the program doesn't do anything -- I can select Save > Save As... from the menu but nothing happens. However, I can see all my individual dissertation files from my PC as the file system is mapped onto a directory.”

Raw image from the same part of the dissertation grabbed from Textedit on the Mac

It was remarkable how much could be read directly in the disk image!

Now my colleague was able to read his CV.asc file with Notepad on Windows, but so far we have not been able to convert the dissertation to a modern format, nor to connect the Final Writer program inside the emulator to a printer. Frustratingly close, but still not quite where we would like to be. I did find a demo copy of Final Writer for Windows 95 on the WayBack machine (earliest lift of the site, also 1996), but unfortunately this wouldn't open the existing image unless we upgrade to the full-featured version... but the company appears to have gone bust in 1996-7 or thereabouts!

So what have we learned from this?

It is possible to read a 13 year old floppy disk from an obsolete machine with an apparently incompatible disk format, kept under conditions of less than benign neglect, using cheap hardware on a recent Windows PC.
It is possible to access the files from the obsolete operating system using an emulator that appears to have been written by spare time volunteers.
It is possible to run the original application that created some of these files, under the emulator, and to read and process them (but not, so far, to save in another format).
Using the emulator is valuable, but constraining (in being unfamiliar technology, with few manuals etc) and limiting (in not, so far, being able to do much more with the files). We would now like to migrate them to a modern environment; for my colleague, this means Windows or Linux.

Fascinating!

Wednesday, 15 April 2009

Heroic data recovery story: 40-year old lunar orbital data

Thanks to Andrew Treloar for the heads up on a wonderful data recovery story. Andrew’s link was to a Slashdot article; there’s not a lot of information there. After a bit of searching around, I found a longer version of the story from last November at CollectSpace, and an interesting discussion on WattsUpWithThat (dated 1 April, which is why I emphasised the November date!).

Roughly the story is that NASA has been sitting on a collection of some 1500 2” analogue tapes which captured high resolution images taken by the Lunar Orbiter mission in 1964 that preceded the Moon landings. The aim of this program was to identify sites for the landings. Although NASA only used low resolution versions of the images (including the famous “Earth Rise” image, see Wikipedia article), the high resolution images were recorded onto the tapes by special Ampex FR-900 instrumentation recorders. Many years later, the archivist concerned refused to throw the tapes out. And eventually even found 3 non-functioning drives, which she stored in her garage, until news of the story leaked out to some interested people, including at least one retired Ampex engineer. Since then, with a little funding from NASA, and some space in an old McDonalds at NASA Ames, they have been attempting to refurbish the drives, and have managed to get at least two images off the tapes. If they can find more manuals and parts, they hope eventually to read all the tapes.

There’s lots that is interesting here. I particularly liked the response to this question from a reader: “What’s being done about the tape itself? You can’t just pull it out of the tins and thread it up.” Dennis Wingo, the main engineer involved replied:

“You know, amazingly enough that is exactly what we do. I am continually amazed that this works as I used to run a TV studio in LA where we could not do that even 20 years ago, but we have had an amazingly small number of tapes that even produce head clogs. We have run some tapes as many as 20-30 times in doing testing. Absolutely amazing to us.

I heard a story, that seems plausable but not sure about it really. It turns out that in 1975 all of the vendors changed the formula for the adhesive that holds the iron particles to the tape back. The new formulation had problems with moisture and over time degraded significantly.

However, the tape before 1975 that did not have this problem was made with a different mixture that included WHALE OIL, I kid you not. I don’t know if this is true […].”

One commenter had a major set of suggestions on how to deal with the stream if all they had was the analogue signal. Dennis responded, in part:

“We are using PCIe digitizers on a Mac Pro workstation to capture our data so we are able to get high data rate captures up to 180 megasamples per second.

We can do a lot with that but we would have to do a LOT of digital post processing including doing the demodulation in the digital domain, which is NOT going to be easy. Much easier to do in hardware, especially since we have spent the money doing so.”

Hey, that'se a hardware engineer talking! I guess it means he’s optimistic to be able to make and keep the drives alive long enough to digitise the entire set, rather than trying any more radical software solutions.

Data recovery from > 40 years ago, anyone? It is heroic, but let’s not be frightened!

Wednesday, 1 April 2009

An update on my data recovery efforts

You may remember that after our Christmas party late last year, I wrote a blog post offering to have a go at recovering some old files, if anyone was interested. A half dozen or so people got in touch, one with 30 or so old Mac disks, someone with a LaTeX version of a thesis on an old Mac disk, a colleague (who started all this, really) with a dissertation on an old Amiga disk, and someone with a CD from an Acorn RISC PC, plus a few others.

I frightened some off by giving them a little disclaimer to agree to, but others persisted and sent me their media. So, problem number 1, how to read old Amiga and Apple floppy disks? Both of these have a different structure from PC-compatible floppies, for example the old Mac disks store more information on the outer tracks than the inner ones, thus packing more data onto the disk.

The answer seemed to be a special controller for a Windows computer, that links to a standard floppy drive. The controller is called the Catweasel Mk4, from a German company, Individual Computers. We ordered one and it arrived quite quickly, well before I had managed to borrow a Windows box to experiment with. The card didn't physically fit in the first system, and then there was a long wait while we found a spare monitor (progress was VERY slow when I had to disconnect and re-use my Mac monitor). Then bid-writing intervened.

Eventually we got back to it, but had lots of problems configuring the controller properly; the company was quite good at providing me with advice. Finally a couple of days ago I finally got the config file right.

I wasn't using any of the contributed disks for testing, but instead some old DOS disks that I had from my days in Dundee (1992-4). Early on we did manage to read these, with rather a lot of errors, but lately we have got zero good sectors off these disks. I'm still not sure why; I'm inclined to blame my attempt to read one of the disks using the Windows commands on the same drive (there's a pass-through mode); I never managed to get any DOS disks to read a single track after that!

Well, today I stuck in an old Mac disk I had... and lo and behold, it was reading with a fair proportion of good sectors. So, let's try the Amiga disk: 100% good (well, maybe one bad sector). And the Mac disk with the LaTeX file on it: pretty good, 1481 good sectors out of 1600.

The problems aren't over yet. All you get from the ImageTool3 program that works through this controller is a disk image. So now I'm looking for a Windows (or Mac) utility to mount an Amiga file system, so we can copy the files out of it. And ditto for the Mac file system (written circa 1990 I think; I'm assuming it's HFS, but don't know).

At that point, depending on the amount of corruption, the Mac job should be pretty much done; my contributor still understands LaTeX and can probably sort out his old macros. For the Amiga files, there will be at least one further stage: identifying the file formats (CV.asc is presumably straight ASCII text, but the dissertation may be in a desktop publishing file format), and then finding a utility to read them.

It's been slow but interesting, and I've been quite despondent at times (checking out the data recovery companies, that sort of thing). But now I'm quite excited!

These two contributors have got their disk images back, and may have further ideas and clues. But can you help with any advice?

Thursday, 8 January 2009

Christmas offers

As the haze of Christmas (or Holiday Season, if you must) goodwill wanes, and New Year Resolutions already begin to fade, I'd better review what I've been offered so far in my Christmas files quest. I'm not overwhelmed by the numbers, I should say, so you still might tempt me with something else. They include

a thesis and CV on an Amiga floppy disk
some old archaeology project DBs in (probably) dBase format but split over a number of floppies, c.1992
thesis data analysed with KaleidaGraph, on 3.5" double sided, double density floppy disks formatted for the Mac
optionally also some Word for Mac documents (the text and figures of the above thesis)
these disks may also contain the raw data (taken on a 286 using a program this respondent wrote himself) and transferred to the Mac. If it is on the disk, it will be in a format he defined himself so he thought it might fall into the "less interesting" category I suggested. On the other hand, giving it a go might cause us to learn something.
some educational resources produced "some years ago" on an Acorn... they still have the Acorn!
another thesis written in LaTeX using Textures, on very ancient Mac disks.

I'm also hoping to get something from a colleague, following up a related conversation a month or so before Christmas, but he's gone on extended leave so I won't know for a while.

Are these "interesting"? Yes, I think so, although I suspect the most challenging part is getting them off the media. Both Amiga drives and old Mac drives use currently non-standard formats that mean you can't easily read the disks on modern systems (I don't know about the Acorn disks yet). At first I thought we might just aim to find someone with working old hardware (and that still might be an option, but see this tale [link added later, got distracted, sorry!]). But it also turns out there's a controller called the CatWeasel that you can add to a PC and connect an ordinary current drive to, that is supposed to figure out Amiga, Atari, Mac and other formats. It's cheap enough to give a go.

Oh, and thanks to Cliff Lynch for a pointer to the Digital Lives project and it's upcoming conference, see http://www.bl.uk/digital-lives/conference.html; Digital Lives deals with this kind of stuff, and Jeremy Leighton John is just fascinating to listen to. He spoke at one of the DCC workshops, see his slides.

Friday, 19 December 2008

The 12 files of Christmas

(Well, half a dozen, maybe.) OK, it’s Christmas, or the Holiday Season if you really must, and I may have a present for you!

My readers will know that I have an interest in obsolete and obsolescent data and files. I’ve tried in the past to get people to tell me about their stuff that had become inaccessible, with only limited success. So, after our work Christmas lunch, with the warm glow of turkey and Christmas pudding still upon me, and a feeling of general goodwill, I thought I’d offer you an opportunity to give me a challenge.

I will do my best to recover the first half dozen interesting files that I’m told about… of course, what I really mean is that I’ll try and get the community to help recover the data. That’s you!

OK, I define interesting, and it won’t necessarily be clear in advance. The first one of a kind might be interesting, the second one would not. Data from some application common of its time may be more interesting than something hand-coded by you, for you. Data might be more interesting (to me) than text. Something quite simple locked onto strange obsolete media might be interesting, but then again it might be so intractable it stops being interesting. We may even pay someone to extract files from your media, if it’s sufficiently interesting (and if we can find someone equipped to do it).

The only reference for this sort of activity that I know of is (Ross & Gow, 1999, see below), commissioned by the Digital Archiving Working Group.

What about the small print? Well, this is a bit of fun with a learning outcome, but I can’t accept liability for what happens. You have to send me your data, of course, and you are going to have to accept the risk that it might all go wrong. If it’s your only copy, and you don’t (or can’t) take a copy, it might get lost or destroyed in the process. You’ll need to accept that risk; if you don't like it, don't send it. I might not be able to recover anything at all, for many reasons. I’ll send you back any data I can recover, but can’t guarantee to send back any media.

The point of this is to tell the stories of recovering data, so don’t send me anything if you don’t want the story told. I don’t mind keeping your identity private (in fact good practice says that’s the default, although I will ask you if you mind being identified). You can ask for your data to be kept private, but if possible I’d like the right to publish extracts of the data, to illustrate the story.

Don’t send me any data yet! (It’s like those adverts: send no money now!) Send me an email with a description of your data. I’m not including my email address here, but you can find it out easily enough. Include “12 files of Christmas” in your subject line. Tell me about your data: what kind of files, what sort of content, what application, what computer/operating system, what media, roughly what size. Tell me why it’s important to you (eg it’s your thesis or the data that support it, your first address book with long-lost relative’s address on it, etc). Tell me if (no, tell me that) you can grant me the rights to do what’s necessary (making copies or authorising others to make copies, making derivative works, etc).

Is that all? Oh, we need a closing date… that had better be twelfth night, so 11:59 pm GMT on 5 January, 2009 (according to Wikipedia)!

Any final caveats? Yes; I have not run this past any legal people yet, and they may tell me the risks of being sued are too great, or come up with a 12-page contract you wouldn’t want to sign. If that happens, I’ll come back here and ‘fess up.

Ross, S., & Gow, A. (1999). Digital Archaeology: Rescuing Neglected and Damaged Data Resources (Study report). London: University of Glasgow.

Thursday, 20 November 2008

Comments on an obsolescence scale, please

Many people in this business know the famous Jeff Rothenberg quote: “It is only slightly facetious to say that digital information lasts forever--or five years, whichever comes first” (Rothenberg, 1995). Having just re-read the article, it’s clear that at that point he was particularly concerned about the fragility of storage media, something that doesn’t concern me for this post. What does concern me is the subject of much of the rest of his article: obsolescence of the data. There’s another very nice quote from the article, accepted gospel now but put so nicely: “A file is not a document in its own right--it merely describes a document that comes into existence when the file is interpreted by the program that produced it. Without this program (or equivalent software), the document is a cryptic hostage of its own encoding” (ibid).

Now the first Rothenberg quote is often assumed to apply in the second case, but there’s clear evidence emerging that data are readable well beyond 5 years. Most files on my hard disk that have been brought forward from previous computers (with very different operating systems) remain readable even after 15 years. Most, but not all; in my case, the exceptions are my early PowerPoint files, created with PowerPoint 4 on a Mac in the 1990s, unreadable with today’s Mac PowerPoint version. However, these files are not completely inaccessible to me, as a colleague with a Windows machine has a different version of PowerPoint that can read these files and save them in a format that I can handle. It’s a comparatively simple, if tedious, migration.

I have spent some time and asked in various fora for evidence of genuinely obsolete, that is completely inaccessible data (Rusbridge, 2006). There are some apocryphal stories and anecdotes, but little hard evidence so far. But perhaps complete inaccessibility isn’t the point? Perhaps the issue is more about risk to content, on the one hand, or extent of information loss on the other? Maybe this isn’t a binary issue (inaccessible or not), but a graded obsolescence scale? Is there such a thing? I couldn’t find one (there’s a hint of one in a NTIS report abstract (Slebodnick, Anderson, Fagan, & Lisez, 1998), but the text isn’t online, and it appears to be about obsolescence of people)!

So here are a couple of attempts, for comment.

Here’s what might be called a Category approach. Users would be asked to select one only from the following categories (it worries me that they are neither complete nor non-overlapping!):

A Completely usable, not deemed at risk
Open definition with no significant proprietary elements
multiple implementations, at least one Open Source
high quality treatment in secondary implementations
widespread use
B Currently usable, at risk
Closed definition, or
Open definition, but with significant proprietary elements
More than one implementation
imperfect treatment in secondary implementations
C Currently usable, at significant risk
Proprietary/closed definition
single proprietary implementation
Not widespread use
D Currently inaccessible, migration path known
Proprietary tools don't run on current environment
Potentially imperfect migration
E Completely inaccessible
no known method of extraction or interpretation

Here’s a more subjective, but perhaps more complete scale. Users are asked to select a number from 1 to 5 representing where they assess their data lies, on a scale whose ends are defined as

1 Completely usable, not deemed at risk, and
5 Completely inaccessible

And here’s a more Likert-like scale. Users are asked to nominate their agreement with a statement such as “My data are completely inaccessible”, using the Likert scale:

1 Strongly disagree
2 Disagree
3 Neither agree nor disagree
4 Agree
5 Strongly Agree

Hmmm. How could you neither agree nor disagree with that statement? “Excuse me, I haven’t a clue”?

Comments on this blog, please; but if you’re commenting elsewhere, can you use the tag “Obsolescence scale”, and maybe drop me a line? Thanks!

Rothenberg, J. (1995). Ensuring the longevity of digital documents. Scientific American, 272(1), 42. (Accessed through Ebscohost)
Rusbridge, C. (2006). Excuse Me... Some Digital Preservation Fallacies? [Electronic Version]. Ariadne from http://www.ariadne.ac.uk/issue46/rusbridge/.
Slebodnick, E. B., Anderson, C. D., Fagan, P., & Lisez, L. (1998). Development of Alternative Continuing Educational Systems for Preventing the Technological Obsolescence of Air Force Scientists and Engineers. Volume I. Basic Study.

Tuesday, 8 July 2008

Old and not so old peripherals needed

The DCC has started to get a number of help-desk queries from people who want to know how to read information from obsolete media, and therefore want to find someone with a working example of the appropriate peripheral. These days, this can be as simple as finding a working 3.5 inch diskette drive (I'm not sure there is one on this entire floor!). We have also been asked to report the whereabouts of 5.25 inch diskettes, 80-column punched card readers, open reel magnetic tape drives and Jaz and other "high capacity" portable media drives. I have a bunch of Jaz media myself, although I think (!) that everything important is now on my hard drive AND backed up. It can be surprisingly hard to find the answer to such questions.

In 1999 the eLib Programme commissioned a study that resulted in a useful report: ROSS, S. & GOW, A. (1999) Digital Archaeology: Rescuing Neglected and Damaged Data Resources. Appendix 3 of that report lists then current data recovery companies, some of which are still in operation (perhaps it would be useful for JISC to commission someone to update this report?). These companies would certainly be one place to start, albeit an expensive one (as they are aimed at the corporate disaster recovery market).

Perhaps institutions should have a considered strategy for continued access to the more common examples. As staff retire and remember possible valuable content amongst their pile of obsolete media, it would be valuable to have a simple means at least to check them...

Does anyone know of any register of the less common such peripherals available within the public sector, preferably available for external use at a reasonable fee?

As a starter for 10, I believe there is a 80-column punched card reader at the UK Data Archive...

Digital Curation Blog