Subscribe to Blog via Email
Join 296 other subscribers-
Recent Posts
Recent Comments
- Wlodzimierz Kuczynski on Vamvakaris: The flood
- opoudjis on Which Indian states are well known in other countries?
- Test Test on Which Indian states are well known in other countries?
- opoudjis on Karamanlis and their food
- Stazybo Horn on Karamanlis and their food
Archives
- July 2023
- June 2023
- May 2023
- February 2023
- June 2022
- November 2021
- October 2021
- March 2019
- February 2019
- November 2017
- October 2017
- September 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- September 2015
- February 2011
- January 2011
- November 2010
- July 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- July 2008
- June 2008
- November 2006
- October 2006
Categories
Meta
The perishability of Word
In re:
http://ptsefton.com/blog/2006/11/08/self_preservation_1
Peter Sefton’s trying to recover his 1994 Word thesis into a sustainable document format, and migrating from 10 year old Word formats and media is no fun at all. He’s right: act now, while Mac Classic is still somewhat accessible. Been there, doing that again soon with my PhD (Word 5, 1998). I did styles like Pete did, so I was somewhat virtuous, but I did go somewhat ape, so I’ll be making life difficult for myself anyway.
I have two major problems Peter didn’t. One, I used Endnote 4. Proprietary bibliographical software which didn’t migrate well: the author names in the Endnote library itself autovanished long ago, and there was a serious compatibility issue resulting in Endnote not talking to the migrated version of the document. I’ve decided to cut my losses, go with Bookends as biblio software (more proprietary software, but I’m not switching to TeX in a hurry), not bother about migrating, and convert the version of the thesis with the Endnote references spelt out. Problem here is, Endnote 4 used control characters to delimit references, which when you migrate the Word file turn up as ugly splotchy fields. Fields you cannot globally find and delete — you cannot search inside the field for text, so you’d end up deleting all fields. And I don’t want to do that, because I occasionally used fields in mathematical typesetting, to get diacritics positioned correctly. *snarl*
Second problem is the thesis predates Unicode — or rather, Microsoft allowing Unicode into the Mac version. So lots of non-future-proof 8-bit fonts: Ismini for the Greek, SILDoulosIPA 93 for the IPA, TimesDiacrit for Latin-2 characters, and (because I went ape) the occasional instance of Arabic, Hebrew, Cyrillic, and Linear B. Lots of tedious global replaces. And some hurdles:
* Word 2004 will import the Word 5 files, but is UNUSABLE on a MacBook.
* Word 2004 will do Unicode alright, but it will not even display SILDoulosIPA 93: turns it to blank squares.
* NeoOffice is usable on a MacBook, but OpenOffice has forgotten so far to implement “replace in all open documents”. We’re talking 10 documents here. This means macros.
* NeoOffice LOSES the font information for 8-bit fonts. And yes, I used styles, but I didn’t use character styles (the main reason being that char styles weren’t supported in Word 5). Which means I’ll be opening these files in Word 2000 (so I can still see the 8-bit fonts), globally replace each font with a different colour, and work off global replaces based on the colours in NeoOffice. (I just did that with someone else, and the colours didn’t always come through; maybe I’ll try char styles after all instead.)
You can see why I’ve been putting this off for so long. But again: a couple of years from now is probably too late. A couple of years ago, as a research assistant, I was asked to recover a file of Don Laycock‘s from Word for DOS 2 — it was a published dictionary of a Papuan language, but we couldn’t grep a dead tree. Nothing on campus would read Word 84 — Microsoft had taken their converter offline months before, and was showing no inclination to put it back up. The only way I was able to get anything out of it was … opening it in Word 5, minted in 1991. And in a couple of years with Classic going extinct, even that will be impossible. Needless to say, the IPA font Don had used was unrecoverable and long gone; I ended up having to infer the engmas by elimination.
Yeah, proprietary, binary Word processing formats really do bite. Thank God I went easy on the diagrams, the preservability of old MacDraw PICTs is even worse…
Leave a Reply