: Surely there is a method for extracting the data from ms docs on a : *nix perl installation. Has anyone had experience with this? There exist at least three freeware utilities, which can convert Word files to something readable 1. catdoc (by me, http://www.ice.ru/~vitus/catdoc ) small C program, which produces ASCII on standard output benefits: small, supports various character sets drawbacks: doesn't handle formatting, produces some garbage, becouse doesn't try to parse OLE structure correctly. 2. word2x (by Duncan Simpson) C++ program, somewhat larger and less portable than catdoc, doesn't support non latin character sets, but attempts to parse word file properly. 3. mswordview (don't remember author) C program which delegates dirty work to Perl LAOLA library. Produces HTML. Links to these utilities as well as to Laola library, can be found on catdoc page mentioned above. ==== Followup from the author of the tip and author of catdoc: But, since that situation was changed a bit and now catdoc page lists more ways to read MS-Word/Excel files. Really there is now some (very alpha) code for read Excel.