How to extract data from msword and excel?

Extracted from comp.lang.perl.modules
Tip provided by Victor B Wagner
: Surely there is a method for extracting the data from ms docs on a
: *nix perl installation.  Has anyone had experience with this?

There exist at least three freeware utilities, which can convert
Word files to something readable

1. catdoc (by me, http://www.ice.ru/~vitus/catdoc )
small C program, which produces ASCII on standard output
benefits: small, supports various character sets
drawbacks: doesn't handle formatting, produces some garbage, becouse
doesn't try to parse OLE structure correctly.

2. word2x (by Duncan Simpson)
C++ program, somewhat larger and less portable than catdoc, doesn't
support non latin character sets, but attempts to parse word file
properly.

3. mswordview (don't remember author)
C program which delegates dirty work to Perl LAOLA library. Produces
HTML.

Links to these utilities as well as to Laola library, can be found on
catdoc page mentioned above.

====
Followup from the author of the tip and author of catdoc:

But, since that situation was changed a bit and now
catdoc page lists more ways to read MS-Word/Excel files.
Really there is now some (very alpha) code for read Excel.

Follow-up : (1)  
| Previous | Next | Index of category | Main Index | Submit |


Appears in section(s) : windows text
Tip recorded : 07-02-1999 15:49:14
HTML page last changed : 27-07-1999 20:11:34