Microsoft has just recently (as of Feb 15) released the full specs on the various Office file formats (including Word DOC files, Excel XLS files and Powerpoint PPT files, in particular).
The specs are available here.
If you’re a file format geek, or your job actually revolves around manipulating these types of files, this is surely a welcome addition to your arsenal.
Some of this information has been available, more or less, for quite some time. Most everyone who’s interested knows, for instance, that these files are actually OLE structured storage files, essentially entire “mini file systems” unto themselves.
But much of the detail has been sketchy at best, till now.
One word of warning though. Don’t expect these to be simple, cookbook recipes on rolling your own version of Word or Excel. The specs are huge and the formats unbelievably complex, having evolved over the course of 10 or more years. Further, these are not Web 2.0, nice, friendly XML/HTML/Text/human readable files by any stretch. They’re tangled, binary, pointer-ridden globs of structures that take a good deal of code just to render somewhat intelligible.
That said, official documentation is far better than none, and surely beats reverse-engineering the formats, as has been done up to this point.
I know I’ll be digging into this more in the coming weeks.