Vladimir Klepov as a Coder

Quick Tip: docx is a zip Archive

Microsof Office’s docx files are actually zip archives with a bunch of XMLs and all the attached media. Super useful, everyone should know it!

When I tell my colleagues, friends, or students about it, they don’t take me seriously the first time. So, here we go again. If you have a docx (or xlsx, or pptx) file, you can unzip it with unzip proj.docx -d proj or any other unarchiver and get a folder with all the stuff that makes up the document:

From here, you can:

  • quickly grab all the media from word/media
  • work with the document (word/document) via an XML parser (or grep / sed, but it’s a secret)

And do all the other marvellous stuff — no Office or even GUI needed. Now go and spread the light of this newfound knowledge and never complain about docx again!

More? All articles ever
Written in by your friend, Vladimir. Follow me on Twitter to get post updates. I have RSS, too. And you can buy me a coffee!
Older
Advanced Promise Coordination: Rate Limiting
Newer
Programming is Like Writing