Friday, April 25, 2008

.docx Considered Stupid

Recently I received an email with a ".docx" file attached. This is the new format for Microsoft Word 2007 documents. I won't even start about how annoying it is to get Word documents attached to emails. Unfortunately it's something you have to put up with.

Being a Mac user and not having the latest and *cough* greatest word processor from Microsoft, I had to figure out how to read this document. I found out the .docx file format is actually a zipped directory tree containing xml files.

Ok, first step was to unzip the .docx file via the command line. This resulted in eleven files in a handful of directories. So far, so good.

Next step was to find which file contains the actual text of the document and not just metadata. Looking through the filenames, I found one called "document.xml" which appeared promising. So I opened it up in my trusty text editor, TextMate. Suddenly my computer began grinding to a crawl as the file was loaded. It turned out that the entire file consists of two lines: the first line containing the xml version, and the second line contained the entire xml markup for the document! No wonder TextMate struggled, since it dutifully created a 70,000+ character line for the document. Why couldn't the xml file have newlines to make it more manageable? Anyway, on to the final step...

My plan was to write a regexp search-and-replace to strip out all the xml markup so I could read the content of the document. But then I discovered that the markup is peppered with <w:proofErr w:type="spellStart"/> tags around almost every single word! I should mention that the contents of the document were in a foreign language, hence all the spelling "errors". For some bizarre reason, Microsoft Word marks up spelling mistakes in docx files, not just on-screen. Why? Shouldn't it be left to the individual application (and platform) loading the document to decide whether or not words are misspelled? I can accept all the other hassles with the docx format: zipped xml files and incredibly long lines, but the encoding of spelling errors is crazy stuff.

After all that I gave up trying cleaning up the xml to read the file. Luckily, I found a web site that offers free conversion of docx files: zamzar.com.

PS: It turns out that Word documents created using MS Office 2007 do not conform with their own OOXML standard!:
OOXML and Office 2007 Conformance: a Smoke Test

Labels: , ,

Thursday, April 10, 2008

Our Precious Gold Medals

Australia loves it's sporting heroes. But I think they sometimes deserve to have their egos pricked every now and then.

With the Tibet and other human rights issues spurring some people to suggest a boycott of the Olympic Games:
Swimming legend to boycott Olympics

some Australian sports "stars" won't have a bar of it:
Hackett dismisses Olympics boycott

It's interesting that these people, who are heavily subsidised by (our) public money to swim up and down a pool all day easily dismiss the political protests of their fellow Australians.

If they financed themselves solely using their own money, then they can do whatever they want. But, according to this report:
Olympic medals or long life: what’s the bottom line?

Australia spent $280 million on its athletes during the 2000 Sydney Olympics. Each medal cost $4.82 million!

The medals are not the only benefits that athletes can receive. They get lucrative sponsorship deals and jump to the head of the line for plum jobs in the sports media.

I'd like to see athletes self-finance their activities more. Maybe they should contribute using schemes such as those that exists for tertiary education in Australia (HECS/HELP)? The money raised could go to helping the underprivileged or into the general health system. Isn't it ironic that at a time when Australia performs so well in international sport, the country is going through an unprecedented obesity epidemic.

Labels: , ,

Tuesday, April 08, 2008

Reality Bites and the ABC of Crime Investigation

I know it sounds like a broken record: the state of television programming is getting worse. Too much reality TV is a given. Arguably, the genre has reached its nadir with the aptly-named "The Moment of Truth".

When someone hits on a winning formula, everyone else seems to react by producing copycat programs. Witness the recent resurgence of the murder investigation genre: CSI, NCIS, Law & Order: SVU/CI, etc ad nauseum. Networks and producers are more willing to exhaust a "winning" formula than take a chance on something truly original.

What can be done? The average viewer doesn't have a lot of choice. All we can do is vote with our feet. I've decided to leave the TV turned off unless there's a program I think is worth watching. No more flicking through the channels in the hope of finding something interesting. The time saved not settling for mediocre viewing can then be spent on reading books, listening to music, writing, coding and blogging :)

Labels: ,