Thursday, September 15, 2005

unicode encoding woes

Firefox has issues with rendering devanaagarii (case in point: check out how the i-ligature gets displaced on pages). The fix (as documented on another useful page on Wikipedia) for Windows is simple:go to Control Panel -> Regional and Language Options; on the Language tab, check the "install files for complex script and right-to-left languages (including Thai)" option; be prepared to insert your Windows XP installer CD-ROM and then restart after the file copy was done. I've tried it. It works.

I still haven't applied the fix on my current machine. And an email to my Yahoo! acount containing devanaagarii text showed up as garbage. The fix took me a few minutes to find: View->Character Encoding->UTF-8 [the default was "Western [ISO-8859-1]". I am told that setting this as the default for pages that don't specify their encoding (bad page, bad page!) should cause no issues in general.

My bugbear Rediff (Movies) proved me wrong. I went to a report on the Toronto Film Festival by Arthur J Pais and I kept seeing the article punctuated randomly by black diamonds with a white question mark in each of them. Changing the Character Encoding to the default made them go away. But curious (pun intended) one that I am, I checked the source of the page and what I saw blows my mind. These old pages rely heavily on <FONT> tags. But that's not really as bad as seeing strange tags that seem to be the cause of the problem: <SPAN style="mso-fareast-language: JA">. Predictably, this page was created in every idiot's favourite XML-junk-spewing HTML editor Microsoft Word. Proof: <?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />. What have they done? Outsourced page edits and design to Japan? Or is everyone in the department using pirated copies of Microsoft Word that came from there?

