Saturday, December 23, 2006 3:03 PM
Michael S. Kaplan
Do not adjust your browser, a.k.a. sometimes two wrongs DO make a right, a.k.a. dumb quotes
(The first part of the title is meant to be an allusion to The Outer Limits and the narration of the Control Voice, which is what I thought of when the inspiration for this post presented itself!)
The other day I was looking at a web site and it contained a bunch of text like the following:
as far as I’m concerned, we’d give
and so on.
Now people who have been dealing with international issues for a while (and maybe people who have read this post or maybe this one) might think they know what is going on here -- we are looking at text that is UTF-8 being looked at as it were in Windows code page 1252.
Good guess. But no. In fact when I right clicked on the page to look at the encoding, I got the same results as you see in this blog:
So it wasn't UTF-8 encoded text being displayed as code page 1252. It already thought it was UTF-8, which means the underlying encoding had to be wrong.
Clearly, adjusting the browser was not going to improve the experience. Which is what made me think of the Control Voice....
Now by taking the encoding investigative techniques I talked about in Behind 'How to break Windows Notepad' and this post, first we'll put the text in Notepad:

then save it, close Notepad, and open the file in Notepad again. You will see:
So clearly there is a UTF-8 problem in the heritage here (by the way the above steps will only work for you if your default system code page is 1252). The only thing that makes this problem harder is that there is no easy way to fix broken content since the only fix is to interpret the text in a way that is technically wrong in order to correct the wrong that screwed it up in the first place (proving that if carefully planned, two wrongs can indeed make a right)....
What would we call that feature -- targeted code page mangling? How'd that look on a right click menu?
I guess we could also blame the problem on Microsoft Word (since the web page appeared to be a copy of an email written in Outlook via Word mail) and its conversion of ' (U+0027, a.k.a. APOSTROPHE) into ‘ (U+2018, a.k.a. LEFT SINGLE QUOTATION MARK) and ’ (U+2019, a.k.a. RIGHT SINGLE QUOTATION MARK) via that exciting "smart quotes" feature that in some cases is affected by this encoding problem that we could easily name "dumb quotes". :-)
This post brought to you by ‘ (U+2018, a.k.a. LEFT SINGLE QUOTATION MARK)