Wednesday, June 14, 2006 8:47 AM
Michael S. Kaplan
Behind 'How to break Windows Notepad'
Larry Osterman pointed me at an article entitled How to break Windows Notepad that makes for an interesting experiment:
Here's how to do it:
1. Open up Notepad (not Wordpad, not Word or any other word processor)
2. Type in this sentence exactly (without quotes): "this app can break"
3. Save the file to your hard drive.
4. Close Notepad
5. Open the saved file by double clicking it.
Instead of seeing your sentence, you should see a series of squares. For whatever reason, Notepad can't figure out what to do with that series of characters and breaks
Now if you have East Asian language support installed, instead of seeing squares (NULL glyphs), you will see:
桴獩愠灰挠湡戠敲歡
An if you look at the code points under those characters, you will likely see what happened:
6874 7369 6120 7070 6320 6e61 6220 6572 6b61
Ah, each byte is a letter that when combined just so happens to line up with a CJK ideograph!
I have talked about the encoding detection mechanisms that notepad uses recently, and this is another example of the problem, one that is more fun since the repro steps are so much fun (in fact the only improvement would be text insulting Microsoft or one of its rivals, which notepad appears to censor in an example of a big bad monopoly, etc.!).
Now I have pointed out that I do not like the IsTextUnicode function in the past, and I suppose this could be considered a good reason (IsTextUnicode returns TRUE here, which is why Notepad guesses as it does).
This post brought to you by 桴 (U+6874, a CJK ideograph)