Browse by Tags

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!
  • Blog Post: "Now that its been saved, how do I open it?"

    Over six years ago, on Wednesday, February 9th, 2005, I blogged the blog in this Blog entitled What is the difference between Big Endian and Little Endian Unicode? . cue gratuitous art of the File Save dialog and the "Big Endian Unicode" option Notepad adds: About six years and one day later...
  • Blog Post: Sometimes things are extended in the wrong direction....

    SQL Server's code page, collation, casing, locale, and resource model are all direct attempts to extend the things that Windows provides in ways that make sense for SQL Server. This sentence bears repeating, I think. Because it seems (in the words of the late, great George Carlin, vaguely important...
  • Blog Post: You can do CESU-8 if you need to; we went in a slightly different direction....

    Regular reader Dan asked me via the Contact link: We just upgraded our customer desktops from Windows 2000 to Windows 7, and we're seeing a major break in our text processing app. We've debugged the problem pretty thoroughly, and it doesn't look it's our app at all. Notepad seems to be breaking...
  • Blog Post: It's not that they're putting the Pressure on Windows, but maybe the Pressure.Net? :-)

    The other day when I blogged abut how If someone blathers on about how Windows supports Unicode, you can suggest they just ZIP it, if you like! , I didn't tell the whole story. I focused on Windows, and a peculiar engineered schizophrenia that is Windows compressed folders, the most non-Unicode piece...
  • Blog Post: If someone blathers on about how Windows supports Unicode, you can suggest they just ZIP it, if you like!

    At Microsoft, we support Unicode. Particularly in Office and Windows - Unicode, Unicode, Unicode. Unicode, Unicode, Unicode, Unicode, Unicode, Unicode, Unicode, Unicode, Unicode, Unicode, Unicode, Unicode, Unicode. Did I mention that we support Unicode? Well, occasionally we do silly stuff...
  • Blog Post: The evolving Story of Locale Support, part 12: Logic dictates that we keep a sense of proportion about the RATIO

    Previous blogs from this series: part 11: What language is that keyboard for? part 10: Perhaps it is best to think of it as unintelligent design? part 9: Nastaleeq vs. Nastaliq? Either way, Windows 8 has got it! part 8: [Finally] taking care of some [more] languages in Pakistan part...
  • Blog Post: Not so KS, if you know what I mean....

    A few months ago, Ken Lunde of Adobe noted some mapping differences between several different vendors and the Korean KS X 1001 standard: The small table below shows the fourteen KS X 1001 code points that map differently, depending on the platform: KS X 1001 (GR) Unicode (W = Windows; M = Macintosh;...
  • Blog Post: On not being well served by the mantra "must support Unicode"

    Yesterday, I was having an interesting conversation, one that has given me pause. We were talking about Unicode, and the need for components in the OS to do a better job of directly embracing it. This is obviously nothing new around these parts, but a new twist was interjected into the conversation...
  • Blog Post: Every character has a story #34: PILE OF POO (U+1F4A9). Or not....

    Over in the Suggestion Box, "Eli the Bearded" asked: I see a need for a Unicode character story behind 'PILE OF POO' (U+1F4A9). Ah yes -- good old U+1f4a9. U+d83d U+dca9, as a UTF-16 "surrogates pair". Pictorially, scatologically, this one: Originally known as DUNG, or "dog dirt...
  • Blog Post: A Hindi default system locale isn't gonna happen

    So over in my Explaining the Windows XP/Server 2003 Regional and Language Options Dialog post (from 2004), Mahima Natarajan asked (this weekend): Hi Michael..I want a non-unicode program "Toolbook" to display hindi. I don't have the option "Hindi" in the Select a language for Non-Unicode program under...
  • Blog Post: Reports of mangling were mangled?

    Over in the Suggestion Box, Aakash Mehendale asked: Today's DailyWTF has an image of a newspaper article in which the text has come out mangled with a mangling I've never seen before. It seems to have become *mostly* x and z. Any ideas on what's happened here? Bad OCR? Some really oddball re-encoding...
  • Blog Post: Unicode in the console (including STDIN)?

    Over in the Suggestion box, Alf P. Steinbach asked: In an earlier topic you discussed how to do Unicode output to a console window at the C library level, by using _setmode. It would be nice with a discussion also of input at the C level (which does not seem to work). And then, how to do this...
  • Blog Post: There's no "I" in IDN, part 11: There's no place like ::1, not even 127.0.0.1!

    Previous parts in this series: part 1: If you're not Unicode, you're just wrong! part 2: Try not to use the wrong functions! part 3: There's no "I" in DIY, either! part 4: the 'path' to Hell is paved with IDN bugs part 5: Stephen Colbert's job is not in any jeopardy part 6: It isn...
  • Blog Post: Every rose has it's Þ....

    Over on the Unicore List, the question was a familar one: I am converting text in an ANSI-encoded document to Unicode using search and replace in Notepad on Windows Vista SP2. The source document contains text in the 8-bit CSX+ encoding for Indic transliteration. A chart of the CSX+ encoding is available...
  • Blog Post: Never doubt that a program like Notepad can change the world. Indeed, it is of the only things that ever has!

    The question came in just the other day: ...customer says that when they specify a page as “UTF-8N”, it doesn’t render correctly on IE, but does on other browsers. I searched for “UTF-8N” and found references only on Japanese-language pages. The one English-language reference...
  • Blog Post: I guess you can say we welched on the promise of code pages?

    Stephen asks: Hi, We have been trying to use Welsh and found that the NLS table ( http://msdn.microsoft.com/en-gb/goglobal/bb896001.aspx ) maps Welsh to codepage 1252. This has confused me because Welsh can contain Ux0174 (and others) that are not in 1252. ( http://www.microsoft.com/globaldev...
  • Blog Post: Whodvethunk it'd be GDI+ injecting a little sanity into digit shenanigans?

    Reader Angel asked about Digit Substitution: Hello Michael! I randomly found your blog while googling about digit substitution. I read all the entries in your blog about this subject, but didn’t find an answer for my problem. However, I think you might be able to help me. I’m working...
  • Blog Post: Consoling oneself with TrueType

    It was not quite a year ago, in Myth busting in the console , where Myth #12 was: Myth #12: You can't change the setting of whether a console window is using a TrueType font. This myth too is quite untrue. I have used a few console API functions and that IsConsoleFontTrueType function from this...
  • Blog Post: There's no "I" in IDN, part 10: Who needs IDN support? How much? When? (Part 2)

    Previous parts in this series: part 1: If you're not Unicode, you're just wrong! part 2: Try not to use the wrong functions! part 3: There's no "I" in DIY, either! part 4: the 'path' to Hell is paved with IDN bugs part 5: Stephen Colbert's job is not in any jeopardy part 6: It isn...
  • Blog Post: There's no "I" in IDN, part 9: Who needs IDN support? How much? When? (Part 1)

    Previous parts in this series: part 1: If you're not Unicode, you're just wrong! part 2: Try not to use the wrong functions! part 3: There's no "I" in DIY, either! part 4: the 'path' to Hell is paved with IDN bugs part 5: Stephen Colbert's job is not in any jeopardy part 6: It isn...
  • Blog Post: It's a good thing the UTC has access to a thesaurus!

    Unicode is a complex standard. Not everyone finds it easy or exciting to read (there are parts that I have read that I would recommend to insomniacs who fail to respond to strong drugs!), but part of documenting a complex standard is sucking it up and trying to capture the issue. It can be quite...
  • Blog Post: At this point, it's a doc bug....

    Andrew asked: Is it a bug in RtlUTF8ToUnicodeN, or a doc bug in MSDN? Thanks. User Mode Codes: WCHAR wsUtf16[100]; ULONG cwUtf16Written; CHAR chTest1 = 'A'; CHAR chTest2[] = {'A','B'}; status = RtlUTF8ToUnicodeN(wsUtf16, ARRAYSIZE(wsUtf16), &cwUtf16Written, &chTest1, sizeof(chTest1)...
  • Blog Post: Every character has a story #34: LATIN LETTER T WITH CEDILLA (U+0162/U+0163)

    It's possible to go a long way when you don't even exist. Look how it worked out for the Capital Sharp S ? :-) Some of them get baked into ISO 8859 and Unicode much earlier though. Like Ţ (U+0162, aka LATIN CAPITAL LETTER T WITH CEDILLA) ţ (U+0163, aka LATIN SMALL LETTER T WITH CEDILLA...
  • Blog Post: The SQL app that works fine until you have to support Chinese....

    The other day, Serge asked me via the Contact link: I coming to you based on a question I have post on SQL forum relative to how can I handle Chinese characters in SQL server tables. I have been pointed to your blog by a guy, hoping you could help me cause I could not find any answer for now. ...
  • Blog Post: The history of messing up Romanian on computers

    It was years ago that I first predicted about Romanian (in blogs like Be careful what you wish for (just in case it comes true!) aka When a Cedilla needs to be a Comma Below (and vice versa) ) that despite claiming to be pleased that people would continue for some time to note problems. The most recent...
Page 1 of 20 (495 items) 12345»