(special thanks to James for pointing out this bug)

It is amazing how sometimes one can be so busy trying to make a point that one can miss the point.

A few days ago, I pointed out that CharNext(ch) != ch+1, a lot of the time.

That ought to be true. It is true if you are running Windows NT 3.51, Windows NT 4.0, or Windows 2000.

But in XP, things seem to have changed a bit.

It used to be that if one took combining characters like U+0308 (COMBINING DIAERESIS) and passed them to the GetStringTypeW or GetStringTypeEx APIs with the CT_CTYPE3 dwInfoType, it would return (C3_NONSPACING | C3_DIACRITIC). If you look at the Platform SDK topics for these APIs, the types are defined as follows:

Name                      Value       Meaning
C3_NONSPACING    0x0001       Nonspacing mark. 
C3_DIACRITIC        0x0002       Diacritic nonspacing mark. 

Starting with Windows XP and continuing on with Windows Server 2003, it now just returns C3_DIACRITIC. Looking at the definitions, this makes sense -- C3_DIACRITIC claims it is for nonspacing marks, too. So the relevant part of the change is:

  1. There used to be no characters marked with just C3_DIACRITIC.
  2. There are no characters that are marked with just C3_NONSPACING now (there used to be several).

This would all be fine given the above definitions (well, not really -- but we'll let that lie for a bit). The problem is that the CharNext and CharPrev APIs are relying on that C3_NONSPACING definition to figure out when to skip characters.

I'm not sure what scares me more -- that this bug has been around since October of 2000, or that it was found due to a blog post that I might not have thought to do had not someone suggested it to me.

I'll see about making sure this bug gets put in on Monday.

So, between this one and the one I found myself (described in the answer to Guess #3 in Why I don't like the IsTextUnicode API), two longstanding bugs in Windows have been found through the act of blogging.

This answers the question I posted in OT -- They taste like chicken, don't they? once and for all. Blogging may annoy me, but its not really relevant anymore. They help me make the product better. So I think I'd better keep doing it....

Scoble, you reading this? :-)

 

This post sponsored by all 792 of the nonspacing marks in Unicode