'A' and 'W' are sometimes living in two different worlds

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!

'A' and 'W' are sometimes living in two different worlds

  • Comments 2

When you think about the 'A' and 'W' decorated versions of functions, in most cases the 'A' version is a simple wrapper that converts the strings, calls the 'W' version, and then when needed converts the string back.

However, there are exceptions to this general principle....

The other day, when I posted How long is that non-Unicode string?, Mihai commented at one point (in relation to potentially using CharNextExA in the solution):

"CharNextExA would work here"
Unless you run it on Win XP, where is broken :-)

He was referring of course to what was posted way back in January of last year in We broke CharNext/CharPrev (or, bugs found through blogging?). The bug that made me decide I enjoyed having a technical blog....

And the issue here seemed worthy of a wee little post of its own -- the post you are reading right now, in fact!

Now if you look at CharNext and CharPrev, they do not have separate topics for CharNextW/CharNextA and CharPrevW/CharPrevA. Because for both functions, what they do from a text description standpoint is about the same.

But if you look at what each function has to do:

  • The 'A' versions have to go one byte at a time, skipping a byte any time the two bytes together make up a double byte CJK idedograph. This never has to happen in the 'W' versions.
  • The 'W' versions have to go on WCHAR at a time, and continuing past any situation where one is dealing with either a surrogate pair(as of Vista) or a combining/nonspacing character (all versions of Windows except when the aforementioned bug was going on). This never happens in the 'A' versions.

(Of course there is the fact that the Vietamese code page also has some combining characters on it as I mentioned previously, but CharNext and CharPrev have never handled this case properly. I suppose one could call this a bug, though I am unaware of plans to fix this.)

In any case, since there is no real overlap of the functionality needed in the 'A' and 'W' versions, the functions are kept entirely separate. There is no "convert and call" logic, and it would not be useful if there were.

Funny how the desctiptions are the same though, huh?

You can think of it as the difference between riding a bicycle and riding a unicycle -- similar principles, but a very different set of skills.

The bonus trivia question -- do you know why the "convert and call" logic would actually be hard to do here if it had been a good idea?

And the bonus hard trivia question -- can you think of a function where it is a good idea even with that difficulty?

 

This post brought to you by À (U+00c0, a.k.a. LATIN CAPITAL LETTER A WITH GRAVE)

Comment on the blather
Leave a Comment
  • Please add 7 and 1 and type the answer here:
  • Post
Blog - Comment List
  • The problem seems fairly obvious to me: to convert the string to Unicode, you still need to know how to get from one character to the next, thus you have to implement the functionality anyway.
  • Hi Sebastian --

    Exactly! You got the bonus trivia question....

    Now do you know the answer to the bonus hard trivia question? A function that acually has to do that conversion and return the right answer related to position(s) within that string?
Page 1 of 1 (2 items)