Friday, May 09, 2008 11:36 AM
Michael S. Kaplan
It is okay to not be in favor of termination
No, this is not a post about abortion. I am talking about NULL termination, a way to end strings that is legal in all fifty states and that is widely used by software that is in regular use by people all over the world irregardless of their views about the other kind of termination. So it is a universally safe topic, even if this disclaimeresque introduction isn't, necessarily.
There are many patterns to calling functions within the NLS API, most (but not all) of which have the following rules:
- You can pass a NULL buffer and a 0 length to be returned the exact required length plus one for the terminating NULL.
- You can pass a buffer of the size above or larger to have the buffer filled, with the return being the exact length plus one for the terminating NULL.
- You can pass a buffer of just the required length without the terminating NULL, you get the buffer filled, with the return being the exact length with no terminating NULL
This behavior can easily get confusing.
It is even a security topic since if misused the filled buffer without a terminating NULL can lead to real bugs if not used properly.
But there are times when it is exactly what you want.
Say for example if you are using LCMapString to do case conversion and you are doing it inline. Inserting a random NULL in the middle of a string with existing content that one wishes to preserve is seldom a good idea.
Other calls can also require this behavior.
But it does explain the following calls and results:
LCMapString(LOCALE_USER_DEFAULT, LCMAP_FULLWIDTH, L"\u00c4\u0170", -1, NULL, 0)
return value: 3
LCMapString(LOCALE_USER_DEFAULT, LCMAP_FULLWIDTH, L"\u00c4\u0170", -1, wz, 3)
return value: 3
wz value: L"\u00c4\u0170\u0000"
LCMapString(LOCALE_USER_DEFAULT, LCMAP_FULLWIDTH, L"\u00c4\u0170", -1, wz, 10)
return value: 3
wz value: L"\u00c4\u0170\u0000"
LCMapString(LOCALE_USER_DEFAULT, LCMAP_FULLWIDTH, L"\u00c4\u0170", 2, wz, 3)
return value: 2
wz value: L"\u00c4\u0170"
LCMapString(LOCALE_USER_DEFAULT, LCMAP_FULLWIDTH, L"\u00c4\u0170", 2, wz, 2)
return value: 2
wz value: L"\u00c4\u0170"
(It helps to know that L"\u00c4\u0170" will have an implicit NULL at the end when one is looking at it from inside the function!)
Now in this case the two characters:
Ä (U+00c4, aka LATIN CAPITAL LETTER A WITH DIAERESIS)
Ű (U+0170, aka LATIN CAPITAL LETTER U WITH DOUBLE ACUTE)
have no full-width equivalents and thus will pass through the function with this flag, completely unchanged. Which provides an ideal opportunity to see all of this behavior, under the microscope so to speak....
This blog sponsored by those two characters along for the ride, above