Sorting it all Out Michael Kaplan's random stuff of dubious value Be sure to read the disclaimer here first!
The other day, Jeremy asked me:
I thought that with your wealth of unicode knowledge, you may be able to answer a few questions for me. In a C/C++ program, is it necessary to wrap single character conversions in a _T( ) macro? For instance... TCHAR tch = _T('S'); The MSVC compiler happly converts the literal 'S' to a double byte '\0''S' for UNICODE builds, so that the following appears to compile fine... TCHAR tch = 'S'; I'm unclear if the compiler is actually substituing L'S' for 'S', or simply promoting the value of 'S' to a double byte. Is there any case in which a single-byte character has a unicode representation that is not a simple double byte promotion?
I thought that with your wealth of unicode knowledge, you may be able to answer a few questions for me.
In a C/C++ program, is it necessary to wrap single character conversions in a _T( ) macro?
For instance...
TCHAR tch = _T('S');
The MSVC compiler happly converts the literal 'S' to a double byte '\0''S' for UNICODE builds, so that the following appears to compile fine...
TCHAR tch = 'S';
I'm unclear if the compiler is actually substituing L'S' for 'S', or simply promoting the value of 'S' to a double byte.
Is there any case in which a single-byte character has a unicode representation that is not a simple double byte promotion?
If you are not dealing with both Unicode and non-Unicode builds of a program, then all of the _T()/TEXT() macro stuff as well as all of the TCHAR stuff is fairly superfluous. As I mentioned a few days ago, new functions NLS adds to Vista are not going to have non-Unicode versions added (a trend started in Server 2003).
To answer the specific question about whether the macros is required (and keeping the last paragraph in mind), I would always suggest using the L prefix on Unicode characters and strings, even though the compiler seems to not feel the need to use it for characters. It is definitely still needed any time you specify a string literal, and the consistency seems like a good thing, doesn't it?
For the ASCII range, you will not find a difference between that "double byte promotion" and a Unicode representation. However, for anything single byte that is outside of ASCII but inside of the default system code page, I would go so far as to say that the "promotion" would usually be wrong, and possibly also subject to different interpretations depending on what the default system code page happens to be. If you are gong to write UNICODE/_UNICODE applications, then it seems best to keep them using Unicode everywhere....
This post brought to you by "S" (U+0053, a.k.a. LATIN CAPITAL LETTER S)