Monday, May 28, 2007 11:01 PM
Michael S. Kaplan
Nothing stinks worse than the thread locale, other than the thread code page
The piece of mail I got (via the Contact link) from Ken was:
Hi Michael,
I have run into what I believe is a bug in MultibyteToWideChar() and WideCharToMultibyte() when the code page parameter is set to CP_THREAD_ACP, 'default language for non-Unicode applications' had been set to Hebrew. This is seen when using the utility macros in atlconv.h like T2WC.
I've created a simple test app that shows unexpected results on some systems. The code page inferred from CP_THREAD_ACP is not the same as GetACP(). I have reproduced this on two different systems set to use Hebrew, but not on two other systems set to use Traditional Chinese - one of which was set to the full Traditional Chinese localized UI. The source is part of a default .net 2003 generated console project.
#include "stdafx.h"
#include <ostream>
int _tmain(int argc, _TCHAR* argv[])
{
std::cout << "default code page is " << GetACP() << std::endl;
std::cout << "_AtlGetConversionACP code page is " << ATL::_AtlGetConversionACP() << std::endl;
CPINFOEX cpinfo = {};
GetCPInfoEx(ATL::_AtlGetConversionACP(), 0, &cpinfo);
std::cout << "Thread code page is " << cpinfo.CodePage << std::endl;
return 0;
}
My results are:
default code page is 1255
_AtlGetConversionACP code page is 3
Thread code page is 1252
When my application calls T2WC, the results are incorrect and the codepoints are extended to 16 bits, but not converted to their Hebrew codepoints. We are getting around this by using _CONVERSION_DONT_USE_THREAD_LOCALE, but I had wondered if others have heard of this problem before.
Thanks for your time,
Ken
Regular readers may recall when I pointed out Why I think the thread locale really stinks.
(In fact, I was asked not too long ago to help clean up some of the bad usages of the thread locale in various parts of Windows in shell32.dll and shlwapi.dll, something I will probably be working on shortly!)
Anyway, after Ken pointed out that the use of _CONVERSION_DONT_USE_THREAD_LOCALE works around the problem, it seems pretty obvious that CP_THREAD_ACP is none other than the LOCALE_IDEFAULTANSICODEPAGE as returned by GetLocaleInfo with the return of GetThreadLocale as the LCID.
Now the thread code page is a pretty shaky thing, and not only for the reason that make me feel like the thread locale stinks. Imagine basing code page conversions on something that any code running in the thread can change any time. Yuck!
In fact, it is downright nasty that ATL and MFC made a breaking change in version 7.0 in this area (as described here):
String Conversions
In versions of ATL up to and including ATL 3.0 in Visual C++ 6.0, string conversions using the macros in atlconv.h were always performed using the ANSI code page of the system (CP_ACP). Starting with ATL 7.0 in Visual C++ .NET, string conversions are performed using the default ANSI code page of the current thread, unless _CONVERSION_DONT_USE_THREAD_LOCALE is defined, in which case the ANSI code page of the system is used as before.
Note that the string conversion classes, such as CW2AEX, allow you to pass a code page to use for the conversion to their constructors. If a code page is not specified, the classes use the same code page as the macros.
For more information, see ATL and MFC String Conversion Macros.
Yuck. I hate breaking changes that are bad. And this is definitely one of them. :-(
Sorry Ken, the strange differences here are kind of by [bad] design. And your workaround is actually the fix here -- it works around what I consider a breaking change that breaks a little bit of ATL here.
In the end, my best advice is to NEVER use either the thread locale or the thread code page. For anything. Ever....
This post brought to you by װ (U+05f0, a.k.a. HEBREW LIGATURE YIDDISH DOUBLE VAV)