Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
Previous posts in this series (including today's!):
(If you are just tuning in and want to start now you can grab the current source from here.)
Ok, let's take the gloves off.
This project was not really build to be compiled as Unicode, so we have to do it the hard way.
Just clean up with the following line (if you have compiled already):
nmake clean
And then to do the Unicode compile, you can try:
nmake cflags="$(cflags) /DUNICODE /D_UNICODE"
(We won't keep doing this, for a whole bunch of reasons!)
Now it is going to fail -- we still have some work to do, after all. The first error is easy enough:
Aha, let's look up strtoul and look for the TCHAR-version here. Scroll down a bit, and Aha! We need that to be _tcstoul, instead.
And to make this all easier let's add this line to the makefile:
cflags = $(cflags) /DUNICODE /D_UNICODE
rather than passing it on the command line (note that this is the usuasl way that Unicode versions of makefiles are done in MSDN samples, in sepasrate files entitled makefile.uni that include the old makefile!). Here goes:
Now you may have already fixed this yesterday when you noticed yourself fixing string constants that were being used with CharNextA. Well, we are not going to follow the advice here because using a reinterpret_cast, a C-style cast, and a function-style cast would all be inappropriate. Let's just change them all to CharNext rather than CharNextW, so that we can compile either with Unicode or not and keep working.... then we'll run nmake again:
Ok, the first problem is easy, just turn CharPrevA into CharPrev.
The second problem requires a bi more thought, though. The code is in the LoadResourceString function in util.cpp, and the code loads up the resource as Unicode already (this is one of those spots in Part 1 that I suggested we think about later).
In this case, we need to add some #ifdef UNICODE specific code, to make sure that under Unicode we copy the loaded string to the caller-supplied buffer under Unicode, and that we leave the code as it otherwise. The new function will be something like this (new code marked in red):
/////////////////////////////////////////////////////////////////////////////// LoadResourceString//UINT LoadResourceString(HINSTANCE hInst, LPCTSTR lpType, LPCTSTR lpName, LPTSTR lpBuf, DWORD *pdwBufSize){ HRSRC hRsrc = 0; HGLOBAL hGlobal = 0; WCHAR *pch = 0; if ((hRsrc = WIN::FindResource(hInst, lpName, lpType)) != 0 && (hGlobal = WIN::LoadResource(hInst, hRsrc)) != 0) { // resource exists if ((pch = (WCHAR*)LockResource(hGlobal)) != 0) { unsigned int cch;#ifdef UNICODE cch = lstrlen(pch); if(FAILED(StringCchCopy(lpBuf, *pdwBufSize, pch))) { *pdwBufSize = cch; return ERROR_FUNCTION_FAILED; }#else cch = WideCharToMultiByte(CP_ACP, 0, pch, -1, NULL, 0, NULL, NULL); if (cch > *pdwBufSize) { *pdwBufSize = cch; return ERROR_MORE_DATA; } if (0 == WideCharToMultiByte(CP_ACP, 0, pch, -1, lpBuf, *pdwBufSize, NULL, NULL)) return ERROR_FUNCTION_FAILED;#endif // UNICODE *pdwBufSize = cch; } else { if (1 > *pdwBufSize) { *pdwBufSize = 1; return ERROR_MORE_DATA; } *pdwBufSize = 1; *lpBuf = 0; } DebugMsg(TEXT("[Resource] lpName = %s, lpBuf = %s\n"), lpName, lpBuf); return ERROR_SUCCESS; } // resource does not exist DebugMsg(TEXT("[Resource] lpName = %s NOT FOUND\n"), lpName); return ERROR_RESOURCE_NAME_NOT_FOUND;}
Ok, let's try compiling again with this error fixed too. We'll try a clean compile and just see how far we get:
Ah, here is another such problem (these vertrust.cpp cases are the ones where there are WCHAR variables defined). In this case, there are two pairs of calls in vertrust.cpp, both in the IsPackageTrusted function. The function basically converts the szPackage and the szSetupExe to Unicode and then calls the IsFileTrusted function on each one.
Of course the biggest problem is that (in my opinion) we really do not have the same situation as with the previous resource loading situation -- we should ideally just pass through the string as is and not allocate or copy anything. The minimal change function I came up with, you can take a look at tomorrow (or you can try out writing your own if you like!). I mainly tried to avoid the extra allocates and copies.
Once this function is fixed up:
Success!
Hey, does that mean we're done?
Well, not quite. Anyone want to guess at what else we need to do before we can get into all the advanced stuff like MSLU integration and so on?
Or does anyone want to try to figure out the best way to write that IsPackageTrusted?
:-)
These last two issues were mercifully few, and they point to the two types of issues that require special handling -- the times when Unicode was involved before the conversion, and what to do with them.
(Coming up -- tomorrow will center on answering the "what's next" question a little bit, but mostly it will be about a discussion about the different experience you will have if you don't (didn't?) do the things I suggested in Parts 2 and 3 and instead skipped right to Part 4 -- because it leads to a very different conversion experience!)
This post brought to you by ఙ (U+0c19, a.k.a. TELUGU LETTER NGA)
As the year 2006 draws to a close and as I set my sites on what applications I need to try to get on the ball about Unicode support over the next two to five years, I realized a major impediment to this goal.
I'll blame a fellow Technical Lead who shall remain nameless, for pointing out (in response to an unrelated question) the logo requirements documentation for XP and for Vista.
With a sense of dread I read through both of them, and neither one contains references to either requirements or recommendations that have anything to with international support of software on Windows.
(ref: Our non-Unicode heritage, with the George Carlin riff that could act as a call to arms if we could get the full bit written and recorded!).
I guess getting international features to start getting at least optional bullet item status might be the first step to being taken seriously in this space....
This post brought to you by ඤ (U+0da4, a.k.a. SINHALA LETTER TAALUJA NAASIKYAYA)
So the question Gary (the Program Manager on the MSKLC update) asked me was:
Are we planning on updating the UI to be Vista UX guidelines compliant?
Luckily he provided me with a[n internal] link so I would know what he was talking about! :-)
I think that we support Aero nicely enough:
In fact, I can't imagine much more we could do, really (switching the fonts would kind of block it from ever being installed downlevel at all. since the Vista UI font (Segoe UI) may not be on downlevel platforms), and distance between "supported with a warning about functionality" and "completely disallowed from being installed" is a pretty big one. I am happy to not force any decisions, and keep options open...
And beyond that I'd hate to embed the font and use all the private font stuff I have talked about previously -- across a whole huge UI is just a bigger effort than most people would want to take on.
I guess one could imagine the surface underneath the layout, the piece highlighted in fuchsia below (or a significant piece of it):
getting that transparent look like the Windows Mobility Center has in some of its "dead" UI space:
But even the simplest prototypes of that showed it to be pretty distracting to actually using MSKLC.
And I am as big of a fan of using the latest cool UI tricks as the next person, but not if it is going to affect productivity!
(which is not malign the work people like Kenny Kerr, Daniel Moth, and others. Because I am sure there are UI cases where it can look really cool (and a modified version of Daniel's solution is what made the prototype so easy!).
Luckily for the rest of the guidelines, they aren't too different than the XP ones that we have already gone through (and the big offending problem we had, the tab order, is now fixed). So I guess you could say that we are following the guidelines as well as we are able to....
And one doesn't have to actually become airborne to support Aero (one doesn't even have to be high!).
Anyway, Gary agreed with this assessment, so that stuff is pretty much all set. :-)
This post brought to you by ✈ (U+2708, a.k.a. AIRPLANE)
The biggest source of actual changes in most conversions of legacy projects to Unicode is handling hard-coded strings. The simple fact is that what you might have in your code as
"This is a string"
and which in a purely Unicode application would be
L"This is a string"
now will have to be either
1) TEXT("This is a string")
or
2) _TEXT("This is a string")
3) _T("This is a string")
I myself prefer the third one but many people like the first or the second. You can look at the MSDN topic Using Generic-Text Mappings to get more information on _T and _TEXT; the one with no underscore prefix is actually defined in the Platform SDK header file winnt.h and this is the reason why it is used by Windows header files that do not want to include tchar.h in their source files.
(If you are bored, the section of winnt.h with the // Neutral ANSI/UNICODE types and macros comment is where these all are.)
Since we will need tchar.h for a few CRT functions you can pretty much take your pick -- the other reason some people prefer the shorter one is that they consider it less distracting (I consider them all to be about the same in that respect).
I am going to use TEXT() for reasons I will point out shortly.
There are several different ways to approach this kind of change:
If you prefer #1, then you may want to skip tody's post and wait for Part 4, tomorrow (which is when we will be doing that). Today is dedicated to taking care of over 100 cases without the complie-time checking....
I tend to prefer the #3 myself, so your find/replace box in VS will look something like this:
The most important things to note are the syntax for tagging an expression (in VS, surround it in curly braces) and then use the tagged expression in the replaced string (in VS, the \# where # is 1-9 which tagged expression to use).
There are many strings you won't want to affect, including obvious ones like
#include "common.h"
and there are even a few "already done" strings like this one from common.h in the source:
#define ISETUPPROPNAME_BASEURL TEXT("BASEURL")
Note that this code is probably shared with other Windows Installer source code projects (like maybe msistuff.exe's?), which would be why it is written with Unicode in mind even though most of the rest of the project is not. And why it would have a name like common.h.
If this convinces you that you would rather use TEXT() to be able to use the same thing in the rest of the project then like I said you can use whatever you like (it is what I chose here!).
The other bonus is that it will keep us from having to include tchar.h to files for just this definiton (if you have been trying it you will see that the source is still compiling right now, before e move it to Unicode).
Of course function names are another case where you do not want to wrap them in TEXT macros since the function names will go to GetProcAddress calls. So you would wrap "advapi32.dll" but you would not wrap "CheckTokenMembership" (a function inside advapi32.dll). Though if you mess this up don't worry, it will be a simple compile error later, very easy to fix....
One other interesting string that needs special handling:
"\""
Which we want to become:
TEXT("\"")
and not
TEXT("\")"
obviously. The simple regular expression is not quite smart enough for the escaped quote case (there are like five of these). if you want to try and create a more complex regular expression you are welcome to!
In any case, hopefully I have convinced you that you will definitely want to be careful about your use of Find Next vs. Replace -- and definitely not be tempted by Replace All. :-)
Other things you do not need to "fix" are pretty much anything in the makefile, or anything in a comment (unless you want to amuse future code reviewers).
Now after you go through all of these, you will have noticed that 56 of the strings to edited were calling one of the three overloads of the DebugMsg function found in utils.cpp. I would recommend you go ahead fix them up too, since (a) you have already changed their datatypes anyway, (b) they all call OutputDebugString which will map to OutputDebugStringW after we compile UNICODE, and (c) there is no harm in seeing Unicode text if you run a debugger that supports Unicode. :-)
Amazingly, we are much, much closer now!
We'll do one more big find/replace in today's post. There are several places in the source code where GetProcAddress is being called to get the address of a ANSI function rather than a Unicode one. Let's fix those up right now. You could search for GetProcAddress, but in this project (as in most other projects) it just goes to constants. Just remember (like I said before) -- you always want to make sure that you do not put the TEXT() macro wrapper around funtion names since GetProcAddress's second parameter never expects a Unicode string. You DO weant them around library names and just about everything else.
The easiest way to find all of the occurrences is the following search:
It is pretty rare to ever have a string that ends with a capital A that you wouldn't want to become a capital W, so although you will want to check each one, you are unlikely to have a ton of noise in the results....
Believe it or not we are getting rather close now (tomorrow we're going to take the next step to find major things to look at).
Stay tuned....
This post brought to you by ဃ (U+1003, a.k.a. MYANMAR LETTER GHA)
Back in early October, Yao Ziyuan (a.k.a. 'Booted Cat') posted a suggestion for Microsoft in the microsoft.public.word.international.features newsgroup. Although I believe the suggestion has indeed been forwarded on appropriately, the message is about to scroll off the group and I thought it would be better to get it somewhere a bit more visible that doesn't have quite the same 'scrolling' characteristic....
Plus it inspired a few things I wanted to say something about. :-)
The post was titled: A Feature Suggestion for Microsoft Chinese PinYin IME. And here is the content of the post:
I wish there can be a mode in which the homophone candidates can be displayed in multiple rows. On the first row are the most frequently used word candidates and homophone character candidates. The subsequent rows divide the other homophone character candidates according to a characteristic. The characteristic can be: By type of tone. Chinese characters have 5 types of tones: type-1through type-5. By common radical. Homophone characters usually can be grouped according to radicals commonly shared. That is, some of them can have a common radical X, some others Y, yet some others Z, and so on. This is like that in a set of integers, some of them have a common divisor, some others have another common divisor, and so on. And the rest which can't be classified into any prior group are put on a last row. By semantic category. This is tricky and may only stay in theoretical speculation. Top semantic categories are like "concrete objects", "abstract concepts", "verbs", "adjectives", "grammatical auxiliary characters", and more specific categories can be derived from an existing category. Thus character selection would look like exploring a tree. In case there are too many rows to display, a vertical scroll bar can come to help. On each row, candidate characters can be sorted by computed probability of occurring in the current context. This idea could improve the efficiency for the user to select a desired character candidate significantly. Regards,Yao Ziyuan
I wish there can be a mode in which the homophone candidates can be displayed in multiple rows. On the first row are the most frequently used word candidates and homophone character candidates. The subsequent rows divide the other homophone character candidates according to a characteristic. The characteristic can be:
In case there are too many rows to display, a vertical scroll bar can come to help.
On each row, candidate characters can be sorted by computed probability of occurring in the current context.
This idea could improve the efficiency for the user to select a desired character candidate significantly.
Regards,Yao Ziyuan
This is an interesting idea, though one that is different enough that were it ever implemented by Microsoft that I would hopefully expect that it would be a new PinYin IME rather than a feature enhancement for the one currently available.
Though for the most part people actually type the tone number so using that first idea of having different rows for different tones would quickly lead to just two rows whose principle "feature" might be blocking the screen in a new way that could lead to less productivity since people might be used to what is being blocked now. Just something to consider, but obviously a need to think about the actual results of having such an input method will need to take place even before something might be considered or prototyped or whatever.
Clearly as one is typing one is using language, which makes it hard to try to quantify specific phonemic or orthographic or semantic or other dimensions of language and simply make them the second dimension, ignoring the others. This gets us into the area of how the mind works, and how language works -- how do we think about language as we write, and with that how does a person using ideographs find the right ideograph in their mind?
An input method that could capture that would have an edge over just about anything else out there, couldn't it? :-)
I don't know how much study has been done in this area in academic circles, or whether there is work that could be captured in an input method that would feel appropriate to users. Does anyone know?
On the other hand, when I look at my keyboard I don't see nearly that much of a connection between language as my brain processes it and the way it is laid out. So perhaps thinking this would be a great idea is kind of flawed reasoning even in the case of an IME? I doubt that the way these ideographs asre grouped really relates to how people think about language while they are typing....
People tend to get quite attached to their methods of input, as they represent a particular stability in their computing life that can be quite traumatic to upset. And while many people could probably see the obvious benefit to both speed and accuracy that the above could bring, there will always be people who really don't want the extra complication that more information can bring.
I know a bit about some of this after having switched to do a lot more of my writing via Dragon Dictate, saving my typing time for code (and thus avoiding exhausting my ability to do so too early!). Dealing with the display of the various candidate lists to maximize productivity and minimize annoyance is a complicated affair, whether it is built in IMEs or expensive bits of custom software.
In its own way, my comments are a slightly more constructive extension to the ideas I pointed out in the provocative post Your layout (in all likelihood) bores me.
That post was of course aimed at a different audience -- the people who were basically hoping to sell some unique keyboard layout to Microsoft since they were sure it would be a great thing for everyone (including them, given Microsoft's cash reserves?). And Yao Ziyuan's post, which was presented more constructively, is in my mind more deserving of a constructive response?
Terribly judgmental of me, I will admit. Though given the subjective doctrines under which this blog is run (i.e. stuff that interests me), I guess my judgment is the arbiter of what will show up here, and the idea of my judgment being judgmental hardly seems all that unreasonable? :-)
Now input method editors have a complicated task (no matter how easy I might think they have it in other contexts!). When one considers the problems of making available thousands or even tens of thousands of possible characters, I think that the free flow of ideas on ways to improve the experience cannot ever do harm whether the ideas lead to solutions or not.
I even have a naive hope that it will help add a little perspective to those who feel a need to muck with keyboard layouts. In actuality it won't help, but that's why it is a naive hope. :-)
In it's own way, the suggestion might even really be pointing to separate IMEs rather than trying to bundle them into the same IME. They are all trying to find ways to "break the ties" between characters that are identical according to what the user types such that it is easier for the user to have the character they want, and it is unclear how often all of the methods would be used by the same person such that glomming them into one IME would itself help productivity.
The principal suggestion that I think is somewhat unique among the IMEs that Microsoft ships is the idea of expanding the candidate list in a new dimension, adding the X axis to the usual Y axis as a way to represent the information. Now I may be actually wrong in thinking this is unique (I don't actually know about all of the IMEs) and perhaps there are even users of IMEs who feel the same way about such an innovation as I feel about people who muck with the CAPS LOCK key. But it seems like an interesting idea, from the outside, at least....
My point? I may not actually have one (or perhaps I just have too many so they are all milling about aimlessly on your screen). But I thought that the idea of a [visible] two dimensional candidate list could probably use a few more eyes.
Maybe it already exists? It seems like one of those ideas that is so obvious to everyone after someone comes up with it, doesn't it?
Or maybe some other company will try to get a patent on this idea after reading about it in then newsgroups or in this little blog, and then if Microsoft actually implements it a few years later they can go sue Microsoft for violating the patent. In which case this blog post can perhaps help them to feel foolish in court. I think if there were more opportunity for people trying to patent things to feel foolish that the world can only get better.... :-)
This post brought to you by ё (U+0451, a.k.a. CYRILLIC SMALL LETTER IO)
My New Year's Resolutions for 2007 (an intentionally more positive version than the ones I posted in 2005 and of course better than 2006 when I posted nothing!):
This post brought to you by B (U+0042 a.k.a. LATIN CAPITAL LETTER B)
The alternate title of this post is not something I would ever recommend using in a conversation about a relationship!
Ok, first we'll start with the source code, you can either get it from the Platform SDK as I pointed out in Part 1, in the Samples\SysMgmt\Msi\setup.exe\ folder, or you can just download it from right here. So far nothing has happened to it -- this is our Tabula Rasa.
So now let's start dirtying it. :-)
I am going to stick it in its own folder (E:\SETUP.EXE\), navigate there from within Visual Studio 2005 Command Prompt and use either NMAKE to do a build (I have tested with the Platform SDK command prompt and others as well, the steps should work either way).
For starters, I'll make sure it can build:
Ok, good start. We probably won't be seeing success much until we are done so let's not get too used to it!
The first thing that has to happen is a search for uses of Unicode that are happening now -- obvious ones like types such as WCHAR, LPWSTR, and LPCWSTR. Clearly if there is anything that supports Unicode now we want to know where it is so that we can look at it more closely later when we decide what to do with it. Here are the results:
Find all "WCHAR", Whole word, Subfolders, Find Results 1, "All Open Documents"E:\setup.exe\vertrust.cpp(193): WCHAR *szwSetup = 0;E:\setup.exe\vertrust.cpp(194): WCHAR *szwPackage = 0;E:\setup.exe\vertrust.cpp(226): szwSetup = new WCHAR[cchWide];E:\setup.exe\vertrust.cpp(256): szwPackage = new WCHAR[cchWide];E:\setup.exe\utils.cpp(232): WCHAR *pch = 0;E:\setup.exe\utils.cpp(238): if ((pch = (WCHAR*)LockResource(hGlobal)) != 0)Matching lines: 6 Matching files: 6 Total files searched: 10 Find all "LPCWSTR", Whole word, Subfolders, Find Results 1, "All Open Documents"E:\setup.exe\vertrust.cpp(62):itvEnum IsFileTrusted(LPCWSTR lpwFile, HWND hwndParent, DWORD dwUIChoice, bool *pfIsSigned, PCCERT_CONTEXT *ppcSigner)E:\setup.exe\setupui.h(87): HRESULT __stdcall OnProgress(ULONG ulProgress, ULONG ulProgressMax, ULONG ulStatusCode, LPCWSTR szStatusText);E:\setup.exe\setupui.h(88): HRESULT __stdcall OnStopBinding(HRESULT, LPCWSTR ) {return S_OK;}E:\setup.exe\setupui.cpp(372):HRESULT CDownloadBindStatusCallback::OnProgress(ULONG ulProgress, ULONG ulProgressMax, ULONG ulStatusCode, LPCWSTR /*szStatusText*/)E:\setup.exe\setup.h(85):itvEnum IsFileTrusted(LPCWSTR szwFile, HWND hwndParent, DWORD dwUIChoice, bool *pfIsSigned, PCCERT_CONTEXT *ppcSigner);Matching lines: 5 Matching files: 6 Total files searched: 10 Find all "LPWSTR", Whole word, Subfolders, Find Results 1, "All Open Documents"Matching lines: 0 Matching files: 0 Total files searched: 10 Find all "MultiByteToWideChar", Whole word, Subfolders, Find Results 1, "All Open Documents"E:\setup.exe\vertrust.cpp(225): cchWide = MultiByteToWideChar(CP_ACP, 0, szSetupExe, -1, 0, 0);E:\setup.exe\vertrust.cpp(233): if (0 == MultiByteToWideChar(CP_ACP, 0, szSetupExe, -1, szwSetup, cchWide))E:\setup.exe\vertrust.cpp(255): cchWide = MultiByteToWideChar(CP_ACP, 0, szPackage, -1, 0, 0);E:\setup.exe\vertrust.cpp(263): if (0 == MultiByteToWideChar(CP_ACP, 0, szPackage, -1, szwPackage, cchWide))Matching lines: 4 Matching files: 6 Total files searched: 10
Find all "WCHAR", Whole word, Subfolders, Find Results 1, "All Open Documents"E:\setup.exe\vertrust.cpp(193): WCHAR *szwSetup = 0;E:\setup.exe\vertrust.cpp(194): WCHAR *szwPackage = 0;E:\setup.exe\vertrust.cpp(226): szwSetup = new WCHAR[cchWide];E:\setup.exe\vertrust.cpp(256): szwPackage = new WCHAR[cchWide];E:\setup.exe\utils.cpp(232): WCHAR *pch = 0;E:\setup.exe\utils.cpp(238): if ((pch = (WCHAR*)LockResource(hGlobal)) != 0)Matching lines: 6 Matching files: 6 Total files searched: 10
Find all "LPCWSTR", Whole word, Subfolders, Find Results 1, "All Open Documents"E:\setup.exe\vertrust.cpp(62):itvEnum IsFileTrusted(LPCWSTR lpwFile, HWND hwndParent, DWORD dwUIChoice, bool *pfIsSigned, PCCERT_CONTEXT *ppcSigner)E:\setup.exe\setupui.h(87): HRESULT __stdcall OnProgress(ULONG ulProgress, ULONG ulProgressMax, ULONG ulStatusCode, LPCWSTR szStatusText);E:\setup.exe\setupui.h(88): HRESULT __stdcall OnStopBinding(HRESULT, LPCWSTR ) {return S_OK;}E:\setup.exe\setupui.cpp(372):HRESULT CDownloadBindStatusCallback::OnProgress(ULONG ulProgress, ULONG ulProgressMax, ULONG ulStatusCode, LPCWSTR /*szStatusText*/)E:\setup.exe\setup.h(85):itvEnum IsFileTrusted(LPCWSTR szwFile, HWND hwndParent, DWORD dwUIChoice, bool *pfIsSigned, PCCERT_CONTEXT *ppcSigner);Matching lines: 5 Matching files: 6 Total files searched: 10
Find all "LPWSTR", Whole word, Subfolders, Find Results 1, "All Open Documents"Matching lines: 0 Matching files: 0 Total files searched: 10
Find all "MultiByteToWideChar", Whole word, Subfolders, Find Results 1, "All Open Documents"E:\setup.exe\vertrust.cpp(225): cchWide = MultiByteToWideChar(CP_ACP, 0, szSetupExe, -1, 0, 0);E:\setup.exe\vertrust.cpp(233): if (0 == MultiByteToWideChar(CP_ACP, 0, szSetupExe, -1, szwSetup, cchWide))E:\setup.exe\vertrust.cpp(255): cchWide = MultiByteToWideChar(CP_ACP, 0, szPackage, -1, 0, 0);E:\setup.exe\vertrust.cpp(263): if (0 == MultiByteToWideChar(CP_ACP, 0, szPackage, -1, szwPackage, cchWide))Matching lines: 4 Matching files: 6 Total files searched: 10
We'll set these lists aside for later, now that we know what they are.
Now of course this sets us up to the first set of changes -- using the handy dandy Replace in Files dialog:
Note the important settings here -- all open documents, match whole words, and using the Find Next button to review followed by the Replace button if we like the change.
LPSTR -> LPTSTR (less than 25 occurrences, no surprises other than perhaps a few that are on our "review later" list)
LPCSTR -> LPCTSTR (less than 100 occurrences, no surprises)
char --> TCHAR (over 100 occurrences, no big surprises in any of them)
Well, there are actually a few issues here that I will be talking about in later posts, but we're starting with the simple approach. :-)
And of course with these changes, note that we are not doing Unicode builds yet, so that so far the build will keep working.
Some things to look forward to here that I'll be talking about in upcoming posts:
Believe it or not, what was done today is likely the second most widespread change we'll have to do for this project!
Wasn't that easy? :-)
Stay tuned for the next step....
This post brought to you by ഗ (U+0d17, a.k.a. MALAYALAM LETTER GA)
So the message I got yesterday from someone with the handle IDisposable was:
RE: http://blogs.msdn.com/michkap/archive/2006/03/25/560838.aspxThis code doesn't work under Vista mostly because the Language Packs won't install (says they are already there), but the load of the Satellite assembly doesn't work (obviously) since the localized directories don't exist.Help me, Obi Wan.
Ok, I'm kidding, it is actually the same Marc Brooks who came up during that blog post he is pointing to. IDisposable is just Marc's handle. I don't think he has it as a license plate though, so he is not as far gone as Mark Davis (president of the Unicode Consortium) whose license plate is UNICODE. :-)
Anyway, I tried it out with several of the language packs, both for 1.1 and 2.0. Now allowing for the fact that I had to of course run them from an elevated command prompt, I had a 100% success rate for the 1.1 language packs, and a 0% success rate for the 2.0 language packs.
Marc is right. Even though 2.0 the langpack setups claim that they are already installed, they are not.
This is, unfortunately, by design.
An excerpt from a soon-to-be-available KB article:
When attempting to install an x86 .NET Framework 2.0 redistributable Language Pack (langpack.exe) on Vista x86, the installation will not fail but will result in no action. When attempting an interactive install of an ia64 or x64.NET Framework 2.0 redistributable Language Pack (langpack.exe) on the corresponding Vista OS, the install will block with a dialog message: "The product has been already installed." Silent ia64 and x64 installs will succeed, but result in no action.In order to support existing application installers, including Visual Studio deployment projects, Vista’s registry will reflect a system with all .NET Framework 2.0 Language Packs installed (see standard keys below). This allows existing installers to succeed when probing the registry to detect language pack existence, installation or repair success. Attempts to install or repair version 2.0 language packs will result in a success (0) being returned immediately, but will result in no actions to the underlying system configuration. Whether .NET Framework 2.0 language components are, in fact, actually installed on a given Vista system will depend upon the set of Vista UI languages installed.
Basically, given that the .NET Framework is considered to be a component of the operating system, the language packs are something that can only be installed as a part of the Windows user interface language support intrinsic in the MUI and LIP technologies.
So Marc is technically incorrect that the code I posted fails; it will work if you install a Vista MUI langpack or a Vista Language Interface Pack, for every language that you have installed. But the current plan of record for the behavior of the .NET Framework language packs is that they go hand-in-glove with the operating system's, starting in Vista.
I verified this on my machine with all of the pre-ship versions of the MUI langpcks -- this is working as designed and I got the following output running my code:
file:///E:/Windows/Microsoft.NET/Framework64/v2.0.50727/mscorlib.dll ar zh-CHS cs da el es fi fr he hu it ja ko nl no pl pt ru sv tr pt-BR zh-CHT
So if you have the Vista MUI langpacks if they are available to you then the .NET Framework langpacks will work.
(the preliminary LIP Vista LIP langpacks are not done ednough for me to know whether they include the .NET Framework resources or not; given that the LIP list is larger than the .NET Framework list, I assume in many cases it may not be there.)
In my opinion, given that design, at a minimum the fact that the langpack setups claim that the component is already installed (rather then the more accurate claim that the component cannot be installed on this platform) is a legitimate bug.
The fact that the download pages claim that Vista is supported is also (in my opinion) an obvious bug, since as far as these packages are concerned Vista is not supported.
Now the last point is the fact that these langpacks are designed not to install on Vista. That part was intentional, so it is (technically) by design.
I won't go so far as to claim that I am thrilled about the design here, FWIW.
Though to be honest the whole MUI licensing model is something that I won't claim to be thrilled about, either (and have not been since the Windows 2000 MUI licensing was worked out). So linking one to the other with no other option is a bad idea, in my opinion, not just for the sake of this one bug, but for the same reason why a person doesn't go to the boring party if they are used to ones that are fun. If you know what I mean.
Neither of them is one I have a whole lot of control over, of course.
In any case, I will see what I can do to keep raising the issue, and anyone who would like to see something different should probably feel free to do the same (here or in the issue Marc raised on the Feedback site here)
But just so you know, the current behavior is considered by design.
This post brought to you by 𒁁 (U+12041, a.k.a. CUNEIFORM SIGN BAD)
Before we get to jump in and work on code, there are a few items to take care of.
Any time the idea of converting an existing project to Unicode comes up, one really has to stop and consider both the benefits and the drawbacks and decide whether it makes sense to do it.
In this case (with MSKLC), the current behavior of the product (in version 1.3) has you double click on an MSI and it will just install. Thus the new requirement that we add a bootstrap setup.exe actually leads to a regression in functionality since clicking on setup.exe if there is a character off of the default system code page in the path will lead to the reported error occurring:
Given our team's firm public stance on the importance of supporting Unicode, there are clear strategic disadvantages to not fixing the problem, which is the main reason that trying to look at whether there are specific mitigations like using short file names (which won't work on some systems anyway) may not be the best workaround.
But obviously not every software development team will have those same pressures so obviously considering mitigations to the problem (or just documenting the problem as a limitation) is always important to consider as an option.
In some cases, another team can have the same set (or at least a similar set) of pressures if they are trying to support a particular market which they would have to admit they cannot in specific cases support the language of that market.
As a tool that has both UI and engine elements, we will be dealing with many interesting scenarios here with setup.exe.
Now one obvious strategic reason to not support Unicode (or to not only support Unicode) would be if Win9x support is also important.
I'll talk more about the Win9x issues later in the series (as even now before I have started, Dean Harding has suggested talking about MSLU integration).
But for now I'll stick to our current project (MSKLC 1.4), which creates keyboard keyboard layouts that can only be installed on NT-based platforms, thus meaning the only thing that Win9x support in SETUP.EXE gives as person is seeing the nice friendly error message telling the person who tried to install anyway to go get bent. So the preliminary triage assumption is that telling people who create a keyboard layout that allows users to type in a particular language that in some cases no character in the path can contain letters in that language is MUCH worse than not being able to tell people who can't follow directions (in friendly way) to go home until they learn how to follow them.
Each triage will be an individual thing, of course. In this case there are also other non-fatal but still glaring problems, such as dialogs put up by SETUP.EXE showing question marks in non-blocking ways such ass displaying the layout description.
In fact, on a tangentially related note, is is actually support of Unicode strings in keyboard layout descriptions, company names, and copyright strings that led me to originally convert kdbtool.exe to kbdutool.exe on a morning several years ago in a the Unicode Technical Committee meeting where a discussion centering on WG-20 issues that was otherwise threatening to put me to sleep led me to wonder how long such a conversion might take (in fact, it took just over 90 minutes).
I'll talk more about triage issues in later posts after the actual work has been done.
For now, I'll explain the basic decisions planned for the conversion itself. Starting off, I have six of them:
Now, where to get the source from?
I got it from the Windows® Server 2003 R2 Platform SDK Full Download, which I already had installed, and I found the source in the
C:\Program Files\Microsoft Platform SDK for Windows Server 2003 R2\Samples\SysMgmt\Msi\setup.exe\
directory.
I believe the part of the directory in red should be in any Platform SDK download.
My current plan is to post the code (14 files including the readme) at the beginning of each post before I go to work on it, so that people following along can treat it like a crossword puzzle and check their answers the next day with a simple tool like WinDiff. The project will not successfully compile as a Unicode binary for most of these posts (I'll warn ahead of time whenever that is the case and tell you how many errors I am getting), so you can consider the fact that I am posting the code to be a good way to follow along without doing any actual work. :-)
Ok, enough blather, let's get to the fun part -- starting tomorrow we're going to be jumping in and handling the code....
This post brought to you by ଖ (U+0b16, a.k.a. ORIYA LETTER KHA)
The other day Paul asked me:
Hi Michael,Been playing around with MUISETUP from XP SP2 (with just Australian English installed), trying to get Japanese installed as the default language. I'm running the following from the command line:muisetup /i 0411 /d 0411 /l /f /r /sFollowed by a reboot, which changes the logoff/logon screens to Japanese, but when I log back on as the same user I ran MUISETUP as, the menus are in English.I'm sure I'm missing something, but how do I change this for that user so all the menus etc change to Japanese from the command line?RgdsPaul
Paul, you aren't missing anything -- the documented MUISETUP.EXE command line parameters do not support setting the user default UI language of the user in whose context the code is run. Per the documentation, here are all the supported command line parameters for Windows 2000/XP/Server 2003:
Command Prompt Setup To enable quiet mode installations, Muisetup.exe accepts parameters entered at the command line. This can be useful either during an unattended installation of the Windows 2000 MultiLanguage Version or simply during the addition and/or removal of user interface languages.To use the command line parameters, use the command prompt to navigate to the directory containing the Muisetup program, and then type: muisetup.exe followed by any of the following switches: /i (specifies the user interface language(s) to be installed) /d (specifies the default user interface language that will be applied to all new user accounts) /u (specifies the user interface language(s) to be uninstalled) /r (specifies that the reboot message should not be displayed) /s (specifies that the installation complete message should not be displayed) When using the /i, /d, and /u switches, the languages must be entered using their four-digit hexadecimal Language ID values. Language IDs should be separated by a space, as in the following example: muisetup.exe /i 0411 0409 0c0a /d 0411 /u 0414 040c
Command Prompt Setup
To enable quiet mode installations, Muisetup.exe accepts parameters entered at the command line. This can be useful either during an unattended installation of the Windows 2000 MultiLanguage Version or simply during the addition and/or removal of user interface languages.To use the command line parameters, use the command prompt to navigate to the directory containing the Muisetup program, and then type:
muisetup.exe
followed by any of the following switches:
/i (specifies the user interface language(s) to be installed) /d (specifies the default user interface language that will be applied to all new user accounts) /u (specifies the user interface language(s) to be uninstalled) /r (specifies that the reboot message should not be displayed) /s (specifies that the installation complete message should not be displayed)
When using the /i, /d, and /u switches, the languages must be entered using their four-digit hexadecimal Language ID values. Language IDs should be separated by a space, as in the following example:
muisetup.exe /i 0411 0409 0c0a /d 0411 /u 0414 040c
Kind of says it all -- you can install/uninstall UI languages and you can change the UI language for new accounts (which also handles .DEFAULT which does the logon screen UI language). No integrated logon user UI language changing.
For that setting Paul is looking for, you have to go the route of the unattend file and intl.cpl....
Now of course in Vista there must be a whole lot of changes -- I'll have to cover those changes soon!
This post brought to you by М (U+041c, a.k.a. CYRILLIC CAPITAL LETTER EM)
It was over a year ago that Jeff D. asked in the Suggestion Box:
Michael, What I'd be interested in reading (assuming you haven't already given us one) is a primer on how to go about converting a non-Unicode app to Unicode compliance... for those of us who are starting to "see the light", as it were. Cheers, ~Jeff D. P.S. Forgive me if you've already written an article like this and I simply haven't found it yet.
First, let me say that I hope Jeff is still around and that he wasn't holding his breath waiting for me to answer. :-)
Although I left the question "on the books" for all this time, I was hesitant to post about it because it really seems like a more a tutorial that one might expect and less of a blog topic.
But I kind of thought that the next time I had to take a purely non-Unicode app and convert it to Unicode that it would be cool to perhaps try to start an answer to Jeff's question by covering what was involved, and what I did to minimize the amount of work while maximizing the productivity.
The last app that I had in mind for this idea that I had to do this with (long before Jeff posted his question) was converting the Windows build tool kbdtool.exe to the kbdutool.exe that MSKLC ships and uses. I specifically decided not to write about the kbdtool --> kbdutool conversion though, for a few reasons. First of all, the source for kbdtool.exe doesn't ship in the SDK, so no one could really see the project I did the work on, and second of all, the work was long done and it is easy to forget lots of details that long after the fact.
Plus I didn't want to just do something that wasn't related to what I'm doing.
And finally, I decided if I was ever going to do it I wanted it to be something that might be generally useful for others, too.
Tall order? Definitely.
And then earlier today, a project that fit my criteria pretty much fell in my lap!
In response to a bug that was reported earlier today:
BUG Title: MSKLC 1.4: The bootstrapper, setup.exe, is not Unicode?Repro Steps: Launch MSKLC 1.4 Customize any key so we can build a keyboard layout Point the working directory to a path that contains Unicode only chars like a Hindi char Build the keyboard layout Launch the created setup.exe Result: Setup was unable to find the msi package or patch.
BUG Title: MSKLC 1.4: The bootstrapper, setup.exe, is not Unicode?Repro Steps:
Result: Setup was unable to find the msi package or patch.
Suddenly I had a project to convert -- the SETUP.EXE Bootstrap sample from the Platform SDK that Heath Stewart mentioned a while back!
As a bonus, perhaps one day the Windows Installer folks might want to pick it up and include in the SDK. :-)
Now I realized there are many ways to do this kind of thing in a blog.
I decided to take the approach of a multiple posts that anyone could follow along with if they have the project on their machine.
This is a nice small project (~3500 lines of code or thereabouts) that I can use to show how I approached the idea.
It also has some other cool aspects I'll be able to point out as I go along (and also a few aspects that don't apply here that I will be able to point out as well!).
The actual amount of time it took me for this project gong from A to Zed to a compiling Unicode version that works was two hours and three minutes. But this series will take a bit longer as do it again and take the time to comment on what I am doing each day.
Hopefully everyone will find this a fun and/or useful way to spend some time over the next week or so....
Each day, the sponsoring character will be a random character not found in any Windows code page that would be suitable for step #3 in the repro steps of the bug, given earlier. :-)
This post brought to you by क (U+0915, a.k.a. DEVANAGARI LETTER KA)
Clearly the City Elders in Athens and Sparta and Thebes and Argos, being long dead, have decomposed at this point.
Now yesterday I posted about how The city elders won't give this string weight, either (aka On being consistently dead wrong, aka Ordinal or bust?).
Perhaps we can use the wisdom of ancient Greece to help us?
Although the attributed cause is indeed what was behind the reported problem, and although one can take advantage of Vista's support of Unicode 5.0 as I stated, there is another solution that can work here for a wider range of cases.
One can actually take advantage of normalization in this case -- taking inspiration from the current state of the City Elders long past, you can decompose the Greek text to help out here.
You see, it goes something like this (using Richard's example of U+1F96, rendering support may vary for you depending on all sorts of OS/browser/font issues):
ᾖ U+1f96 (GREEK SMALL LETTER ETA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI) decomposes to:
ᾖ U+1F26 U+0345 (GREEK SMALL LETTER ETA WITH PSILI AND PERISPOMENI + COMBINING GREEK YPOGEGRAMMENI) which decomposes to:
ᾖ U+1f20 U+0342 U+0345 (GREEK SMALL LETTER ETA WITH PSILI + COMBINING GREEK PERISPOMENI + COMBINING GREEK YPOGEGRAMMENI) which decomposes to:
ᾖ U+03b7 U+0313 U+0342 U+0345 (GREEK SMALL LETTER ETA + COMBINING COMMA ABOVE + COMBINING GREEK PERISPOMENI + COMBINING GREEK YPOGEGRAMMENI)
and everything in that bottom row does have weight in both Server 2003 and the .NET Framework.
Thus "\u1f96".Normalize(NormalizationForm.FormD) will give you something entirely sortable....
And this cam be extended to the rest of the extended Greek text!
Now obviously this will not work for characters using scripts in Unicode that versions of Windows prior to Vista don't handle at all, like Tibetan or Mongolian. But Greek has worked for a long time, and Unicode Normalization gives a solution to the problem that will work quite well in Microsoft and third party products not yet running on Vista! :-)
This post brought to you by ᾖ (U+1f96, a.k.a. GREEK SMALL LETTER ETA WITH PSILI AND PERISPOMENI AND YPOGEGRAMMENI)
The question Richard Wilson asked me via the contact link:
I am having problems with string comparisons with the Unicode Greek Extended characters, and someone suggested writing to you. The thread ishttp://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=1049188&SiteID=1but basically any comparison between two Greek Extended characters says they are equal, and any comparison between a normal and Extended character says that the Extended character comes first. (Excluding of course an Ordinal comparison, which doesn't help me because the ordinal order for Greek extended characters is not the alphabetical order). This is true for all the types of string comparisons in .NET, and someone has said that there is the same problem with the API call CompareString() Is this an error in Windows, or is there a way to correctly compare and sort Greek characters with accents? Thanks in advance.
I am having problems with string comparisons with the Unicode Greek Extended characters, and someone suggested writing to you. The thread is
Well, somewhere between the title of this post and the one it alludes to (this one), the answer lies.
These characters have no weight in collation, in either Windows or the .NET Framework, until Vista.
Though as a bonus in managed code, if you are running on Vista, any of the 72 Windows-only cultures that are synthesized from new locales in Vista will also give those characters weight....
Short of that, it's ordinal or bust.
As a postscript, this does not change in Orcas, given the whole red bits/green bits mess I have ranted about before. The powers that be are convinced that all customers prefer results that are consistent with prior versions even if that means being consistently dead wrong (if you disagree then I won't disagree with you!).
This post brought to you by ἀ (U+1f00, a.k.a. GREEK SMALL LETTER ALPHA WITH PSILI)
In Open it all up, get out of the way, and then what happens?, I talked about the challenges that my team faces as we add features that we can't always get our clients who use our functionality to pick up.
As serious as that problem may be (and cearly I think it is serious if I am willing to try and push readers to push applications for the functionality they would like to see picked up!), this post is not about that issue at all.
Instead, this post is going to give another problem, that of features that our clients would like for us pick up, and if we can't do it on their timeline the sorts of things they sometimes do to keep from being blocked by our timetable.
Now clearly this is an area where I have delved before, pointing out efforts within teams like the Shell folks do when they really don't want to wait for us.
But today, the application I am going to pick ontalk about is Microsoft Outlook.
I am not going to talk about their time zone issues today, that will be for a future post. :-)
Today, since it is technically a holiday, I am going to talk about holiday support in Outlook.
To get to it, just choose Tools|Options within Outlook, which will give you this dialog:
Click on that Calendar Options... button, which will give you this dialog:
From there you will get to the way to add holidays to your calendar.
Now the next dialog is for the most part a list of locations based on list of locales (specifically not the locale's maligned little brother, the GEOID), though it is based on a static list that Outlook contains since they would have no knowledge of the holidays of locales not on their list and the holiday support is based on a closed list....
By default it will select the location that corresponds to your default user locale, whether it is English (United States) of French (France) or whatever:
If no location on its list corresponds then nothing is selected (I verified this with Outlook 2003 and Bosnian).
When I said for the most part a list of locations based on a list of locales there is a notable exception with a few entries bout religious holidays:
(And there are no other religious holiday categories listed in case you were curious.)
As a bit of social engineering, note there is no Select All button -- they clearly are not encouraging people to select all of the locations. Unluckily enough for them, I am both ornery and patient enough to select them all!
(And the text in the dialog talks about locations with no mention of religious holidays, for that matter!)
And here is whether the wheels start to fall off the wagon.
If you select Israel, you won't get several holidays that are actually treated as National holidays there -- to get them you have to choose that Jewish Religious Holidays option. That will get you these holidays, like the very top of my calendar last week in five-day view shows:
Though looking at this week shows a problem in the other direction:
Now I did choose to add Christian religious holidays along with everything else, so we are left with either a space limitation scrolling items off the page or else a decision to make Christmas not a Christian Religious Holiday since it is in so many other locales.
(I'll let you decide which is worse -- it is actually, in fact, the former though the fact that neither my own locale's nor the reigious holiday appears high enough up on the list also seems like a problem).
So if Christmas is a recognized holiday in so many countries and that is why it is there, then what about the recognized holidays in Irael? And of course there are the many other religioud holidays -- from Diwali and Ganesh Chaturthi to Magha Puja and Vesak to Pioneer Day to the various National Founding Days and so on, ad infinitum.
There are quite a few inconsistencies here!
Some of these inconsistencies may be based on feedbck or planned. But they all look weird and make the feature look like it has troubles....
Now I am not going to pick on the decisions here, the fact is that they are all a bunch of compromises intended to make up for the fact that this whole area is incredibly messy and there are a lot of problems trying to add such a feature. Making fun of their choices is taking cheap shots, and I'd rather underscore that no matter what options they tried to choose making them look foolish in one way or another based on the consequences of making choices in such a messy area.
We have actually had adding this support to NLS made previously, most recently by someone who is now an Architect, if that gives a hint at the level at which this request has being made.
But the honest truth is that the standard for what we do in the Win32 API (and specifically the NLS API) is and by necessity has to be higher than that of a user interface application. From our point of view, the generous offer of "all of the data that Outlook has been shipping for years" is not much of a motivator, especially since the collected data is tainted by what from our point of view would be very bad design were we to simply add it to each locale or to start jumping into the slippery slope of user religious preferences or to ignore our own location support despite the obvious connection.
Now would it be possible to do such a feature at the NLS API kind of level? I mean, pointing out that implementation is too flawed to adopt does not mean all implementations are flawed, right?
Well, maybe.
But this is not a feature that can be created in a vacuum; you might be able to convincingly argue (and some have, to me!) that the single biggest cause of this problem is our strong push to try to meet customer requirements not being matched by an equally strong push to try to get applications signed up at the same time.
So where is the guarantee that if we were to add such a feature that it would be useful an used by applications like Outlook, that now have to deal with the legacy issues attached to their existing implementation?
Holiday support in NLS is not oozing with opportunity to add the support for users to see like custom locales does -- I mean, there is no Holiday Options control panel applet like there is a Regional and Language Options.... and custom locales are indeed a festure requested of us quite directly, from our UI piece (RLO).
This post is a long-winded way of pointing out that the things we don't do yet are often so because they are simply not something that we are in a position to do yet. We may never be, or we might just be waiting for the compelling customer and application scenario to come along....
If it was going to happen as a feature, the clearest change that would have to be made would be a bit of combining, obviously -- there is no specific benefit to repeating the same holiday over and over again as is happening for Christmas, or even Boxing Day. The way that the mechanisms work in Outlook (a static process that will simply insert all the holidays for each chosen location element) is also not suited to a function, and thus even if there were an NLS "holiday" feature much of the actual processing would still belong elsewhere in the callers of such functionality.
This post brought to you by ᆒ (U+1192, a.k.a. HANGUL JUNGSEONG YU-YE)
The report was that a specific sequence of bytes was failing conversion via code page 50220 to Unicode when using MultiByteToWideChar but succeeding when using MLang. The bug only repros on XP (in Server 2003) the conversion was working in both technologies.
Now as I pointed out in All code page architectures are created equal, some really are more equal than others. So let's take a look at this "MLang being more equal than Win32 on XP" case, shall we?
An excerpt of the byte sequence that shows the problem is:
1B 24 42 2D 21 2D 22 2D 23 2D 24 2D 25 2D 26 2D 27 2D 28 2D 29 2D 2A 1B 28 4A 1B 24 42 2D 2B 2D 2C
Breaking it down a bit:
Now you will notice that the sequence in pink and the later one in red are the same. Yung-Shin did an analysis of what was going on:
The 3-byte escape sequence in pink switches the mode to JIS X 0208-1983 mode, and two additional 3-byte sequence (1b, 28, 4a and 1b, 24, 42) switches the mode to JIS-Roman and back to JIS mode X 0208-1983 again. This is actually unnecessary because it’s already in JIS mode already. However, in XP, these bogus escape sequence causes it to exit the loop and returns from the MB2WC call. In a word, if bogus escape sequence like this, the bogus sequence will truncate from the bogus escape sequence. If there are no bogus escape sequence (i.e. removing the red bytes), XP will convert the string just fine.This bug is fixed in Server from Shawn by continuing the loop so its mode is switched correctly.
Now I write blog posts in tools that have no problem inserting HTML like
<font face=Tahoma>Hello </font><font face=Tahoma>Dolly!</font>
(note the completely bogus bit converting out of and right back into the exact same font in red)
So I can believe there might indeed be editing tools that might do the same thing with encodings that use escape sequences to switch in and out of different modes.
The file itself that containing the errant sequences may well have just been set up to test the specific case, so it may or may not be proof of a real need to do something better here in XP.
But I was wondering if anyone out there had run into this specific bug before, and whether it was blocking them (because writing code to prefilter the bytes would actually be quite a pain to do for obvious reasons).
So, is there anyone using ISO 2022 code pages running into this problem?
This post brought to you by U+000e and U+000f (a.k.a. C0 control characters representing SHIFT IN and SHIFT OUT)