Postings are provided as is with no warranties, and confer no rights. Opinions expressed here are my own delusions; my employers at best shake their heads and sigh, at worst repudiate the content with extreme prejudice, whenever it manages to appear on their radar.
This blog is unsuitable for overly sensitive persons with low self-esteem and/or no sense of humour. Proceed at your own risk. Use as directed. Do not spray directly into eyes. Caution: filling may be hot. Do not give to children under 60 years of age. Not labeled for individual sale. Do not read 'natas teews ym' backwards. Objects in mirror are closer than they appear. Chew before swallowing. Do not bend, fold, spindle or mutilate. Do not take orally unless directed by a physician. Remove baby before folding stroller. Not for use on unexplained calf pain.
A nice FLAIR (FLuid Attenuated Inversion Recovery) view from the not-too-distant past. Every abnormality you can see on this scan (and there is more than one!) is asymptomatic at present. Alongside is a picture of me walking the walls at Fremont Studios, a sign of a damaged brain.
The NLS API function IsNLSDefinedString is an exercise in social engineering within software.
Perhaps I should explain what the hell I am talking about. :-)
This function takes a string and essentially gives you a judgment about whether this string is one you can pass to the collation functions in the NLS API and expect to have something along the lines of reasonable, supportable results.
The process is simple. It enumerates every UTF-16 code unit in the string, and uses the following tests to make its decision.
Clearly, this is not a linguistic judgment, since the conditions are easily stated. Every UTF-16 code unit in the string:
Calling a string that is does not pass this test INVALID has interesting consequences, since it means that IsNLSDefinedString is not just returning whether to expect determinsm in collation function results. If that were the case then only the point #1 would be needed.
Two questions come up at this point:
Question #1: Why judge the PUA so harshly, if NLS collation functions will return deterministic results?
The issue here is that the private use area has no real context or meaning beyond that created by private agreement. Therefore, there is no way that NLS collation functions can treat such a string as being valid, since its meaning is unknown to the operating system.
So IsNLSDefinedString makes sure that situations that require an answer to the question of determinism are not given false answers based on strings that do not have a known, valid value.
Question #2: Why judge unpaired surrogate code points to harshly, if NLS collation functions will return deterministic results?
The issue here is that an unpaired surrogate is given the same status in Unicode as an undefined code point, so IsNLSDefinedString returns FALSE here just as it would for any other undefined code point.
So if you use IsNLSDefinedString, you are being influenced to do certain things with your application to make sure that these "undesirable" code units are not treated as being valid.
A very geeky form of social engineering, as NLS tries to make the character "neighborhood" a nicer place for the other characters to live!
Could this be expanded in the future to take care of other sequences such as too many diacritics and other potential undesirables? Well, perhaps -- in a new major version only though, of course -- but the line so far has been drawn to differentiate between what has clear meaning in Unicode vs. what does not; it is unclear whether it makes sense in the long run to extend the coverage to handle implementation-specific limits....
This post brought to you by U+00ad, a.k.a. SOFT HYPHEN
It occurred to me that some could apply this cartoon of Hugh Macleod's to Sorting It All Out....
I'll see what I can do to make sure I stay focused on solutions. :-)
In other news, and there is no whining here, I agree with Jim Glass that 20 language versions of Microsoft Dynamics CRM 3.0 is a good start.
And I won't even whine about the need for the next 100. :-)
This post brought to you by Z (U+005a, a.k.a. LATIN CAPITAL LETTER Z)
Back in early 2005 (in the post Keeping it simple with complex scripts), I talked a little bit about the way that the Uniscribe documentation gave several examples of how complex script shaping rules would be used by giving examples with Latin script cursive writing.
In retrospect this is kind of ironic, since Latin is not conventionally thought of as a 'complex script', certainly not when this documentation was written and to most people not even now.
(most of what I talk about here applies in interesting ways to complex scripts in general)
It is a funny thing, but if you use a Latin script language like English and you read the language that someone wrote out using cursive writing, you do not give the subtle change in the differences in how many letters connect a second though. You just assume that these small differences exist, and have no problem reading it.
And clearly the person writing has no trouble with these small difference
Developing a font that uses cursive (basically a 'handwriting' font) is a bit more challenging.
There is clearly no way, for example, to emulate the writer, who often needs to change the shape of the current letter based on the next letter. Because no rendering engine can read minds, the simple truth is that the initial form of the letter that is written may not match the final form once the next letter is typed -- unless you sacrifice the quality of what the final rendering will be by producing "position neutral cursive."
Of course if you choose to sacrifice that quality, it affects the reader's experience.
You could think of these two different methods, where one favors the writer by keeping the letters consistent and the other favors the reader by looking more like actual handwriting as a typography issue that lots of people don't really consider very often....
In general the font has to choose one of these two approches.
With that said, let's take a look at Segoe Script, one of the new fonts that ships with Vista.
Let's take a simple, common phrase that I am sure you find yourself using all the time, such as we welcome werewolves. Let's type it into Notepad on Vista. Every pharse starts with the first letter....
Ok, simple enough. Now let's add that second letter -- note that the connection points between w and e are such that they should be at the midpoint of the letter, not the baseline. So we can watch the w glyph change:
Just like one might do in handwriting. Ok, we'll finish the sentence:
And there are a few other examples there with other letters, too. Ok, so clearly Segoe Script is one of those fonts that is better for the readers than the writers.
Or is it?
Let's do the same thing in Wordpad, which uses a RICHEDIT control rather than an EDIT control:
Hmmm... right out of the gate there is something different. I'm almost afraid to continue:
It looks the that w did not even blink! Let's look at the whole phrase:
Clearly, our "reader" font has become a "writer" font. What happened?
It get's worse, actually. Let's look at a small managed application that renders our phrase using four different techniques and the two different kinds of Edit controls:
Suddenly everything looks more complicated than it did a moment ago, doesn't it? The EDIT control with ExtTextOutW betrays the pattern for the EDIT control, just the same kind of way that TabbedTextOutW does for RICHEDIT.
What is the underlying issue causing these seemingly pathologically diverse results?
Well, the issue is simple enough, and it is that Latin, unlike Arabic, Devanagrai, Tibetan, Sinhalese, and many others, is only sometimes considered to be a complex script. And by sometimes it is clear that I am saying in some code paths.
Fun wrappers around text rendering like Uniscribe, TabbedTextOutW, and DrawTextExW, will end up being treated in XP SP2, Vista, and other recent platforms as a complex script, while both the simpler (e.g. SetWindowTextW) and lower level (e.g. ExtTextOutW) functions will treat it like it is not.
And although ExtTextOutW has ETO_IGNORELANGUAGE, which is essentially an ETO_STOPTREATINGMELIKEIAMSOCOMPLEXYOUMISERABLECONTROLFREAK flag, it has no ETO_DONTIGNORELANGUAGE, which would be more of an ETO_IAMCOMPLEXHEARMEROARYOUMISERABLESIMPLETON kind of flag.
(Hard to believe that they don't have me authoring the names of more constants in the Platform SDK headers, isn't it?)
So the behavior you get here will be very much dependent on what method you use to get the text drawn and what control you use.
Unlike the situation in scripts that are pretty much always considered complex, which do not depend on the function called to know this extra work is needed....
Of course treating Latin as a complex script was not done exlusively for the sake of cursive Latin fonts, it was instead done for the support so many African languages that need the text to be considered complex to get all of the right shaping for diacritics.
Which means that the controls and functions that screw this up are being all that they can be for some languages, a topic I will talk about more another day.
Another topic I'll hit on in a future post is some of the additional issues with cursive fonts.
(Special thanks to MVP 'Ted' for first pointing out this issue to me, and Peter Constable for his help in getting the understandable if not intuitive explanation together!)
This post brought to you by w (U+0077, a.k.a. LATIN SMALL LETTER W)
Kimberly L. Tripp has a great post entitled Changing Database Collation and dealing with TempDB Objects.
Definitely worth a read -- this is a problem I have bitten by in the past and have been defensively using per-column collation in TempDB operations when the solution Kim suggests is actually much simpler in just about all of the cases one might hit.
Thanks, Kim!
To her advice I will add my old reminder that at least for testing purposes, Case/kana/accent/width sensitive collations are better. Because it is MUCH easier to go from sensitive to insensitive than the other way around....
This post sponsored by K (U+004b, a.k.a. LATIN CAPITAL LETTER K)
For a long time, the National Language Support Reference contained all of the information for the Multilingual User Interface. But that is changing now, and in upcoming releases of the Platform SDK -- with these two new links (one for NLS, and one for MUI -- both under the International Text Display node.
That MUI node is brand new, and contains both all of the MUI info that used to be mixed in with NLS and a ton of new info under the About Multilingual User Interface (MUI) node.
How much new and cool and previously disparate info is there, you might ask?
About 2.6 buttloads, at least!
Check out these topics:
I'll be talking about some of these topics further in future posts, especially the updates to the resource model, fallbacks, and other improvements/changes (in both Vista and in downlevel components.
But for those who want to start looking right away, the docs are available above.
Don't miss that note on the top of many of these new pages:
Note: This documentation is preliminary and subject to change.
Enjoy, and strap in for some of the future upcoming posts!
This post brought to you by M (U+004d, a.k.a. LATIN CAPITAL LETTER M)
In previous posts, I have talked about the unattend mode of Regional and Language Options.
And in the most recent of those, I promised to talk about the changes in Vista. So that is what is happening in this very post!
WARNING: The stuff covered here will not work in Windows prior to Vista....
Some of the exciting features that were added may be small items, but a few of them have been requested for a very very long time:
One of the big new changes, to go along with all of the changes to Vista's unattend during setup story in general, is a move to an XML file format rather than the old text file format used previously....
To run it, make sure you follow that advice I gave about control.exe, instead of rundll32.exe like that KB article suggests:
control.exe intl.cpl,,/f:"c:\Unattend.xml"
A sample of the XML file that should work in Beta 2 of Vista (and probably most other builds, this functionality has been in there for a while now!) can be seen below:
<gs:GlobalizationServices xmlns:gs="urn:longhornGlobalizationUnattend"> <!-- user list --> <gs:UserList> <gs:User UserID="Current" CopySettingsToDefaultUserAcct="true" CopySettingsToSystemAcct="true"/> </gs:UserList> <!-- GeoID --> <gs:LocationPreferences> <gs:GeoID Value="134"/> </gs:LocationPreferences> <!-- UI Language Prefernces --> <gs:MUILanguagePreferences> <gs:MUILanguage Value="cy-GB"/> <gs:MUIFallback Value="en-GB"/> </gs:MUILanguagePreferences> <!-- system locale --> <gs:SystemLocale Name="en-GB"/> <!-- input preferences --> <gs:InputPreferences> <gs:InputLanguageID Action="add" ID="0409:00000409"/> <gs:InputLanguageID Action="remove" ID="0409:00000409"/> <!--bs-Latn-BA--><gs:InputLanguageID Action="add" ID="141a:0000041a"/> <!--cy-GB--><gs:InputLanguageID Action="add" ID="0452:00000452"/> <!--cy-GB--><gs:InputLanguageID Action="add" ID="0809:00000809"/> </gs:InputPreferences> <!-- user locale --> <gs:UserLocale> <gs:Locale Name="en-US" SetAsCurrent="true" ResetAllSettings="false"> <gs:Win32> <gs:iCalendarType>1</gs:iCalendarType> <gs:sList>...</gs:sList> <gs:sDecimal>;;</gs:sDecimal> <gs:sThousand>::</gs:sThousand> <gs:sGrouping>1</gs:sGrouping> <gs:iDigits>2</gs:iDigits> <gs:iNegNumber>2</gs:iNegNumber> <gs:sNegativeSign>(</gs:sNegativeSign> <gs:sPositiveSign>=</gs:sPositiveSign> <gs:sCurrency>kr</gs:sCurrency> <gs:sMonDecimalSep>,,</gs:sMonDecimalSep> <gs:sMonThousandSep>...</gs:sMonThousandSep> <gs:sMonGrouping>3</gs:sMonGrouping> <gs:iCurrDigits>1</gs:iCurrDigits> <gs:iCurrency>3</gs:iCurrency> <gs:iNegCurr>3</gs:iNegCurr> <gs:iLZero>0</gs:iLZero> <gs:sTimeFormat>:HH:m:s tt:</gs:sTimeFormat> <gs:s1159>a.m.</gs:s1159> <gs:s2359>p.m.</gs:s2359> <gs:sShortDate>d/M/yy</gs:sShortDate> <gs:sLongDate>dddd, MMMM yyyy</gs:sLongDate> <gs:iFirstDayOfWeek>6</gs:iFirstDayOfWeek> <gs:iFirstWeekOfYear>2</gs:iFirstWeekOfYear> <gs:sNativeDigits>0246813579</gs:sNativeDigits> <gs:iDigitSubstitution>1</gs:iDigitSubstitution> <gs:iMeasure>0</gs:iMeasure> <gs:iTwoDigitYearMax>2021</gs:iTwoDigitYearMax> </gs:Win32> </gs:Locale> </gs:UserLocale> </gs:GlobalizationServices>
<gs:GlobalizationServices xmlns:gs="urn:longhornGlobalizationUnattend">
<!-- user list --> <gs:UserList> <gs:User UserID="Current" CopySettingsToDefaultUserAcct="true" CopySettingsToSystemAcct="true"/> </gs:UserList>
<!-- GeoID --> <gs:LocationPreferences> <gs:GeoID Value="134"/> </gs:LocationPreferences>
<!-- UI Language Prefernces --> <gs:MUILanguagePreferences> <gs:MUILanguage Value="cy-GB"/> <gs:MUIFallback Value="en-GB"/> </gs:MUILanguagePreferences>
<!-- system locale --> <gs:SystemLocale Name="en-GB"/>
<!-- input preferences --> <gs:InputPreferences> <gs:InputLanguageID Action="add" ID="0409:00000409"/> <gs:InputLanguageID Action="remove" ID="0409:00000409"/> <!--bs-Latn-BA--><gs:InputLanguageID Action="add" ID="141a:0000041a"/> <!--cy-GB--><gs:InputLanguageID Action="add" ID="0452:00000452"/> <!--cy-GB--><gs:InputLanguageID Action="add" ID="0809:00000809"/> </gs:InputPreferences>
<!-- user locale --> <gs:UserLocale> <gs:Locale Name="en-US" SetAsCurrent="true" ResetAllSettings="false"> <gs:Win32> <gs:iCalendarType>1</gs:iCalendarType> <gs:sList>...</gs:sList> <gs:sDecimal>;;</gs:sDecimal> <gs:sThousand>::</gs:sThousand> <gs:sGrouping>1</gs:sGrouping> <gs:iDigits>2</gs:iDigits> <gs:iNegNumber>2</gs:iNegNumber> <gs:sNegativeSign>(</gs:sNegativeSign> <gs:sPositiveSign>=</gs:sPositiveSign> <gs:sCurrency>kr</gs:sCurrency> <gs:sMonDecimalSep>,,</gs:sMonDecimalSep> <gs:sMonThousandSep>...</gs:sMonThousandSep> <gs:sMonGrouping>3</gs:sMonGrouping> <gs:iCurrDigits>1</gs:iCurrDigits> <gs:iCurrency>3</gs:iCurrency> <gs:iNegCurr>3</gs:iNegCurr> <gs:iLZero>0</gs:iLZero> <gs:sTimeFormat>:HH:m:s tt:</gs:sTimeFormat> <gs:s1159>a.m.</gs:s1159> <gs:s2359>p.m.</gs:s2359> <gs:sShortDate>d/M/yy</gs:sShortDate> <gs:sLongDate>dddd, MMMM yyyy</gs:sLongDate> <gs:iFirstDayOfWeek>6</gs:iFirstDayOfWeek> <gs:iFirstWeekOfYear>2</gs:iFirstWeekOfYear> <gs:sNativeDigits>0246813579</gs:sNativeDigits> <gs:iDigitSubstitution>1</gs:iDigitSubstitution> <gs:iMeasure>0</gs:iMeasure> <gs:iTwoDigitYearMax>2021</gs:iTwoDigitYearMax> </gs:Win32> </gs:Locale> </gs:UserLocale>
</gs:GlobalizationServices>
Note the use of comments in the XML. :-)
Now as was true in the old format, you only have to include the sections that you have changes in. However, at a minimum you must always have the following skeleton:
<gs:GlobalizationServices xmlns:gs="urn:longhornGlobalizationUnattend"> <!-- user list --> <gs:UserList> <gs:User UserID="Current"/> </gs:UserList> </gs:GlobalizationServices>
<!-- user list --> <gs:UserList> <gs:User UserID="Current"/> </gs:UserList>
WARNING: Since I am a person who did not look at the event log when I could not get a file to work, you can save yourself a lot of time if you don't forget to add that "gs:UserList" tag!
Anticipating another question -- at present only the "Current" user is supported.
But don't worry. Think of this as another one of those instances where potential features in future versions are poking their heads out for people to gawk at! :-)
Now of course if anything changes between now and Vista shipping I will post again with whatever the updates are....
Anticpating another response -- the current plan of record is that the old text files will have to be converted to the new format, though of course that is a bigger issue than just Regional and Language Options -- it affects the unattend files for all of Vista. But I'll keep you updated on this issue if I hear anything else.
Now I am sure there are things I am forgetting, but I will make sure to add those in the future, too. :-)
This post brought to you by "ᠽ" (U+183d, a.k.a. MONGOLIAN LETTER ZA)
I can't deny it, I am a fan of Bill and Karloyn, the Slowskys.
In case you haven't seen their commercials on TV, here is one of the first ones, actually the first one I saw.
Now it is easy to get a little too caught up in this sort of thing.
I mean, I'm certainly am not going to use their AIM handle to start chatting with them or anything -- I have enough slow typists on my IM list as it is, enough to make me wonder if we don't all have a little turtle in us....
And I probably won't even add their blog (The Slowskys: a nullus volito) to the list of blogs I read, as there is something about fictional blogs that doesn't really appeal to me (even if they have an RSS feed!)
But I am interested, speaking as a moderately intelligent person who cares about language and who runs Microsoft Office and who also has Comcast High Speed Internet, why the Comcast-Slowsky ad campaign appealed to me so much in a way that Microsoft's Have you evolved? / Don't be a Dinosaur Head campaign did not....
Obviously I am not one of those who sits in the "Anything But Microsoft" camp, so that is not the reason (though there are some who do not need to look any deeper than that!).
Looking at both ads, at what each campaign was trying to communicate to me, there are some clear differences. Let's look at the message that each one imparts to me:
The MS ad has an overall negative message for those who do not upgrade, who do not evolve. It is actually saying that there is something wrong with you if you do not upgrade -- you are a dinosaur, one of those creatures who (in the words of the Johnny from the movie Airplane II: The Sequel "...got too big and fat, so they all died and they turned into oil." You are a throwback, and you will become extinct. Unless you are in county government or something, in which case you probably aren't even up past DOS yet, you are simply either someone who evolves or someone who (like the dinosaurs) will become extinct.
But the Slowskys are charming and funny, and to be honest there is nothing wrong with finding the whole internet making you feel like you are being rushed. As turtles go they seem to be doing rather well for themselves, and they are quite happy with things as they are. Turtles are not extinct at all, and the campaign is not explicitly saying "Don't be a Slowsky!" at all -- even if there is an implicit suggestion, they are quite clear that it is not stupidity or poor judgment that leads them to their conclusion.
I think I like the subtle differences in approach here, a lot.
Now of course I may not be the target for either company -- I was using both products before their ad campaigns started, so neither one really affected my purchasing decision. And I know enough to not judge either company by the folks who do their advertising anyway.
And, Truth be told, I know enough about Microsoft (its good and its bad) to keep a firm "net neutral to positive stance" without really breaking a sweat (certainly my being on the payroll does not keep me from pointing out flaws, but I am obviously a fan overall), while my upset about Comcast refusing to pick up SoapNet in my area has me tempted to go to DIRECTV more and more each day. And Bill and Karolyn do not tempt me to stay, any more than that Gekko would keep me at Geico if they were my auto insurance company (which they are not, and not only because the Gekko blog is three years old yet has no RSS feed)....
But I can't help enjoying Karolyn and Bill Slowsky. :-)
This post brought to you by ⺫ ♥ ⻱ (U+2eab, U+2665, and U+2ef1 a.k.a. CJK RADICAL EYE, BLACK HEART SUIT, and CJK RADICAL TURTLE)
I have mentioned the ALT+X mechanism for entering Unicode code points in passing previously, e.g. in posts such as Typing in random Unicode code points.
Michael O'Henly asked via the Contacting Me... link:
Hi Michael... I'm starting to learn Mandarin and want to be able to take notes using pinyin romanization in Word. Basically, I want to be able to easily create the following characters in a Word document. ā á ǎ à ō ó ǒ ò ē é ě è ī í ǐ ì ū ú ǔ ù ǖ ǘ ǚ ǜ ü Ā Á Ǎ À Ō Ó Ǒ Ò Ē É Ě È I think what I'm going to describe is a Word (2003 and 2007) bug, but I'd appreciate your opinion. In a new document and using a unicode font, enter 01CE followed by Alt-X. You get "ǎ". Enter 01DA followed by Alt-X, then 01CE followed by Alt-X. You get "ǚǎ". So far so good. Enter "asdf" followed by 01CE followed by Alt-X. You get "asdf01CE". The substitution of the unicode character doesn't happen. This behaviour _doesn't_ occur in WordPad -- only in Word. Any thoughts about why this is happening and/or how I can get around it?Thank you.
Hi Michael...
I'm starting to learn Mandarin and want to be able to take notes using pinyin romanization in Word. Basically, I want to be able to easily create the following characters in a Word document.
ā á ǎ à ō ó ǒ ò ē é ě è ī í ǐ ì ū ú ǔ ù ǖ ǘ ǚ ǜ ü Ā Á Ǎ À Ō Ó Ǒ Ò Ē É Ě È
I think what I'm going to describe is a Word (2003 and 2007) bug, but I'd appreciate your opinion.
This behaviour _doesn't_ occur in WordPad -- only in Word.
Any thoughts about why this is happening and/or how I can get around it?Thank you.
It is funny, I was over in building 36 a few weeks ago, while some of the folks from Office were giving a presentation to some customers about some of the international features in the next version of Office.
Chris Pratley was there, and although I probably hadn't seen him for a while, he actually helped me out recently when he pointed out a feature I did not know about in Office ( >= 2002) that is behind the very behavior that Michael hit....
Believe it or not, it is by design!
The <ALT>+X behavior in Word that toggles between Unicode code points and characters only supported UTF-16 code units when it was first introduced, but starting with Office 2002 it will accept code points across the entire Unicode code space!
In this particular case, because what was typed was asdf01ce, where the last four characters could all be potential hexidecimal numbers (and where you had not selected the four specific characters you wanted to be converted), Word could not really figure out what you wanted -- and thus did nothing.
As a workaround, if you select the last four characters of asdf01ce there that represent the code point you want to convert and then hit <ALT>+X, then you will end up with the asdfǎ that you were looking for.
The reason this does not happen in Wordpad is due in part to the fact that the version of Wordpad that ships with some versions of Windows1 still only has support for UTF-16 -- so you cannot repro the "problem" there, and in part because the "feature" of guessing what you want to convert without the selection does not seem to work exactly the same way.
But if you select the code unit or code point, it will always convert for you!
1 - How far behind the Windows version of RichEdit is from the one in Office is a constant pain point that Office PARTNER SDE Murray Sargent and I have commiserated about in the past!
This post brought to you by ǎ (U+01ce, a.k.a. LATIN SMALL LETTER A WITH CARON)
Indeed, this is the question I came to ask the mirror, while the wicked queen was napping (or maybe she was off getting a facial?):
Mirror, mirror... before whom I fuss, Does mirroring work well in GDI+?
Of course the mirror cannot lie, so it is forced to answer truthfully:
Thy international features are very powerful, 'tis true; But GDI+'s mirroring capabilities will simply never do. Sometimes it reverses one time too many for thee,Other times it is off by a pixel or three.Even when it does work, this won't add up to beans;For Michael you, of all people, know what 'unsupported' means!
Annoyed, I decide to risk seven years of bad luck and break the freaking mirror. But anticipating that it might eventually come to this, I spoke with an attorney prior to the B&E, and he is convinced he can get the 7 years dropped down to 2⅓, possibly even with suspended sentence....
As I slip out of the wicked queen's chamber, I wonder whether it would have been safer to try to look in MSDN for the answer. :-)
(I don't know whether to be proud of the intro to this post or embarrassed about the fact that it took more time to write than the remaining text!)
I have talked a little about mirroring in previous posts.
But a question I have been asked before and which comes up from time to time on discussion lists is how to make mirroring work properly in GDI+ -- or in the managed world.
The short answer (covered in the GlobalDev article Mirroring Awareness) is that it doesn't:
Windows Forms don't support mirroring directly, as they do the RightToLeft property. Instead, you create your own controls to be displayed as you need. You can develop your own RTLTreeview control, for example, which inherits from the Treeview control and changes its style to be mirrored.
In other words, the suggestion is to create two controls, and hide one of them. :-(
The real truth is, as I said, that GDI+ does not support mirroring properly, and there are at present no plans to change that.
If you do the work to mirror a device context and the DC is clipped, then GDI+ will write to the wrong location (a pixel or two off). A few code workarounds have been suggested to fix the problem, such as using the GDI OffsetRect function to adjust the clipping rectangle:
RECT rcHDC, rcClip;::GetClientRect(hwnd, &rcHDC );::GetClipBox(hdc, &rcClip );::OffsetRect(rcDestination, (rcClip.left - rcHDC.left) - (rcHDC.right - rcClip.right), 0);
Other people actually just move it over one pixel, but doing the calculation feels a bit safer to do (and it doesn't always look like a single pixel to me!).
Now note that this only solves the 'slight pixel off' problem. The other problem that can occur is that the text can simply not flip at all (presumably due to the strange interactions of the RightToLeft property and the mirroring property for all new windows (set via the SetProcessDefaultLayout function).
The result? Everything can end up not being mirrored, since the mirror image of a mirror image looks like the original. :-(
Luckily, the fix for this is also reasonably straightforward -- you can simply remove the RTL layout flag if it is in there, so you can let the RightToLeft property do its thing without interference
// Disable flipping if necessary, to avoid being 'double-flipped' DWORD dwLayout = ::GetLayout(hdc); if (dwLayout & LAYOUT_RTL) { ::SetLayout(hdc, dwLayout & ~LAYOUT_RTL); }
(I suppose you can also go the other way around -- keeping the layout LTR in the property and using the mirroring stuff to do the flipping -- the key is to make sure that you do not use both -- though I have not actually tried this method, myself)
Now, for both of these problems, my advice is not to do anything unless you are actually seeing the problems in question (e.g. you may actually be using GDI rather than GDI+ without realizing it!). The complaints come up often enough that I am pretty sure you will notice it pretty quickly if you try to mirror a managed application....
For direct text writing to a device context, It may be worth considering using the TextRenderer class, which neatly avoids the bulk of these issues.
There are other controls that have additional issues.
This post brought to you by "ק" (U+05e7, a.k.a. HEBREW LETTER QOF)
Regular reader Ivan Petrov asked in the Suggestion Box:
Hi Michael I'm wondering have you ever thinked about adding to the Windows basic text shortcut menu, which consists of the folowing items: Undo <separator> Cut Copy Paste Delete <separator> Select All an 'Change Case' item? So, did you know such a program, which adds this extension to the Windows basic text shortcut menu? Or maybe better - This to be a good idea for making another PowerToy program for windows ;-) http://www.microsoft.com/windowsxp/downloads/powertoys/xppowertoys.mspxRegards, Ivan.
Hi Michael I'm wondering have you ever thinked about adding to the Windows basic text shortcut menu, which consists of the folowing items:
Undo <separator> Cut Copy Paste Delete <separator> Select All
an 'Change Case' item? So, did you know such a program, which adds this extension to the Windows basic text shortcut menu? Or maybe better - This to be a good idea for making another PowerToy program for windows ;-) http://www.microsoft.com/windowsxp/downloads/powertoys/xppowertoys.mspxRegards, Ivan.
Well, I usually like to use (as a good indication of items that might belong on these sorts of right-click 'context sensitive' menus) a pretty high standard -- like is it something that people will commonly need?
Or, to look at it a different way -- is it a commonly used feature in a program like Word? I don't even think there is a toggle for this at all, is there?
I am not sure this idea meets the test there -- how common would the operation really be?
Looked at yet another way, and to answer the other question -- I do not know of anyone who does this now, for any control....
Now if a common usage was found, there is the problem of how the text would be changed by the "Change Case" item -- upper? Or lower? Or reversed from the current casing? Obviously there would have to be separate operations -- multiple menu items for at least the upper and lower choices. That is a lot of real estate in a menu that does need to kept simple....
Of course even for the default controls like the EDIT control and items like the Windows Explorer, these are all separate menu handlers, each of which would need this code added somewhat globally. This can also obviously be problenatic.
As luck would have it, the team that has to think about issues with this type of scope -- the Shell team -- has a few members who read this blog. So I can say that I have passed on the idea by even having this blog post!
Though I can't say I would recommend this one myself; I don't think usage would be all that common for the vast majority of people....
Can anyone think of a situation where this would be commonly needed?
This post brought to you by "೫" (U+0ecb, a.k.a. KANNADA DIGIT FIVE)(A letter with no casing worries, whatsoever!)
I was being very hopeful when I posted It may not always end with ի and It may not always end with ის or ისა, either and hoped that someone would answer my request for better information.
As I somewhat intentionally attempted to apply a rule-based process for genitive forms of month names, I may not fit the ideal model of the young child who says things because they don't know any better. But I thought some kind native speaker of one of the languages would reply back with either the more complex rule(s) or the specific exception(s) to cover the data.
I guess I took it for granted that someone who knows one of the lanuages would be one of the blog's readers; I guess I sort of it take it for granted that there is enough of a following here, especially when people tell me about whole communities of people who are readers (I was recently told that I am widely read by software developers in Japan, for example --something I never would have believed).
There could of course be people who are not wanting to directly contribute to Microsoft software.
And it would be foolish of me to claim that I wouldn't be interested in adding new data if feedback comes in that makes it through the review process, since it would be a lie -- I would like to improve what the products I have any influence on do, and that would be true no matter where I happen to be working. That is just a part of wanting to properly support language and culture in the products.
Though of course this is not the sort of feature that would impact the price of Windows, or even the marketing of it (can you imagine seeing Now Including Genitive Month Names! on the actual box of Windows sold in Armenia?!?). For the most part it is a feature that people would simply not notice -- just as we never notice when language is correct, compared to when it is not.
So, if any Georgians or Armenians are out there reading this, do you have light to shed on my "suppositions of a young child" about the genitive forms of month names in Georgian and Armenian?
And if you want you can even tell people that you goed to the blog and corrected me on my grammar mistakes! :-)
This post brought to you by "Շ" (U+0547, a.k.a. ARMENIAN CAPITAL LETTER SHA)
Changing the user interface language really is a rather disruptive operation.
Prior to the multilanguage version of Windows, you would actually have to install a fresh version of Windows based on the desired language. Even once MUI was made available, a change to this per-session setting required logging out and logging back in again.
In Office, changes to the user interface language in the multilanguage version required starting a new session of the Office application in question. Which is truly the application equivalent of that same requirement that the OS has -- starting over to get the update.
Now neither of these cases literally required that firm line from a technical perspective, other than a desire to keep the user experience clean and consistent. And to fully understand the consequences of unclean and inconsistent here, you have to imagine all of the changes like font linking/fallback, mirroring, already loaded string resources, dialog templates, and so on, being mixed in terms of supported languages. The results woul be catastrophic even for Microsoft's own applications, let alone for third parties.
You can think of that line as a sort of "wall of sanity" that keeps the experiences consistent. :-)
So now we come to the .NET Framework....
I swear not a week goes by that the question does not get asked. Here is an example of the question that keeps coming up:
I would like to change the language at runtime. I do not found any topic about that in the documentation. Although restarting the program with a new UIculture change to the according resources by putting:' Sets the UI culture to French (France)Thread.CurrentThread.CurrentUICulture = New CultureInfo("fr-FR")in the "Public Sub New()" procedure as an example, I do not know how we can reset all the strings at the runtime by selecting a menu option to the desired language.
If people look at their VB.Net or C# WinForms code, they see a call to InitializeComponent() in their form constructor that literally builds the form from scratch.
Calling this procdure again is not hard, but it would be pretty unwieldy as a method since it basically recreates every control on the form. At that point, why not just unload and reload the form? You may have a weird experience related to different forms having different languages, but I have to assume you are okay with that or you wouldn't be asking the question, right? :-)
But assuming you wanted to not reset the whole form, the real hard question is how much of a subset can you create, given that a localizer may have resized or moved controls. How much of the InitializeComponent() procedure can you recreate, and how can you keep your custom version in sync with the one that the designer keeps in sync for you?
The distance between the possible and the practical here is substantial; the distance between the possible and the maintainable is orders of magnitude greater still.
How often would it really and truly be worth it?
I know that some people will answer that they agree with this post but their situation is an exception that really requires the functionality, so I will actually pony of up some examples of how to make this sort of thing work if you have to do it. Keep your eyes out for an upcoming post on this soon!
This post brought to you by "჻" (U+10fb, a.k.a. GEORGIAN PARAGRAPH SEPARATOR)
Mushy asked in the Suggestion Box:
Michael, You may have an old post that explains the following situation, so point me to it if you do. Here is the issue. If you look on my blog page you'll see some quotation marks printed as ’ instead of ". When I first posted the material they weren't there. Now, most of the " marks are in some other code. How do you get rid of them? Is this caused by Word, .doc, .txt, .rtc formats? Which is best to post in? P.S. I like your blog and have bookmarked it. Thanks, Mushy
(For those who are interested, Mushy's blog is Cross+Hairs.)
Some people may be familar with the byte sequence; if you are then you are a geek. :-)
But to see what is going on, let us first consider Microsoft Notepad's detection behavior around encodings:
So, armed with this knowledge, lt's try the following:
What you will find is that the byte sequence 0xE2 0x80 0x99, which in code page 1252 (and the original saved file) looks like:
’
has been interpretted by the new instance of Notepad as:
’
in UTF-8 -- because sequence 0xE2 0x80 0x99 is what U+2019 (RIGHT SINGLE QUOTATION MARK) looks like in the underlying UTF-8 datastream.
If a web page is showing such sequences, this is usually caused by incorrect charset meta tag info on the page, incorrect header info from the server, incorrect code page detection on the client, or some combination of those issues....
If the problem occurs with other code pages, the exact representation will be different:
Slight differences in some of them, but it helps point out why strange garbage character sequences are often just not properly detecting UTF-8....
This post brought to you by ’ (U+2019, a.k.a. RIGHT SINGLE QUOTATION MARK)
Over on the BCLTeam's WebLog, Ryan Byington wrote a mostly excellent post entitled SerialPort Encoding.
At the end of the post he went a bit too far when he mentioned that
The only encoding that converts all characters with a value 0-255 to a single byte with the corresponding value and vice versa when converting bytes to characters is the “Western European (ISO)” encoding.
As regular readers here may know, the vast majority of the code pages supported in Windows and the .NET Framework are single byte character sets (SBCS). This includes 1250, 1251, 1252, 1253, 1254, 1255, 1256, 1257, 1258, 874, 437, 720, 737, 775, 850, 852, 855, 857, 858, 862, 866, 28591 (the one Ryan was thinking of), 28592, 28593, 28594, 28595, 28596, 28597, 28598, 28599, and 28605.
Plus there are all of those EBCDIC code pages if a device was using them -- they are SBCS as well.
And a few miscellaneous ones, too.
There are a lot of them....
The last code sample is also a little incomplete (it treats Encoding.GetEncoding like a property rather than a method) but we can assume that pseudo code and focus on the big list of code pages, instead. :-)
This post brought to you by "ਊ" (U+0a0a, a.k.a. GURMUKHI LETTER UU)
A while back, regular reader 'Maurits' noted in the Suggestion Box:
Just submitted my first PSS support case (for an unrelated issue.) The email confirmation I received had the following amusing snippet in the headers: Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit
I was a little amused, myself. But in searching around I found an Internet Draft or two which seemed a little bit relevant. It seems like this might even be possible, and may have a valid meaning?
Or maybe not. I suppose there is some meaning -- like UTF-8 that contains only 7-bit stuff (that weird state that causes Notepad to add the UTF-8 BOM that everyone hates so much!).
Call me crazy, but I thought that is what UTF-7 was intended for. :-)
This post brought to you by "॥" (U+0965, a.k.a. DEVANAGARI DOUBLE DANDA)