Earlier today in the post Just when you think you know a function... I talked about the secret way to use two U+200f (RIGHT-TO-LEFT MARK) characters in the MessageBox function to put MB_RTLREADING flag behavior in the hands of localizers, where it may often belong.

While I was talking to people about that post, I got a question about what U+200f was for when it was being used correctly, and what made me so sure that it was not dangerous to put two in a row that way.

I figured I should answer that question (since several native speakers of bidrectional languages helped give me some information too!).

The easiest way to explain it is to first look at how I talked about reading order in the post Sticky Keys vs. Reading Order. Basically, this 'Reading Order' setting allows you set the context for the text before you even type it. It is a non-destructive (in the sense that it does not alter the text in a harmful way) and easily changeable.

Then, you start typing. And now we will look at Unicode Standard Annex #9 - The Bidirectional Algorithm. It talks about how characters all have a Bidi class that can say what directionality it has (if any) and how strong that directionality is.

Now most letters have what is known as a strong directionality, but the strength is very local and has very little effect on anything but the characters right around it. And this is where U+200e and U+200f come in -- they are just as strong (but no stronger) than one of those letters might be (Left- to-Right and Right-to-Left, respectively). As UAX #9 says:

2.4 Implicit Directional Marks

These characters are very light-weight codes. They act exactly like right-to-left or left-to-right characters, except that they do not display or have any other semantic effect. Their use is generally more convenient than the explicit embeddings or overrides since their scope is much more local.

RLM

Right-to-Left Mark   Right-to-left zero-width character

LRM

Left-to-Right Mark   Left-to-right zero-width character

There is no special mention of the implicit directional marks in the following algorithm. That is because their effect on bidirectional ordering is exactly the same as a corresponding strong directional character; the only difference is that they do not appear in the display.

In fact the only difference between them and the letters is that LRM and RLM are not visible -- so two in a row has no more effect than two letters in a row -- which is to say none of any significance.

And as More on cursor support: the rest of the answer certainly showed, even a misplaced LRM, RLM, or random letter with strong directionality will not convince any character with strong directionality to change its stripes. The only characters that have anything to fear are the weaker characters, though as the UAX #9 indicates those do exist. So it makes sense to put them in when you want to give an extra hint if you are not as sure of the context.

I'll talk more about that how functions use (and perhaps misuse?) this functionality soon....

 

This post brought to you by U+200e, LEFT-TO-RIGHT MARKER