More on cursor movement

Sorting it all Out
Michael Kaplan's random stuff of dubious value
Be sure to read the disclaimer here first!

More on cursor movement

  • Comments 28

James Brown asked in the microsoft.public.win32.programmer.international and microsoft.public.win32.programmer.gdi newsgroups:

Suppose I have the following two Arabic codepoints:

U+0648 "arabic letter waw"
U+0650 "arabic letter kasra"

These render as a single glyph with Uniscribe

When pasted into Notepad, the cursor (and selection highlight) can traverse into the middle of the cluster.

When pasted into Wordpad, the cursor _cannot_ move into the middle of these characters.

Which is the correct (or desirable) behaviour?

Maybe someone can even explain, what significance does it have for the cursor to move into the *middle* of a grapheme cluster - how does the user know which character he/she has selected??

thanks,
James

Excellent question, James!

The desirable behavior is what you are describing as the WordPad behavior, to a point. Although if I paste a string of 12 of these pairs of characters (وِوِوِوِوِوِوِوِوِوِوِوِ) into WordPad, it will treat them as a single unit, which is not what I would call desirable. :-(

The Notepad behavior you describe is also not preferred; in all cases other than the BACKSPACE character (for the reasons I describe here), you would want to have movement jump the text element boundaries, which would be those two characters you mentioned....

The bad news is that I can reproduce the behavior you describe in Windows Server 2003 SP1:

The worse news is that I can reproduce the WordPad behavior I describe above in Windows Server 2003 SP1 and XP SP2.

But the good news is that in XP SP2, Notepad behaves correctly and the cursor does not appear in the middle of the character....

In IE6, I currently get the character splitting behavior. You can test out your own browser and version with the textbox below -- put the cursor in and move back and forth to see what happens:

 

At least products are getting better though (the Vista version of Uniscribe has all of the XP SP2 updates and more!).

 

This post brought to you by "و" (U+0648, a.k.a. ARABIC LETTER WAW)

Comment on the blather
Leave a Comment
  • Please add 7 and 5 and type the answer here:
  • Post
Blog - Comment List
  • A further observation. With Notepad, I can use the mouse to move into the center of the glyph (i.e. between the two codepoints). But when I use the keyboard the caret always moves over the glyph-clusters.

    Strange. The problem is I don't understand the Arabic language so I have no idea if positioning the caret (with mouse) in the middle of the cluster is at all meaningful..

    James
  • Well, meaningful is a relative term, of course. :-)

    Although I may not expect if you click right in the middle of a cluster that it would respect what I did *that* much (since after all I cannot click in the middle of a U and have that respected.

    Or more to the point, since I cannot click in the middle of "Ů" (U+0055 U+030a), I would not expect it to work for other text elements. The cursor movement is a lot more intuitive than the direct insertion via mouse click....

  • For what it is worth, if any testers from typography are around this may be worth putting a bug in, unless I am misunderstanding how it ought to work.... :-)
  • This is interesting. Using IE 6.0, I can only click between and select the individual waws. However, using the arrow keys it moves one codepoint for each keypress. This means that I if I press Left once it moves to the middle of the letter waw, then Left again to put the caret between letters. If I hold down shift, it will only select whole graphemes, but I have to press Left twice to get the caret to move.
  • Here is IBM's view on things:

    http://www-306.ibm.com/software/globalization/topics/bidiui/arabic.jsp

    I think what they are saying is that the Notepad behaviour I described is desirable?!!
  • I don't think so James, although their choice of example in mentioning Notepad is probably bad.

    A ligature is not the same thing as a character. Look at the squiggle "fi" (fi) in Latin script. It's a typographic convention rather than a basic unit of the language. There's no way for me to type it on this keyboard, but my typesetting software uses it automatically in printed documents when the alternative would be an ugly "near miss" of two alphabetic characters.

    So, the example in this post by Michael is correct, the U+0650 KASRA isn't a separate character, so your cursor should ignore it and you shouldn't be able to select it. But the U+0627 ALEF and U+0644 LAM in the IBM example are separate characters, so you can move the cursor between them even though they're drawn as a single squiggle.

    They're both correct, and there's software out there which gets this right, GTK+ provides default entry and text input controls which do this correctly for example. Presumably if Notepad functions properly in XP SP2 then this means the Windows Common Controls now also get it right. Is that right Michael?
  • "Presumably if Notepad functions properly in XP SP2 then this means the Windows Common Controls now also get it right. Is that right Michael?"

    Possibly, Nick -- although I would be afraid to try and predict what the Shell common controls will do on a given day. :-)
  • I've been doing some more experimentation with Uniscribe and I've found that the two Arabic characters I originally mentioned are in fact rendered as *two* glyphs. Uniscribe does however classify them as a single cluster but they are drawn as two - it just looks as if it is a single character. Perhaps this is what is causing Notepad (and the ScriptString API) to allow the cursor to be placed in the middle?
  • Hi James,

    If connection points between a latin letter and a diacritic are poorly defined then they will sometimes not appear to be connected at all -- yet we would never think of that as two characters. The behavior in Notepad must have some kind of explicit history....
  • No character-splitting in Firefox 1.0.7 on Ubuntu.
  • Neither type of splitting?
  • What's the story about XP x64? I would imagine that it should be treated like Win2003 SP1 in this matter, but it would be far more interesting if it actually behaves like normal XP.
  • That is an excellent question, CN. I believe it would act more like Server 2003 SP1 since it was built out of that code tree, rather than picking up thre features of XP SP2....
  • Today I was able to briefly experiment with WordPad on a PC running XP. The behaviour was most remarkable. The sense of the cursor keys seemed to be reversed in the RTL text?
  • Indeed, Nick! And believe it or not, that is what people using RTL languages expect on computers.
Page 1 of 2 (28 items) 12