Hello. My name is Wujun Wang and I am a tester on the IE team in Beijing. My area of test focus is BiDi. Wait, what is BiDi? When I searched for “bidi” at http://www.Dictionary.com, it defined bidi as "A thin, often flavored Indian cigarette made of tobacco wrapped in a tendu leaf." So you know what I do. I am testing cigarettes! Well, not really. BiDi is a short name often used for the Unicode Bidirectional Algorithm (http://www.unicode.org/reports/tr9/).
When text is presented in horizontal lines, most scripts display characters from left to right. However, there are several languages (such as Arabic, Divehi, Hebrew and Syriac) where the natural ordering of horizontal text in display is right to left. Ambiguities can arise in determining the ordering of character display when text flows in two directions (hence Bidirectional) is present. For example, Hebrew text containing Latin letters and/or digits flows in two directions. Here is a short example:
<HTML DIR=RTL> <BODY> <P>1+2+ש</P> </BODY></HTML>
This HTML example will show a short string. The DIR attribute specifies which default directionality to apply to those characters. <HTML DIR=RTL> means all elements in this file are defined with a default right-to-left directionality. This seems pretty straight forward until directions get going right-to-left and left-to-right next to each other. For example, let’s look at the string "1+2+ש". "ש” is the HEBREW LETTER SHIN. Since Hebrew’s natural ordering is from right to left, you may guess this string will be displayed from right to left as "ש+2+1". However, this is not correct! Although Hebrew is written from right to left, digits are written from left to right. Another tricky thing about this string is that the “+” is a special character in the BiDi algorithm. Therefore, according to the BiDi algorithm in Unicode, the correct visual rendering of the sample should be "ש+1+2".
You might wonder how we went about coming up with tests to make sure we are going the right direction in bidi land. First, we spent time going through the Unicode Bidi Algorithm document step by step and figured out what it was saying. (Warning: reading the bidi algorithm can make your head hurt.) Next we identified groups of characters and combinations that would be helpful to test each of the rules and combinations. Then we worked out on paper what we thought should be happening with the bidi levels and reordering. Finally we made HTML test cases and checked to make sure the browser gave us that same result that we had figured out on paper.
Improving our support for the Unicode BiDi Algorithm fixes a number of problems our customers have had with IE. Because we live on a small planet, having correct support for all languages is important to us.
Some other articles that might be appealing to you if you are interested in content for right to left languages are:
Authoring HTML for Middle Eastern Content: http://www.microsoft.com/globaldev/handson/dev/Mideast.mspx
Justifying Text using Cascading Style Sheets (CSS) in Internet Explorer 5.5:http://www.microsoft.com/middleeast/msdn/JustifyingText-CSS.aspx
Thanks in advance for your feedback!
- Wujun Wang