Welcome to MSDN Blogs Sign in | Join | Help
UNICODE-ASCII-Arabic and conversion problems? Use this algorithm

So, you've used Microsoft Office applications like Excel or Access and SQL Server to store and retrieve Arabic text. There are times though, when your data seems to be corrupt or you get dummy latin characters when you're expecting the correct Arabic text. Providing configuration information regarding how to set SQL Server, Excel or Access to hold Arabic information correctly is not the topic of interest for this blog post. Instead, I'd like to provide you with an algorithm and sample VBA code to help you convert the latin text back to Arabic.

There are many reasons for the Arabic text to appear in Latin. One of the reasons may be your SQL Server Collation settings, or the fact that you have used a VARCHAR (or in this context CHAR, TEXT, etc.) when you're supposed to use NVARCHAR (or NTEXT, NCHAR, etc.). The N prefix in the type makes sure that the information in the column is stored in UNICODE and hence, your Arabic text will surely display exactly as it is everywhere. However, if you are getting something like ÇáßãÈíæÊÑ when you are supposed to get الكمبيوتر, then it's already too late.

Good news for you guys! I have figured out the conversion table between such Latin characters and the corresponding Arabic text in Unicode. The algorithm is simple, there is a difference in the ASCII codes between the Latin text and those in Arabic (based on codepage 1256: Arabic Windows). Unfortunately, this is not a constant value, as the order of characters is not the same either. Hence, in order to convert those Latin letters into UNICODE, all you have to do is scan the text taking one character at a time, then get the UNICODE value for the character and add the deficit corresponding the range value from the table below:

ASCII range for Latin Text

ASCII range for Latin text (in hex)

Deficit between ASCII and Unicode for Arabic

From

To

From

To

192

214

C0

D6

1376

216

219

D8

DB

1375

221

223

DD

DF

1380

225

225

E1

E1

1379

227

230

E3

E6

1378

236

237

EC

ED

1373

Note: This table is a draft one. Although I tried all possible Arabic characters, including special ones, I'm not sure if this is the complete table. However, it is guaranteed that all textual characters are included.

Here's a VBA code you can use within Excel or Access to convert the Latin Text to Arabic. I have used Excel as an example:

Sub convert2ara()
    For Each s In Selection
        strNew = ""
        For i = 1 To Len(s.Text)
            j = AscW(Mid(s.Text, i, 1))
            Select Case j
                Case &HC0 To &HD6: j = j + 1376
                Case &HD8 To &HDB: j = j + 1375
                Case &HDD To &HDF: j = j + 1380
                Case &HE1: j = j + 1379
                Case &HE3 To &HE6: j = j + 1378
                Case &HEC To &HED: j = j + 1373
            End Select
            strnew = strnew & ChrW(j)
        Next
        Cells(s.Row, s.Column + 1) = strnew
    Next
End Sub

Here's also a T-SQL Stored Procedure that does the same thing:

CREATE PROCEDURE Convert2Ara
 @Latin VarChar(100),
 @Arabic NVarChar(100) = N'' OUTPUT
AS
BEGIN
 DECLARE @ind int, @Len int, @Src int
 SET @Len = Len(@Latin)
 SET @Ind = 1
 WHILE @ind <= @Len
  BEGIN
   SET @Src = ASCII(SUBSTRING(@Latin, @ind, 1))
   Set @Src = Case
    WHEN @Src BETWEEN 192 and 214 THEN @Src + 1376
    WHEN @Src BETWEEN 216 and 219 THEN @Src + 1375
    WHEN @Src BETWEEN 221 and 223 THEN @Src + 1380
    WHEN @Src = 225 THEN @Src + 1379
    WHEN @Src BETWEEN 227 and 230 THEN @Src + 1378
    WHEN @Src BETWEEN 236 and 237 THEN @Src + 1373
   END
   SET @Arabic = @Arabic + NCHAR(@Src)
   SET @Ind = @Ind + 1
  END
END

Try it and tell me if it works for you.

Special thanks to my friend, Mahdi Al-Saffar, whose request for some help inspired me to find a solution and publish this post.

Posted: Monday, December 31, 2007 8:20 AM by hovsep

Comments

No Comments

Anonymous comments are disabled
Page view tracker