UNICODE-ASCII-Arabic and conversion problems? Use this algorithm
So, you've used Microsoft Office applications like Excel or Access and SQL Server to store and retrieve Arabic text. There are times though, when your data seems to be corrupt or you get dummy latin characters when you're expecting the correct Arabic text. Providing configuration information regarding how to set SQL Server, Excel or Access to hold Arabic information correctly is not the topic of interest for this blog post. Instead, I'd like to provide you with an algorithm and sample VBA code to help you convert the latin text back to Arabic.
There are many reasons for the Arabic text to appear in Latin. One of the reasons may be your SQL Server Collation settings, or the fact that you have used a VARCHAR (or in this context CHAR, TEXT, etc.) when you're supposed to use NVARCHAR (or NTEXT, NCHAR, etc.). The N prefix in the type makes sure that the information in the column is stored in UNICODE and hence, your Arabic text will surely display exactly as it is everywhere. However, if you are getting something like ÇáßãÈíæÊÑ when you are supposed to get الكمبيوتر, then it's already too late.
Good news for you guys! I have figured out the conversion table between such Latin characters and the corresponding Arabic text in Unicode. The algorithm is simple, there is a difference in the ASCII codes between the Latin text and those in Arabic (based on codepage 1256: Arabic Windows). Unfortunately, this is not a constant value, as the order of characters is not the same either. Hence, in order to convert those Latin letters into UNICODE, all you have to do is scan the text taking one character at a time, then get the UNICODE value for the character and add the deficit corresponding the range value from the table below:
|
ASCII range for Latin Text |
ASCII range for Latin text (in hex) |
Deficit between ASCII and Unicode for Arabic |
|
From |
To |
From |
To |
|
192 |
214 |
C0 |
D6 |
1376 |
|
216 |
219 |
D8 |
DB |
1375 |
|
221 |
223 |
DD |
DF |
1380 |
|
225 |
225 |
E1 |
E1 |
1379 |
|
227 |
230 |
E3 |
E6 |
1378 |
|
236 |
237 |
EC |
ED |
1373 |
Note: This table is a draft one. Although I tried all possible Arabic characters, including special ones, I'm not sure if this is the complete table. However, it is guaranteed that all textual characters are included.
Here's a VBA code you can use within Excel or Access to convert the Latin Text to Arabic. I have used Excel as an example:
Sub convert2ara()
For Each s In Selection
strNew = ""
For i = 1 To Len(s.Text)
j = AscW(Mid(s.Text, i, 1))
Select Case j
Case &HC0 To &HD6: j = j + 1376
Case &HD8 To &HDB: j = j + 1375
Case &HDD To &HDF: j = j + 1380
Case &HE1: j = j + 1379
Case &HE3 To &HE6: j = j + 1378
Case &HEC To &HED: j = j + 1373
End Select
strnew = strnew & ChrW(j)
Next
Cells(s.Row, s.Column + 1) = strnew
Next
End Sub
Here's also a T-SQL Stored Procedure that does the same thing:
CREATE PROCEDURE Convert2Ara
@Latin VarChar(100),
@Arabic NVarChar(100) = N'' OUTPUT
AS
BEGIN
DECLARE @ind int, @Len int, @Src int
SET @Len = Len(@Latin)
SET @Ind = 1
WHILE @ind <= @Len
BEGIN
SET @Src = ASCII(SUBSTRING(@Latin, @ind, 1))
Set @Src = Case
WHEN @Src BETWEEN 192 and 214 THEN @Src + 1376
WHEN @Src BETWEEN 216 and 219 THEN @Src + 1375
WHEN @Src BETWEEN 221 and 223 THEN @Src + 1380
WHEN @Src = 225 THEN @Src + 1379
WHEN @Src BETWEEN 227 and 230 THEN @Src + 1378
WHEN @Src BETWEEN 236 and 237 THEN @Src + 1373
END
SET @Arabic = @Arabic + NCHAR(@Src)
SET @Ind = @Ind + 1
END
END
Try it and tell me if it works for you.
Special thanks to my friend, Mahdi Al-Saffar, whose request for some help inspired me to find a solution and publish this post.