Sorting it all Out Michael Kaplan's random stuff of dubious value Be sure to read the disclaimer here first!
Suzanne has been riffing on me in relation to Vietnamese and then she shifted over to talk about Google and other languages, so I thought I would riff off of her a bit. :-)
By the way Suzanne -- I did not find your terminology to be inaccurate; it was just different. I was explaining why I was confused!
I am going to take her string bãi biển from that first article and run it through various search engines in various forms, trying to look for some patterns. I am not using Suzanne's test (using Google Image) looking for pictures of beaches) since not all of the search engines have a comparable service. it is pretty easy to tell from the text excerpts if the search has found Vietnamese sights or not without too much trouble....
First, the engines:
Second, the strings to test:
Now if you look at these five strings being tested #1, #2, and #4 are all canonically equivalent and thus should give identical results in search engines that conform to Unicode and its principles of canonical equivalence.
I will put the strings in double quotes for all search engines.
And here are the comparison results:
#1
bãi biển
#2
bãi biển
#3
bai biên
#4
bãi biển
#5
bai bien
Conclusions:
This post brought to you by "ổ" (U+1ed5, a.k.a. LATIN SMALL LETTER O WITH CIRCUMFLEX AND HOOK ABOVE)
Thinking about the issues involved with à ≠ a (unless à = a) made me think back to other posts where
You may have read in Arial Unicode MS effectively [bites|sucks|blows] about how Microsoft MVP Omi Azad