Sorting it all Out Michael Kaplan's random stuff of dubious value Be sure to read the disclaimer here first!
Previous parts in this series:
Now I'll admit it has been a long time coming, but it is now here....
As mentioned by Lisa Shieh in IDN now supported by Google AdWords:
Now those last two links are pretty important ones.
The first one says:
How Much Text Can I have In My Ads?
Ads can show, including spaces, 25 characters for the title, 70 characters for the ad text, and 35 characters for a display URL (or approximately 17 for languages that use non-ASCII(multi-byte) characters).
On Google, text ads are displayed on four lines: a title, two lines of ad text (each with 35 characters), and a URL line. However, the format may differ on Google partner sites.
Some Eastern European and Asian countries also support longer text ads containing up to 30 characters in the title and 76 characters in the ad text.
I. Ad Text
If your ad text contains any wide characters, such as certain capital letters and punctuation marks, fewer characters may fit on the line. The system will notify you if you exceed a character limit. Also, some of Google's syndication partners may not display non-standard characters if you include them in your ad.
If you create text ads using non-Latin characters, please be aware that the character limit may vary. Ads in languages with non-latin (double-byte) characters, such as Chinese, Japanese, and Korean, can contain the following number of characters, including spaces: 12 characters in the title, 17 characters in each line of ad text, and 17 characters in the display URL. Countries that support longer text ads have higher double-byte character limits.
II. Display URL
Google can only display up to 35 characters of your display URL, due to limited space. If your display URL is longer than 35 characters, it will appear shortened when your ad is displayed. WAP mobile ads can show up to 20 characters in a display URL, so any longer domain will be truncated to fit within those limits. For non-ASCII (multi-byte) languages such as Japanese or Korean, the width of these characters can vary, so the display URL might be shortened if it’s longer than 17 characters.
If your display URL is longer than 35 characters (or 20, for WAP mobile ads), you may consider using a shortened version of your URL, such as your homepage. Please be sure that your display URL accurately represents your destination URL, the page within your site to which users are taken via your ad. The display URL should have the same domain (such as example.com) as your landing page.
Also, please note that your display URL must be an actual web address, appearing in the form of a valid URL. It must include the extension (such as .com, .net, or .org,). It does not need to include the prefix (such as http:// or www).
Since your ad space is limited, try to create compelling and targeted ad text that is highly relevant to the products or services you're promoting. You can optimize your ad text to create the most effective ads.
And the second one? It says:
You can use non-ASCII (multi-byte) characters (such as those used in Japanese, Korean, and Chinese) in your URLs, but note that some of these characters need nearly twice the display space as single-byte characters. So, the exact number of characters you can use in a destination URL might be less than the character limit shown in the preview counter. To mitigate against URL spoofing, non-ASCII characters will be displayed only when the user’s interface language matches the characters in the visible URL. In all other cases, the URL will render as ASCII punycode. For example, if your Google interface language isn't in a language that uses Cyrillic characters (e.g. Russian), these characters won't render (e.g.http://пример.испытание will display as http://xn--e1a...).
Aha, very informative! And there is now a little insight into why it took so long for us to see this....
Google, like Microsoft, is a big place, and it takes time to get every team interested in something new, no matter how important you might think it is....
Given how long URLs were limited to just LDH (letters A-Z, digits 0-9, hyphen), it's easy to see how any given technology might have such limitations in its own DNA, and how un-eager they would be to make changes that could lead to service degradation or customer confusion.
Overall, I think it's good that AdWords has taken these steps.
Though I will feel better when the very natural annoyance with provincial assumptions like "each person knows only one language" also penetrate the AdWords folks and they take the next step -- like finding a more intelligible way to show IDNs that are in a different script.
The current solution is a great rudimentary first step, but it can't be the last one. Showing Punycode so readily is never the best answer, so hopefully that is a temporary plan (this one, moderated by UI language, has some obvious flaws in it)....
Showing punycode prevents some very nasty attacks, like say Greek glyphs matching English glyphs resulting on a URL appearing to go where it does not go.
It is not a good solution since text that cannot be read cannot be judged. The final solution will need to be smarter than this.
Previous parts in this series:
part 1: If you're not Unicode, you're just wrong!
part 2: Try
Previous blogs in this series:
part 11: There's no place like ::1, not even 127.0.0.1!
part 12: Emoji + IDN == U+1F4A9 (PILE OF POO)
part 11: There
part 13: Desktop and Managed and Metro; oh my!
part 12: Emoji
part 14: It turns out there's no "I" in IE, either
part 13: Desktop
part 15: Still no 'I' in EAI.... but we could use an US sometime