Larry Osterman's WebLog

Confessions of an Old Fogey
Blog - Title

What is localization anyway?

What is localization anyway?

  • Comments 32
I may be stomping on Michael Kaplan's toes with this one, but...

I was reading the February 2005 issue of Dr. Dobbs Journal this morning and I ran into the article "Automating Localization" by Hew Wolff (you may have to subscribe to get access to the article).

When I was reading the article, I was struck by the following comment:

 I didn't think we could, because the localization process is prety straightforward. By "localization", I mean the same thing as "globalization" (oddly) or "internationalization." You go through the files looking for English text strings, and pull them into a big "language table," assigning each one a unique key

The first thing I thought was what an utterly wrong statement.  The author of the article is conflating five different concepts and calling them the same thing.  The five concepts are: localizability, translation, localization, internationalization, and globalization.

What Hew's talking about is "localizability" - the process of making the product localizable.

Given that caveat, he's totally right in his definition of localizability - localizability is the process of extracting all the language-dependant strings in your binary and putting them in a separate location that can be later modified by a translator.

But he totally missed the boat on the rest of the concepts.

The first three (localizability, translation, and localization) are about resources:

  • Localizability is about enabling translation and localization.  It's about ensuring that a translator has the ability to modify your application to work in a new country without recompiling your binary.
  • Translation is about converting the words in his big "language table" from one language to another.  Researchers love this one because they think that they can automate this process (see Google's language tools as an example of this).
  • Localization is the next step past translation.  As Yoshihiko Sakurai mentioned to Michael in a related discussion this morning "[localization] is a step past translation, taking the certain communication code associated with a certain culture.  There are so many aspects you have to think about such as their moral values, working styles, social structures, etc... in order to get desired (or non-desired) outputs.  This is one of the big reasons that automated translation tools leave so much to be desired - humans know about the cultural issues involved in a language, computers don't.

Internationalization is about code.  It's about ensuring that the code in your application can handle strings in a language sensitive manner.  Michael's blog is FULL of examples of internationalization.  Michael's article about Tamil numbers, or deeptanshuv's article about the four versions of "I" in Turkish are great examples of this.  Another example is respecting the date and time format of the user - even though users in the US and the UK both speak English (I know that the Brits reading this take issue with the concept of Americans speaking English, but bear with me here), they use different date formats.  Today is 26/01/2005 in Great Britain, but it's 01/26/2005 here in the US.  If your application displays dates, it should automatically adjust them.

Globalization is about politics.  It's about ensuring that your application doesn't step on the policies of a country - So you don't ever highlight national borders in your graphics, because you might upset your customers living on one side or another of a disputed border. I do want to be clear that this isn't the traditional use of globalization, maybe a better word would be geopoliticization, but that's too many letters to type, even for me, and since globalization was almost always used as a synonym for internationalization, I figured it wouldn't mind being coopted in this manner :)

Having said that, his article is an interesting discussion about localization and the process of localization.  I think that the process he went through was a fascinating one, with some merit.  But that one phrase REALLY stuck in my craw.

Edit: Fixed incorrect reference to UK dates - I should have checked first :)  Oh, and it's 2005, not 2004.

Edit2: Added Sakurai-san's name.

Edit3: Added comment about the term "globalization"

  • "Today is 2004/01/26 in Great Britain"

    Small point: it's 26/01/2004. I've only ever seen the date the other way in relation to ISO. :-)
  • Wierd - for some reason, I thought that the date order in the UK was y/m/d, not m/d/y.

    But the point still holds - the date order's different in the UK than it is in the US.
  • "But the point still holds - the date order's different in the UK than it is in the US."

    Definately, I was just nitpicking the format (d/m/y). ;-)

    In the past I've had to find a bug in a timesheet/expense LOB application where someone had changed the default database language from British to US English - and thus the parsing. As the interface code was using the users regional settings to display the date, strange things happened for a few days!

    Localisation is hard. I doff my hat to all localisation teams everywhere!
  • I've never understood why the US seems to want to make things different just for the sake of difference. There's no logic whatsoever in m/d/y - since when is any measurement written "mid-significance/least-significance/greatest-significance"? You don't see pricetags with "$26 and $2000, and 49c" or distances like "motel .5 and 7 miles ahead". Either use least-to-most (d/m/y) or most-to-least (y/m/d), don't just muddle it up because you want to be contrary toward every other country.
  • "contrary toward every other country"? Um. Just about every country does this slightly differently. Bring up the regional options control panel applet and start scrolling down. There are countries out there that use HUGELY different orders for just about everything - I picked the easiest one (dates), there are others.
  • I think the US format is based on how we say dates.
    January 26th 2005 -> 1/26/2005
    But some people will say 26th of January 2005.

    It always seemed to make more sense to me to use d/m/y, from smallest to largest.
    To remove doubt, whenever I can, I use the month name and not a number.



  • Yesterday was the 26/1/05 in Australia, and 235 years and three days ago Captain Phillip with the first fleet landed just south of me (and did't like it at botany bay so went to Sydney Harbour just north of me) and on the 26/1/1788 claimed Australia for Great Britain. It's called Australia Day here.

    Why oh why don't MS programming languages recognise "colour" as a legal word. It would be so simple to make english language versions of their languages and they could be in the same version. It generates syntax error after syntax error. If I spelt colour as color at school I would still be repeating year 2 40 years later. In year 4 we would be beaten with a ruler for spelling color (Miss Tutt if anyone knows her address?). There's other things like this. And it's so simple to fix - accept colour or color.
  • When I was at primary school some 20 years ago (gawd, I'm showing my age), I was told "spell it one way, and be consistent". So I spell in the US fashion (color, jail, program, -ize, etc).

    It really irks me that anti-US types don't realize that the formalization of "-ise" endings and so on only really happened after the 1930's. Australia (and Britain) used both until around the second World War, and then suddenly stopped. For no reason.

    I find it particularly difficult to understand many English accents on TV, whereas I don't have any trouble even with the deepest South US accent. Australian English, particularly in common spoken usage is far, far closer to US English than "British English".

    I don't know why the nitpickers insist on archaic constructions such as "gaol" when they no longer use "connexion" or "to-day" (which Tolkien used as recently as the 1960's). Language is a living breathing entity.

    As long as we all understand each other, it's good enough for me.

    On topic, I'm currently refactoring a PHP codebase to support right-to-left languages and date forms. PHP is a nightmare to get localization / internationalization done at all. Things such as daylight savings alone cause hundreds of lines of code. O for culture and locale properties written by people who understand these things! .NET makes it so much easier.

    Andrew
  • Also, localizability is about more than just putting everything into a string table. Are your dialog boxes and other controls big enough to handle the French translation? French words tend to be longer than English words. Do you have bitmaps that contain text? Do you have bitmaps that contain pictures that make no sense in other cultures? (Stop signs, red lights, those yellow triangle caution signs -- not all of those are universal.)

    Regular readers of my blog know that I'm a big fan of pseudolocalization. If I had my druthers, we'd all be dogfooding pseudolocalized builds every day.
  • I can get used to spelling "colour" as "color" (along with some of the other funny mis-spellings in the U.S. dictionary), but I just can't get my hear wrapped around that bizarre m/d/y date format...

    Anyway, that's all completely off-topic, and I couldn't agree more with respect to incorrect use of the different terms. They all have very different meanings, but there're still too many people who use them interchangeably.
  • > localizability is the process of extracting
    > all the language-dependant strings in your
    > binary [...]

    That's only part of it. You start by going through files looking for Japanese strings and pulling them out, but that isn't enough. After translating all the strings to (usually) English, the default locale and default fonts display the translated results perfectly, but if you deliver the result to a customer in (fairly often) an English-speaking country then the program fails because the customer's system sees an application's request for a Japanese locale or fonts and the customer's system doesn't have those installed. In addition to translating the words, you have to set the locales and fonts for display of the words you've translated.

    And while doing that, you have to avoid setting the locales and fonts for display of words that you haven't translated. If the program gets strings from the OS, or filenames from disk, etc., they'd better be displayed the way the user's system ordinarily displays them. Even the "PrivBar" from one of your colleagues doesn't do that, (and of course tons of applications and drivers from other vendors), and it's not exactly possible to guess what the thing was supposed to display.

    1/26/2005 12:39 PM Ian

    > I've only ever seen the date the other way
    > in relation to ISO. :-)

    And in the world's largest country by population, and in other countries near that one.
  • well, never seen these sorts of definitions before. coming from a web app (coldfusion and java) environment, you seem to be splitting/mixing some pretty commonly accepted definitions. i would have thought that i18n was the process of making an app locale neutral (text and date formatting as you mentioned, but also stuff like number formatting, calendars, GUI layouts, etc.), l10n was the process of translations, etc. for a specific locale ("skinning" it if you will) and g11n was the process of moving your i18n app across many locales via repeated l10n. i've never seen g11n only associated w/politics except maybe in the field of economics/trade. in fact, i recall seeing g11n & i18n being used almost as synonyms on the dr. international website.

    i do agree though that machine translation is for the birds. try round-tripping something like "this side towards enemy" thru translation s/w to see the potentially dangerous issues.

  • is localization anyway?" href="http://blogs.msdn.com/larryosterman/archive/2005/01/26/361015.aspx">"What is localization, anyway?" Larry Osterman of Microsoft has a good short piece on five different things that are each often called "localization": (a) localizability: designing an app to easily accept alternate text; (b) translation: actually providing...
  • Gah, I hate the i18n, l10n, and g11n terminology. Why on earth are people so unwilling to type the silly letters?

    Anyway, Paul, you may be right. I ran the text past the GIFT team at Microsoft before posting and they didn't have problems with my definitions. And there's a really critical distinction between globalization and internationalization. There needs to be a step somewhere in the process of supporting multiple cultures that involves meta-information - it's not the information in the text strings, it's not the order of dates&times. These issues are almost always involved in political issues, and not technical - national boundaries, time zone boundaries, country/province names, etc. I'm using globalization to refer to that specific aspect of the "world-ready" problem - there needs to be a word to reflect that part of the process, which goes beyond the mechanics of supporting multiple cultures.
  • It's very funny how we all cannot get past what we already know. Just as the UK and Aussies cannot get past the m/d/y format the US uses, I always mess up the customs form when traveling to the UK or elsewhere, always writing my DOB as m/d/y. Creatures of habit I guess. Also, although it always seems odd to read, when I see the term "colour" or "favourite" I think it's so cool looking.

    Larry, cool post. Thanks for clarifying. As a developer, these are the things that I know I have to deal with, but truly hate to think about because it makes my brain hurt.

    JW
Page 1 of 3 (32 items) 123