Making your VB code ready to go global (Matt Gertz)

Making your VB code ready to go global (Matt Gertz)

  • Comments 4

Greetings, all!

I’m Matt Gertz, the Dev Manager for Visual Basic team.  I’ve been on the team for a bit over 12 years, via the Blackbird/Visual InterDev side of the product, and in that time have been a dev on various features (mostly IDE-related), dev lead of deployment, dev lead of compiler, and box lead before my current responsibilities as DM.  I’ve been somewhat remiss in not having posted to this blog before, being content to hang out on the VB IDE Forum, but it’s my intent to correct that situation and to throw out a few thoughts here and there.  Many of our clever folk here have been writing about some of the exciting new features upcoming for VB9 (such as Scott’s excellent series on extension methods), so rather than stealing a measure from them, I’m going to instead focus on things that you can do with Visual Basic right now.

Going global

Many of the questions I’ve been asked by customers over the years revolve around the concept of globalization – making your code work in different languages – so I’m going to write about this today.  I’m actually fairly passionate this subject – if there was such a thing as a Unicode groupie, that might well describe me.  A lot of this enthusiasm comes from years of development on code that needs to go international.  In Ye Olde Days, this was frankly very hard to do, but fortunately, using Visual Basic & .NET, making your code ready for a global audience is far simpler than it used to be.

Consider the following (admittedly contrived) code:

    Sub test(ByVal value As Double)

        Dim s As String = value.ToString

        Dim positionOfDot = InStr(s, ".")

        If positionOfDot > 0 Then

            Console.WriteLine("The number " & s.Substring(positionOfDot) _

 & " is the decimal portion.")

        Else

            Console.WriteLine("There is no decimal portion.")

        End If

    End Sub

 

This code will work fine in English (although there are certainly better ways to get the decimal portion of a number -- please don't do it this way for real J).  However, there are several problems with the code if you want to take it global, problems which I’m sure many folks will have spotted:

(1)    Many cultures don’t use “.” as a decimal separator – many cultures use (for example) a comma.

(2)    The output strings are hard-coded to English – you’d have to change the code for each language.

(3)    The first output string assumes an English grammar ordering – some languages might need the argument at the end or the beginning, not in the middle.

So , some of you might be thinking, “What’s so wrong about 2 and 3?  I’m going to have to translate the strings anyway if I go global, right?”  Sure, but you don’t want to have to touch your code files unless necessary, and you sure don’t want to have different code paths for different languages – it will make your supportability nightmarish for any given version of your application.

Fortunately, there are solutions:

(1)    When dealing with cultural information, rely on the System.Globalization namespace.  For example, instead of hard-coding the “.”, you can instead call the following code:

Imports System.Globalization

(…)

      Dim c As String = _

  CultureInfo.CurrentUICulture.NumberFormat.CurrencyDecimalSeparator

 

to get the decimal separator for the current culture – in the contrived example above, you’d then use the result as your search string instead of “.”.  There are many, many pieces of information you can pull out of the CurrentUICulture, and you can even determine information about other cultures as well from System.Globalization.  I’ve attached a ZIP file to this post which gives examples of the cool things you can do with that namespace.  (The example also leverages the Encoding class in the System.Text namespace to show how you can translate your strings from UTF-8 to Unicode to DBCS, something you might need to do in global code.)

(2)    Move your strings to your project’s resources.  This is pretty darn simple in VS2005.  Simply right-click the project and choose “Properties,” and then navigate to the “Resources” page.  Add your strings to the resource table (each string will need a name as well as its text value), and then in code, simply type “My.Resources.MyResourceName” to use the string.  Later, when you need to change the string to a different language, you simply update your resources rather than your code.  The resource manager also can hold icons, images, and even whole text files that you might be using in your UI.  (If you’re not using VS2005, then check out my attached example to see how to access the resources without “My” -- I wrote that code before the “My” functionality was available, and simply threw my resources into a handy resx file.)

(3)    Use replacement characters in strings rather than concatenating pieces of strings together when writing full grammatical sentences.  Trust me; this will make your life a lot easier when localizing your code.   In the resources, make your string resource (let’s name it “MyOutput”) something like:

 

“The number {0} is the decimal portion.”

                And then the call becomes:

            Console.WriteLine(String.Format(My.Resources.MyOutput, _

s.Substring(positionOfDot))

 

Now, your localization team won’t need to touch your code to change the grammar of the sentence – they can just move around the “{0}” in the resource string when translating.

The resources on your forms can also be automatically managed from the resource manager, making it similarly easy to update the default strings in your controls.  Simply change the “Localizable” property of the form to “True,” and your form’s strings will no longer be hard-coded in InitializeComponent, but will now be stored instead in {form’s name}.resx (you may have to choose “Show All Files” to see that file).  Those resources are kept separate from the code resources (which are stored in Resources.resx), so you would need to translate both when going global.

The downside of everything I’ve just mentioned is that you still need to rebuild your product for each language after translating the resources – the code isn’t changed, but the resources have, so compilation is necessary.  To get around this and to avoid all possibility of code binaries being different from language to language, you can create separate assemblies for your resources and refer to them from your main project, leveraging the assembly’s ability to identify its culture.  I’m not going to go into it since there’s information already about this on the net if you’re interested – here’s one, for example.

One big win in using VB from a global point of view is that everything is Unicode – you don’t have to do anything special to support the wide variety of characters out there.  Even the editor supports Unicode, so you can have a wide variety of (for example) variable names.  However, I should point out that we do not support Unicode characters greater than &H00FFFF, the so-called “surrogate character” combinations, as variable names.  (If you don’t know what a “surrogate character” is… well, that’s a topic for another time.)

Hopefully you will find this information on globalization useful – please feel free to comment or ask questions, and I’ll do my best to follow up.  The book "Developing International Software" from Microsoft Press is also a very good reference on how we deal with globalization issues here.  The fully comprehensive "Unicode Standard 5.0" volume from The Unicode Consortium is also a handy thing to have around for specific questions on Unicode usage (and it also makes a great booster seat for kids when they're sitting at a high table) -- if it's outside your budget, you can read it online at www.unicode.org.

Going forward, my plan for my blog entries is to walk through a fairly complex card game I wrote last year, in order to point out some of the other very useful functionality in Visual Basic 2005 that you might not know as much about.

Until next time,

--Matt--*

Attachment: VB-Globalization-usage.zip
Leave a Comment
  • Please add 7 and 8 and type the answer here:
  • Post
  • Thanks for that - I'm currently moving several of my apps from VB6 to VB2005, so this kind of info is very useful.

    I do have one question, though. In my VB6 apps I'm using an external translation file in plain text. Each line in the file is English followed by a separator followed by the foreign language equivalent, with a file per language. When you run one of my apps, it scans the language folder and displays the available translations in a drop-down menu, and it also remembers the previous language.

    The thing that bothers me with resource files is this ... how can users (especially non-programmer users) modify and use their own translation files for a particular program? I have users who downloaded one of my apps, then created a translation file so their non-english-speaking relative could also use the program. No recompilation is necessary - in fact, they don't have to contact me at all, although many send me the translation files they've put together.

    I'm planning to keep using the exact same system, to be honest. The fact all controls have a 'text' property instead of 'title' or 'text' or 'caption' just makes the runtime translation of every form easier in VB2005. (On loading a form, I call the translation routine with the instance of the form. The routine does a string replace on the text property for each control before the form is displayed.)

    For large companies with departments dedicated to official translations, building the languages into resource files is ideal. For a smaller coder with no translation teams it's easier to rely on the generosity of users. Example: one of my apps has over thirty translation files, and I usually get a new one every week.

  • Hi, Simon,

     So, before I answer your quesiton, I'll just provide the obligatory "here's where we're coming from" statement.  Microsoft uses resource assemblies not just because it's easier to update resrouces that way, but because we have a need to have a consistent look and feel, as well as a legal and ethical obligation to make sure that nothing offensive gets into the product.  So, in that situation (or a situation where you can contract out the language work), all you need to do is make sure that the new resource assembly has the same signature as the other resource assemblies, except for the cultural flag.  The link I gave above is a good start down that trail.

    However, I understand that your case is a bit different -- you don't even want users to recompile resources, you don't want to ship a separate assembly, and so on -- you want it to be brain-dead easy for them. This does involve a certain amount of trust that your product won't look silly if a third party translates it, and if you have that sort of trust, that's great. (It sounds like you have a great customer relationship -- that's super!)  What *I* would do in such a case is pretty much what you're suggesting above -- I would create an XML file which contained the resource names and values, read it into a nice list that you can index, and set the controls as appropriate.  Making it XML allows customers to use various XML editors to make their translations easier.

    I'm sure you're well aware of the downside, but for everyone else I'll mention it:  since your third-party has no access to the actual resources, they can't tell if everything will line up or overrun on the dialogs when they change the text except by trial-and-error.  This can be made easier for them if room if each label is made a little bigger than it needs to be for English (rule of thumb is about 33%).  Icons and images (which might also need to change from culture to culture) might also be problematic unless you ship the files as stand-alone.

    --Matt--*

  • Thanks for that, and yes - I understand why you need signed language files. The press would give you a roasting if you let something slip in unintentionally, whereas they've never heard of me.

    I've had to deal with the UI problems you mention, often by increasing the button or label size just a fraction. Most users are happy to play with contractions until it fits, though. (My conversion routine also handles the tooltips, so they can put a fuller explanation there.)

    I'm looking at XML for data storage in my current project, which is a conversion of my novel-writing software. However, for language files I think it's overkill.  Right now anyone can create a language file for my apps, regardless of their expertise. They just duplicate Template.txt, rename the file, then start changing strings in wordpad. If the file was stuffed with XML tags I can see things going wrong very quickly - assuming they even attempted the translation in the first place.

    I do have a great relationship with my users, yes. Mind you, I give most of my software away so they're already in my debt ;-)  Many are just happy to see that adding their own language is quite easy.

    One gotcha I've found - under Vb2005 I've had to use UTF7 to read in the language files, since many of them are for extended languages.

    I'm enjoying VB2005, by the way. The earlier incarnations of .net were just too different from VB6 to waste my time with, and things like no tags on controls, no pause-and-edit, were deal-killers for me. This one's got it right, and so the massive conversion of all my code has begun.

    (The one I'm least looking forward to? My stock charting program, which is very heavy on the graphics & picturebox methods - and has 150,000 lines of code. I'm leaving that one 'til last. Way last.)

  • Glad to hear you're enjoying VB2005!  I like it quite a lot myself (though I'm admittedly partisan), and do all of my hobby programming in it.  (Ironically, I've been getting used to the new features in VB9, particularly the enhanced intellisense, and I find that I miss them when I switch back to VB2005.)  It really helps to get that sort of feedback.  We deliberately focused on the VB6 customer for VB2005, and it's good to hear whether we succeeded or not.

    When you said UTF-7, did you mean UTF-8?  You could certainly use UTF-7, since the system.text.encoding class supports it, but UTF-8 is just as useful and is certainly more human-readable since there are no interdependencies on adjacent characters in the encoding algorithm.  You could just use Unicode as well and save a bit of time on performance, since VB supports it natively and even Notepad gives you the opportunity to work with Unicode files -- even if you don't have the appropriate languages installed on your machines, everything still works -- the files won't be corrupted or anything, though you may see little boxes where your font doesn't cover the characters.  Your customers will certain have those characters already installed by default (else how would they translate?), and that's the important thing.  (I personally install the languages & supporting fonts just so that I can see something more interesting looking than little boxes when looking at non-Western text -- you can do this from the "Regional and Language Options" applet.)

    Good luck with the 150,000-line app. :-)

    --Matt--*

Page 1 of 1 (4 items)