Internationalizing Applications

Filed under Languages, VB Feng Shui

image A recently posted article on CodeProject had this to say about Visual Studio’s support for multi-lingual applications:

Conclusion: Visual Studio .NET does not offer any multilanguage support which is worth thinking about it.
You will only waste your time with Microsoft’s approach!

Check the whole article/project out here.

I can’t verify the author’s credentials that would justify such a claim, but from the sound of it, he did at least start down the MS sanctioned path, and I have to agree with him.

Way back in the dark ages, pre-internet, around 1990 or so, I was managing development of a CRM system (customer relationship management software).

We’d picked up some resellers oversees and needed to get the product internationalized. This was really at the very beginning of Windows even (Win 3.1 and WFW, if you remember those!). Our internationalization efforts had to apply equally well to our DOS app and our Windows version.

Several of the developers and myself went to a conference, the precursor to VBits, (I don’t remember exactly what it was called back then) and I got a chance to talk with one of the MS internationalization engineers directly.

I’d played with the whole “separate resources for each language” technique and found it workable, but so labor intensive, that I couldn’t imagine anyone but the largest shops actually doing it that way.

The MS guy verified that suspicion. He said (and I’m paraphrasing), “The core team finishes up the project, and ships it, and then the whole project base is ‘thrown over the wall’ and each internationalization team then takes over and internationalized the project into their respective languages, re-tests, etc.”

Ouch.

Now, I’m sure times have changed at MS, but if the comment from Elmue in the article on CodeProject is any indication, they haven’t changed that much.

The internationalization functions I helped build way back then was based on 3 simple concepts:

  1. The original english strings had to remain in the code
  2. Those strings has to actually be used by the code
  3. Those strings had to be easily searchable (say, by a GREP or similar utility) for extraction and translation

Why those first two points?

Because if the english strings remain in the code, the code remained relatively easy to debug and maintain for the programmers.

Further, if the strings are actually used by the code, that meant that during dev and alpha/beta testing, we wouldn’t have any disconnect issues with resources not matching what was needed in the code itself. This is akin to the age old concept of eating your own dog food. The idea of socking strings away in resources and just having a comment in the code as to what the string contained just scared the hell out of me.

Also, it also meant that if the translatable resources are lost for whatever reason, the program would still be able to run based on the “compiled in” English strings. Not ideal, but better than simply throwing errors or displaying blanks.

We accomplished all these goals by embedding all translatable strings (including those in dialogs, etc), into a function call. Something like:

MyString$ = TX$(“This is the english version”, stringcode, groupcode)

Where stringcode and groupcode were optional arguments that indicated, basically, the resource ID of the string and an arbitrary group ID of the string.

Originally, when you were writing or working on code, you’d never even bother entering the stringcode or groupcode args, so your call would look like:

MyString$ = TX$(“This is the english version”)

But, because it was trivially easy to scan for TX$(), when our scanner was run on the code, it could

  1. Extract the strings
  2. Give then ID’s
  3. Rewrite the source code with the appropriate string and group codes as necessary.
  4. Generate a “translator’s file” that contained the string, ID’s and potentially developer comments that would indicate the context of the string and intention (for use by translators to assist with the translation).

Nowadays, with OO, extension methods, reflection, and all the other .NET goodies, seems like this whole process could be vastly more efficient than even what we did back then.

But in the end, we translated the product into 8 or more languages in just a few months with this technique, using no additional developers, and a few native speaker volunteer translators. And it didn’t require any code rewrites and was just as efficient to debug as if you’d left the strings in the code and did nothing about translation at all.

Now, granted, there’s a lot more to internationalizing an app that translating text. You have to worry about icons, input method (when applicable), date, time, and number formats, and even subtle things like color choices, but it was a huge timesaver for an otherwise arduous task.

Closing comment: Why TX$() you might ask<g>? Basically, it was because we didn’t want a huge function name taking up tons of space in the code when it would be used as often as it was. That’s all. As I recall, it is about the only two letter function I’ve ever authored in code. I was never a big fan of the BASICA 2 letter name restriction!

Any translation war stories out there? How have you translated applications?

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*