Last update January 17, 2007

Idea Discussion /
Internationalization



Table of contents of this page
Internationalization vs. Localization   
Compile-time   
Runtime-time   
Proposal for D   
gettext   
Common Locale Date Repository   
More links   

Internationalization vs. Localization    

  • i18n: internationalization
  • l10n: localization
Also, very roughly said, when it comes to multi-lingual messages, internationalization is usually taken care of by programmers, and localization is usually taken care of by translators.
Source: http://www.gnu.org/software/gettext/manual/html_mono/gettext.html#SEC3

This page will deal with both internationalization and localization. Sorry if the terms are mistakenly used for one another. It is a new concept to the editor that they are separate ideas.

There seems to be 2 schools of thought in the area of internationalization/localization:

  1. Compile-time generated using version().
  2. Runtime-time generated using some sort of a plugin architecture or language resource files.
Since D is capable enough for either method, both parties can be happy.

Compile-time    

Version identifiers should have a prefix (such as "locale_" or "loc_"). This should make it clear to most viewers of the code what's happening. Not everyone would intuitively know that ky_KG is a locale feature, but locale_ky_KG is clear. (By the way "ky" is a lang, or language; "KG" is a country. The combination "ky_KG" is a locale).

version (locale_en_GB) {

     ...
 } else version (locale_en_US) {
     ...
 } else version (locale_en) {
     ...
 } 

This is just a start...

Runtime-time    

Proposal for D    

We need a class Locale (or possibly a struct Locale) containing those ISO codes. Java might use strings internally, but there are a whole bunch of reasons why that's not such a good idea - such as the fact that "fr", "fra" and "fre" are all, equivalently, the language code for French, and should all compare as equal; such as case and other punctuation concerns ("en-us" == "en-US" == "en_us" == "en_US", etc.). I'd vote for putting enums inside the class (enum Language and enum Country - the variant field will still need to be a string). I imagine that the gettext implementation will need to use our yet-to-be-invented Locale class, and the unicode lib certainly will (and soon).
Source: NG:digitalmars.D/6502

gettext    

What about just porting GNU gettext to phobos? This way you have a semi-standart way of localizing programs (which a lot of translators know about), and a set of pre-written tools (even nice GUI ones).
Looking at the python implementation it should not be difficult; the Python implementation is only 493 lines (gettext.py) I'll see if I can take enough time to do it in the next weeks.
Source: NG:digitalmars.D/6467

GNU gettext documentation, http://www.gnu.org/software/gettext/manual/html_chapter/gettext_toc.html
The Python implementation (that of about 500 lines) does't use any external C lib at all; it's 100% pure Python.
Source: NG:digitalmars.D/6494

Common Locale Date Repository    

D should define locales exclusively in terms of ISO language and country codes, plus variant extensions. Unicode defines locales that way, and the etc.unicode library will have no choice but to use the ISO codes. Collation and stuff like that will need to rely on data from the CDLR (Common Locale Data Repository - see http://www.unicode.org/cldr/).
Source: NG:digitalmars.D/6493

More links    


FrontPage | News | TestPage | MessageBoard | Search | Contributors | Folders | Index | Help | Preferences | Edit

Edit text of this page (date of last change: January 17, 2007 22:58 (diff))