Boost.Locale
|
Messages formatting is probably the most important part of the localization - making your application speak in the user's language.
Boost.Locale uses the GNU Gettext localization model. We recommend you read the general documentation of GNU Gettext, as it is outside the scope of this document.
The model is following:
foo
is prepared for localization by calling the translate function for each message used in user interface. cout << "Hello World" << endl;
cout << translate("Hello World") << endl;
foo.po
file is generated that contains all of the original English strings. ... msgid "Hello World" msgstr "" ...
foo.po
file is translated for the supported locales. For example, de.po
, ar.po
, en_CA.po
, and he.po
. ... msgid "Hello World" msgstr "שלום עולם"
mo
format and stored in the following file structure: de de/LC_MESSAGES de/LC_MESSAGES/foo.mo en_CA/ en_CA/LC_MESSAGES en_CA/LC_MESSAGES/foo.mo ...
translate
function is called and the message is written to an output stream, a dictionary lookup is performed and the localized message is written out instead.All the dictionaries are loaded by the generator class. Using localized strings in the application, requires specification of the following parameters:
This is done by calling the following member functions of the generator class:
/usr/share/locale/ar/LC_MESSAGES/foo
.mo, then path should be /usr/share/locale
. This is an example of our first fully localized program:
#include <boost/locale.hpp> #include <iostream> using namespace std; using namespace boost::locale; int main() { generator gen; // Specify location of dictionaries gen.add_messages_path("."); gen.add_messages_domain("hello"); // Generate locales and imbue them to iostream locale::global(gen("")); cout.imbue(locale()); // Display a message using current system locale cout << translate("Hello World") << endl; }
There are two ways to translate messages:
std::ostream
formatting the message in the std::ostream's
locale. std::ostream
object and for postponing message translationstd::basic_string
in given locale. The basic function that allows us to translate a message is boost::locale::translate() family of functions.
These functions use a character type CharType
as template parameter and receive either CharType const *
or std::basic_string<CharType>
as input.
These functions receive an original message and return a special proxy object - basic_message<CharType>. This object holds all the required information for the message formatting.
When this object is written to an output ostream
, it performs a dictionary lookup of the message according to the locale imbued in iostream
.
If the message is found in the dictionary it is written to the output stream, otherwise the original string is written to the stream.
For example:
// Translate a simple message "Hello World!" std::cout << boost::locale::translate("Hello World!") << std::endl;
This allows the program to postpone translation of the message until the translation is actually needed, even to different locale targets.
// Several output stream that we write a message to // English, Japanese, Hebrew etc. // Each one them has installed std::locale object that represents // their specific locale std::ofstream en,ja,he,de,ar; // Send single message to multiple streams void send_to_all(message const &msg) { // in each of the cases below // the message is translated to different // language en << msg; ja << msg; he << msg; de << msg; ar << msg; } int main() { ... send_to_all(translate("Hello World")); }
std::wstring msg = translate(L"Do you want to open the file?");
std::locale ru_RU = ... ; std::string msg = translate("Do you want to open the file?").str(ru_RU);
GNU Gettext catalogs have simple, robust and yet powerful plural forms support. We recommend to read the original GNU documentation here.
Let's try to solve a simple problem, displaying a message to the user:
if(files == 1) cout << translate("You have 1 file in the directory") << endl; else cout << format(translate("You have {1} files in the directory")) % files << endl;
This very simple task becomes quite complicated when we deal with languages other than English. Many languages have more than two plural forms. For example, in Hebrew there are special forms for single, double, plural, and plural above 10. They can't be distinguished by the simple rule "is n 1 or not"
The correct solution is to give a translator an ability to choose a plural form on its own. Thus the translate function can receive two additional parameters English plural form a number: translate(single,plural,count)
For example:
cout << format(translate( "You have {1} file in the directory", "You have {1} files in the directory", files)) % files << endl;
A special entry in the dictionary specifies the rule to choose the correct plural form in the target language. For example, the Slavic language family has 3 plural forms, that can be chosen using following equation:
plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;
Such equation is stored in the message catalog itself and it is evaluated during translation to supply the correct form.
So the code above would display 3 different forms in Russian locale for values of 1, 3 and 5:
У вас есть 1 файл в каталоге У вас есть 3 файла в каталоге У вас есть 5 файлов в каталоге
And for Japanese that does not have plural forms at all it would display the same message for any numeric value.
For more detailed information please refer to GNU Gettext: 11.2.6 Additional functions for plural forms
In many cases it is not sufficient to provide only the original English string to get the correct translation. You sometimes need to provide some context information. In German, for example, a button labeled "open" is translated to "öffnen" in the context of "opening a file", or to "aufbauen" in the context of opening an internet connection.
In these cases you must add some context information to the original string, by adding a comment.
button->setLabel(translate("File","open"));
The context information is provided as the first parameter to the translate function in both singular and plural forms. The translator would see this context information and would be able to translate the "open" string correctly.
For example, this is how the po
file would look:
msgctxt "File" msgid "open" msgstr "öffnen" msgctxt "Internet Connection" msgid "open" msgstr "aufbauen"
In some cases it is useful to work with multiple message domains.
For example, if an application consists of several independent modules, it may have several domains - a separate domain for each module.
For example, developing a FooBar office suite we might have:
There are three ways to use non-default domains:
iostream
, you can use the parameterized manipulator as::domain(std::string const &), which allows switching domains in a stream: cout << as::domain("foo") << translate("Hello") << as::domain("bar") << translate("Hello"); // First translation is taken from dictionary foo and the other from dictionary bar
message
object to a string: MessageBox(dgettext("gui","Error Occurred"));
Many applications do not write messages directly to an output stream or use only one locale in the process, so calling translate("Hello World").str()
for a single message would be annoying. Thus Boost.Locale provides GNU Gettext-like localization functions for direct translation of the messages. However, unlike the GNU Gettext functions, the Boost.Locale translation functions provide an additional optional parameter (locale), and support wide, u16 and u32 strings.
The GNU Gettext like functions prototypes can be found in this section.
All of these functions can have different prefixes for different forms:
d
- translation in specific domainn
- plural form translationp
- translation in specific contextThere are many tools to extract messages from the source code into the .po file format. The most popular and "native" tool is
xgettext
which is installed by default on most Unix systems and freely downloadable for Windows (see Using Gettext Tools on Windows).
For example, we have a source file called dir.cpp
that prints:
cout << translate("Listing of catalog {1}:") % file_name << endl; cout << translate("Catalog {1} contains 1 file","Catalog {1} contains {2,num} files",files_no) % file_name % files_no << endl;
Now we run:
xgettext --keyword=translate:1,1t --keyword=translate:1,2,3t dir.cpp
And a file called messages.po
created that looks like this (approximately):
#: dir.cpp:1 msgid "Listing of catalog {1}:" msgstr "" #: dir.cpp:2 msgid "Catalog {1} contains 1 file" msgid_plural "Catalog {1} contains {2,num} files" msgstr[0] "" msgstr[1] ""
This file can be given to translators to adapt it to specific languages.
We used the --keyword
parameter of xgettext
to make it suitable for extracting messages from source code localized with Boost.Locale, searching for translate()
function calls instead of the default gettext()
and ngettext()
ones. The first parameter --keyword=translate:1,1t
provides the template for basic messages: a translate
function that is called with 1 argument (1t) and the first message is taken as the key. The second one --keyword=translate:1,2,3t
is used for plural forms. It tells xgettext
to use a translate()
function call with 3 parameters (3t) and take the 1st and 2nd parameter as keys. An additional marker Nc
can be used to mark context information.
The full set of xgettext parameters suitable for Boost.Locale is:
xgettext --keyword=translate:1,1t --keyword=translate:1c,2,2t \ --keyword=translate:1,2,3t --keyword=translate:1c,2,3,4t \ --keyword=gettext:1 --keyword=pgettext:1c,2 \ --keyword=ngettext:1,2 --keyword=npgettext:1c,2,3 \ source_file_1.cpp ... source_file_N.cpp
Of course, if you do not use "gettext" like translation you may ignore some of these parameters.
When the access to actual file system is limited like in ActiveX controls or when the developer wants to ship all-in-one executable file, it is useful to be able to load gettext
catalogs from a custom location - a custom file system.
Boost.Locale provides an option to install boost::locale::message_format facet with customized options provided in boost::locale::gnu_gettext::messages_info structure.
This structure contains boost::function
based callback that allows user to provide custom functionality to load message catalog files.
For example:
// Configure all options for message catalog namespace blg = boost::locale::gnu_gettext; blg::messages_info info; info.language = "he"; info.country = "IL"; info.encoding="UTF-8"; info.paths.push_back(""); // You need some even empty path info.domains.push_back(blg::messages_info::domain("my_app")); info.callback = some_file_loader; // Provide a callback // Create a basic locale without messages support boost::locale::generator gen; std::locale base_locale = gen("he_IL.UTF-8"); // Install messages catalogs for "char" support to the final locale // we are going to use std::locale real_locale(base_locale,blg::create_messages_facet<char>(info));
In order to setup language, country and other members you may use boost::locale::info facet for convenience,
// Configure all options for message catalog namespace blg = boost::locale::gnu_gettext; blg::messages_info info; info.paths.push_back(""); // You need some even empty path info.domains.push_back(blg::messages_info::domain("my_app")); info.callback = some_file_loader; // Provide a callback // Create an object with default locale std::locale base_locale = gen(""); // Use boost::locale::info to configure all parameters boost::locale::info const &properties = std::use_facet<boost::locale::info>(base_locale); info.language = properties.language(); info.country = properties.country(); info.encoding = properties.encoding(); info.variant = properties.variant(); // Install messages catalogs to the final locale std::locale real_locale(base_locale,blg::create_messages_facet<char>(info));
Boost.Locale assumes that you use English for original text messages. And the best practice is to use US-ASCII characters for original keys.
However in some cases it us useful in insert some Unicode characters in text like for example Copyright "©" character.
As long as your narrow character string encoding is UTF-8 nothing further should be done.
Boost.Locale assumes that your sources are encoded in UTF-8 and the input narrow string use UTF-8 - which is the default for most compilers around (with notable exception of Microsoft Visual C++).
However if your narrow strings encoding in the source file is not UTF-8 but some other encoding like windows-1252, the string would be misinterpreted.
You can specify the character set of the original strings when you specify the domain name for the application.
#include <boost/locale.hpp> #include <iostream> using namespace std; using namespace boost::locale; int main() { generator gen; // Specify location of dictionaries gen.add_messages_path("."); // Specify the encoding of the source string gen.add_messages_domain("copyrighted/windows-1255"); // Generate locales and imbue them to iostream locale::global(gen("")); cout.imbue(locale()); // In Windows 1255 (C) symbol is encoded as 0xA9 cout << translate("© 2001 All Rights Reserved") << endl; }
Thus if the programs runs in UTF-8 locale the copyright symbol would be automatically converted to an appropriate UTF-8 sequence if the key is missing in the dictionary.
xgettext
, msgfmt
, msgmerge
that do a very fine job, especially as they are freely available for download and support almost any platform. All Linux distributions, BSD Flavors, Mac OS X and other Unix like operating systems provide GNU Gettext tools as a standard package.