1Locale::TextDomain(3) User Contributed Perl DocumentationLocale::TextDomain(3)
2
3
4
6 Locale::TextDomain - Perl Interface to Uniforum Message Translation
7
9 use Locale::TextDomain ('my-package', @locale_dirs);
10
11 use Locale::TextDomain qw (my-package);
12
13 my $translated = __"Hello World!\n";
14
15 my $alt = $__{"Hello World!\n"};
16
17 my $alt2 = $__->{"Hello World!\n"};
18
19 my @list = (N__"Hello",
20 N__"World");
21
22 printf (__n ("one file read",
23 "%d files read",
24 $num_files),
25 $num_files);
26
27 print __nx ("one file read", "{num} files read", $num_files,
28 num => $num_files);
29
30 my $translated_context = __p ("Verb, to view", "View");
31
32 printf (__np ("Files read from filesystems",
33 "one file read",
34 "%d files read",
35 $num_files),
36 $num_files);
37
38 print __npx ("Files read from filesystems",
39 "one file read",
40 "{num} files read",
41 $num_files,
42 num => $num_files);
43
45 The module Locale::TextDomain(3pm) provides a high-level interface to
46 Perl message translation.
47
48 Textdomains
49 When you request a translation for a given string, the system used in
50 libintl-perl follows a standard strategy to find a suitable message
51 catalog containing the translation: Unless you explicitely define a
52 name for the message catalog, libintl-perl will assume that your
53 catalog is called 'messages' (unless you have changed the default value
54 to something else via Locale::Messages(3pm), method textdomain()).
55
56 You might think that his default strategy leaves room for optimization
57 and you are right. It would be a lot smarter if multiple software
58 packages, all with their individual message catalogs, could be
59 installed on one system, and it should also be possible that third-
60 party components of your software (like Perl modules) can load their
61 message catalogs, too, without interfering with yours.
62
63 The solution is clear, you have to assign a unique name to your message
64 database, and you have to specify that name at run-time. That unique
65 name is the so-called textdomain of your software package. The name is
66 actually arbitrary but you should follow these best-practice guidelines
67 to ensure maximum interoperability:
68
69 File System Safety
70 In practice, textdomains get mapped into file names, and you
71 should therefore make sure that the textdomain you choose is a
72 valid filename on every system that will run your software.
73
74 Case-sensitivity
75 Textdomains are always case-sensitive (i. e. 'Package' and
76 'PACKAGE' are not the same). However, since the message
77 catalogs will be stored on file systems, that may or may not
78 distinguish case when looking up file names, you should avoid
79 potential conflicts here.
80
81 Textdomain Should Match CPAN Name
82 If your software is listed as a module on CPAN, you should
83 simply choose the name on CPAN as your textdomain. The
84 textdomain for libintl-perl is hence 'libintl-perl'. But
85 please replace all periods ('.') in your package name with an
86 underscore because ...
87
88 Internet Domain Names as a Fallback
89 ... if your software is not a module listed on CPAN, as a last
90 resort you should use the Java(tm) package scheme, i. e. choose
91 an internet domain that you are owner of (or ask the owner of
92 an internet domain) and concatenate your preferred textdomain
93 with the reversed internet domain. Example: Your company runs
94 the web-site 'www.foobar.org' and is the owner of the domain
95 'foobar.org'. The textdomain for your company's software
96 'barfoos' should hence be 'org.foobar.barfoos'.
97
98 If your software is likely to be installed in different versions on the
99 same system, it is probably a good idea to append some version
100 information to your textdomain.
101
102 Other systems are less strict with the naming scheme for textdomains
103 but the phenomena known as Perl is actually a plethora of small,
104 specialized modules and it is probably wisest to postulate some
105 namespace model in order to avoid chaos.
106
107 Binding textdomains to directories
108 Once the system knows the textdomain of the message that you want to
109 get translated into the user's language, it still has to find the
110 correct message catalog. By default, libintl-perl will look up the
111 string in the translation database found in the directories
112 /usr/share/locale and /usr/local/share/locale (in that order).
113
114 It is neither guaranteed that these directories exist on the target
115 machine, nor can you be sure that the installation routine has write
116 access to these locations. You can therefore instruct libintl-perl to
117 search other directories prior to the default directories. Specifying
118 a differnt search directory is called binding a textdomain to a
119 directory.
120
121 Locale::TextDomain extends the default strategy by a Perl specific
122 approach. Unless told otherwise, it will look for a directory
123 LocaleData in every component found in the standard include path @INC
124 and check for a database containing the message for your textdomain
125 there. Example: If the path /usr/lib/perl/5.8.0/site_perl is in your
126 @INC, you can install your translation files in
127 /usr/lib/perl/5.8.0/site_perl/LocaleData, and they will be found at
128 run-time.
129
131 It is crucial to remember that you use Locale::TextDoamin(3) as
132 specified in the section "SYNOPSIS", that means you have to use it, not
133 require it. The module behaves quite differently compared to other
134 modules.
135
136 The most significant difference is the meaning of the list passed as an
137 argument to the use() function. It actually works like this:
138
139 use Locale::TextDomain (TEXTDOMAIN, DIRECTORY, ...)
140
141 The first argument (the first string passed to use()) is the textdomain
142 of your package, optionally followed by a list of directories to search
143 instead of the Perl-specific directories (see above: /LocaleData
144 appended to every part of @INC).
145
146 If you are the author of a package 'barfoos', you will probably put the
147 line
148
149 use Locale::TextDomain 'barfoos';
150
151 resp. for non-CPAN modules
152
153 use Locale::TextDomain 'org.foobar.barfoos';
154
155 in every module of your package that contains translatable strings. If
156 your module has been installed properly, including the message
157 catalogs, it will then be able to retrieve these translations at run-
158 time.
159
160 If you have not installed the translation database in a directory
161 LocaleData in the standard include path @INC (or in the system
162 directories /usr/share/locale resp. /usr/local/share/locale), you have
163 to explicitely specify a search path by giving the names of directories
164 (as strings!) as additional arguments to use():
165
166 use Locale::TextDomain qw (barfoos ./dir1 ./dir2);
167
168 Alternatively you can call the function bindtextdomain() with suitable
169 arguments (see the entry for bindtextdomain() in "FUNCTIONS" in
170 Locale::Messages). If you do so, you should pass "undef" as an
171 additional argument in order to avoid unnecessary lookups:
172
173 use Locale::TextDomain ('barfoos', undef);
174
175 You see that the arguments given to use() have nothing to do with what
176 is imported into your namespace, but they are rather arguments to
177 textdomain(), resp. bindtextdomain(). Does that mean that
178 Locale::TextDomain exports nothing into your namespace? Umh, not
179 exactly ... in fact it imports all functions listed below into your
180 namespace, and hence you should not define conflicting functions (and
181 variables) yourself.
182
183 So, why has Locale::TextDomain to be different from other modules? If
184 you have ever written software in C and prepared it for
185 internationalization (i18n), you will probably have defined some
186 preprocessor macros like:
187
188 #define _(String) dgettext ("my-textdomain", String)
189 #define N_(String) String
190
191 You only have to define that once in C, and the textdomain for your
192 package is automatically inserted into all gettext functions. In Perl
193 there is no such mechanism (at least it is not portable, option -P) and
194 using the gettext functions could become quite cumbersome without some
195 extra fiddling:
196
197 print dgettext ("my-textdomain", "Hello world!\n");
198
199 This is no fun. In C it would merely be a
200
201 printf (_("Hello world!\n"));
202
203 Perl has to be more concise and shorter than C ... see the next section
204 for how you can use Locale::TextDomain to end up in Perl with a mere
205
206 print __"Hello World!\n";
207
209 All functions have quite funny names on purpose. In fact the purpose
210 for that is quite clear: They should be short, operator-like, and they
211 should not yell for conflicts with existing functions in your
212 namespace. You will understand it, when you internationalize your
213 first Perl program or module. Preparing it is more like marking
214 strings as being translatable than inserting function calls. Here we
215 go:
216
217 __ MSGID
218 NOTE: This is a double underscore!
219
220 The basic and most-used function. It is a short-cut for a call to
221 gettext() resp. dgettext(), and simply returns the translation for
222 MSGID. If your old code reads like this:
223
224 print "permission denied";
225
226 You will now write:
227
228 print __"permission denied";
229
230 That's all, the string will be output in the user's preferred
231 language, provided that you have installed a translation for it.
232
233 Of course you can also use parentheses:
234
235 print __("permission denied");
236
237 Or even:
238
239 print (__("permission denied"));
240
241 In my eyes, the first version without parentheses looks best.
242
243 __x MSGID, ID1 => VAL1, ID2 => VAL2, ...
244 One of the nicest features in Perl is its capability to interpolate
245 variables into strings:
246
247 print "This is the $color $thing.\n";
248
249 This nice feature might con you into thinking that you could now
250 write
251
252 print __"This is the $color $thing.\n";
253
254 Alas, that would be nice, but it is not possible. Remember that
255 the function __() serves both as an operator for translating
256 strings and as a mark for translatable strings. If the above
257 string would get extracted from your Perl code, the un-interpolated
258 form would end up in the message catalog because when parsing your
259 code it is unpredictable what values the variables $thing and
260 $color will have at run-time (this fact is most probably one of the
261 reasons you have written your program for).
262
263 However, at run-time, Perl will have interpolated the values
264 already before __() (resp. the underlying gettext() function) has
265 seen the original string. Consequently something like "This is the
266 red car.\n" will be looked up in the message catalog, it will not
267 be found (because only "This is the $color $thing.\n" is included
268 in the database), and the original, untranslated string will be
269 returned. Honestly, because this is almost always an error, the
270 xgettext(1) program will bail out with a fatal error when it comes
271 across that string in your code.
272
273 There are two workarounds for that:
274
275 printf __"This is the %s %s.\n", $color, $thing;
276
277 But that has several disadvantages: Your translator will only see
278 the isolated string, and without the surrounding code it is almost
279 impossible to interpret it correctly. Of course, GNU emacs and
280 other software capable of editing PO translation files will allow
281 you to examine the context in the source code, but it is more
282 likely that your translator will look for a less challenging
283 translation project when she frequently comes across such messages.
284
285 And even if she does understand the underlying programming, what if
286 she has to reorder the color and the thing like in French:
287
288 msgid "This is the red car.\n";
289 msgstr "Cela est la voiture rouge.\n"
290
291 Zut alors! No way! You cannot portably reorder the arguments to
292 printf() and friends in Perl (it is possible in C, but at the time
293 of this writing not supported in Perl, and it would lead to other
294 problems anyway).
295
296 So what? The Perl backend to GNU gettext has defined an alternative
297 format for interpolatable strings:
298
299 "This is the {color} {thing}.\n";
300
301 Instead of Perl variables you use place-holders (legal Perl
302 variables are also legal place-holders) in curly braces, and then
303 you call
304
305 print __x ("This is the {color} {thing}.\n",
306 thing => $thang,
307 color => $color);
308
309 The function __x() will take the additional hash and replace all
310 occurencies of the hash keys in curly braces with the corresponding
311 values. Simple, readable, understandable to translators, what else
312 would you want? And if the translator forgets, misspells or
313 otherwise messes up some "variables", the msgfmt(1) program, that
314 is used to compile the textual translation file into its binary
315 representation will even choke on these errors and refuse to
316 compile the translation.
317
318 __n MSGID, MSGID_PLURAL, COUNT
319 Whew! That looks complicated ... It is best explained with an
320 example. We'll have another look at your vintage code:
321
322 if ($files_deleted > 1) {
323 print "All files have been deleted.\n";
324 } else {
325 print "One file has been deleted.\n";
326 }
327
328 Your intent is clear, you wanted to avoid the cumbersome "1 files
329 deleted". This is okay for English, but other languages have more
330 than one plural form. For example in Russian it makes a difference
331 whether you want to say 1 file, 3 files or 6 files. You will use
332 three different forms of the noun 'file' in each case. [Note: Yep,
333 very smart you are, the Russian word for 'file' is in fact the
334 English word, and it is an invariable noun, but if you know that,
335 you will also understand the rest despite this little
336 simplification ...].
337
338 That is the reason for the existance of the function ngettext(),
339 that __n() is a short-cut for:
340
341 print __n"One file has been deleted.\n",
342 "All files have been deleted.\n",
343 $files_deleted;
344
345 Alternatively:
346
347 print __n ("One file has been deleted.\n",
348 "All files have been deleted.\n",
349 $files_deleted);
350
351 The effect is always the same: libintl-perl will find out which
352 plural form to pick for your user's language, and the output string
353 will always look okay.
354
355 __nx MSGID, MSGID_PLURAL, COUNT, VAR1 => VAL1, VAR2 => VAL2, ...
356 Bringing it all together:
357
358 print __nx ("One file has been deleted.\n",
359 "{count} files have been deleted.\n",
360 $num_files,
361 count => $num_files);
362
363 The function __nx() picks the correct plural form (also for
364 English!) and it is capable of interpolating variables into
365 strings.
366
367 Have a close look at the order of arguments: The first argument is
368 the string in the singular, the second one is the plural string.
369 The third one is an integer indicating the number of items. This
370 third argument is only used to pick the correct translation. The
371 optionally following arguments make up the hash used for
372 interpolation. In the beginning it is often a little confusing
373 that the variable holding the number of items will usually be
374 repeated somewhere in the interpolation hash.
375
376 __xn MSGID, MSGID_PLURAL, COUNT, VAR1 => VAL1, VAR2 => VAL2, ...
377 Does exactly the same thing as __nx(). In fact it is a common typo
378 promoted to a feature.
379
380 __p MSGCTXT, MSGID
381 This is much like __. The "p" stands for "particular", and the
382 MSGCTXT is used to provide context to the translator. This may be
383 neccessary when your string is short, and could stand for multiple
384 things. For example:
385
386 print __p"Verb, to view", "View";
387 print __p"Noun, a view", "View";
388
389 The above may be "View" entries in a menu, where View->Source and
390 File->View are different forms of "View", and likely need to be
391 translated differently.
392
393 A typical usage are GUI programs. Imagine a program with a main
394 menu and the notorious "Open" entry in the "File" menu. Now
395 imagine, there is another menu entry Preferences->Advanced->Policy
396 where you have a choice between the alternatives "Open" and
397 "Closed". In English, "Open" is the adequate text at both places.
398 In other languages, it is very likely that you need two different
399 translations. Therefore, you would now write:
400
401 __p"File|", "Open";
402 __p"Preferences|Advanced|Policy", "Open";
403
404 In English, or if no translation can be found, the second argument
405 (MSGID) is returned.
406
407 This function was introduced in libintl-perl 1.17.
408
409 __px MSGCTXT, MSGID, VAR1 => VAL1, VAR2 => VAL2, ...
410 Like __p(), but supports variable substitution in the string, like
411 __x().
412
413 print __px("Verb, to view", "View {file}", file => $filename);
414
415 See __p() and __x() for more details.
416
417 This function was introduced in libintl-perl 1.17.
418
419 __np MSGCTXT, MSGID, MSGID_PLURAL, COUNT
420 This adds context to plural calls. It should not be needed very
421 often, if at all, due to the __nx() function. The type of variable
422 substitution used in other gettext libraries (using sprintf-like
423 sybols, like %s or %1) sometimes required context. For a (bad)
424 example of this:
425
426 printf (__np("[count] files have been deleted",
427 "One file has been deleted.\n",
428 "%s files have been deleted.\n",
429 $num_files),
430 $num_files);
431
432 NOTE: The above usage is discouraged. Just use the __nx() call,
433 which provides inline context via the key names.
434
435 This function was introduced in libintl-perl 1.17.
436
437 __npx MSGCTXT, MSGID, MSGID_PLURAL, COUNT, VAR1 => VAL1, VAR2 => VAL2,
438 ...
439 This is provided for comleteness. It adds the variable
440 interpolation into the string to the previous method, __np().
441
442 It's usage would be like so:
443
444 print __nx ("Files being permenantly removed",
445 "One file has been deleted.\n",
446 "{count} files have been deleted.\n",
447 $num_files,
448 count => $num_files);
449
450 I cannot think of any situations requiring this, but we can easily
451 support it, so here it is.
452
453 This function was introduced in libintl-perl 1.17.
454
455 N__ (ARG1, ARG2, ...)
456 A no-op function that simply echoes its arguments to the caller.
457 Take the following piece of Perl:
458
459 my @options = (
460 "Open",
461 "Save",
462 "Save As",
463 );
464
465 ...
466
467 my $option = $options[1];
468
469 Now say that you want to have this translatable. You could
470 sometimes simply do:
471
472 my @options = (
473 __"Open",
474 __"Save",
475 __"Save As",
476 );
477
478 ...
479
480 my $option = $options[1];
481
482 But often times this will not be what you want, for example when
483 you also need the unmodified original string. Sometimes it may not
484 even work, for example, when the preferred user language is not yet
485 determined at the time that the list is initialized.
486
487 In these cases you would write:
488
489 my @options = (
490 N__"Open",
491 N__"Save",
492 N__"Save As",
493 );
494
495 ...
496
497 my $option = __($options[1]);
498 # or: my $option = dgettext ('my-domain', $options[1]);
499
500 Now all the strings in @options will be left alone, since N__()
501 returns its arguments (one ore more) unmodified. Nevertheless, the
502 string extractor will be able to recognize the strings as being
503 translatable. And you can still get the translation later by
504 passing the variable instead of the string to one of the above
505 translation functions.
506
507 N__n (MSGID, MSGID_PLURAL, COUNT)
508 Does exactly the same as N__(). You will use this form if you have
509 to mark the strings as having plural forms.
510
511 N__p (MSGCTXT, MSGID)
512 Marks MSGID as N__() does, but in the context MSGCTXT.
513
514 N__np (MSGCTXT, MSGID, MSGID_PLURAL, COUNT)
515 Marks MSGID as N__n() does, but in the context MSGCTXT. =back
516
518 The module exports several variables into your namespace:
519
520 %__ A tied hash. Its keys are your original messages, the values are
521 their translations:
522
523 my $title = "<h1>$__{'My Homepage'}</h1>";
524
525 This is much better for your translation team than
526
527 my $title = __"<h1>My Homepage</h1>";
528
529 In the second case the HTML code will make it into the translation
530 database and your translators have to be aware of HTML syntax when
531 translating strings.
532
533 Warning: Do not use this hash outside of double-quoted strings!
534 The code in the tied hash object relies on the correct working of
535 the function caller() (see "perldoc -f caller"), and this function
536 will report incorrect results if the tied hash value is the
537 argument to a function from another package, for example:
538
539 my $result = Other::Package::do_it ($__{'Some string'});
540
541 The tied hash code will see "Other::Package" as the calling
542 package, instead of your own package. Consequently it will look up
543 the message in the wrong text domain. There is no workaround for
544 this bug. Therefore:
545
546 Never use the tied hash interpolated strings!
547
548 $__ A reference to "%__", in case you prefer:
549
550 my $title = "<h1>$__->{'My Homepage'}</h1>";
551
553 Message translation can be a time-consuming task. Take this little
554 example:
555
556 1: use Locale::TextDomain ('my-domain');
557 2: use POSIX (:locale_h);
558 3:
559 4: setlocale (LC_ALL, '');
560 5: print __"Hello world!\n";
561
562 This will usually be quite fast, but in pathological cases it may run
563 for several seconds. A worst-case scenario would be a Chinese user at
564 a terminal that understands the codeset Big5-HKSCS. Your translator
565 for Chinese has however chosen to encode the translations in the
566 codeset EUC-TW.
567
568 What will happen at run-time? First, the library will search and load
569 a (maybe large) message catalog for your textdomain 'my-domain'. Then
570 it will look up the translation for "Hello world!\n", it will find that
571 it is encoded in EUC-TW. Since that differs from the output codeset
572 Big5-HKSCS, it will first load a conversion table containing several
573 ten-thousands of codepoints for EUC-TW, then it does the same with the
574 smaller, but still very large conversion table for Big5-HKSCS, it will
575 convert the translation on the fly from EUC-TW into Big5-HKSCS, and
576 finally it will return the converted translation.
577
578 A worst-case scenario but realistic. And for these five lines of
579 codes, there is not much you can do to make it any faster. You should
580 understand, however, when the different steps will take place, so that
581 you can arrange your code for it.
582
583 You have learned in the section "DESCRIPTION" that line 1 is
584 responsible for locating your message database. However, the use()
585 will do nothing more than remembering your settings. It will not
586 search any directories, it will not load any catalogs or conversion
587 tables.
588
589 Somewhere in your code you will always have a call to
590 POSIX::setlocale(), and the performance of this call may be time-
591 consuming, depending on the architecture of your system. On some
592 systems, this will consume very little time, on others it will only
593 consume a considerable amount of time for the first call, and on others
594 it may always be time-consuming. Since you cannot know, how
595 setlocale() is implemented on the target system, you should reduce the
596 calls to setlocale() to a minimum.
597
598 Line 5 requests the translation for your string. Only now, the library
599 will actually load the message catalog, and only now will it load
600 eventually needed conversion tables. And from now on, all this
601 information will be cached in memory. This strategy is used throughout
602 libintl-perl, and you may describe it as 'load-on-first-access'.
603 Getting the next translation will consume very little resources.
604
605 However, although the translation retrieval is somewhat obfuscated by
606 an operator-like function call, it is still a function call, and in
607 fact it even involves a chain of function calls. Consequently, the
608 following example is probably bad practice:
609
610 foreach (1 .. 100_000) {
611 print __"Hello world!\n";
612 }
613
614 This example introduces a lot of overhead into your program. Better do
615 this:
616
617 my $string = __"Hello world!\n";
618 foreach (1 .. 100_000) {
619 print $string;
620 }
621
622 The translation will never change, there is no need to retrieve it over
623 and over again. Although libintl-perl will of course cache the
624 translation read from the file system, you can still avoid the overhead
625 for the function calls.
626
628 Copyright (C) 2002-2009, Guido Flohr <guido@imperia.net>, all rights
629 reserved. See the source code for details.
630
631 This software is contributed to the Perl community by Imperia
632 (<http://www.imperia.net/>).
633
635 Locale::Messages(3pm), Locale::gettext_pp(3pm), perl(1), gettext(1),
636 gettext(3)
637
639 Hey! The above document had some coding errors, which are explained
640 below:
641
642 Around line 904:
643 You forgot a '=back' before '=head1'
644
645 Around line 1050:
646 =cut found outside a pod block. Skipping to next block.
647
648
649
650perl v5.12.0 2010-05-02 Locale::TextDomain(3)