1Locale::Maketext(3pm) Perl Programmers Reference Guide Locale::Maketext(3pm)
2
3
4
6 Locale::Maketext - framework for localization
7
9 package MyProgram;
10 use strict;
11 use MyProgram::L10N;
12 # ...which inherits from Locale::Maketext
13 my $lh = MyProgram::L10N->get_handle() || die "What language?";
14 ...
15 # And then any messages your program emits, like:
16 warn $lh->maketext( "Can't open file [_1]: [_2]\n", $f, $! );
17 ...
18
20 It is a common feature of applications (whether run directly, or via
21 the Web) for them to be "localized" -- i.e., for them to a present an
22 English interface to an English-speaker, a German interface to a
23 German-speaker, and so on for all languages it's programmed with.
24 Locale::Maketext is a framework for software localization; it provides
25 you with the tools for organizing and accessing the bits of text and
26 text-processing code that you need for producing localized
27 applications.
28
29 In order to make sense of Maketext and how all its components fit
30 together, you should probably go read Locale::Maketext::TPJ13, and then
31 read the following documentation.
32
33 You may also want to read over the source for "File::Findgrep" and its
34 constituent modules -- they are a complete (if small) example
35 application that uses Maketext.
36
38 The basic design of Locale::Maketext is object-oriented, and
39 Locale::Maketext is an abstract base class, from which you derive a
40 "project class". The project class (with a name like
41 "TkBocciBall::Localize", which you then use in your module) is in turn
42 the base class for all the "language classes" for your project (with
43 names "TkBocciBall::Localize::it", "TkBocciBall::Localize::en",
44 "TkBocciBall::Localize::fr", etc.).
45
46 A language class is a class containing a lexicon of phrases as class
47 data, and possibly also some methods that are of use in interpreting
48 phrases in the lexicon, or otherwise dealing with text in that
49 language.
50
51 An object belonging to a language class is called a "language handle";
52 it's typically a flyweight object.
53
54 The normal course of action is to call:
55
56 use TkBocciBall::Localize; # the localization project class
57 $lh = TkBocciBall::Localize->get_handle();
58 # Depending on the user's locale, etc., this will
59 # make a language handle from among the classes available,
60 # and any defaults that you declare.
61 die "Couldn't make a language handle??" unless $lh;
62
63 From then on, you use the "maketext" function to access entries in
64 whatever lexicon(s) belong to the language handle you got. So, this:
65
66 print $lh->maketext("You won!"), "\n";
67
68 ...emits the right text for this language. If the object in $lh
69 belongs to class "TkBocciBall::Localize::fr" and
70 %TkBocciBall::Localize::fr::Lexicon contains "("You won!" => "Tu as
71 gagne!")", then the above code happily tells the user "Tu as gagne!".
72
74 Locale::Maketext offers a variety of methods, which fall into three
75 categories:
76
77 · Methods to do with constructing language handles.
78
79 · "maketext" and other methods to do with accessing %Lexicon data for
80 a given language handle.
81
82 · Methods that you may find it handy to use, from routines of yours
83 that you put in %Lexicon entries.
84
85 These are covered in the following section.
86
87 Construction Methods
88 These are to do with constructing a language handle:
89
90 · $lh = YourProjClass->get_handle( ...langtags... ) || die "lg-
91 handle?";
92
93 This tries loading classes based on the language-tags you give
94 (like "("en-US", "sk", "kon", "es-MX", "ja", "i-klingon")", and for
95 the first class that succeeds, returns
96 YourProjClass::language->new().
97
98 If it runs thru the entire given list of language-tags, and finds
99 no classes for those exact terms, it then tries "superordinate"
100 language classes. So if no "en-US" class (i.e.,
101 YourProjClass::en_us) was found, nor classes for anything else in
102 that list, we then try its superordinate, "en" (i.e.,
103 YourProjClass::en), and so on thru the other language-tags in the
104 given list: "es". (The other language-tags in our example list:
105 happen to have no superordinates.)
106
107 If none of those language-tags leads to loadable classes, we then
108 try classes derived from YourProjClass->fallback_languages() and
109 then if nothing comes of that, we use classes named by
110 YourProjClass->fallback_language_classes(). Then in the (probably
111 quite unlikely) event that that fails, we just return undef.
112
113 · $lh = YourProjClass->get_handle() || die "lg-handle?";
114
115 When "get_handle" is called with an empty parameter list, magic
116 happens:
117
118 If "get_handle" senses that it's running in program that was
119 invoked as a CGI, then it tries to get language-tags out of the
120 environment variable "HTTP_ACCEPT_LANGUAGE", and it pretends that
121 those were the languages passed as parameters to "get_handle".
122
123 Otherwise (i.e., if not a CGI), this tries various OS-specific ways
124 to get the language-tags for the current locale/language, and then
125 pretends that those were the value(s) passed to "get_handle".
126
127 Currently this OS-specific stuff consists of looking in the
128 environment variables "LANG" and "LANGUAGE"; and on MSWin machines
129 (where those variables are typically unused), this also tries using
130 the module Win32::Locale to get a language-tag for whatever
131 language/locale is currently selected in the "Regional Settings"
132 (or "International"?) Control Panel. I welcome further
133 suggestions for making this do the Right Thing under other
134 operating systems that support localization.
135
136 If you're using localization in an application that keeps a
137 configuration file, you might consider something like this in your
138 project class:
139
140 sub get_handle_via_config {
141 my $class = $_[0];
142 my $chosen_language = $Config_settings{'language'};
143 my $lh;
144 if($chosen_language) {
145 $lh = $class->get_handle($chosen_language)
146 || die "No language handle for \"$chosen_language\" or the like";
147 } else {
148 # Config file missing, maybe?
149 $lh = $class->get_handle()
150 || die "Can't get a language handle";
151 }
152 return $lh;
153 }
154
155 · $lh = YourProjClass::langname->new();
156
157 This constructs a language handle. You usually don't call this
158 directly, but instead let "get_handle" find a language class to
159 "use" and to then call ->new on.
160
161 · $lh->init();
162
163 This is called by ->new to initialize newly-constructed language
164 handles. If you define an init method in your class, remember that
165 it's usually considered a good idea to call $lh->SUPER::init in it
166 (presumably at the beginning), so that all classes get a chance to
167 initialize a new object however they see fit.
168
169 · YourProjClass->fallback_languages()
170
171 "get_handle" appends the return value of this to the end of
172 whatever list of languages you pass "get_handle". Unless you
173 override this method, your project class will inherit
174 Locale::Maketext's "fallback_languages", which currently returns
175 "('i-default', 'en', 'en-US')". ("i-default" is defined in RFC
176 2277).
177
178 This method (by having it return the name of a language-tag that
179 has an existing language class) can be used for making sure that
180 "get_handle" will always manage to construct a language handle
181 (assuming your language classes are in an appropriate @INC
182 directory). Or you can use the next method:
183
184 · YourProjClass->fallback_language_classes()
185
186 "get_handle" appends the return value of this to the end of the
187 list of classes it will try using. Unless you override this
188 method, your project class will inherit Locale::Maketext's
189 "fallback_language_classes", which currently returns an empty list,
190 "()". By setting this to some value (namely, the name of a
191 loadable language class), you can be sure that "get_handle" will
192 always manage to construct a language handle.
193
194 The "maketext" Method
195 This is the most important method in Locale::Maketext:
196
197 $text = $lh->maketext(I<key>, ...parameters for this phrase...);
198
199 This looks in the %Lexicon of the language handle $lh and all its
200 superclasses, looking for an entry whose key is the string key.
201 Assuming such an entry is found, various things then happen, depending
202 on the value found:
203
204 If the value is a scalarref, the scalar is dereferenced and returned
205 (and any parameters are ignored).
206
207 If the value is a coderef, we return &$value($lh, ...parameters...).
208
209 If the value is a string that doesn't look like it's in Bracket
210 Notation, we return it (after replacing it with a scalarref, in its
211 %Lexicon).
212
213 If the value does look like it's in Bracket Notation, then we compile
214 it into a sub, replace the string in the %Lexicon with the new coderef,
215 and then we return &$new_sub($lh, ...parameters...).
216
217 Bracket Notation is discussed in a later section. Note that trying to
218 compile a string into Bracket Notation can throw an exception if the
219 string is not syntactically valid (say, by not balancing brackets
220 right.)
221
222 Also, calling &$coderef($lh, ...parameters...) can throw any sort of
223 exception (if, say, code in that sub tries to divide by zero). But a
224 very common exception occurs when you have Bracket Notation text that
225 says to call a method "foo", but there is no such method. (E.g., "You
226 have [quatn,_1,ball]." will throw an exception on trying to call
227 $lh->quatn($_[1],'ball') -- you presumably meant "quant".) "maketext"
228 catches these exceptions, but only to make the error message more
229 readable, at which point it rethrows the exception.
230
231 An exception may be thrown if key is not found in any of $lh's %Lexicon
232 hashes. What happens if a key is not found, is discussed in a later
233 section, "Controlling Lookup Failure".
234
235 Note that you might find it useful in some cases to override the
236 "maketext" method with an "after method", if you want to translate
237 encodings, or even scripts:
238
239 package YrProj::zh_cn; # Chinese with PRC-style glyphs
240 use base ('YrProj::zh_tw'); # Taiwan-style
241 sub maketext {
242 my $self = shift(@_);
243 my $value = $self->maketext(@_);
244 return Chineeze::taiwan2mainland($value);
245 }
246
247 Or you may want to override it with something that traps any
248 exceptions, if that's critical to your program:
249
250 sub maketext {
251 my($lh, @stuff) = @_;
252 my $out;
253 eval { $out = $lh->SUPER::maketext(@stuff) };
254 return $out unless $@;
255 ...otherwise deal with the exception...
256 }
257
258 Other than those two situations, I don't imagine that it's useful to
259 override the "maketext" method. (If you run into a situation where it
260 is useful, I'd be interested in hearing about it.)
261
262 $lh->fail_with or $lh->fail_with(PARAM)
263 $lh->failure_handler_auto
264 These two methods are discussed in the section "Controlling Lookup
265 Failure".
266
267 Utility Methods
268 These are methods that you may find it handy to use, generally from
269 %Lexicon routines of yours (whether expressed as Bracket Notation or
270 not).
271
272 $language->quant($number, $singular)
273 $language->quant($number, $singular, $plural)
274 $language->quant($number, $singular, $plural, $negative)
275 This is generally meant to be called from inside Bracket Notation
276 (which is discussed later), as in
277
278 "Your search matched [quant,_1,document]!"
279
280 It's for quantifying a noun (i.e., saying how much of it there is,
281 while giving the correct form of it). The behavior of this method
282 is handy for English and a few other Western European languages,
283 and you should override it for languages where it's not suitable.
284 You can feel free to read the source, but the current
285 implementation is basically as this pseudocode describes:
286
287 if $number is 0 and there's a $negative,
288 return $negative;
289 elsif $number is 1,
290 return "1 $singular";
291 elsif there's a $plural,
292 return "$number $plural";
293 else
294 return "$number " . $singular . "s";
295 #
296 # ...except that we actually call numf to
297 # stringify $number before returning it.
298
299 So for English (with Bracket Notation) "...[quant,_1,file]..." is
300 fine (for 0 it returns "0 files", for 1 it returns "1 file", and
301 for more it returns "2 files", etc.)
302
303 But for "directory", you'd want "[quant,_1,directory,directories]"
304 so that our elementary "quant" method doesn't think that the plural
305 of "directory" is "directorys". And you might find that the output
306 may sound better if you specify a negative form, as in:
307
308 "[quant,_1,file,files,No files] matched your query.\n"
309
310 Remember to keep in mind verb agreement (or adjectives too, in
311 other languages), as in:
312
313 "[quant,_1,document] were matched.\n"
314
315 Because if _1 is one, you get "1 document were matched". An
316 acceptable hack here is to do something like this:
317
318 "[quant,_1,document was, documents were] matched.\n"
319
320 $language->numf($number)
321 This returns the given number formatted nicely according to this
322 language's conventions. Maketext's default method is mostly to
323 just take the normal string form of the number (applying sprintf
324 "%G" for only very large numbers), and then to add commas as
325 necessary. (Except that we apply "tr/,./.,/" if
326 $language->{'numf_comma'} is true; that's a bit of a hack that's
327 useful for languages that express two million as "2.000.000" and
328 not as "2,000,000").
329
330 If you want anything fancier, consider overriding this with
331 something that uses Number::Format, or does something else
332 entirely.
333
334 Note that numf is called by quant for stringifying all quantifying
335 numbers.
336
337 $language->sprintf($format, @items)
338 This is just a wrapper around Perl's normal "sprintf" function.
339 It's provided so that you can use "sprintf" in Bracket Notation:
340
341 "Couldn't access datanode [sprintf,%10x=~[%s~],_1,_2]!\n"
342
343 returning...
344
345 Couldn't access datanode Stuff=[thangamabob]!
346
347 $language->language_tag()
348 Currently this just takes the last bit of "ref($language)", turns
349 underscores to dashes, and returns it. So if $language is an
350 object of class Hee::HOO::Haw::en_us, $language->language_tag()
351 returns "en-us". (Yes, the usual representation for that language
352 tag is "en-US", but case is never considered meaningful in
353 language-tag comparison.)
354
355 You may override this as you like; Maketext doesn't use it for
356 anything.
357
358 $language->encoding()
359 Currently this isn't used for anything, but it's provided (with
360 default value of "(ref($language) && $language->{'encoding'})) or
361 "iso-8859-1"" ) as a sort of suggestion that it may be
362 useful/necessary to associate encodings with your language handles
363 (whether on a per-class or even per-handle basis.)
364
365 Language Handle Attributes and Internals
366 A language handle is a flyweight object -- i.e., it doesn't
367 (necessarily) carry any data of interest, other than just being a
368 member of whatever class it belongs to.
369
370 A language handle is implemented as a blessed hash. Subclasses of
371 yours can store whatever data you want in the hash. Currently the only
372 hash entry used by any crucial Maketext method is "fail", so feel free
373 to use anything else as you like.
374
375 Remember: Don't be afraid to read the Maketext source if there's any
376 point on which this documentation is unclear. This documentation is
377 vastly longer than the module source itself.
378
380 These are Locale::Maketext's assumptions about the class hierarchy
381 formed by all your language classes:
382
383 · You must have a project base class, which you load, and which you
384 then use as the first argument in the call to
385 YourProjClass->get_handle(...). It should derive (whether directly
386 or indirectly) from Locale::Maketext. It doesn't matter how you
387 name this class, although assuming this is the localization
388 component of your Super Mega Program, good names for your project
389 class might be SuperMegaProgram::Localization,
390 SuperMegaProgram::L10N, SuperMegaProgram::I18N,
391 SuperMegaProgram::International, or even
392 SuperMegaProgram::Languages or SuperMegaProgram::Messages.
393
394 · Language classes are what YourProjClass->get_handle will try to
395 load. It will look for them by taking each language-tag (skipping
396 it if it doesn't look like a language-tag or locale-tag!), turning
397 it to all lowercase, turning dashes to underscores, and appending
398 it to YourProjClass . "::". So this:
399
400 $lh = YourProjClass->get_handle(
401 'en-US', 'fr', 'kon', 'i-klingon', 'i-klingon-romanized'
402 );
403
404 will try loading the classes YourProjClass::en_us (note
405 lowercase!), YourProjClass::fr, YourProjClass::kon,
406 YourProjClass::i_klingon and YourProjClass::i_klingon_romanized.
407 (And it'll stop at the first one that actually loads.)
408
409 · I assume that each language class derives (directly or indirectly)
410 from your project class, and also defines its @ISA, its %Lexicon,
411 or both. But I anticipate no dire consequences if these
412 assumptions do not hold.
413
414 · Language classes may derive from other language classes (although
415 they should have "use Thatclassname" or "use base
416 qw(...classes...)"). They may derive from the project class. They
417 may derive from some other class altogether. Or via multiple
418 inheritance, it may derive from any mixture of these.
419
420 · I foresee no problems with having multiple inheritance in your
421 hierarchy of language classes. (As usual, however, Perl will
422 complain bitterly if you have a cycle in the hierarchy: i.e., if
423 any class is its own ancestor.)
424
426 A typical %Lexicon entry is meant to signify a phrase, taking some
427 number (0 or more) of parameters. An entry is meant to be accessed by
428 via a string key in $lh->maketext(key, ...parameters...), which should
429 return a string that is generally meant for be used for "output" to the
430 user -- regardless of whether this actually means printing to STDOUT,
431 writing to a file, or putting into a GUI widget.
432
433 While the key must be a string value (since that's a basic restriction
434 that Perl places on hash keys), the value in the lexicon can currently
435 be of several types: a defined scalar, scalarref, or coderef. The use
436 of these is explained above, in the section 'The "maketext" Method',
437 and Bracket Notation for strings is discussed in the next section.
438
439 While you can use arbitrary unique IDs for lexicon keys (like
440 "_min_larger_max_error"), it is often useful for if an entry's key is
441 itself a valid value, like this example error message:
442
443 "Minimum ([_1]) is larger than maximum ([_2])!\n",
444
445 Compare this code that uses an arbitrary ID...
446
447 die $lh->maketext( "_min_larger_max_error", $min, $max )
448 if $min > $max;
449
450 ...to this code that uses a key-as-value:
451
452 die $lh->maketext(
453 "Minimum ([_1]) is larger than maximum ([_2])!\n",
454 $min, $max
455 ) if $min > $max;
456
457 The second is, in short, more readable. In particular, it's obvious
458 that the number of parameters you're feeding to that phrase (two) is
459 the number of parameters that it wants to be fed. (Since you see _1
460 and a _2 being used in the key there.)
461
462 Also, once a project is otherwise complete and you start to localize
463 it, you can scrape together all the various keys you use, and pass it
464 to a translator; and then the translator's work will go faster if what
465 he's presented is this:
466
467 "Minimum ([_1]) is larger than maximum ([_2])!\n",
468 => "", # fill in something here, Jacques!
469
470 rather than this more cryptic mess:
471
472 "_min_larger_max_error"
473 => "", # fill in something here, Jacques
474
475 I think that keys as lexicon values makes the completed lexicon entries
476 more readable:
477
478 "Minimum ([_1]) is larger than maximum ([_2])!\n",
479 => "Le minimum ([_1]) est plus grand que le maximum ([_2])!\n",
480
481 Also, having valid values as keys becomes very useful if you set up an
482 _AUTO lexicon. _AUTO lexicons are discussed in a later section.
483
484 I almost always use keys that are themselves valid lexicon values. One
485 notable exception is when the value is quite long. For example, to get
486 the screenful of data that a command-line program might return when
487 given an unknown switch, I often just use a brief, self-explanatory key
488 such as "_USAGE_MESSAGE". At that point I then go and immediately to
489 define that lexicon entry in the ProjectClass::L10N::en lexicon (since
490 English is always my "project language"):
491
492 '_USAGE_MESSAGE' => <<'EOSTUFF',
493 ...long long message...
494 EOSTUFF
495
496 and then I can use it as:
497
498 getopt('oDI', \%opts) or die $lh->maketext('_USAGE_MESSAGE');
499
500 Incidentally, note that each class's %Lexicon inherits-and-extends the
501 lexicons in its superclasses. This is not because these are special
502 hashes per se, but because you access them via the "maketext" method,
503 which looks for entries across all the %Lexicon hashes in a language
504 class and all its ancestor classes. (This is because the idea of
505 "class data" isn't directly implemented in Perl, but is instead left to
506 individual class-systems to implement as they see fit..)
507
508 Note that you may have things stored in a lexicon besides just phrases
509 for output: for example, if your program takes input from the
510 keyboard, asking a "(Y/N)" question, you probably need to know what the
511 equivalent of "Y[es]/N[o]" is in whatever language. You probably also
512 need to know what the equivalents of the answers "y" and "n" are. You
513 can store that information in the lexicon (say, under the keys
514 "~answer_y" and "~answer_n", and the long forms as "~answer_yes" and
515 "~answer_no", where "~" is just an ad-hoc character meant to indicate
516 to programmers/translators that these are not phrases for output).
517
518 Or instead of storing this in the language class's lexicon, you can
519 (and, in some cases, really should) represent the same bit of knowledge
520 as code in a method in the language class. (That leaves a tidy
521 distinction between the lexicon as the things we know how to say, and
522 the rest of the things in the lexicon class as things that we know how
523 to do.) Consider this example of a processor for responses to French
524 "oui/non" questions:
525
526 sub y_or_n {
527 return undef unless defined $_[1] and length $_[1];
528 my $answer = lc $_[1]; # smash case
529 return 1 if $answer eq 'o' or $answer eq 'oui';
530 return 0 if $answer eq 'n' or $answer eq 'non';
531 return undef;
532 }
533
534 ...which you'd then call in a construct like this:
535
536 my $response;
537 until(defined $response) {
538 print $lh->maketext("Open the pod bay door (y/n)? ");
539 $response = $lh->y_or_n( get_input_from_keyboard_somehow() );
540 }
541 if($response) { $pod_bay_door->open() }
542 else { $pod_bay_door->leave_closed() }
543
544 Other data worth storing in a lexicon might be things like filenames
545 for language-targetted resources:
546
547 ...
548 "_main_splash_png"
549 => "/styles/en_us/main_splash.png",
550 "_main_splash_imagemap"
551 => "/styles/en_us/main_splash.incl",
552 "_general_graphics_path"
553 => "/styles/en_us/",
554 "_alert_sound"
555 => "/styles/en_us/hey_there.wav",
556 "_forward_icon"
557 => "left_arrow.png",
558 "_backward_icon"
559 => "right_arrow.png",
560 # In some other languages, left equals
561 # BACKwards, and right is FOREwards.
562 ...
563
564 You might want to do the same thing for expressing key bindings or the
565 like (since hardwiring "q" as the binding for the function that quits a
566 screen/menu/program is useful only if your language happens to
567 associate "q" with "quit"!)
568
570 Bracket Notation is a crucial feature of Locale::Maketext. I mean
571 Bracket Notation to provide a replacement for the use of sprintf
572 formatting. Everything you do with Bracket Notation could be done with
573 a sub block, but bracket notation is meant to be much more concise.
574
575 Bracket Notation is a like a miniature "template" system (in the sense
576 of Text::Template, not in the sense of C++ templates), where normal
577 text is passed thru basically as is, but text in special regions is
578 specially interpreted. In Bracket Notation, you use square brackets
579 ("[...]"), not curly braces ("{...}") to note sections that are
580 specially interpreted.
581
582 For example, here all the areas that are taken literally are underlined
583 with a "^", and all the in-bracket special regions are underlined with
584 an X:
585
586 "Minimum ([_1]) is larger than maximum ([_2])!\n",
587 ^^^^^^^^^ XX ^^^^^^^^^^^^^^^^^^^^^^^^^^ XX ^^^^
588
589 When that string is compiled from bracket notation into a real Perl
590 sub, it's basically turned into:
591
592 sub {
593 my $lh = $_[0];
594 my @params = @_;
595 return join '',
596 "Minimum (",
597 ...some code here...
598 ") is larger than maximum (",
599 ...some code here...
600 ")!\n",
601 }
602 # to be called by $lh->maketext(KEY, params...)
603
604 In other words, text outside bracket groups is turned into string
605 literals. Text in brackets is rather more complex, and currently
606 follows these rules:
607
608 · Bracket groups that are empty, or which consist only of whitespace,
609 are ignored. (Examples: "[]", "[ ]", or a [ and a ] with
610 returns and/or tabs and/or spaces between them.
611
612 Otherwise, each group is taken to be a comma-separated group of
613 items, and each item is interpreted as follows:
614
615 · An item that is "_digits" or "_-digits" is interpreted as
616 $_[value]. I.e., "_1" becomes with $_[1], and "_-3" is interpreted
617 as $_[-3] (in which case @_ should have at least three elements in
618 it). Note that $_[0] is the language handle, and is typically not
619 named directly.
620
621 · An item "_*" is interpreted to mean "all of @_ except $_[0]".
622 I.e., @_[1..$#_]. Note that this is an empty list in the case of
623 calls like $lh->maketext(key) where there are no parameters (except
624 $_[0], the language handle).
625
626 · Otherwise, each item is interpreted as a string literal.
627
628 The group as a whole is interpreted as follows:
629
630 · If the first item in a bracket group looks like a method name, then
631 that group is interpreted like this:
632
633 $lh->that_method_name(
634 ...rest of items in this group...
635 ),
636
637 · If the first item in a bracket group is "*", it's taken as
638 shorthand for the so commonly called "quant" method. Similarly, if
639 the first item in a bracket group is "#", it's taken to be
640 shorthand for "numf".
641
642 · If the first item in a bracket group is the empty-string, or "_*"
643 or "_digits" or "_-digits", then that group is interpreted as just
644 the interpolation of all its items:
645
646 join('',
647 ...rest of items in this group...
648 ),
649
650 Examples: "[_1]" and "[,_1]", which are synonymous; and
651 ""[,ID-(,_4,-,_2,)]"", which compiles as "join "", "ID-(", $_[4],
652 "-", $_[2], ")"".
653
654 · Otherwise this bracket group is invalid. For example, in the group
655 "[!@#,whatever]", the first item "!@#" is neither the empty-string,
656 "_number", "_-number", "_*", nor a valid method name; and so
657 Locale::Maketext will throw an exception of you try compiling an
658 expression containing this bracket group.
659
660 Note, incidentally, that items in each group are comma-separated, not
661 "/\s*,\s*/"-separated. That is, you might expect that this bracket
662 group:
663
664 "Hoohah [foo, _1 , bar ,baz]!"
665
666 would compile to this:
667
668 sub {
669 my $lh = $_[0];
670 return join '',
671 "Hoohah ",
672 $lh->foo( $_[1], "bar", "baz"),
673 "!",
674 }
675
676 But it actually compiles as this:
677
678 sub {
679 my $lh = $_[0];
680 return join '',
681 "Hoohah ",
682 $lh->foo(" _1 ", " bar ", "baz"), # note the <space> in " bar "
683 "!",
684 }
685
686 In the notation discussed so far, the characters "[" and "]" are given
687 special meaning, for opening and closing bracket groups, and "," has a
688 special meaning inside bracket groups, where it separates items in the
689 group. This begs the question of how you'd express a literal "[" or
690 "]" in a Bracket Notation string, and how you'd express a literal comma
691 inside a bracket group. For this purpose I've adopted "~" (tilde) as
692 an escape character: "~[" means a literal '[' character anywhere in
693 Bracket Notation (i.e., regardless of whether you're in a bracket group
694 or not), and ditto for "~]" meaning a literal ']', and "~," meaning a
695 literal comma. (Altho "," means a literal comma outside of bracket
696 groups -- it's only inside bracket groups that commas are special.)
697
698 And on the off chance you need a literal tilde in a bracket expression,
699 you get it with "~~".
700
701 Currently, an unescaped "~" before a character other than a bracket or
702 a comma is taken to mean just a "~" and that character. I.e., "~X"
703 means the same as "~~X" -- i.e., one literal tilde, and then one
704 literal "X". However, by using "~X", you are assuming that no future
705 version of Maketext will use "~X" as a magic escape sequence. In
706 practice this is not a great problem, since first off you can just
707 write "~~X" and not worry about it; second off, I doubt I'll add lots
708 of new magic characters to bracket notation; and third off, you aren't
709 likely to want literal "~" characters in your messages anyway, since
710 it's not a character with wide use in natural language text.
711
712 Brackets must be balanced -- every openbracket must have one matching
713 closebracket, and vice versa. So these are all invalid:
714
715 "I ate [quant,_1,rhubarb pie."
716 "I ate [quant,_1,rhubarb pie[."
717 "I ate quant,_1,rhubarb pie]."
718 "I ate quant,_1,rhubarb pie[."
719
720 Currently, bracket groups do not nest. That is, you cannot say:
721
722 "Foo [bar,baz,[quux,quuux]]\n";
723
724 If you need a notation that's that powerful, use normal Perl:
725
726 %Lexicon = (
727 ...
728 "some_key" => sub {
729 my $lh = $_[0];
730 join '',
731 "Foo ",
732 $lh->bar('baz', $lh->quux('quuux')),
733 "\n",
734 },
735 ...
736 );
737
738 Or write the "bar" method so you don't need to pass it the output from
739 calling quux.
740
741 I do not anticipate that you will need (or particularly want) to nest
742 bracket groups, but you are welcome to email me with convincing (real-
743 life) arguments to the contrary.
744
746 If maketext goes to look in an individual %Lexicon for an entry for key
747 (where key does not start with an underscore), and sees none, but does
748 see an entry of "_AUTO" => some_true_value, then we actually define
749 $Lexicon{key} = key right then and there, and then use that value as if
750 it had been there all along. This happens before we even look in any
751 superclass %Lexicons!
752
753 (This is meant to be somewhat like the AUTOLOAD mechanism in Perl's
754 function call system -- or, looked at another way, like the AutoLoader
755 module.)
756
757 I can picture all sorts of circumstances where you just do not want
758 lookup to be able to fail (since failing normally means that maketext
759 throws a "die", although see the next section for greater control over
760 that). But here's one circumstance where _AUTO lexicons are meant to
761 be especially useful:
762
763 As you're writing an application, you decide as you go what messages
764 you need to emit. Normally you'd go to write this:
765
766 if(-e $filename) {
767 go_process_file($filename)
768 } else {
769 print qq{Couldn't find file "$filename"!\n};
770 }
771
772 but since you anticipate localizing this, you write:
773
774 use ThisProject::I18N;
775 my $lh = ThisProject::I18N->get_handle();
776 # For the moment, assume that things are set up so
777 # that we load class ThisProject::I18N::en
778 # and that that's the class that $lh belongs to.
779 ...
780 if(-e $filename) {
781 go_process_file($filename)
782 } else {
783 print $lh->maketext(
784 qq{Couldn't find file "[_1]"!\n}, $filename
785 );
786 }
787
788 Now, right after you've just written the above lines, you'd normally
789 have to go open the file ThisProject/I18N/en.pm, and immediately add an
790 entry:
791
792 "Couldn't find file \"[_1]\"!\n"
793 => "Couldn't find file \"[_1]\"!\n",
794
795 But I consider that somewhat of a distraction from the work of getting
796 the main code working -- to say nothing of the fact that I often have
797 to play with the program a few times before I can decide exactly what
798 wording I want in the messages (which in this case would require me to
799 go changing three lines of code: the call to maketext with that key,
800 and then the two lines in ThisProject/I18N/en.pm).
801
802 However, if you set "_AUTO => 1" in the %Lexicon in,
803 ThisProject/I18N/en.pm (assuming that English (en) is the language that
804 all your programmers will be using for this project's internal message
805 keys), then you don't ever have to go adding lines like this
806
807 "Couldn't find file \"[_1]\"!\n"
808 => "Couldn't find file \"[_1]\"!\n",
809
810 to ThisProject/I18N/en.pm, because if _AUTO is true there, then just
811 looking for an entry with the key "Couldn't find file \"[_1]\"!\n" in
812 that lexicon will cause it to be added, with that value!
813
814 Note that the reason that keys that start with "_" are immune to _AUTO
815 isn't anything generally magical about the underscore character -- I
816 just wanted a way to have most lexicon keys be autoable, except for
817 possibly a few, and I arbitrarily decided to use a leading underscore
818 as a signal to distinguish those few.
819
821 If you call $lh->maketext(key, ...parameters...), and there's no entry
822 key in $lh's class's %Lexicon, nor in the superclass %Lexicon hash, and
823 if we can't auto-make key (because either it starts with a "_", or
824 because none of its lexicons have "_AUTO => 1,"), then we have failed
825 to find a normal way to maketext key. What then happens in these
826 failure conditions, depends on the $lh object's "fail" attribute.
827
828 If the language handle has no "fail" attribute, maketext will simply
829 throw an exception (i.e., it calls "die", mentioning the key whose
830 lookup failed, and naming the line number where the calling
831 $lh->maketext(key,...) was.
832
833 If the language handle has a "fail" attribute whose value is a coderef,
834 then $lh->maketext(key,...params...) gives up and calls:
835
836 return $that_subref->($lh, $key, @params);
837
838 Otherwise, the "fail" attribute's value should be a string denoting a
839 method name, so that $lh->maketext(key,...params...) can give up with:
840
841 return $lh->$that_method_name($phrase, @params);
842
843 The "fail" attribute can be accessed with the "fail_with" method:
844
845 # Set to a coderef:
846 $lh->fail_with( \&failure_handler );
847
848 # Set to a method name:
849 $lh->fail_with( 'failure_method' );
850
851 # Set to nothing (i.e., so failure throws a plain exception)
852 $lh->fail_with( undef );
853
854 # Get the current value
855 $handler = $lh->fail_with();
856
857 Now, as to what you may want to do with these handlers: Maybe you'd
858 want to log what key failed for what class, and then die. Maybe you
859 don't like "die" and instead you want to send the error message to
860 STDOUT (or wherever) and then merely "exit()".
861
862 Or maybe you don't want to "die" at all! Maybe you could use a handler
863 like this:
864
865 # Make all lookups fall back onto an English value,
866 # but only after we log it for later fingerpointing.
867 my $lh_backup = ThisProject->get_handle('en');
868 open(LEX_FAIL_LOG, ">>wherever/lex.log") || die "GNAARGH $!";
869 sub lex_fail {
870 my($failing_lh, $key, $params) = @_;
871 print LEX_FAIL_LOG scalar(localtime), "\t",
872 ref($failing_lh), "\t", $key, "\n";
873 return $lh_backup->maketext($key,@params);
874 }
875
876 Some users have expressed that they think this whole mechanism of
877 having a "fail" attribute at all, seems a rather pointless
878 complication. But I want Locale::Maketext to be usable for software
879 projects of any scale and type; and different software projects have
880 different ideas of what the right thing is to do in failure conditions.
881 I could simply say that failure always throws an exception, and that if
882 you want to be careful, you'll just have to wrap every call to
883 $lh->maketext in an eval { }. However, I want programmers to reserve
884 the right (via the "fail" attribute) to treat lookup failure as
885 something other than an exception of the same level of severity as a
886 config file being unreadable, or some essential resource being
887 inaccessible.
888
889 One possibly useful value for the "fail" attribute is the method name
890 "failure_handler_auto". This is a method defined in the class
891 Locale::Maketext itself. You set it with:
892
893 $lh->fail_with('failure_handler_auto');
894
895 Then when you call $lh->maketext(key, ...parameters...) and there's no
896 key in any of those lexicons, maketext gives up with
897
898 return $lh->failure_handler_auto($key, @params);
899
900 But failure_handler_auto, instead of dying or anything, compiles $key,
901 caching it in
902
903 $lh->{'failure_lex'}{$key} = $complied
904
905 and then calls the compiled value, and returns that. (I.e., if $key
906 looks like bracket notation, $compiled is a sub, and we return
907 &{$compiled}(@params); but if $key is just a plain string, we just
908 return that.)
909
910 The effect of using "failure_auto_handler" is like an AUTO lexicon,
911 except that it 1) compiles $key even if it starts with "_", and 2) you
912 have a record in the new hashref $lh->{'failure_lex'} of all the keys
913 that have failed for this object. This should avoid your program dying
914 -- as long as your keys aren't actually invalid as bracket code, and as
915 long as they don't try calling methods that don't exist.
916
917 "failure_auto_handler" may not be exactly what you want, but I hope it
918 at least shows you that maketext failure can be mitigated in any number
919 of very flexible ways. If you can formalize exactly what you want, you
920 should be able to express that as a failure handler. You can even make
921 it default for every object of a given class, by setting it in that
922 class's init:
923
924 sub init {
925 my $lh = $_[0]; # a newborn handle
926 $lh->SUPER::init();
927 $lh->fail_with('my_clever_failure_handler');
928 return;
929 }
930 sub my_clever_failure_handler {
931 ...you clever things here...
932 }
933
935 Here is a brief checklist on how to use Maketext to localize
936 applications:
937
938 · Decide what system you'll use for lexicon keys. If you insist, you
939 can use opaque IDs (if you're nostalgic for "catgets"), but I have
940 better suggestions in the section "Entries in Each Lexicon", above.
941 Assuming you opt for meaningful keys that double as values (like
942 "Minimum ([_1]) is larger than maximum ([_2])!\n"), you'll have to
943 settle on what language those should be in. For the sake of
944 argument, I'll call this English, specifically American English,
945 "en-US".
946
947 · Create a class for your localization project. This is the name of
948 the class that you'll use in the idiom:
949
950 use Projname::L10N;
951 my $lh = Projname::L10N->get_handle(...) || die "Language?";
952
953 Assuming you call your class Projname::L10N, create a class
954 consisting minimally of:
955
956 package Projname::L10N;
957 use base qw(Locale::Maketext);
958 ...any methods you might want all your languages to share...
959
960 # And, assuming you want the base class to be an _AUTO lexicon,
961 # as is discussed a few sections up:
962
963 1;
964
965 · Create a class for the language your internal keys are in. Name
966 the class after the language-tag for that language, in lowercase,
967 with dashes changed to underscores. Assuming your project's first
968 language is US English, you should call this Projname::L10N::en_us.
969 It should consist minimally of:
970
971 package Projname::L10N::en_us;
972 use base qw(Projname::L10N);
973 %Lexicon = (
974 '_AUTO' => 1,
975 );
976 1;
977
978 (For the rest of this section, I'll assume that this "first
979 language class" of Projname::L10N::en_us has _AUTO lexicon.)
980
981 · Go and write your program. Everywhere in your program where you
982 would say:
983
984 print "Foobar $thing stuff\n";
985
986 instead do it thru maketext, using no variable interpolation in the
987 key:
988
989 print $lh->maketext("Foobar [_1] stuff\n", $thing);
990
991 If you get tired of constantly saying "print $lh->maketext",
992 consider making a functional wrapper for it, like so:
993
994 use Projname::L10N;
995 use vars qw($lh);
996 $lh = Projname::L10N->get_handle(...) || die "Language?";
997 sub pmt (@) { print( $lh->maketext(@_)) }
998 # "pmt" is short for "Print MakeText"
999 $Carp::Verbose = 1;
1000 # so if maketext fails, we see made the call to pmt
1001
1002 Besides whole phrases meant for output, anything language-dependent
1003 should be put into the class Projname::L10N::en_us, whether as
1004 methods, or as lexicon entries -- this is discussed in the section
1005 "Entries in Each Lexicon", above.
1006
1007 · Once the program is otherwise done, and once its localization for
1008 the first language works right (via the data and methods in
1009 Projname::L10N::en_us), you can get together the data for
1010 translation. If your first language lexicon isn't an _AUTO
1011 lexicon, then you already have all the messages explicitly in the
1012 lexicon (or else you'd be getting exceptions thrown when you call
1013 $lh->maketext to get messages that aren't in there). But if you
1014 were (advisedly) lazy and are using an _AUTO lexicon, then you've
1015 got to make a list of all the phrases that you've so far been
1016 letting _AUTO generate for you. There are very many ways to
1017 assemble such a list. The most straightforward is to simply grep
1018 the source for every occurrence of "maketext" (or calls to wrappers
1019 around it, like the above "pmt" function), and to log the following
1020 phrase.
1021
1022 · You may at this point want to consider whether your base class
1023 (Projname::L10N), from which all lexicons inherit from
1024 (Projname::L10N::en, Projname::L10N::es, etc.), should be an _AUTO
1025 lexicon. It may be true that in theory, all needed messages will
1026 be in each language class; but in the presumably unlikely or
1027 "impossible" case of lookup failure, you should consider whether
1028 your program should throw an exception, emit text in English (or
1029 whatever your project's first language is), or some more complex
1030 solution as described in the section "Controlling Lookup Failure",
1031 above.
1032
1033 · Submit all messages/phrases/etc. to translators.
1034
1035 (You may, in fact, want to start with localizing to one other
1036 language at first, if you're not sure that you've properly
1037 abstracted the language-dependent parts of your code.)
1038
1039 Translators may request clarification of the situation in which a
1040 particular phrase is found. For example, in English we are
1041 entirely happy saying "n files found", regardless of whether we
1042 mean "I looked for files, and found n of them" or the rather
1043 distinct situation of "I looked for something else (like lines in
1044 files), and along the way I saw n files." This may involve
1045 rethinking things that you thought quite clear: should "Edit" on a
1046 toolbar be a noun ("editing") or a verb ("to edit")? Is there
1047 already a conventionalized way to express that menu option,
1048 separate from the target language's normal word for "to edit"?
1049
1050 In all cases where the very common phenomenon of quantification
1051 (saying "N files", for any value of N) is involved, each translator
1052 should make clear what dependencies the number causes in the
1053 sentence. In many cases, dependency is limited to words adjacent
1054 to the number, in places where you might expect them ("I found
1055 the-?PLURAL N empty-?PLURAL directory-?PLURAL"), but in some cases
1056 there are unexpected dependencies ("I found-?PLURAL ..."!) as well
1057 as long-distance dependencies "The N directory-?PLURAL could not be
1058 deleted-?PLURAL"!).
1059
1060 Remind the translators to consider the case where N is 0: "0 files
1061 found" isn't exactly natural-sounding in any language, but it may
1062 be unacceptable in many -- or it may condition special kinds of
1063 agreement (similar to English "I didN'T find ANY files").
1064
1065 Remember to ask your translators about numeral formatting in their
1066 language, so that you can override the "numf" method as
1067 appropriate. Typical variables in number formatting are: what to
1068 use as a decimal point (comma? period?); what to use as a thousands
1069 separator (space? nonbreaking space? comma? period? small middot?
1070 prime? apostrophe?); and even whether the so-called "thousands
1071 separator" is actually for every third digit -- I've heard reports
1072 of two hundred thousand being expressible as "2,00,000" for some
1073 Indian (Subcontinental) languages, besides the less surprising
1074 "200 000", "200.000", "200,000", and "200'000". Also, using a set
1075 of numeral glyphs other than the usual ASCII "0"-"9" might be
1076 appreciated, as via "tr/0-9/\x{0966}-\x{096F}/" for getting digits
1077 in Devanagari script (for Hindi, Konkani, others).
1078
1079 The basic "quant" method that Locale::Maketext provides should be
1080 good for many languages. For some languages, it might be useful to
1081 modify it (or its constituent "numerate" method) to take a plural
1082 form in the two-argument call to "quant" (as in "[quant,_1,files]")
1083 if it's all-around easier to infer the singular form from the
1084 plural, than to infer the plural form from the singular.
1085
1086 But for other languages (as is discussed at length in
1087 Locale::Maketext::TPJ13), simple "quant"/"numerify" is not enough.
1088 For the particularly problematic Slavic languages, what you may
1089 need is a method which you provide with the number, the citation
1090 form of the noun to quantify, and the case and gender that the
1091 sentence's syntax projects onto that noun slot. The method would
1092 then be responsible for determining what grammatical number that
1093 numeral projects onto its noun phrase, and what case and gender it
1094 may override the normal case and gender with; and then it would
1095 look up the noun in a lexicon providing all needed inflected forms.
1096
1097 · You may also wish to discuss with the translators the question of
1098 how to relate different subforms of the same language tag,
1099 considering how this reacts with "get_handle"'s treatment of these.
1100 For example, if a user accepts interfaces in "en, fr", and you have
1101 interfaces available in "en-US" and "fr", what should they get?
1102 You may wish to resolve this by establishing that "en" and "en-US"
1103 are effectively synonymous, by having one class zero-derive from
1104 the other.
1105
1106 For some languages this issue may never come up (Danish is rarely
1107 expressed as "da-DK", but instead is just "da"). And for other
1108 languages, the whole concept of a "generic" form may verge on being
1109 uselessly vague, particularly for interfaces involving voice media
1110 in forms of Arabic or Chinese.
1111
1112 · Once you've localized your program/site/etc. for all desired
1113 languages, be sure to show the result (whether live, or via
1114 screenshots) to the translators. Once they approve, make every
1115 effort to have it then checked by at least one other speaker of
1116 that language. This holds true even when (or especially when) the
1117 translation is done by one of your own programmers. Some kinds of
1118 systems may be harder to find testers for than others, depending on
1119 the amount of domain-specific jargon and concepts involved -- it's
1120 easier to find people who can tell you whether they approve of your
1121 translation for "delete this message" in an email-via-Web
1122 interface, than to find people who can give you an informed opinion
1123 on your translation for "attribute value" in an XML query tool's
1124 interface.
1125
1127 I recommend reading all of these:
1128
1129 Locale::Maketext::TPJ13 -- my The Perl Journal article about Maketext.
1130 It explains many important concepts underlying Locale::Maketext's
1131 design, and some insight into why Maketext is better than the plain old
1132 approach of having message catalogs that are just databases of sprintf
1133 formats.
1134
1135 File::Findgrep is a sample application/module that uses
1136 Locale::Maketext to localize its messages. For a larger
1137 internationalized system, see also Apache::MP3.
1138
1139 I18N::LangTags.
1140
1141 Win32::Locale.
1142
1143 RFC 3066, Tags for the Identification of Languages, as at
1144 http://sunsite.dk/RFC/rfc/rfc3066.html
1145
1146 RFC 2277, IETF Policy on Character Sets and Languages is at
1147 http://sunsite.dk/RFC/rfc/rfc2277.html -- much of it is just things of
1148 interest to protocol designers, but it explains some basic concepts,
1149 like the distinction between locales and language-tags.
1150
1151 The manual for GNU "gettext". The gettext dist is available in
1152 "ftp://prep.ai.mit.edu/pub/gnu/" -- get a recent gettext tarball and
1153 look in its "doc/" directory, there's an easily browsable HTML version
1154 in there. The gettext documentation asks lots of questions worth
1155 thinking about, even if some of their answers are sometimes wonky,
1156 particularly where they start talking about pluralization.
1157
1158 The Locale/Maketext.pm source. Obverse that the module is much shorter
1159 than its documentation!
1160
1162 Copyright (c) 1999-2004 Sean M. Burke. All rights reserved.
1163
1164 This library is free software; you can redistribute it and/or modify it
1165 under the same terms as Perl itself.
1166
1167 This program is distributed in the hope that it will be useful, but
1168 without any warranty; without even the implied warranty of
1169 merchantability or fitness for a particular purpose.
1170
1172 Sean M. Burke "sburke@cpan.org"
1173
1174
1175
1176perl v5.10.1 2009-02-12 Locale::Maketext(3pm)