1Locale::Maketext(3) User Contributed Perl Documentation Locale::Maketext(3)
2
3
4
6 Locale::Maketext - framework for localization
7
9 package MyProgram;
10 use strict;
11 use MyProgram::L10N;
12 # ...which inherits from Locale::Maketext
13 my $lh = MyProgram::L10N->get_handle() || die "What language?";
14 ...
15 # And then any messages your program emits, like:
16 warn $lh->maketext( "Can't open file [_1]: [_2]\n", $f, $! );
17 ...
18
20 It is a common feature of applications (whether run directly, or via
21 the Web) for them to be "localized" -- i.e., for them to a present an
22 English interface to an English-speaker, a German interface to a
23 German-speaker, and so on for all languages it's programmed with.
24 Locale::Maketext is a framework for software localization; it provides
25 you with the tools for organizing and accessing the bits of text and
26 text-processing code that you need for producing localized
27 applications.
28
29 In order to make sense of Maketext and how all its components fit
30 together, you should probably go read Locale::Maketext::TPJ13, and then
31 read the following documentation.
32
33 You may also want to read over the source for "File::Findgrep" and its
34 constituent modules -- they are a complete (if small) example
35 application that uses Maketext.
36
38 The basic design of Locale::Maketext is object-oriented, and
39 Locale::Maketext is an abstract base class, from which you derive a
40 "project class". The project class (with a name like
41 "TkBocciBall::Localize", which you then use in your module) is in turn
42 the base class for all the "language classes" for your project (with
43 names "TkBocciBall::Localize::it", "TkBocciBall::Localize::en",
44 "TkBocciBall::Localize::fr", etc.).
45
46 A language class is a class containing a lexicon of phrases as class
47 data, and possibly also some methods that are of use in interpreting
48 phrases in the lexicon, or otherwise dealing with text in that
49 language.
50
51 An object belonging to a language class is called a "language handle";
52 it's typically a flyweight object.
53
54 The normal course of action is to call:
55
56 use TkBocciBall::Localize; # the localization project class
57 $lh = TkBocciBall::Localize->get_handle();
58 # Depending on the user's locale, etc., this will
59 # make a language handle from among the classes available,
60 # and any defaults that you declare.
61 die "Couldn't make a language handle??" unless $lh;
62
63 From then on, you use the "maketext" function to access entries in
64 whatever lexicon(s) belong to the language handle you got. So, this:
65
66 print $lh->maketext("You won!"), "\n";
67
68 ...emits the right text for this language. If the object in $lh
69 belongs to class "TkBocciBall::Localize::fr" and
70 %TkBocciBall::Localize::fr::Lexicon contains "("You won!" => "Tu as
71 gagne!")", then the above code happily tells the user "Tu as gagne!".
72
74 Locale::Maketext offers a variety of methods, which fall into three
75 categories:
76
77 · Methods to do with constructing language handles.
78
79 · "maketext" and other methods to do with accessing %Lexicon data for
80 a given language handle.
81
82 · Methods that you may find it handy to use, from routines of yours
83 that you put in %Lexicon entries.
84
85 These are covered in the following section.
86
87 Construction Methods
88 These are to do with constructing a language handle:
89
90 · $lh = YourProjClass->get_handle( ...langtags... ) || die "lg-
91 handle?";
92
93 This tries loading classes based on the language-tags you give
94 (like "("en-US", "sk", "kon", "es-MX", "ja", "i-klingon")", and for
95 the first class that succeeds, returns
96 YourProjClass::language->new().
97
98 If it runs thru the entire given list of language-tags, and finds
99 no classes for those exact terms, it then tries "superordinate"
100 language classes. So if no "en-US" class (i.e.,
101 YourProjClass::en_us) was found, nor classes for anything else in
102 that list, we then try its superordinate, "en" (i.e.,
103 YourProjClass::en), and so on thru the other language-tags in the
104 given list: "es". (The other language-tags in our example list:
105 happen to have no superordinates.)
106
107 If none of those language-tags leads to loadable classes, we then
108 try classes derived from YourProjClass->fallback_languages() and
109 then if nothing comes of that, we use classes named by
110 YourProjClass->fallback_language_classes(). Then in the (probably
111 quite unlikely) event that that fails, we just return undef.
112
113 · $lh = YourProjClass->get_handle() || die "lg-handle?";
114
115 When "get_handle" is called with an empty parameter list, magic
116 happens:
117
118 If "get_handle" senses that it's running in program that was
119 invoked as a CGI, then it tries to get language-tags out of the
120 environment variable "HTTP_ACCEPT_LANGUAGE", and it pretends that
121 those were the languages passed as parameters to "get_handle".
122
123 Otherwise (i.e., if not a CGI), this tries various OS-specific ways
124 to get the language-tags for the current locale/language, and then
125 pretends that those were the value(s) passed to "get_handle".
126
127 Currently this OS-specific stuff consists of looking in the
128 environment variables "LANG" and "LANGUAGE"; and on MSWin machines
129 (where those variables are typically unused), this also tries using
130 the module Win32::Locale to get a language-tag for whatever
131 language/locale is currently selected in the "Regional Settings"
132 (or "International"?) Control Panel. I welcome further
133 suggestions for making this do the Right Thing under other
134 operating systems that support localization.
135
136 If you're using localization in an application that keeps a
137 configuration file, you might consider something like this in your
138 project class:
139
140 sub get_handle_via_config {
141 my $class = $_[0];
142 my $chosen_language = $Config_settings{'language'};
143 my $lh;
144 if($chosen_language) {
145 $lh = $class->get_handle($chosen_language)
146 || die "No language handle for \"$chosen_language\""
147 . " or the like";
148 } else {
149 # Config file missing, maybe?
150 $lh = $class->get_handle()
151 || die "Can't get a language handle";
152 }
153 return $lh;
154 }
155
156 · $lh = YourProjClass::langname->new();
157
158 This constructs a language handle. You usually don't call this
159 directly, but instead let "get_handle" find a language class to
160 "use" and to then call ->new on.
161
162 · $lh->init();
163
164 This is called by ->new to initialize newly-constructed language
165 handles. If you define an init method in your class, remember that
166 it's usually considered a good idea to call $lh->SUPER::init in it
167 (presumably at the beginning), so that all classes get a chance to
168 initialize a new object however they see fit.
169
170 · YourProjClass->fallback_languages()
171
172 "get_handle" appends the return value of this to the end of
173 whatever list of languages you pass "get_handle". Unless you
174 override this method, your project class will inherit
175 Locale::Maketext's "fallback_languages", which currently returns
176 "('i-default', 'en', 'en-US')". ("i-default" is defined in RFC
177 2277).
178
179 This method (by having it return the name of a language-tag that
180 has an existing language class) can be used for making sure that
181 "get_handle" will always manage to construct a language handle
182 (assuming your language classes are in an appropriate @INC
183 directory). Or you can use the next method:
184
185 · YourProjClass->fallback_language_classes()
186
187 "get_handle" appends the return value of this to the end of the
188 list of classes it will try using. Unless you override this
189 method, your project class will inherit Locale::Maketext's
190 "fallback_language_classes", which currently returns an empty list,
191 "()". By setting this to some value (namely, the name of a
192 loadable language class), you can be sure that "get_handle" will
193 always manage to construct a language handle.
194
195 The "maketext" Method
196 This is the most important method in Locale::Maketext:
197
198 $text = $lh->maketext(I<key>, ...parameters for this phrase...);
199
200 This looks in the %Lexicon of the language handle $lh and all its
201 superclasses, looking for an entry whose key is the string key.
202 Assuming such an entry is found, various things then happen, depending
203 on the value found:
204
205 If the value is a scalarref, the scalar is dereferenced and returned
206 (and any parameters are ignored).
207
208 If the value is a coderef, we return &$value($lh, ...parameters...).
209
210 If the value is a string that doesn't look like it's in Bracket
211 Notation, we return it (after replacing it with a scalarref, in its
212 %Lexicon).
213
214 If the value does look like it's in Bracket Notation, then we compile
215 it into a sub, replace the string in the %Lexicon with the new coderef,
216 and then we return &$new_sub($lh, ...parameters...).
217
218 Bracket Notation is discussed in a later section. Note that trying to
219 compile a string into Bracket Notation can throw an exception if the
220 string is not syntactically valid (say, by not balancing brackets
221 right.)
222
223 Also, calling &$coderef($lh, ...parameters...) can throw any sort of
224 exception (if, say, code in that sub tries to divide by zero). But a
225 very common exception occurs when you have Bracket Notation text that
226 says to call a method "foo", but there is no such method. (E.g., "You
227 have [quatn,_1,ball]." will throw an exception on trying to call
228 $lh->quatn($_[1],'ball') -- you presumably meant "quant".) "maketext"
229 catches these exceptions, but only to make the error message more
230 readable, at which point it rethrows the exception.
231
232 An exception may be thrown if key is not found in any of $lh's %Lexicon
233 hashes. What happens if a key is not found, is discussed in a later
234 section, "Controlling Lookup Failure".
235
236 Note that you might find it useful in some cases to override the
237 "maketext" method with an "after method", if you want to translate
238 encodings, or even scripts:
239
240 package YrProj::zh_cn; # Chinese with PRC-style glyphs
241 use base ('YrProj::zh_tw'); # Taiwan-style
242 sub maketext {
243 my $self = shift(@_);
244 my $value = $self->maketext(@_);
245 return Chineeze::taiwan2mainland($value);
246 }
247
248 Or you may want to override it with something that traps any
249 exceptions, if that's critical to your program:
250
251 sub maketext {
252 my($lh, @stuff) = @_;
253 my $out;
254 eval { $out = $lh->SUPER::maketext(@stuff) };
255 return $out unless $@;
256 ...otherwise deal with the exception...
257 }
258
259 Other than those two situations, I don't imagine that it's useful to
260 override the "maketext" method. (If you run into a situation where it
261 is useful, I'd be interested in hearing about it.)
262
263 $lh->fail_with or $lh->fail_with(PARAM)
264 $lh->failure_handler_auto
265 These two methods are discussed in the section "Controlling Lookup
266 Failure".
267
268 Utility Methods
269 These are methods that you may find it handy to use, generally from
270 %Lexicon routines of yours (whether expressed as Bracket Notation or
271 not).
272
273 $language->quant($number, $singular)
274 $language->quant($number, $singular, $plural)
275 $language->quant($number, $singular, $plural, $negative)
276 This is generally meant to be called from inside Bracket Notation
277 (which is discussed later), as in
278
279 "Your search matched [quant,_1,document]!"
280
281 It's for quantifying a noun (i.e., saying how much of it there is,
282 while giving the correct form of it). The behavior of this method
283 is handy for English and a few other Western European languages,
284 and you should override it for languages where it's not suitable.
285 You can feel free to read the source, but the current
286 implementation is basically as this pseudocode describes:
287
288 if $number is 0 and there's a $negative,
289 return $negative;
290 elsif $number is 1,
291 return "1 $singular";
292 elsif there's a $plural,
293 return "$number $plural";
294 else
295 return "$number " . $singular . "s";
296 #
297 # ...except that we actually call numf to
298 # stringify $number before returning it.
299
300 So for English (with Bracket Notation) "...[quant,_1,file]..." is
301 fine (for 0 it returns "0 files", for 1 it returns "1 file", and
302 for more it returns "2 files", etc.)
303
304 But for "directory", you'd want "[quant,_1,directory,directories]"
305 so that our elementary "quant" method doesn't think that the plural
306 of "directory" is "directorys". And you might find that the output
307 may sound better if you specify a negative form, as in:
308
309 "[quant,_1,file,files,No files] matched your query.\n"
310
311 Remember to keep in mind verb agreement (or adjectives too, in
312 other languages), as in:
313
314 "[quant,_1,document] were matched.\n"
315
316 Because if _1 is one, you get "1 document were matched". An
317 acceptable hack here is to do something like this:
318
319 "[quant,_1,document was, documents were] matched.\n"
320
321 $language->numf($number)
322 This returns the given number formatted nicely according to this
323 language's conventions. Maketext's default method is mostly to
324 just take the normal string form of the number (applying sprintf
325 "%G" for only very large numbers), and then to add commas as
326 necessary. (Except that we apply "tr/,./.,/" if
327 $language->{'numf_comma'} is true; that's a bit of a hack that's
328 useful for languages that express two million as "2.000.000" and
329 not as "2,000,000").
330
331 If you want anything fancier, consider overriding this with
332 something that uses Number::Format, or does something else
333 entirely.
334
335 Note that numf is called by quant for stringifying all quantifying
336 numbers.
337
338 $language->numerate($number, $singular, $plural, $negative)
339 This returns the given noun form which is appropriate for the
340 quantity $number according to this language's conventions.
341 "numerate" is used internally by "quant" to quantify nouns. Use it
342 directly -- usually from bracket notation -- to avoid "quant"'s
343 implicit call to "numf" and output of a numeric quantity.
344
345 $language->sprintf($format, @items)
346 This is just a wrapper around Perl's normal "sprintf" function.
347 It's provided so that you can use "sprintf" in Bracket Notation:
348
349 "Couldn't access datanode [sprintf,%10x=~[%s~],_1,_2]!\n"
350
351 returning...
352
353 Couldn't access datanode Stuff=[thangamabob]!
354
355 $language->language_tag()
356 Currently this just takes the last bit of "ref($language)", turns
357 underscores to dashes, and returns it. So if $language is an
358 object of class Hee::HOO::Haw::en_us, $language->language_tag()
359 returns "en-us". (Yes, the usual representation for that language
360 tag is "en-US", but case is never considered meaningful in
361 language-tag comparison.)
362
363 You may override this as you like; Maketext doesn't use it for
364 anything.
365
366 $language->encoding()
367 Currently this isn't used for anything, but it's provided (with
368 default value of "(ref($language) && $language->{'encoding'})) or
369 "iso-8859-1"" ) as a sort of suggestion that it may be
370 useful/necessary to associate encodings with your language handles
371 (whether on a per-class or even per-handle basis.)
372
373 Language Handle Attributes and Internals
374 A language handle is a flyweight object -- i.e., it doesn't
375 (necessarily) carry any data of interest, other than just being a
376 member of whatever class it belongs to.
377
378 A language handle is implemented as a blessed hash. Subclasses of
379 yours can store whatever data you want in the hash. Currently the only
380 hash entry used by any crucial Maketext method is "fail", so feel free
381 to use anything else as you like.
382
383 Remember: Don't be afraid to read the Maketext source if there's any
384 point on which this documentation is unclear. This documentation is
385 vastly longer than the module source itself.
386
388 These are Locale::Maketext's assumptions about the class hierarchy
389 formed by all your language classes:
390
391 · You must have a project base class, which you load, and which you
392 then use as the first argument in the call to
393 YourProjClass->get_handle(...). It should derive (whether directly
394 or indirectly) from Locale::Maketext. It doesn't matter how you
395 name this class, although assuming this is the localization
396 component of your Super Mega Program, good names for your project
397 class might be SuperMegaProgram::Localization,
398 SuperMegaProgram::L10N, SuperMegaProgram::I18N,
399 SuperMegaProgram::International, or even
400 SuperMegaProgram::Languages or SuperMegaProgram::Messages.
401
402 · Language classes are what YourProjClass->get_handle will try to
403 load. It will look for them by taking each language-tag (skipping
404 it if it doesn't look like a language-tag or locale-tag!), turning
405 it to all lowercase, turning dashes to underscores, and appending
406 it to YourProjClass . "::". So this:
407
408 $lh = YourProjClass->get_handle(
409 'en-US', 'fr', 'kon', 'i-klingon', 'i-klingon-romanized'
410 );
411
412 will try loading the classes YourProjClass::en_us (note
413 lowercase!), YourProjClass::fr, YourProjClass::kon,
414 YourProjClass::i_klingon and YourProjClass::i_klingon_romanized.
415 (And it'll stop at the first one that actually loads.)
416
417 · I assume that each language class derives (directly or indirectly)
418 from your project class, and also defines its @ISA, its %Lexicon,
419 or both. But I anticipate no dire consequences if these
420 assumptions do not hold.
421
422 · Language classes may derive from other language classes (although
423 they should have "use Thatclassname" or "use base
424 qw(...classes...)"). They may derive from the project class. They
425 may derive from some other class altogether. Or via multiple
426 inheritance, it may derive from any mixture of these.
427
428 · I foresee no problems with having multiple inheritance in your
429 hierarchy of language classes. (As usual, however, Perl will
430 complain bitterly if you have a cycle in the hierarchy: i.e., if
431 any class is its own ancestor.)
432
434 A typical %Lexicon entry is meant to signify a phrase, taking some
435 number (0 or more) of parameters. An entry is meant to be accessed by
436 via a string key in $lh->maketext(key, ...parameters...), which should
437 return a string that is generally meant for be used for "output" to the
438 user -- regardless of whether this actually means printing to STDOUT,
439 writing to a file, or putting into a GUI widget.
440
441 While the key must be a string value (since that's a basic restriction
442 that Perl places on hash keys), the value in the lexicon can currently
443 be of several types: a defined scalar, scalarref, or coderef. The use
444 of these is explained above, in the section 'The "maketext" Method',
445 and Bracket Notation for strings is discussed in the next section.
446
447 While you can use arbitrary unique IDs for lexicon keys (like
448 "_min_larger_max_error"), it is often useful for if an entry's key is
449 itself a valid value, like this example error message:
450
451 "Minimum ([_1]) is larger than maximum ([_2])!\n",
452
453 Compare this code that uses an arbitrary ID...
454
455 die $lh->maketext( "_min_larger_max_error", $min, $max )
456 if $min > $max;
457
458 ...to this code that uses a key-as-value:
459
460 die $lh->maketext(
461 "Minimum ([_1]) is larger than maximum ([_2])!\n",
462 $min, $max
463 ) if $min > $max;
464
465 The second is, in short, more readable. In particular, it's obvious
466 that the number of parameters you're feeding to that phrase (two) is
467 the number of parameters that it wants to be fed. (Since you see _1
468 and a _2 being used in the key there.)
469
470 Also, once a project is otherwise complete and you start to localize
471 it, you can scrape together all the various keys you use, and pass it
472 to a translator; and then the translator's work will go faster if what
473 he's presented is this:
474
475 "Minimum ([_1]) is larger than maximum ([_2])!\n",
476 => "", # fill in something here, Jacques!
477
478 rather than this more cryptic mess:
479
480 "_min_larger_max_error"
481 => "", # fill in something here, Jacques
482
483 I think that keys as lexicon values makes the completed lexicon entries
484 more readable:
485
486 "Minimum ([_1]) is larger than maximum ([_2])!\n",
487 => "Le minimum ([_1]) est plus grand que le maximum ([_2])!\n",
488
489 Also, having valid values as keys becomes very useful if you set up an
490 _AUTO lexicon. _AUTO lexicons are discussed in a later section.
491
492 I almost always use keys that are themselves valid lexicon values. One
493 notable exception is when the value is quite long. For example, to get
494 the screenful of data that a command-line program might return when
495 given an unknown switch, I often just use a brief, self-explanatory key
496 such as "_USAGE_MESSAGE". At that point I then go and immediately to
497 define that lexicon entry in the ProjectClass::L10N::en lexicon (since
498 English is always my "project language"):
499
500 '_USAGE_MESSAGE' => <<'EOSTUFF',
501 ...long long message...
502 EOSTUFF
503
504 and then I can use it as:
505
506 getopt('oDI', \%opts) or die $lh->maketext('_USAGE_MESSAGE');
507
508 Incidentally, note that each class's %Lexicon inherits-and-extends the
509 lexicons in its superclasses. This is not because these are special
510 hashes per se, but because you access them via the "maketext" method,
511 which looks for entries across all the %Lexicon hashes in a language
512 class and all its ancestor classes. (This is because the idea of
513 "class data" isn't directly implemented in Perl, but is instead left to
514 individual class-systems to implement as they see fit..)
515
516 Note that you may have things stored in a lexicon besides just phrases
517 for output: for example, if your program takes input from the
518 keyboard, asking a "(Y/N)" question, you probably need to know what the
519 equivalent of "Y[es]/N[o]" is in whatever language. You probably also
520 need to know what the equivalents of the answers "y" and "n" are. You
521 can store that information in the lexicon (say, under the keys
522 "~answer_y" and "~answer_n", and the long forms as "~answer_yes" and
523 "~answer_no", where "~" is just an ad-hoc character meant to indicate
524 to programmers/translators that these are not phrases for output).
525
526 Or instead of storing this in the language class's lexicon, you can
527 (and, in some cases, really should) represent the same bit of knowledge
528 as code in a method in the language class. (That leaves a tidy
529 distinction between the lexicon as the things we know how to say, and
530 the rest of the things in the lexicon class as things that we know how
531 to do.) Consider this example of a processor for responses to French
532 "oui/non" questions:
533
534 sub y_or_n {
535 return undef unless defined $_[1] and length $_[1];
536 my $answer = lc $_[1]; # smash case
537 return 1 if $answer eq 'o' or $answer eq 'oui';
538 return 0 if $answer eq 'n' or $answer eq 'non';
539 return undef;
540 }
541
542 ...which you'd then call in a construct like this:
543
544 my $response;
545 until(defined $response) {
546 print $lh->maketext("Open the pod bay door (y/n)? ");
547 $response = $lh->y_or_n( get_input_from_keyboard_somehow() );
548 }
549 if($response) { $pod_bay_door->open() }
550 else { $pod_bay_door->leave_closed() }
551
552 Other data worth storing in a lexicon might be things like filenames
553 for language-targetted resources:
554
555 ...
556 "_main_splash_png"
557 => "/styles/en_us/main_splash.png",
558 "_main_splash_imagemap"
559 => "/styles/en_us/main_splash.incl",
560 "_general_graphics_path"
561 => "/styles/en_us/",
562 "_alert_sound"
563 => "/styles/en_us/hey_there.wav",
564 "_forward_icon"
565 => "left_arrow.png",
566 "_backward_icon"
567 => "right_arrow.png",
568 # In some other languages, left equals
569 # BACKwards, and right is FOREwards.
570 ...
571
572 You might want to do the same thing for expressing key bindings or the
573 like (since hardwiring "q" as the binding for the function that quits a
574 screen/menu/program is useful only if your language happens to
575 associate "q" with "quit"!)
576
578 Bracket Notation is a crucial feature of Locale::Maketext. I mean
579 Bracket Notation to provide a replacement for the use of sprintf
580 formatting. Everything you do with Bracket Notation could be done with
581 a sub block, but bracket notation is meant to be much more concise.
582
583 Bracket Notation is a like a miniature "template" system (in the sense
584 of Text::Template, not in the sense of C++ templates), where normal
585 text is passed thru basically as is, but text in special regions is
586 specially interpreted. In Bracket Notation, you use square brackets
587 ("[...]"), not curly braces ("{...}") to note sections that are
588 specially interpreted.
589
590 For example, here all the areas that are taken literally are underlined
591 with a "^", and all the in-bracket special regions are underlined with
592 an X:
593
594 "Minimum ([_1]) is larger than maximum ([_2])!\n",
595 ^^^^^^^^^ XX ^^^^^^^^^^^^^^^^^^^^^^^^^^ XX ^^^^
596
597 When that string is compiled from bracket notation into a real Perl
598 sub, it's basically turned into:
599
600 sub {
601 my $lh = $_[0];
602 my @params = @_;
603 return join '',
604 "Minimum (",
605 ...some code here...
606 ") is larger than maximum (",
607 ...some code here...
608 ")!\n",
609 }
610 # to be called by $lh->maketext(KEY, params...)
611
612 In other words, text outside bracket groups is turned into string
613 literals. Text in brackets is rather more complex, and currently
614 follows these rules:
615
616 · Bracket groups that are empty, or which consist only of whitespace,
617 are ignored. (Examples: "[]", "[ ]", or a [ and a ] with
618 returns and/or tabs and/or spaces between them.
619
620 Otherwise, each group is taken to be a comma-separated group of
621 items, and each item is interpreted as follows:
622
623 · An item that is "_digits" or "_-digits" is interpreted as
624 $_[value]. I.e., "_1" becomes with $_[1], and "_-3" is interpreted
625 as $_[-3] (in which case @_ should have at least three elements in
626 it). Note that $_[0] is the language handle, and is typically not
627 named directly.
628
629 · An item "_*" is interpreted to mean "all of @_ except $_[0]".
630 I.e., @_[1..$#_]. Note that this is an empty list in the case of
631 calls like $lh->maketext(key) where there are no parameters (except
632 $_[0], the language handle).
633
634 · Otherwise, each item is interpreted as a string literal.
635
636 The group as a whole is interpreted as follows:
637
638 · If the first item in a bracket group looks like a method name, then
639 that group is interpreted like this:
640
641 $lh->that_method_name(
642 ...rest of items in this group...
643 ),
644
645 · If the first item in a bracket group is "*", it's taken as
646 shorthand for the so commonly called "quant" method. Similarly, if
647 the first item in a bracket group is "#", it's taken to be
648 shorthand for "numf".
649
650 · If the first item in a bracket group is the empty-string, or "_*"
651 or "_digits" or "_-digits", then that group is interpreted as just
652 the interpolation of all its items:
653
654 join('',
655 ...rest of items in this group...
656 ),
657
658 Examples: "[_1]" and "[,_1]", which are synonymous; and
659 ""[,ID-(,_4,-,_2,)]"", which compiles as "join "", "ID-(", $_[4],
660 "-", $_[2], ")"".
661
662 · Otherwise this bracket group is invalid. For example, in the group
663 "[!@#,whatever]", the first item "!@#" is neither the empty-string,
664 "_number", "_-number", "_*", nor a valid method name; and so
665 Locale::Maketext will throw an exception of you try compiling an
666 expression containing this bracket group.
667
668 Note, incidentally, that items in each group are comma-separated, not
669 "/\s*,\s*/"-separated. That is, you might expect that this bracket
670 group:
671
672 "Hoohah [foo, _1 , bar ,baz]!"
673
674 would compile to this:
675
676 sub {
677 my $lh = $_[0];
678 return join '',
679 "Hoohah ",
680 $lh->foo( $_[1], "bar", "baz"),
681 "!",
682 }
683
684 But it actually compiles as this:
685
686 sub {
687 my $lh = $_[0];
688 return join '',
689 "Hoohah ",
690 $lh->foo(" _1 ", " bar ", "baz"), # note the <space> in " bar "
691 "!",
692 }
693
694 In the notation discussed so far, the characters "[" and "]" are given
695 special meaning, for opening and closing bracket groups, and "," has a
696 special meaning inside bracket groups, where it separates items in the
697 group. This begs the question of how you'd express a literal "[" or
698 "]" in a Bracket Notation string, and how you'd express a literal comma
699 inside a bracket group. For this purpose I've adopted "~" (tilde) as
700 an escape character: "~[" means a literal '[' character anywhere in
701 Bracket Notation (i.e., regardless of whether you're in a bracket group
702 or not), and ditto for "~]" meaning a literal ']', and "~," meaning a
703 literal comma. (Altho "," means a literal comma outside of bracket
704 groups -- it's only inside bracket groups that commas are special.)
705
706 And on the off chance you need a literal tilde in a bracket expression,
707 you get it with "~~".
708
709 Currently, an unescaped "~" before a character other than a bracket or
710 a comma is taken to mean just a "~" and that character. I.e., "~X"
711 means the same as "~~X" -- i.e., one literal tilde, and then one
712 literal "X". However, by using "~X", you are assuming that no future
713 version of Maketext will use "~X" as a magic escape sequence. In
714 practice this is not a great problem, since first off you can just
715 write "~~X" and not worry about it; second off, I doubt I'll add lots
716 of new magic characters to bracket notation; and third off, you aren't
717 likely to want literal "~" characters in your messages anyway, since
718 it's not a character with wide use in natural language text.
719
720 Brackets must be balanced -- every openbracket must have one matching
721 closebracket, and vice versa. So these are all invalid:
722
723 "I ate [quant,_1,rhubarb pie."
724 "I ate [quant,_1,rhubarb pie[."
725 "I ate quant,_1,rhubarb pie]."
726 "I ate quant,_1,rhubarb pie[."
727
728 Currently, bracket groups do not nest. That is, you cannot say:
729
730 "Foo [bar,baz,[quux,quuux]]\n";
731
732 If you need a notation that's that powerful, use normal Perl:
733
734 %Lexicon = (
735 ...
736 "some_key" => sub {
737 my $lh = $_[0];
738 join '',
739 "Foo ",
740 $lh->bar('baz', $lh->quux('quuux')),
741 "\n",
742 },
743 ...
744 );
745
746 Or write the "bar" method so you don't need to pass it the output from
747 calling quux.
748
749 I do not anticipate that you will need (or particularly want) to nest
750 bracket groups, but you are welcome to email me with convincing (real-
751 life) arguments to the contrary.
752
754 If maketext goes to look in an individual %Lexicon for an entry for key
755 (where key does not start with an underscore), and sees none, but does
756 see an entry of "_AUTO" => some_true_value, then we actually define
757 $Lexicon{key} = key right then and there, and then use that value as if
758 it had been there all along. This happens before we even look in any
759 superclass %Lexicons!
760
761 (This is meant to be somewhat like the AUTOLOAD mechanism in Perl's
762 function call system -- or, looked at another way, like the AutoLoader
763 module.)
764
765 I can picture all sorts of circumstances where you just do not want
766 lookup to be able to fail (since failing normally means that maketext
767 throws a "die", although see the next section for greater control over
768 that). But here's one circumstance where _AUTO lexicons are meant to
769 be especially useful:
770
771 As you're writing an application, you decide as you go what messages
772 you need to emit. Normally you'd go to write this:
773
774 if(-e $filename) {
775 go_process_file($filename)
776 } else {
777 print qq{Couldn't find file "$filename"!\n};
778 }
779
780 but since you anticipate localizing this, you write:
781
782 use ThisProject::I18N;
783 my $lh = ThisProject::I18N->get_handle();
784 # For the moment, assume that things are set up so
785 # that we load class ThisProject::I18N::en
786 # and that that's the class that $lh belongs to.
787 ...
788 if(-e $filename) {
789 go_process_file($filename)
790 } else {
791 print $lh->maketext(
792 qq{Couldn't find file "[_1]"!\n}, $filename
793 );
794 }
795
796 Now, right after you've just written the above lines, you'd normally
797 have to go open the file ThisProject/I18N/en.pm, and immediately add an
798 entry:
799
800 "Couldn't find file \"[_1]\"!\n"
801 => "Couldn't find file \"[_1]\"!\n",
802
803 But I consider that somewhat of a distraction from the work of getting
804 the main code working -- to say nothing of the fact that I often have
805 to play with the program a few times before I can decide exactly what
806 wording I want in the messages (which in this case would require me to
807 go changing three lines of code: the call to maketext with that key,
808 and then the two lines in ThisProject/I18N/en.pm).
809
810 However, if you set "_AUTO => 1" in the %Lexicon in,
811 ThisProject/I18N/en.pm (assuming that English (en) is the language that
812 all your programmers will be using for this project's internal message
813 keys), then you don't ever have to go adding lines like this
814
815 "Couldn't find file \"[_1]\"!\n"
816 => "Couldn't find file \"[_1]\"!\n",
817
818 to ThisProject/I18N/en.pm, because if _AUTO is true there, then just
819 looking for an entry with the key "Couldn't find file \"[_1]\"!\n" in
820 that lexicon will cause it to be added, with that value!
821
822 Note that the reason that keys that start with "_" are immune to _AUTO
823 isn't anything generally magical about the underscore character -- I
824 just wanted a way to have most lexicon keys be autoable, except for
825 possibly a few, and I arbitrarily decided to use a leading underscore
826 as a signal to distinguish those few.
827
829 If your lexicon is a tied hash the simple act of caching the compiled
830 value can be fatal.
831
832 For example a GDBM_File GDBM_READER tied hash will die with something
833 like:
834
835 gdbm store returned -1, errno 2, key "..." at ...
836
837 All you need to do is turn on caching outside of the lexicon hash
838 itself like so:
839
840 sub init {
841 my ($lh) = @_;
842 ...
843 $lh->{'use_external_lex_cache'} = 1;
844 ...
845 }
846
847 And then instead of storing the compiled value in the lexicon hash it
848 will store it in $lh->{'_external_lex_cache'}
849
851 If you call $lh->maketext(key, ...parameters...), and there's no entry
852 key in $lh's class's %Lexicon, nor in the superclass %Lexicon hash, and
853 if we can't auto-make key (because either it starts with a "_", or
854 because none of its lexicons have "_AUTO => 1,"), then we have failed
855 to find a normal way to maketext key. What then happens in these
856 failure conditions, depends on the $lh object's "fail" attribute.
857
858 If the language handle has no "fail" attribute, maketext will simply
859 throw an exception (i.e., it calls "die", mentioning the key whose
860 lookup failed, and naming the line number where the calling
861 $lh->maketext(key,...) was.
862
863 If the language handle has a "fail" attribute whose value is a coderef,
864 then $lh->maketext(key,...params...) gives up and calls:
865
866 return $that_subref->($lh, $key, @params);
867
868 Otherwise, the "fail" attribute's value should be a string denoting a
869 method name, so that $lh->maketext(key,...params...) can give up with:
870
871 return $lh->$that_method_name($phrase, @params);
872
873 The "fail" attribute can be accessed with the "fail_with" method:
874
875 # Set to a coderef:
876 $lh->fail_with( \&failure_handler );
877
878 # Set to a method name:
879 $lh->fail_with( 'failure_method' );
880
881 # Set to nothing (i.e., so failure throws a plain exception)
882 $lh->fail_with( undef );
883
884 # Get the current value
885 $handler = $lh->fail_with();
886
887 Now, as to what you may want to do with these handlers: Maybe you'd
888 want to log what key failed for what class, and then die. Maybe you
889 don't like "die" and instead you want to send the error message to
890 STDOUT (or wherever) and then merely "exit()".
891
892 Or maybe you don't want to "die" at all! Maybe you could use a handler
893 like this:
894
895 # Make all lookups fall back onto an English value,
896 # but only after we log it for later fingerpointing.
897 my $lh_backup = ThisProject->get_handle('en');
898 open(LEX_FAIL_LOG, ">>wherever/lex.log") || die "GNAARGH $!";
899 sub lex_fail {
900 my($failing_lh, $key, $params) = @_;
901 print LEX_FAIL_LOG scalar(localtime), "\t",
902 ref($failing_lh), "\t", $key, "\n";
903 return $lh_backup->maketext($key,@params);
904 }
905
906 Some users have expressed that they think this whole mechanism of
907 having a "fail" attribute at all, seems a rather pointless
908 complication. But I want Locale::Maketext to be usable for software
909 projects of any scale and type; and different software projects have
910 different ideas of what the right thing is to do in failure conditions.
911 I could simply say that failure always throws an exception, and that if
912 you want to be careful, you'll just have to wrap every call to
913 $lh->maketext in an eval { }. However, I want programmers to reserve
914 the right (via the "fail" attribute) to treat lookup failure as
915 something other than an exception of the same level of severity as a
916 config file being unreadable, or some essential resource being
917 inaccessible.
918
919 One possibly useful value for the "fail" attribute is the method name
920 "failure_handler_auto". This is a method defined in the class
921 Locale::Maketext itself. You set it with:
922
923 $lh->fail_with('failure_handler_auto');
924
925 Then when you call $lh->maketext(key, ...parameters...) and there's no
926 key in any of those lexicons, maketext gives up with
927
928 return $lh->failure_handler_auto($key, @params);
929
930 But failure_handler_auto, instead of dying or anything, compiles $key,
931 caching it in
932
933 $lh->{'failure_lex'}{$key} = $complied
934
935 and then calls the compiled value, and returns that. (I.e., if $key
936 looks like bracket notation, $compiled is a sub, and we return
937 &{$compiled}(@params); but if $key is just a plain string, we just
938 return that.)
939
940 The effect of using "failure_auto_handler" is like an AUTO lexicon,
941 except that it 1) compiles $key even if it starts with "_", and 2) you
942 have a record in the new hashref $lh->{'failure_lex'} of all the keys
943 that have failed for this object. This should avoid your program dying
944 -- as long as your keys aren't actually invalid as bracket code, and as
945 long as they don't try calling methods that don't exist.
946
947 "failure_auto_handler" may not be exactly what you want, but I hope it
948 at least shows you that maketext failure can be mitigated in any number
949 of very flexible ways. If you can formalize exactly what you want, you
950 should be able to express that as a failure handler. You can even make
951 it default for every object of a given class, by setting it in that
952 class's init:
953
954 sub init {
955 my $lh = $_[0]; # a newborn handle
956 $lh->SUPER::init();
957 $lh->fail_with('my_clever_failure_handler');
958 return;
959 }
960 sub my_clever_failure_handler {
961 ...you clever things here...
962 }
963
965 Here is a brief checklist on how to use Maketext to localize
966 applications:
967
968 · Decide what system you'll use for lexicon keys. If you insist, you
969 can use opaque IDs (if you're nostalgic for "catgets"), but I have
970 better suggestions in the section "Entries in Each Lexicon", above.
971 Assuming you opt for meaningful keys that double as values (like
972 "Minimum ([_1]) is larger than maximum ([_2])!\n"), you'll have to
973 settle on what language those should be in. For the sake of
974 argument, I'll call this English, specifically American English,
975 "en-US".
976
977 · Create a class for your localization project. This is the name of
978 the class that you'll use in the idiom:
979
980 use Projname::L10N;
981 my $lh = Projname::L10N->get_handle(...) || die "Language?";
982
983 Assuming you call your class Projname::L10N, create a class
984 consisting minimally of:
985
986 package Projname::L10N;
987 use base qw(Locale::Maketext);
988 ...any methods you might want all your languages to share...
989
990 # And, assuming you want the base class to be an _AUTO lexicon,
991 # as is discussed a few sections up:
992
993 1;
994
995 · Create a class for the language your internal keys are in. Name
996 the class after the language-tag for that language, in lowercase,
997 with dashes changed to underscores. Assuming your project's first
998 language is US English, you should call this Projname::L10N::en_us.
999 It should consist minimally of:
1000
1001 package Projname::L10N::en_us;
1002 use base qw(Projname::L10N);
1003 %Lexicon = (
1004 '_AUTO' => 1,
1005 );
1006 1;
1007
1008 (For the rest of this section, I'll assume that this "first
1009 language class" of Projname::L10N::en_us has _AUTO lexicon.)
1010
1011 · Go and write your program. Everywhere in your program where you
1012 would say:
1013
1014 print "Foobar $thing stuff\n";
1015
1016 instead do it thru maketext, using no variable interpolation in the
1017 key:
1018
1019 print $lh->maketext("Foobar [_1] stuff\n", $thing);
1020
1021 If you get tired of constantly saying "print $lh->maketext",
1022 consider making a functional wrapper for it, like so:
1023
1024 use Projname::L10N;
1025 use vars qw($lh);
1026 $lh = Projname::L10N->get_handle(...) || die "Language?";
1027 sub pmt (@) { print( $lh->maketext(@_)) }
1028 # "pmt" is short for "Print MakeText"
1029 $Carp::Verbose = 1;
1030 # so if maketext fails, we see made the call to pmt
1031
1032 Besides whole phrases meant for output, anything language-dependent
1033 should be put into the class Projname::L10N::en_us, whether as
1034 methods, or as lexicon entries -- this is discussed in the section
1035 "Entries in Each Lexicon", above.
1036
1037 · Once the program is otherwise done, and once its localization for
1038 the first language works right (via the data and methods in
1039 Projname::L10N::en_us), you can get together the data for
1040 translation. If your first language lexicon isn't an _AUTO
1041 lexicon, then you already have all the messages explicitly in the
1042 lexicon (or else you'd be getting exceptions thrown when you call
1043 $lh->maketext to get messages that aren't in there). But if you
1044 were (advisedly) lazy and are using an _AUTO lexicon, then you've
1045 got to make a list of all the phrases that you've so far been
1046 letting _AUTO generate for you. There are very many ways to
1047 assemble such a list. The most straightforward is to simply grep
1048 the source for every occurrence of "maketext" (or calls to wrappers
1049 around it, like the above "pmt" function), and to log the following
1050 phrase.
1051
1052 · You may at this point want to consider whether your base class
1053 (Projname::L10N), from which all lexicons inherit from
1054 (Projname::L10N::en, Projname::L10N::es, etc.), should be an _AUTO
1055 lexicon. It may be true that in theory, all needed messages will
1056 be in each language class; but in the presumably unlikely or
1057 "impossible" case of lookup failure, you should consider whether
1058 your program should throw an exception, emit text in English (or
1059 whatever your project's first language is), or some more complex
1060 solution as described in the section "Controlling Lookup Failure",
1061 above.
1062
1063 · Submit all messages/phrases/etc. to translators.
1064
1065 (You may, in fact, want to start with localizing to one other
1066 language at first, if you're not sure that you've properly
1067 abstracted the language-dependent parts of your code.)
1068
1069 Translators may request clarification of the situation in which a
1070 particular phrase is found. For example, in English we are
1071 entirely happy saying "n files found", regardless of whether we
1072 mean "I looked for files, and found n of them" or the rather
1073 distinct situation of "I looked for something else (like lines in
1074 files), and along the way I saw n files." This may involve
1075 rethinking things that you thought quite clear: should "Edit" on a
1076 toolbar be a noun ("editing") or a verb ("to edit")? Is there
1077 already a conventionalized way to express that menu option,
1078 separate from the target language's normal word for "to edit"?
1079
1080 In all cases where the very common phenomenon of quantification
1081 (saying "N files", for any value of N) is involved, each translator
1082 should make clear what dependencies the number causes in the
1083 sentence. In many cases, dependency is limited to words adjacent
1084 to the number, in places where you might expect them ("I found
1085 the-?PLURAL N empty-?PLURAL directory-?PLURAL"), but in some cases
1086 there are unexpected dependencies ("I found-?PLURAL ..."!) as well
1087 as long-distance dependencies "The N directory-?PLURAL could not be
1088 deleted-?PLURAL"!).
1089
1090 Remind the translators to consider the case where N is 0: "0 files
1091 found" isn't exactly natural-sounding in any language, but it may
1092 be unacceptable in many -- or it may condition special kinds of
1093 agreement (similar to English "I didN'T find ANY files").
1094
1095 Remember to ask your translators about numeral formatting in their
1096 language, so that you can override the "numf" method as
1097 appropriate. Typical variables in number formatting are: what to
1098 use as a decimal point (comma? period?); what to use as a thousands
1099 separator (space? nonbreaking space? comma? period? small middot?
1100 prime? apostrophe?); and even whether the so-called "thousands
1101 separator" is actually for every third digit -- I've heard reports
1102 of two hundred thousand being expressible as "2,00,000" for some
1103 Indian (Subcontinental) languages, besides the less surprising
1104 "200 000", "200.000", "200,000", and "200'000". Also, using a set
1105 of numeral glyphs other than the usual ASCII "0"-"9" might be
1106 appreciated, as via "tr/0-9/\x{0966}-\x{096F}/" for getting digits
1107 in Devanagari script (for Hindi, Konkani, others).
1108
1109 The basic "quant" method that Locale::Maketext provides should be
1110 good for many languages. For some languages, it might be useful to
1111 modify it (or its constituent "numerate" method) to take a plural
1112 form in the two-argument call to "quant" (as in "[quant,_1,files]")
1113 if it's all-around easier to infer the singular form from the
1114 plural, than to infer the plural form from the singular.
1115
1116 But for other languages (as is discussed at length in
1117 Locale::Maketext::TPJ13), simple "quant"/"numf" is not enough. For
1118 the particularly problematic Slavic languages, what you may need is
1119 a method which you provide with the number, the citation form of
1120 the noun to quantify, and the case and gender that the sentence's
1121 syntax projects onto that noun slot. The method would then be
1122 responsible for determining what grammatical number that numeral
1123 projects onto its noun phrase, and what case and gender it may
1124 override the normal case and gender with; and then it would look up
1125 the noun in a lexicon providing all needed inflected forms.
1126
1127 · You may also wish to discuss with the translators the question of
1128 how to relate different subforms of the same language tag,
1129 considering how this reacts with "get_handle"'s treatment of these.
1130 For example, if a user accepts interfaces in "en, fr", and you have
1131 interfaces available in "en-US" and "fr", what should they get?
1132 You may wish to resolve this by establishing that "en" and "en-US"
1133 are effectively synonymous, by having one class zero-derive from
1134 the other.
1135
1136 For some languages this issue may never come up (Danish is rarely
1137 expressed as "da-DK", but instead is just "da"). And for other
1138 languages, the whole concept of a "generic" form may verge on being
1139 uselessly vague, particularly for interfaces involving voice media
1140 in forms of Arabic or Chinese.
1141
1142 · Once you've localized your program/site/etc. for all desired
1143 languages, be sure to show the result (whether live, or via
1144 screenshots) to the translators. Once they approve, make every
1145 effort to have it then checked by at least one other speaker of
1146 that language. This holds true even when (or especially when) the
1147 translation is done by one of your own programmers. Some kinds of
1148 systems may be harder to find testers for than others, depending on
1149 the amount of domain-specific jargon and concepts involved -- it's
1150 easier to find people who can tell you whether they approve of your
1151 translation for "delete this message" in an email-via-Web
1152 interface, than to find people who can give you an informed opinion
1153 on your translation for "attribute value" in an XML query tool's
1154 interface.
1155
1157 I recommend reading all of these:
1158
1159 Locale::Maketext::TPJ13 -- my The Perl Journal article about Maketext.
1160 It explains many important concepts underlying Locale::Maketext's
1161 design, and some insight into why Maketext is better than the plain old
1162 approach of having message catalogs that are just databases of sprintf
1163 formats.
1164
1165 File::Findgrep is a sample application/module that uses
1166 Locale::Maketext to localize its messages. For a larger
1167 internationalized system, see also Apache::MP3.
1168
1169 I18N::LangTags.
1170
1171 Win32::Locale.
1172
1173 RFC 3066, Tags for the Identification of Languages, as at
1174 http://sunsite.dk/RFC/rfc/rfc3066.html
1175
1176 RFC 2277, IETF Policy on Character Sets and Languages is at
1177 http://sunsite.dk/RFC/rfc/rfc2277.html -- much of it is just things of
1178 interest to protocol designers, but it explains some basic concepts,
1179 like the distinction between locales and language-tags.
1180
1181 The manual for GNU "gettext". The gettext dist is available in
1182 "ftp://prep.ai.mit.edu/pub/gnu/" -- get a recent gettext tarball and
1183 look in its "doc/" directory, there's an easily browsable HTML version
1184 in there. The gettext documentation asks lots of questions worth
1185 thinking about, even if some of their answers are sometimes wonky,
1186 particularly where they start talking about pluralization.
1187
1188 The Locale/Maketext.pm source. Obverse that the module is much shorter
1189 than its documentation!
1190
1192 Copyright (c) 1999-2004 Sean M. Burke. All rights reserved.
1193
1194 This library is free software; you can redistribute it and/or modify it
1195 under the same terms as Perl itself.
1196
1197 This program is distributed in the hope that it will be useful, but
1198 without any warranty; without even the implied warranty of
1199 merchantability or fitness for a particular purpose.
1200
1202 Sean M. Burke "sburke@cpan.org"
1203
1204
1205
1206perl v5.16.3 2012-11-27 Locale::Maketext(3)