1Locale::Maketext(3pm)  Perl Programmers Reference Guide  Locale::Maketext(3pm)
2
3
4

NAME

6       Locale::Maketext - framework for localization
7

SYNOPSIS

9         package MyProgram;
10         use strict;
11         use MyProgram::L10N;
12          # ...which inherits from Locale::Maketext
13         my $lh = MyProgram::L10N->get_handle() || die "What language?";
14         ...
15         # And then any messages your program emits, like:
16         warn $lh->maketext( "Can't open file [_1]: [_2]\n", $f, $! );
17         ...
18

DESCRIPTION

20       It is a common feature of applications (whether run directly, or via
21       the Web) for them to be "localized" -- i.e., for them to a present an
22       English interface to an English-speaker, a German interface to a
23       German-speaker, and so on for all languages it's programmed with.
24       Locale::Maketext is a framework for software localization; it provides
25       you with the tools for organizing and accessing the bits of text and
26       text-processing code that you need for producing localized
27       applications.
28
29       In order to make sense of Maketext and how all its components fit
30       together, you should probably go read Locale::Maketext::TPJ13, and then
31       read the following documentation.
32
33       You may also want to read over the source for "File::Findgrep" and its
34       constituent modules -- they are a complete (if small) example
35       application that uses Maketext.
36

QUICK OVERVIEW

38       The basic design of Locale::Maketext is object-oriented, and
39       Locale::Maketext is an abstract base class, from which you derive a
40       "project class".  The project class (with a name like
41       "TkBocciBall::Localize", which you then use in your module) is in turn
42       the base class for all the "language classes" for your project (with
43       names "TkBocciBall::Localize::it", "TkBocciBall::Localize::en",
44       "TkBocciBall::Localize::fr", etc.).
45
46       A language class is a class containing a lexicon of phrases as class
47       data, and possibly also some methods that are of use in interpreting
48       phrases in the lexicon, or otherwise dealing with text in that
49       language.
50
51       An object belonging to a language class is called a "language handle";
52       it's typically a flyweight object.
53
54       The normal course of action is to call:
55
56         use TkBocciBall::Localize;  # the localization project class
57         $lh = TkBocciBall::Localize->get_handle();
58          # Depending on the user's locale, etc., this will
59          # make a language handle from among the classes available,
60          # and any defaults that you declare.
61         die "Couldn't make a language handle??" unless $lh;
62
63       From then on, you use the "maketext" function to access entries in
64       whatever lexicon(s) belong to the language handle you got.  So, this:
65
66         print $lh->maketext("You won!"), "\n";
67
68       ...emits the right text for this language.  If the object in $lh
69       belongs to class "TkBocciBall::Localize::fr" and
70       %TkBocciBall::Localize::fr::Lexicon contains "("You won!"  => "Tu as
71       gagne!")", then the above code happily tells the user "Tu as gagne!".
72

METHODS

74       Locale::Maketext offers a variety of methods, which fall into three
75       categories:
76
77       ·   Methods to do with constructing language handles.
78
79       ·   "maketext" and other methods to do with accessing %Lexicon data for
80           a given language handle.
81
82       ·   Methods that you may find it handy to use, from routines of yours
83           that you put in %Lexicon entries.
84
85       These are covered in the following section.
86
87   Construction Methods
88       These are to do with constructing a language handle:
89
90       ·   $lh = YourProjClass->get_handle( ...langtags... ) || die "lg-
91           handle?";
92
93           This tries loading classes based on the language-tags you give
94           (like "("en-US", "sk", "kon", "es-MX", "ja", "i-klingon")", and for
95           the first class that succeeds, returns
96           YourProjClass::language->new().
97
98           If it runs thru the entire given list of language-tags, and finds
99           no classes for those exact terms, it then tries "superordinate"
100           language classes.  So if no "en-US" class (i.e.,
101           YourProjClass::en_us) was found, nor classes for anything else in
102           that list, we then try its superordinate, "en" (i.e.,
103           YourProjClass::en), and so on thru the other language-tags in the
104           given list: "es".  (The other language-tags in our example list:
105           happen to have no superordinates.)
106
107           If none of those language-tags leads to loadable classes, we then
108           try classes derived from YourProjClass->fallback_languages() and
109           then if nothing comes of that, we use classes named by
110           YourProjClass->fallback_language_classes().  Then in the (probably
111           quite unlikely) event that that fails, we just return undef.
112
113       ·   $lh = YourProjClass->get_handle() || die "lg-handle?";
114
115           When "get_handle" is called with an empty parameter list, magic
116           happens:
117
118           If "get_handle" senses that it's running in program that was
119           invoked as a CGI, then it tries to get language-tags out of the
120           environment variable "HTTP_ACCEPT_LANGUAGE", and it pretends that
121           those were the languages passed as parameters to "get_handle".
122
123           Otherwise (i.e., if not a CGI), this tries various OS-specific ways
124           to get the language-tags for the current locale/language, and then
125           pretends that those were the value(s) passed to "get_handle".
126
127           Currently this OS-specific stuff consists of looking in the
128           environment variables "LANG" and "LANGUAGE"; and on MSWin machines
129           (where those variables are typically unused), this also tries using
130           the module Win32::Locale to get a language-tag for whatever
131           language/locale is currently selected in the "Regional Settings"
132           (or "International"?)  Control Panel.  I welcome further
133           suggestions for making this do the Right Thing under other
134           operating systems that support localization.
135
136           If you're using localization in an application that keeps a
137           configuration file, you might consider something like this in your
138           project class:
139
140             sub get_handle_via_config {
141               my $class = $_[0];
142               my $chosen_language = $Config_settings{'language'};
143               my $lh;
144               if($chosen_language) {
145                 $lh = $class->get_handle($chosen_language)
146                  || die "No language handle for \"$chosen_language\" or the like";
147               } else {
148                 # Config file missing, maybe?
149                 $lh = $class->get_handle()
150                  || die "Can't get a language handle";
151               }
152               return $lh;
153             }
154
155       ·   $lh = YourProjClass::langname->new();
156
157           This constructs a language handle.  You usually don't call this
158           directly, but instead let "get_handle" find a language class to
159           "use" and to then call ->new on.
160
161       ·   $lh->init();
162
163           This is called by ->new to initialize newly-constructed language
164           handles.  If you define an init method in your class, remember that
165           it's usually considered a good idea to call $lh->SUPER::init in it
166           (presumably at the beginning), so that all classes get a chance to
167           initialize a new object however they see fit.
168
169       ·   YourProjClass->fallback_languages()
170
171           "get_handle" appends the return value of this to the end of
172           whatever list of languages you pass "get_handle".  Unless you
173           override this method, your project class will inherit
174           Locale::Maketext's "fallback_languages", which currently returns
175           "('i-default', 'en', 'en-US')".  ("i-default" is defined in RFC
176           2277).
177
178           This method (by having it return the name of a language-tag that
179           has an existing language class) can be used for making sure that
180           "get_handle" will always manage to construct a language handle
181           (assuming your language classes are in an appropriate @INC
182           directory).  Or you can use the next method:
183
184       ·   YourProjClass->fallback_language_classes()
185
186           "get_handle" appends the return value of this to the end of the
187           list of classes it will try using.  Unless you override this
188           method, your project class will inherit Locale::Maketext's
189           "fallback_language_classes", which currently returns an empty list,
190           "()".  By setting this to some value (namely, the name of a
191           loadable language class), you can be sure that "get_handle" will
192           always manage to construct a language handle.
193
194   The "maketext" Method
195       This is the most important method in Locale::Maketext:
196
197           $text = $lh->maketext(I<key>, ...parameters for this phrase...);
198
199       This looks in the %Lexicon of the language handle $lh and all its
200       superclasses, looking for an entry whose key is the string key.
201       Assuming such an entry is found, various things then happen, depending
202       on the value found:
203
204       If the value is a scalarref, the scalar is dereferenced and returned
205       (and any parameters are ignored).
206
207       If the value is a coderef, we return &$value($lh, ...parameters...).
208
209       If the value is a string that doesn't look like it's in Bracket
210       Notation, we return it (after replacing it with a scalarref, in its
211       %Lexicon).
212
213       If the value does look like it's in Bracket Notation, then we compile
214       it into a sub, replace the string in the %Lexicon with the new coderef,
215       and then we return &$new_sub($lh, ...parameters...).
216
217       Bracket Notation is discussed in a later section.  Note that trying to
218       compile a string into Bracket Notation can throw an exception if the
219       string is not syntactically valid (say, by not balancing brackets
220       right.)
221
222       Also, calling &$coderef($lh, ...parameters...) can throw any sort of
223       exception (if, say, code in that sub tries to divide by zero).  But a
224       very common exception occurs when you have Bracket Notation text that
225       says to call a method "foo", but there is no such method.  (E.g., "You
226       have [quatn,_1,ball]." will throw an exception on trying to call
227       $lh->quatn($_[1],'ball') -- you presumably meant "quant".)  "maketext"
228       catches these exceptions, but only to make the error message more
229       readable, at which point it rethrows the exception.
230
231       An exception may be thrown if key is not found in any of $lh's %Lexicon
232       hashes.  What happens if a key is not found, is discussed in a later
233       section, "Controlling Lookup Failure".
234
235       Note that you might find it useful in some cases to override the
236       "maketext" method with an "after method", if you want to translate
237       encodings, or even scripts:
238
239           package YrProj::zh_cn; # Chinese with PRC-style glyphs
240           use base ('YrProj::zh_tw');  # Taiwan-style
241           sub maketext {
242             my $self = shift(@_);
243             my $value = $self->maketext(@_);
244             return Chineeze::taiwan2mainland($value);
245           }
246
247       Or you may want to override it with something that traps any
248       exceptions, if that's critical to your program:
249
250         sub maketext {
251           my($lh, @stuff) = @_;
252           my $out;
253           eval { $out = $lh->SUPER::maketext(@stuff) };
254           return $out unless $@;
255           ...otherwise deal with the exception...
256         }
257
258       Other than those two situations, I don't imagine that it's useful to
259       override the "maketext" method.  (If you run into a situation where it
260       is useful, I'd be interested in hearing about it.)
261
262       $lh->fail_with or $lh->fail_with(PARAM)
263       $lh->failure_handler_auto
264           These two methods are discussed in the section "Controlling Lookup
265           Failure".
266
267   Utility Methods
268       These are methods that you may find it handy to use, generally from
269       %Lexicon routines of yours (whether expressed as Bracket Notation or
270       not).
271
272       $language->quant($number, $singular)
273       $language->quant($number, $singular, $plural)
274       $language->quant($number, $singular, $plural, $negative)
275           This is generally meant to be called from inside Bracket Notation
276           (which is discussed later), as in
277
278                "Your search matched [quant,_1,document]!"
279
280           It's for quantifying a noun (i.e., saying how much of it there is,
281           while giving the correct form of it).  The behavior of this method
282           is handy for English and a few other Western European languages,
283           and you should override it for languages where it's not suitable.
284           You can feel free to read the source, but the current
285           implementation is basically as this pseudocode describes:
286
287                if $number is 0 and there's a $negative,
288                   return $negative;
289                elsif $number is 1,
290                   return "1 $singular";
291                elsif there's a $plural,
292                   return "$number $plural";
293                else
294                   return "$number " . $singular . "s";
295                #
296                # ...except that we actually call numf to
297                #  stringify $number before returning it.
298
299           So for English (with Bracket Notation) "...[quant,_1,file]..." is
300           fine (for 0 it returns "0 files", for 1 it returns "1 file", and
301           for more it returns "2 files", etc.)
302
303           But for "directory", you'd want "[quant,_1,directory,directories]"
304           so that our elementary "quant" method doesn't think that the plural
305           of "directory" is "directorys".  And you might find that the output
306           may sound better if you specify a negative form, as in:
307
308                "[quant,_1,file,files,No files] matched your query.\n"
309
310           Remember to keep in mind verb agreement (or adjectives too, in
311           other languages), as in:
312
313                "[quant,_1,document] were matched.\n"
314
315           Because if _1 is one, you get "1 document were matched".  An
316           acceptable hack here is to do something like this:
317
318                "[quant,_1,document was, documents were] matched.\n"
319
320       $language->numf($number)
321           This returns the given number formatted nicely according to this
322           language's conventions.  Maketext's default method is mostly to
323           just take the normal string form of the number (applying sprintf
324           "%G" for only very large numbers), and then to add commas as
325           necessary.  (Except that we apply "tr/,./.,/" if
326           $language->{'numf_comma'} is true; that's a bit of a hack that's
327           useful for languages that express two million as "2.000.000" and
328           not as "2,000,000").
329
330           If you want anything fancier, consider overriding this with
331           something that uses Number::Format, or does something else
332           entirely.
333
334           Note that numf is called by quant for stringifying all quantifying
335           numbers.
336
337       $language->sprintf($format, @items)
338           This is just a wrapper around Perl's normal "sprintf" function.
339           It's provided so that you can use "sprintf" in Bracket Notation:
340
341                "Couldn't access datanode [sprintf,%10x=~[%s~],_1,_2]!\n"
342
343           returning...
344
345                Couldn't access datanode      Stuff=[thangamabob]!
346
347       $language->language_tag()
348           Currently this just takes the last bit of "ref($language)", turns
349           underscores to dashes, and returns it.  So if $language is an
350           object of class Hee::HOO::Haw::en_us, $language->language_tag()
351           returns "en-us".  (Yes, the usual representation for that language
352           tag is "en-US", but case is never considered meaningful in
353           language-tag comparison.)
354
355           You may override this as you like; Maketext doesn't use it for
356           anything.
357
358       $language->encoding()
359           Currently this isn't used for anything, but it's provided (with
360           default value of "(ref($language) && $language->{'encoding'})) or
361           "iso-8859-1"" ) as a sort of suggestion that it may be
362           useful/necessary to associate encodings with your language handles
363           (whether on a per-class or even per-handle basis.)
364
365   Language Handle Attributes and Internals
366       A language handle is a flyweight object -- i.e., it doesn't
367       (necessarily) carry any data of interest, other than just being a
368       member of whatever class it belongs to.
369
370       A language handle is implemented as a blessed hash.  Subclasses of
371       yours can store whatever data you want in the hash.  Currently the only
372       hash entry used by any crucial Maketext method is "fail", so feel free
373       to use anything else as you like.
374
375       Remember: Don't be afraid to read the Maketext source if there's any
376       point on which this documentation is unclear.  This documentation is
377       vastly longer than the module source itself.
378

LANGUAGE CLASS HIERARCHIES

380       These are Locale::Maketext's assumptions about the class hierarchy
381       formed by all your language classes:
382
383       ·   You must have a project base class, which you load, and which you
384           then use as the first argument in the call to
385           YourProjClass->get_handle(...).  It should derive (whether directly
386           or indirectly) from Locale::Maketext.  It doesn't matter how you
387           name this class, although assuming this is the localization
388           component of your Super Mega Program, good names for your project
389           class might be SuperMegaProgram::Localization,
390           SuperMegaProgram::L10N, SuperMegaProgram::I18N,
391           SuperMegaProgram::International, or even
392           SuperMegaProgram::Languages or SuperMegaProgram::Messages.
393
394       ·   Language classes are what YourProjClass->get_handle will try to
395           load.  It will look for them by taking each language-tag (skipping
396           it if it doesn't look like a language-tag or locale-tag!), turning
397           it to all lowercase, turning dashes to underscores, and appending
398           it to YourProjClass . "::".  So this:
399
400             $lh = YourProjClass->get_handle(
401               'en-US', 'fr', 'kon', 'i-klingon', 'i-klingon-romanized'
402             );
403
404           will try loading the classes YourProjClass::en_us (note
405           lowercase!), YourProjClass::fr, YourProjClass::kon,
406           YourProjClass::i_klingon and YourProjClass::i_klingon_romanized.
407           (And it'll stop at the first one that actually loads.)
408
409       ·   I assume that each language class derives (directly or indirectly)
410           from your project class, and also defines its @ISA, its %Lexicon,
411           or both.  But I anticipate no dire consequences if these
412           assumptions do not hold.
413
414       ·   Language classes may derive from other language classes (although
415           they should have "use Thatclassname" or "use base
416           qw(...classes...)").  They may derive from the project class.  They
417           may derive from some other class altogether.  Or via multiple
418           inheritance, it may derive from any mixture of these.
419
420       ·   I foresee no problems with having multiple inheritance in your
421           hierarchy of language classes.  (As usual, however, Perl will
422           complain bitterly if you have a cycle in the hierarchy: i.e., if
423           any class is its own ancestor.)
424

ENTRIES IN EACH LEXICON

426       A typical %Lexicon entry is meant to signify a phrase, taking some
427       number (0 or more) of parameters.  An entry is meant to be accessed by
428       via a string key in $lh->maketext(key, ...parameters...), which should
429       return a string that is generally meant for be used for "output" to the
430       user -- regardless of whether this actually means printing to STDOUT,
431       writing to a file, or putting into a GUI widget.
432
433       While the key must be a string value (since that's a basic restriction
434       that Perl places on hash keys), the value in the lexicon can currently
435       be of several types: a defined scalar, scalarref, or coderef.  The use
436       of these is explained above, in the section 'The "maketext" Method',
437       and Bracket Notation for strings is discussed in the next section.
438
439       While you can use arbitrary unique IDs for lexicon keys (like
440       "_min_larger_max_error"), it is often useful for if an entry's key is
441       itself a valid value, like this example error message:
442
443         "Minimum ([_1]) is larger than maximum ([_2])!\n",
444
445       Compare this code that uses an arbitrary ID...
446
447         die $lh->maketext( "_min_larger_max_error", $min, $max )
448          if $min > $max;
449
450       ...to this code that uses a key-as-value:
451
452         die $lh->maketext(
453          "Minimum ([_1]) is larger than maximum ([_2])!\n",
454          $min, $max
455         ) if $min > $max;
456
457       The second is, in short, more readable.  In particular, it's obvious
458       that the number of parameters you're feeding to that phrase (two) is
459       the number of parameters that it wants to be fed.  (Since you see _1
460       and a _2 being used in the key there.)
461
462       Also, once a project is otherwise complete and you start to localize
463       it, you can scrape together all the various keys you use, and pass it
464       to a translator; and then the translator's work will go faster if what
465       he's presented is this:
466
467        "Minimum ([_1]) is larger than maximum ([_2])!\n",
468         => "",   # fill in something here, Jacques!
469
470       rather than this more cryptic mess:
471
472        "_min_larger_max_error"
473         => "",   # fill in something here, Jacques
474
475       I think that keys as lexicon values makes the completed lexicon entries
476       more readable:
477
478        "Minimum ([_1]) is larger than maximum ([_2])!\n",
479         => "Le minimum ([_1]) est plus grand que le maximum ([_2])!\n",
480
481       Also, having valid values as keys becomes very useful if you set up an
482       _AUTO lexicon.  _AUTO lexicons are discussed in a later section.
483
484       I almost always use keys that are themselves valid lexicon values.  One
485       notable exception is when the value is quite long.  For example, to get
486       the screenful of data that a command-line program might return when
487       given an unknown switch, I often just use a brief, self-explanatory key
488       such as "_USAGE_MESSAGE".  At that point I then go and immediately to
489       define that lexicon entry in the ProjectClass::L10N::en lexicon (since
490       English is always my "project language"):
491
492         '_USAGE_MESSAGE' => <<'EOSTUFF',
493         ...long long message...
494         EOSTUFF
495
496       and then I can use it as:
497
498         getopt('oDI', \%opts) or die $lh->maketext('_USAGE_MESSAGE');
499
500       Incidentally, note that each class's %Lexicon inherits-and-extends the
501       lexicons in its superclasses.  This is not because these are special
502       hashes per se, but because you access them via the "maketext" method,
503       which looks for entries across all the %Lexicon hashes in a language
504       class and all its ancestor classes.  (This is because the idea of
505       "class data" isn't directly implemented in Perl, but is instead left to
506       individual class-systems to implement as they see fit..)
507
508       Note that you may have things stored in a lexicon besides just phrases
509       for output:  for example, if your program takes input from the
510       keyboard, asking a "(Y/N)" question, you probably need to know what the
511       equivalent of "Y[es]/N[o]" is in whatever language.  You probably also
512       need to know what the equivalents of the answers "y" and "n" are.  You
513       can store that information in the lexicon (say, under the keys
514       "~answer_y" and "~answer_n", and the long forms as "~answer_yes" and
515       "~answer_no", where "~" is just an ad-hoc character meant to indicate
516       to programmers/translators that these are not phrases for output).
517
518       Or instead of storing this in the language class's lexicon, you can
519       (and, in some cases, really should) represent the same bit of knowledge
520       as code in a method in the language class.  (That leaves a tidy
521       distinction between the lexicon as the things we know how to say, and
522       the rest of the things in the lexicon class as things that we know how
523       to do.)  Consider this example of a processor for responses to French
524       "oui/non" questions:
525
526         sub y_or_n {
527           return undef unless defined $_[1] and length $_[1];
528           my $answer = lc $_[1];  # smash case
529           return 1 if $answer eq 'o' or $answer eq 'oui';
530           return 0 if $answer eq 'n' or $answer eq 'non';
531           return undef;
532         }
533
534       ...which you'd then call in a construct like this:
535
536         my $response;
537         until(defined $response) {
538           print $lh->maketext("Open the pod bay door (y/n)? ");
539           $response = $lh->y_or_n( get_input_from_keyboard_somehow() );
540         }
541         if($response) { $pod_bay_door->open()         }
542         else          { $pod_bay_door->leave_closed() }
543
544       Other data worth storing in a lexicon might be things like filenames
545       for language-targetted resources:
546
547         ...
548         "_main_splash_png"
549           => "/styles/en_us/main_splash.png",
550         "_main_splash_imagemap"
551           => "/styles/en_us/main_splash.incl",
552         "_general_graphics_path"
553           => "/styles/en_us/",
554         "_alert_sound"
555           => "/styles/en_us/hey_there.wav",
556         "_forward_icon"
557          => "left_arrow.png",
558         "_backward_icon"
559          => "right_arrow.png",
560         # In some other languages, left equals
561         #  BACKwards, and right is FOREwards.
562         ...
563
564       You might want to do the same thing for expressing key bindings or the
565       like (since hardwiring "q" as the binding for the function that quits a
566       screen/menu/program is useful only if your language happens to
567       associate "q" with "quit"!)
568

BRACKET NOTATION

570       Bracket Notation is a crucial feature of Locale::Maketext.  I mean
571       Bracket Notation to provide a replacement for the use of sprintf
572       formatting.  Everything you do with Bracket Notation could be done with
573       a sub block, but bracket notation is meant to be much more concise.
574
575       Bracket Notation is a like a miniature "template" system (in the sense
576       of Text::Template, not in the sense of C++ templates), where normal
577       text is passed thru basically as is, but text in special regions is
578       specially interpreted.  In Bracket Notation, you use square brackets
579       ("[...]"), not curly braces ("{...}") to note sections that are
580       specially interpreted.
581
582       For example, here all the areas that are taken literally are underlined
583       with a "^", and all the in-bracket special regions are underlined with
584       an X:
585
586         "Minimum ([_1]) is larger than maximum ([_2])!\n",
587          ^^^^^^^^^ XX ^^^^^^^^^^^^^^^^^^^^^^^^^^ XX ^^^^
588
589       When that string is compiled from bracket notation into a real Perl
590       sub, it's basically turned into:
591
592         sub {
593           my $lh = $_[0];
594           my @params = @_;
595           return join '',
596             "Minimum (",
597             ...some code here...
598             ") is larger than maximum (",
599             ...some code here...
600             ")!\n",
601         }
602         # to be called by $lh->maketext(KEY, params...)
603
604       In other words, text outside bracket groups is turned into string
605       literals.  Text in brackets is rather more complex, and currently
606       follows these rules:
607
608       ·   Bracket groups that are empty, or which consist only of whitespace,
609           are ignored.  (Examples: "[]", "[    ]", or a [ and a ] with
610           returns and/or tabs and/or spaces between them.
611
612           Otherwise, each group is taken to be a comma-separated group of
613           items, and each item is interpreted as follows:
614
615       ·   An item that is "_digits" or "_-digits" is interpreted as
616           $_[value].  I.e., "_1" becomes with $_[1], and "_-3" is interpreted
617           as $_[-3] (in which case @_ should have at least three elements in
618           it).  Note that $_[0] is the language handle, and is typically not
619           named directly.
620
621       ·   An item "_*" is interpreted to mean "all of @_ except $_[0]".
622           I.e., @_[1..$#_].  Note that this is an empty list in the case of
623           calls like $lh->maketext(key) where there are no parameters (except
624           $_[0], the language handle).
625
626       ·   Otherwise, each item is interpreted as a string literal.
627
628       The group as a whole is interpreted as follows:
629
630       ·   If the first item in a bracket group looks like a method name, then
631           that group is interpreted like this:
632
633             $lh->that_method_name(
634               ...rest of items in this group...
635             ),
636
637       ·   If the first item in a bracket group is "*", it's taken as
638           shorthand for the so commonly called "quant" method.  Similarly, if
639           the first item in a bracket group is "#", it's taken to be
640           shorthand for "numf".
641
642       ·   If the first item in a bracket group is the empty-string, or "_*"
643           or "_digits" or "_-digits", then that group is interpreted as just
644           the interpolation of all its items:
645
646             join('',
647               ...rest of items in this group...
648             ),
649
650           Examples:  "[_1]" and "[,_1]", which are synonymous; and
651           ""[,ID-(,_4,-,_2,)]"", which compiles as "join "", "ID-(", $_[4],
652           "-", $_[2], ")"".
653
654       ·   Otherwise this bracket group is invalid.  For example, in the group
655           "[!@#,whatever]", the first item "!@#" is neither the empty-string,
656           "_number", "_-number", "_*", nor a valid method name; and so
657           Locale::Maketext will throw an exception of you try compiling an
658           expression containing this bracket group.
659
660       Note, incidentally, that items in each group are comma-separated, not
661       "/\s*,\s*/"-separated.  That is, you might expect that this bracket
662       group:
663
664         "Hoohah [foo, _1 , bar ,baz]!"
665
666       would compile to this:
667
668         sub {
669           my $lh = $_[0];
670           return join '',
671             "Hoohah ",
672             $lh->foo( $_[1], "bar", "baz"),
673             "!",
674         }
675
676       But it actually compiles as this:
677
678         sub {
679           my $lh = $_[0];
680           return join '',
681             "Hoohah ",
682             $lh->foo(" _1 ", " bar ", "baz"),  # note the <space> in " bar "
683             "!",
684         }
685
686       In the notation discussed so far, the characters "[" and "]" are given
687       special meaning, for opening and closing bracket groups, and "," has a
688       special meaning inside bracket groups, where it separates items in the
689       group.  This begs the question of how you'd express a literal "[" or
690       "]" in a Bracket Notation string, and how you'd express a literal comma
691       inside a bracket group.  For this purpose I've adopted "~" (tilde) as
692       an escape character:  "~[" means a literal '[' character anywhere in
693       Bracket Notation (i.e., regardless of whether you're in a bracket group
694       or not), and ditto for "~]" meaning a literal ']', and "~," meaning a
695       literal comma.  (Altho "," means a literal comma outside of bracket
696       groups -- it's only inside bracket groups that commas are special.)
697
698       And on the off chance you need a literal tilde in a bracket expression,
699       you get it with "~~".
700
701       Currently, an unescaped "~" before a character other than a bracket or
702       a comma is taken to mean just a "~" and that character.  I.e., "~X"
703       means the same as "~~X" -- i.e., one literal tilde, and then one
704       literal "X".  However, by using "~X", you are assuming that no future
705       version of Maketext will use "~X" as a magic escape sequence.  In
706       practice this is not a great problem, since first off you can just
707       write "~~X" and not worry about it; second off, I doubt I'll add lots
708       of new magic characters to bracket notation; and third off, you aren't
709       likely to want literal "~" characters in your messages anyway, since
710       it's not a character with wide use in natural language text.
711
712       Brackets must be balanced -- every openbracket must have one matching
713       closebracket, and vice versa.  So these are all invalid:
714
715         "I ate [quant,_1,rhubarb pie."
716         "I ate [quant,_1,rhubarb pie[."
717         "I ate quant,_1,rhubarb pie]."
718         "I ate quant,_1,rhubarb pie[."
719
720       Currently, bracket groups do not nest.  That is, you cannot say:
721
722         "Foo [bar,baz,[quux,quuux]]\n";
723
724       If you need a notation that's that powerful, use normal Perl:
725
726         %Lexicon = (
727           ...
728           "some_key" => sub {
729             my $lh = $_[0];
730             join '',
731               "Foo ",
732               $lh->bar('baz', $lh->quux('quuux')),
733               "\n",
734           },
735           ...
736         );
737
738       Or write the "bar" method so you don't need to pass it the output from
739       calling quux.
740
741       I do not anticipate that you will need (or particularly want) to nest
742       bracket groups, but you are welcome to email me with convincing (real-
743       life) arguments to the contrary.
744

AUTO LEXICONS

746       If maketext goes to look in an individual %Lexicon for an entry for key
747       (where key does not start with an underscore), and sees none, but does
748       see an entry of "_AUTO" => some_true_value, then we actually define
749       $Lexicon{key} = key right then and there, and then use that value as if
750       it had been there all along.  This happens before we even look in any
751       superclass %Lexicons!
752
753       (This is meant to be somewhat like the AUTOLOAD mechanism in Perl's
754       function call system -- or, looked at another way, like the AutoLoader
755       module.)
756
757       I can picture all sorts of circumstances where you just do not want
758       lookup to be able to fail (since failing normally means that maketext
759       throws a "die", although see the next section for greater control over
760       that).  But here's one circumstance where _AUTO lexicons are meant to
761       be especially useful:
762
763       As you're writing an application, you decide as you go what messages
764       you need to emit.  Normally you'd go to write this:
765
766         if(-e $filename) {
767           go_process_file($filename)
768         } else {
769           print qq{Couldn't find file "$filename"!\n};
770         }
771
772       but since you anticipate localizing this, you write:
773
774         use ThisProject::I18N;
775         my $lh = ThisProject::I18N->get_handle();
776          # For the moment, assume that things are set up so
777          # that we load class ThisProject::I18N::en
778          # and that that's the class that $lh belongs to.
779         ...
780         if(-e $filename) {
781           go_process_file($filename)
782         } else {
783           print $lh->maketext(
784             qq{Couldn't find file "[_1]"!\n}, $filename
785           );
786         }
787
788       Now, right after you've just written the above lines, you'd normally
789       have to go open the file ThisProject/I18N/en.pm, and immediately add an
790       entry:
791
792         "Couldn't find file \"[_1]\"!\n"
793         => "Couldn't find file \"[_1]\"!\n",
794
795       But I consider that somewhat of a distraction from the work of getting
796       the main code working -- to say nothing of the fact that I often have
797       to play with the program a few times before I can decide exactly what
798       wording I want in the messages (which in this case would require me to
799       go changing three lines of code: the call to maketext with that key,
800       and then the two lines in ThisProject/I18N/en.pm).
801
802       However, if you set "_AUTO => 1" in the %Lexicon in,
803       ThisProject/I18N/en.pm (assuming that English (en) is the language that
804       all your programmers will be using for this project's internal message
805       keys), then you don't ever have to go adding lines like this
806
807         "Couldn't find file \"[_1]\"!\n"
808         => "Couldn't find file \"[_1]\"!\n",
809
810       to ThisProject/I18N/en.pm, because if _AUTO is true there, then just
811       looking for an entry with the key "Couldn't find file \"[_1]\"!\n" in
812       that lexicon will cause it to be added, with that value!
813
814       Note that the reason that keys that start with "_" are immune to _AUTO
815       isn't anything generally magical about the underscore character -- I
816       just wanted a way to have most lexicon keys be autoable, except for
817       possibly a few, and I arbitrarily decided to use a leading underscore
818       as a signal to distinguish those few.
819

CONTROLLING LOOKUP FAILURE

821       If you call $lh->maketext(key, ...parameters...), and there's no entry
822       key in $lh's class's %Lexicon, nor in the superclass %Lexicon hash, and
823       if we can't auto-make key (because either it starts with a "_", or
824       because none of its lexicons have "_AUTO => 1,"), then we have failed
825       to find a normal way to maketext key.  What then happens in these
826       failure conditions, depends on the $lh object's "fail" attribute.
827
828       If the language handle has no "fail" attribute, maketext will simply
829       throw an exception (i.e., it calls "die", mentioning the key whose
830       lookup failed, and naming the line number where the calling
831       $lh->maketext(key,...) was.
832
833       If the language handle has a "fail" attribute whose value is a coderef,
834       then $lh->maketext(key,...params...) gives up and calls:
835
836         return $that_subref->($lh, $key, @params);
837
838       Otherwise, the "fail" attribute's value should be a string denoting a
839       method name, so that $lh->maketext(key,...params...) can give up with:
840
841         return $lh->$that_method_name($phrase, @params);
842
843       The "fail" attribute can be accessed with the "fail_with" method:
844
845         # Set to a coderef:
846         $lh->fail_with( \&failure_handler );
847
848         # Set to a method name:
849         $lh->fail_with( 'failure_method' );
850
851         # Set to nothing (i.e., so failure throws a plain exception)
852         $lh->fail_with( undef );
853
854         # Get the current value
855         $handler = $lh->fail_with();
856
857       Now, as to what you may want to do with these handlers:  Maybe you'd
858       want to log what key failed for what class, and then die.  Maybe you
859       don't like "die" and instead you want to send the error message to
860       STDOUT (or wherever) and then merely "exit()".
861
862       Or maybe you don't want to "die" at all!  Maybe you could use a handler
863       like this:
864
865         # Make all lookups fall back onto an English value,
866         #  but only after we log it for later fingerpointing.
867         my $lh_backup = ThisProject->get_handle('en');
868         open(LEX_FAIL_LOG, ">>wherever/lex.log") || die "GNAARGH $!";
869         sub lex_fail {
870           my($failing_lh, $key, $params) = @_;
871           print LEX_FAIL_LOG scalar(localtime), "\t",
872              ref($failing_lh), "\t", $key, "\n";
873           return $lh_backup->maketext($key,@params);
874         }
875
876       Some users have expressed that they think this whole mechanism of
877       having a "fail" attribute at all, seems a rather pointless
878       complication.  But I want Locale::Maketext to be usable for software
879       projects of any scale and type; and different software projects have
880       different ideas of what the right thing is to do in failure conditions.
881       I could simply say that failure always throws an exception, and that if
882       you want to be careful, you'll just have to wrap every call to
883       $lh->maketext in an eval { }.  However, I want programmers to reserve
884       the right (via the "fail" attribute) to treat lookup failure as
885       something other than an exception of the same level of severity as a
886       config file being unreadable, or some essential resource being
887       inaccessible.
888
889       One possibly useful value for the "fail" attribute is the method name
890       "failure_handler_auto".  This is a method defined in the class
891       Locale::Maketext itself.  You set it with:
892
893         $lh->fail_with('failure_handler_auto');
894
895       Then when you call $lh->maketext(key, ...parameters...) and there's no
896       key in any of those lexicons, maketext gives up with
897
898         return $lh->failure_handler_auto($key, @params);
899
900       But failure_handler_auto, instead of dying or anything, compiles $key,
901       caching it in
902
903           $lh->{'failure_lex'}{$key} = $complied
904
905       and then calls the compiled value, and returns that.  (I.e., if $key
906       looks like bracket notation, $compiled is a sub, and we return
907       &{$compiled}(@params); but if $key is just a plain string, we just
908       return that.)
909
910       The effect of using "failure_auto_handler" is like an AUTO lexicon,
911       except that it 1) compiles $key even if it starts with "_", and 2) you
912       have a record in the new hashref $lh->{'failure_lex'} of all the keys
913       that have failed for this object.  This should avoid your program dying
914       -- as long as your keys aren't actually invalid as bracket code, and as
915       long as they don't try calling methods that don't exist.
916
917       "failure_auto_handler" may not be exactly what you want, but I hope it
918       at least shows you that maketext failure can be mitigated in any number
919       of very flexible ways.  If you can formalize exactly what you want, you
920       should be able to express that as a failure handler.  You can even make
921       it default for every object of a given class, by setting it in that
922       class's init:
923
924         sub init {
925           my $lh = $_[0];  # a newborn handle
926           $lh->SUPER::init();
927           $lh->fail_with('my_clever_failure_handler');
928           return;
929         }
930         sub my_clever_failure_handler {
931           ...you clever things here...
932         }
933

HOW TO USE MAKETEXT

935       Here is a brief checklist on how to use Maketext to localize
936       applications:
937
938       ·   Decide what system you'll use for lexicon keys.  If you insist, you
939           can use opaque IDs (if you're nostalgic for "catgets"), but I have
940           better suggestions in the section "Entries in Each Lexicon", above.
941           Assuming you opt for meaningful keys that double as values (like
942           "Minimum ([_1]) is larger than maximum ([_2])!\n"), you'll have to
943           settle on what language those should be in.  For the sake of
944           argument, I'll call this English, specifically American English,
945           "en-US".
946
947       ·   Create a class for your localization project.  This is the name of
948           the class that you'll use in the idiom:
949
950             use Projname::L10N;
951             my $lh = Projname::L10N->get_handle(...) || die "Language?";
952
953           Assuming you call your class Projname::L10N, create a class
954           consisting minimally of:
955
956             package Projname::L10N;
957             use base qw(Locale::Maketext);
958             ...any methods you might want all your languages to share...
959
960             # And, assuming you want the base class to be an _AUTO lexicon,
961             # as is discussed a few sections up:
962
963             1;
964
965       ·   Create a class for the language your internal keys are in.  Name
966           the class after the language-tag for that language, in lowercase,
967           with dashes changed to underscores.  Assuming your project's first
968           language is US English, you should call this Projname::L10N::en_us.
969           It should consist minimally of:
970
971             package Projname::L10N::en_us;
972             use base qw(Projname::L10N);
973             %Lexicon = (
974               '_AUTO' => 1,
975             );
976             1;
977
978           (For the rest of this section, I'll assume that this "first
979           language class" of Projname::L10N::en_us has _AUTO lexicon.)
980
981       ·   Go and write your program.  Everywhere in your program where you
982           would say:
983
984             print "Foobar $thing stuff\n";
985
986           instead do it thru maketext, using no variable interpolation in the
987           key:
988
989             print $lh->maketext("Foobar [_1] stuff\n", $thing);
990
991           If you get tired of constantly saying "print $lh->maketext",
992           consider making a functional wrapper for it, like so:
993
994             use Projname::L10N;
995             use vars qw($lh);
996             $lh = Projname::L10N->get_handle(...) || die "Language?";
997             sub pmt (@) { print( $lh->maketext(@_)) }
998              # "pmt" is short for "Print MakeText"
999             $Carp::Verbose = 1;
1000              # so if maketext fails, we see made the call to pmt
1001
1002           Besides whole phrases meant for output, anything language-dependent
1003           should be put into the class Projname::L10N::en_us, whether as
1004           methods, or as lexicon entries -- this is discussed in the section
1005           "Entries in Each Lexicon", above.
1006
1007       ·   Once the program is otherwise done, and once its localization for
1008           the first language works right (via the data and methods in
1009           Projname::L10N::en_us), you can get together the data for
1010           translation.  If your first language lexicon isn't an _AUTO
1011           lexicon, then you already have all the messages explicitly in the
1012           lexicon (or else you'd be getting exceptions thrown when you call
1013           $lh->maketext to get messages that aren't in there).  But if you
1014           were (advisedly) lazy and are using an _AUTO lexicon, then you've
1015           got to make a list of all the phrases that you've so far been
1016           letting _AUTO generate for you.  There are very many ways to
1017           assemble such a list.  The most straightforward is to simply grep
1018           the source for every occurrence of "maketext" (or calls to wrappers
1019           around it, like the above "pmt" function), and to log the following
1020           phrase.
1021
1022       ·   You may at this point want to consider whether your base class
1023           (Projname::L10N), from which all lexicons inherit from
1024           (Projname::L10N::en, Projname::L10N::es, etc.), should be an _AUTO
1025           lexicon.  It may be true that in theory, all needed messages will
1026           be in each language class; but in the presumably unlikely or
1027           "impossible" case of lookup failure, you should consider whether
1028           your program should throw an exception, emit text in English (or
1029           whatever your project's first language is), or some more complex
1030           solution as described in the section "Controlling Lookup Failure",
1031           above.
1032
1033       ·   Submit all messages/phrases/etc. to translators.
1034
1035           (You may, in fact, want to start with localizing to one other
1036           language at first, if you're not sure that you've properly
1037           abstracted the language-dependent parts of your code.)
1038
1039           Translators may request clarification of the situation in which a
1040           particular phrase is found.  For example, in English we are
1041           entirely happy saying "n files found", regardless of whether we
1042           mean "I looked for files, and found n of them" or the rather
1043           distinct situation of "I looked for something else (like lines in
1044           files), and along the way I saw n files."  This may involve
1045           rethinking things that you thought quite clear: should "Edit" on a
1046           toolbar be a noun ("editing") or a verb ("to edit")?  Is there
1047           already a conventionalized way to express that menu option,
1048           separate from the target language's normal word for "to edit"?
1049
1050           In all cases where the very common phenomenon of quantification
1051           (saying "N files", for any value of N) is involved, each translator
1052           should make clear what dependencies the number causes in the
1053           sentence.  In many cases, dependency is limited to words adjacent
1054           to the number, in places where you might expect them ("I found
1055           the-?PLURAL N empty-?PLURAL directory-?PLURAL"), but in some cases
1056           there are unexpected dependencies ("I found-?PLURAL ..."!) as well
1057           as long-distance dependencies "The N directory-?PLURAL could not be
1058           deleted-?PLURAL"!).
1059
1060           Remind the translators to consider the case where N is 0: "0 files
1061           found" isn't exactly natural-sounding in any language, but it may
1062           be unacceptable in many -- or it may condition special kinds of
1063           agreement (similar to English "I didN'T find ANY files").
1064
1065           Remember to ask your translators about numeral formatting in their
1066           language, so that you can override the "numf" method as
1067           appropriate.  Typical variables in number formatting are:  what to
1068           use as a decimal point (comma? period?); what to use as a thousands
1069           separator (space? nonbreaking space? comma? period? small middot?
1070           prime? apostrophe?); and even whether the so-called "thousands
1071           separator" is actually for every third digit -- I've heard reports
1072           of two hundred thousand being expressible as "2,00,000" for some
1073           Indian (Subcontinental) languages, besides the less surprising
1074           "200 000", "200.000", "200,000", and "200'000".  Also, using a set
1075           of numeral glyphs other than the usual ASCII "0"-"9" might be
1076           appreciated, as via "tr/0-9/\x{0966}-\x{096F}/" for getting digits
1077           in Devanagari script (for Hindi, Konkani, others).
1078
1079           The basic "quant" method that Locale::Maketext provides should be
1080           good for many languages.  For some languages, it might be useful to
1081           modify it (or its constituent "numerate" method) to take a plural
1082           form in the two-argument call to "quant" (as in "[quant,_1,files]")
1083           if it's all-around easier to infer the singular form from the
1084           plural, than to infer the plural form from the singular.
1085
1086           But for other languages (as is discussed at length in
1087           Locale::Maketext::TPJ13), simple "quant"/"numerify" is not enough.
1088           For the particularly problematic Slavic languages, what you may
1089           need is a method which you provide with the number, the citation
1090           form of the noun to quantify, and the case and gender that the
1091           sentence's syntax projects onto that noun slot.  The method would
1092           then be responsible for determining what grammatical number that
1093           numeral projects onto its noun phrase, and what case and gender it
1094           may override the normal case and gender with; and then it would
1095           look up the noun in a lexicon providing all needed inflected forms.
1096
1097       ·   You may also wish to discuss with the translators the question of
1098           how to relate different subforms of the same language tag,
1099           considering how this reacts with "get_handle"'s treatment of these.
1100           For example, if a user accepts interfaces in "en, fr", and you have
1101           interfaces available in "en-US" and "fr", what should they get?
1102           You may wish to resolve this by establishing that "en" and "en-US"
1103           are effectively synonymous, by having one class zero-derive from
1104           the other.
1105
1106           For some languages this issue may never come up (Danish is rarely
1107           expressed as "da-DK", but instead is just "da").  And for other
1108           languages, the whole concept of a "generic" form may verge on being
1109           uselessly vague, particularly for interfaces involving voice media
1110           in forms of Arabic or Chinese.
1111
1112       ·   Once you've localized your program/site/etc. for all desired
1113           languages, be sure to show the result (whether live, or via
1114           screenshots) to the translators.  Once they approve, make every
1115           effort to have it then checked by at least one other speaker of
1116           that language.  This holds true even when (or especially when) the
1117           translation is done by one of your own programmers.  Some kinds of
1118           systems may be harder to find testers for than others, depending on
1119           the amount of domain-specific jargon and concepts involved -- it's
1120           easier to find people who can tell you whether they approve of your
1121           translation for "delete this message" in an email-via-Web
1122           interface, than to find people who can give you an informed opinion
1123           on your translation for "attribute value" in an XML query tool's
1124           interface.
1125

SEE ALSO

1127       I recommend reading all of these:
1128
1129       Locale::Maketext::TPJ13 -- my The Perl Journal article about Maketext.
1130       It explains many important concepts underlying Locale::Maketext's
1131       design, and some insight into why Maketext is better than the plain old
1132       approach of having message catalogs that are just databases of sprintf
1133       formats.
1134
1135       File::Findgrep is a sample application/module that uses
1136       Locale::Maketext to localize its messages.  For a larger
1137       internationalized system, see also Apache::MP3.
1138
1139       I18N::LangTags.
1140
1141       Win32::Locale.
1142
1143       RFC 3066, Tags for the Identification of Languages, as at
1144       http://sunsite.dk/RFC/rfc/rfc3066.html
1145
1146       RFC 2277, IETF Policy on Character Sets and Languages is at
1147       http://sunsite.dk/RFC/rfc/rfc2277.html -- much of it is just things of
1148       interest to protocol designers, but it explains some basic concepts,
1149       like the distinction between locales and language-tags.
1150
1151       The manual for GNU "gettext".  The gettext dist is available in
1152       "ftp://prep.ai.mit.edu/pub/gnu/" -- get a recent gettext tarball and
1153       look in its "doc/" directory, there's an easily browsable HTML version
1154       in there.  The gettext documentation asks lots of questions worth
1155       thinking about, even if some of their answers are sometimes wonky,
1156       particularly where they start talking about pluralization.
1157
1158       The Locale/Maketext.pm source.  Obverse that the module is much shorter
1159       than its documentation!
1160
1162       Copyright (c) 1999-2004 Sean M. Burke.  All rights reserved.
1163
1164       This library is free software; you can redistribute it and/or modify it
1165       under the same terms as Perl itself.
1166
1167       This program is distributed in the hope that it will be useful, but
1168       without any warranty; without even the implied warranty of
1169       merchantability or fitness for a particular purpose.
1170

AUTHOR

1172       Sean M. Burke "sburke@cpan.org"
1173
1174
1175
1176perl v5.10.1                      2009-02-12             Locale::Maketext(3pm)
Impressum