1Locale::Maketext(3)   User Contributed Perl Documentation  Locale::Maketext(3)
2
3
4

NAME

6       Locale::Maketext - framework for localization
7

SYNOPSIS

9         package MyProgram;
10         use strict;
11         use MyProgram::L10N;
12          # ...which inherits from Locale::Maketext
13         my $lh = MyProgram::L10N->get_handle() || die "What language?";
14         ...
15         # And then any messages your program emits, like:
16         warn $lh->maketext( "Can't open file [_1]: [_2]\n", $f, $! );
17         ...
18

DESCRIPTION

20       It is a common feature of applications (whether run directly, or via
21       the Web) for them to be "localized" -- i.e., for them to a present an
22       English interface to an English-speaker, a German interface to a
23       German-speaker, and so on for all languages it's programmed with.
24       Locale::Maketext is a framework for software localization; it provides
25       you with the tools for organizing and accessing the bits of text and
26       text-processing code that you need for producing localized
27       applications.
28
29       In order to make sense of Maketext and how all its components fit
30       together, you should probably go read Locale::Maketext::TPJ13, and then
31       read the following documentation.
32
33       You may also want to read over the source for "File::Findgrep" and its
34       constituent modules -- they are a complete (if small) example
35       application that uses Maketext.
36

QUICK OVERVIEW

38       The basic design of Locale::Maketext is object-oriented, and
39       Locale::Maketext is an abstract base class, from which you derive a
40       "project class".  The project class (with a name like
41       "TkBocciBall::Localize", which you then use in your module) is in turn
42       the base class for all the "language classes" for your project (with
43       names "TkBocciBall::Localize::it", "TkBocciBall::Localize::en",
44       "TkBocciBall::Localize::fr", etc.).
45
46       A language class is a class containing a lexicon of phrases as class
47       data, and possibly also some methods that are of use in interpreting
48       phrases in the lexicon, or otherwise dealing with text in that
49       language.
50
51       An object belonging to a language class is called a "language handle";
52       it's typically a flyweight object.
53
54       The normal course of action is to call:
55
56         use TkBocciBall::Localize;  # the localization project class
57         $lh = TkBocciBall::Localize->get_handle();
58          # Depending on the user's locale, etc., this will
59          # make a language handle from among the classes available,
60          # and any defaults that you declare.
61         die "Couldn't make a language handle??" unless $lh;
62
63       From then on, you use the "maketext" function to access entries in
64       whatever lexicon(s) belong to the language handle you got.  So, this:
65
66         print $lh->maketext("You won!"), "\n";
67
68       ...emits the right text for this language.  If the object in $lh
69       belongs to class "TkBocciBall::Localize::fr" and
70       %TkBocciBall::Localize::fr::Lexicon contains "("You won!"  => "Tu as
71       gagne!")", then the above code happily tells the user "Tu as gagne!".
72

METHODS

74       Locale::Maketext offers a variety of methods, which fall into three
75       categories:
76
77       ·   Methods to do with constructing language handles.
78
79       ·   "maketext" and other methods to do with accessing %Lexicon data for
80           a given language handle.
81
82       ·   Methods that you may find it handy to use, from routines of yours
83           that you put in %Lexicon entries.
84
85       These are covered in the following section.
86
87   Construction Methods
88       These are to do with constructing a language handle:
89
90       ·   $lh = YourProjClass->get_handle( ...langtags... ) || die "lg-
91           handle?";
92
93           This tries loading classes based on the language-tags you give
94           (like "("en-US", "sk", "kon", "es-MX", "ja", "i-klingon")", and for
95           the first class that succeeds, returns
96           YourProjClass::language->new().
97
98           If it runs thru the entire given list of language-tags, and finds
99           no classes for those exact terms, it then tries "superordinate"
100           language classes.  So if no "en-US" class (i.e.,
101           YourProjClass::en_us) was found, nor classes for anything else in
102           that list, we then try its superordinate, "en" (i.e.,
103           YourProjClass::en), and so on thru the other language-tags in the
104           given list: "es".  (The other language-tags in our example list:
105           happen to have no superordinates.)
106
107           If none of those language-tags leads to loadable classes, we then
108           try classes derived from YourProjClass->fallback_languages() and
109           then if nothing comes of that, we use classes named by
110           YourProjClass->fallback_language_classes().  Then in the (probably
111           quite unlikely) event that that fails, we just return undef.
112
113       ·   $lh = YourProjClass->get_handle() || die "lg-handle?";
114
115           When "get_handle" is called with an empty parameter list, magic
116           happens:
117
118           If "get_handle" senses that it's running in program that was
119           invoked as a CGI, then it tries to get language-tags out of the
120           environment variable "HTTP_ACCEPT_LANGUAGE", and it pretends that
121           those were the languages passed as parameters to "get_handle".
122
123           Otherwise (i.e., if not a CGI), this tries various OS-specific ways
124           to get the language-tags for the current locale/language, and then
125           pretends that those were the value(s) passed to "get_handle".
126
127           Currently this OS-specific stuff consists of looking in the
128           environment variables "LANG" and "LANGUAGE"; and on MSWin machines
129           (where those variables are typically unused), this also tries using
130           the module Win32::Locale to get a language-tag for whatever
131           language/locale is currently selected in the "Regional Settings"
132           (or "International"?)  Control Panel.  I welcome further
133           suggestions for making this do the Right Thing under other
134           operating systems that support localization.
135
136           If you're using localization in an application that keeps a
137           configuration file, you might consider something like this in your
138           project class:
139
140             sub get_handle_via_config {
141               my $class = $_[0];
142               my $chosen_language = $Config_settings{'language'};
143               my $lh;
144               if($chosen_language) {
145                 $lh = $class->get_handle($chosen_language)
146                  || die "No language handle for \"$chosen_language\""
147                       . " or the like";
148               } else {
149                 # Config file missing, maybe?
150                 $lh = $class->get_handle()
151                  || die "Can't get a language handle";
152               }
153               return $lh;
154             }
155
156       ·   $lh = YourProjClass::langname->new();
157
158           This constructs a language handle.  You usually don't call this
159           directly, but instead let "get_handle" find a language class to
160           "use" and to then call ->new on.
161
162       ·   $lh->init();
163
164           This is called by ->new to initialize newly-constructed language
165           handles.  If you define an init method in your class, remember that
166           it's usually considered a good idea to call $lh->SUPER::init in it
167           (presumably at the beginning), so that all classes get a chance to
168           initialize a new object however they see fit.
169
170       ·   YourProjClass->fallback_languages()
171
172           "get_handle" appends the return value of this to the end of
173           whatever list of languages you pass "get_handle".  Unless you
174           override this method, your project class will inherit
175           Locale::Maketext's "fallback_languages", which currently returns
176           "('i-default', 'en', 'en-US')".  ("i-default" is defined in RFC
177           2277).
178
179           This method (by having it return the name of a language-tag that
180           has an existing language class) can be used for making sure that
181           "get_handle" will always manage to construct a language handle
182           (assuming your language classes are in an appropriate @INC
183           directory).  Or you can use the next method:
184
185       ·   YourProjClass->fallback_language_classes()
186
187           "get_handle" appends the return value of this to the end of the
188           list of classes it will try using.  Unless you override this
189           method, your project class will inherit Locale::Maketext's
190           "fallback_language_classes", which currently returns an empty list,
191           "()".  By setting this to some value (namely, the name of a
192           loadable language class), you can be sure that "get_handle" will
193           always manage to construct a language handle.
194
195   The "maketext" Method
196       This is the most important method in Locale::Maketext:
197
198           $text = $lh->maketext(I<key>, ...parameters for this phrase...);
199
200       This looks in the %Lexicon of the language handle $lh and all its
201       superclasses, looking for an entry whose key is the string key.
202       Assuming such an entry is found, various things then happen, depending
203       on the value found:
204
205       If the value is a scalarref, the scalar is dereferenced and returned
206       (and any parameters are ignored).
207
208       If the value is a coderef, we return &$value($lh, ...parameters...).
209
210       If the value is a string that doesn't look like it's in Bracket
211       Notation, we return it (after replacing it with a scalarref, in its
212       %Lexicon).
213
214       If the value does look like it's in Bracket Notation, then we compile
215       it into a sub, replace the string in the %Lexicon with the new coderef,
216       and then we return &$new_sub($lh, ...parameters...).
217
218       Bracket Notation is discussed in a later section.  Note that trying to
219       compile a string into Bracket Notation can throw an exception if the
220       string is not syntactically valid (say, by not balancing brackets
221       right.)
222
223       Also, calling &$coderef($lh, ...parameters...) can throw any sort of
224       exception (if, say, code in that sub tries to divide by zero).  But a
225       very common exception occurs when you have Bracket Notation text that
226       says to call a method "foo", but there is no such method.  (E.g., "You
227       have [quatn,_1,ball]." will throw an exception on trying to call
228       $lh->quatn($_[1],'ball') -- you presumably meant "quant".)  "maketext"
229       catches these exceptions, but only to make the error message more
230       readable, at which point it rethrows the exception.
231
232       An exception may be thrown if key is not found in any of $lh's %Lexicon
233       hashes.  What happens if a key is not found, is discussed in a later
234       section, "Controlling Lookup Failure".
235
236       Note that you might find it useful in some cases to override the
237       "maketext" method with an "after method", if you want to translate
238       encodings, or even scripts:
239
240           package YrProj::zh_cn; # Chinese with PRC-style glyphs
241           use base ('YrProj::zh_tw');  # Taiwan-style
242           sub maketext {
243             my $self = shift(@_);
244             my $value = $self->maketext(@_);
245             return Chineeze::taiwan2mainland($value);
246           }
247
248       Or you may want to override it with something that traps any
249       exceptions, if that's critical to your program:
250
251         sub maketext {
252           my($lh, @stuff) = @_;
253           my $out;
254           eval { $out = $lh->SUPER::maketext(@stuff) };
255           return $out unless $@;
256           ...otherwise deal with the exception...
257         }
258
259       Other than those two situations, I don't imagine that it's useful to
260       override the "maketext" method.  (If you run into a situation where it
261       is useful, I'd be interested in hearing about it.)
262
263       $lh->fail_with or $lh->fail_with(PARAM)
264       $lh->failure_handler_auto
265           These two methods are discussed in the section "Controlling Lookup
266           Failure".
267
268   Utility Methods
269       These are methods that you may find it handy to use, generally from
270       %Lexicon routines of yours (whether expressed as Bracket Notation or
271       not).
272
273       $language->quant($number, $singular)
274       $language->quant($number, $singular, $plural)
275       $language->quant($number, $singular, $plural, $negative)
276           This is generally meant to be called from inside Bracket Notation
277           (which is discussed later), as in
278
279                "Your search matched [quant,_1,document]!"
280
281           It's for quantifying a noun (i.e., saying how much of it there is,
282           while giving the correct form of it).  The behavior of this method
283           is handy for English and a few other Western European languages,
284           and you should override it for languages where it's not suitable.
285           You can feel free to read the source, but the current
286           implementation is basically as this pseudocode describes:
287
288                if $number is 0 and there's a $negative,
289                   return $negative;
290                elsif $number is 1,
291                   return "1 $singular";
292                elsif there's a $plural,
293                   return "$number $plural";
294                else
295                   return "$number " . $singular . "s";
296                #
297                # ...except that we actually call numf to
298                #  stringify $number before returning it.
299
300           So for English (with Bracket Notation) "...[quant,_1,file]..." is
301           fine (for 0 it returns "0 files", for 1 it returns "1 file", and
302           for more it returns "2 files", etc.)
303
304           But for "directory", you'd want "[quant,_1,directory,directories]"
305           so that our elementary "quant" method doesn't think that the plural
306           of "directory" is "directorys".  And you might find that the output
307           may sound better if you specify a negative form, as in:
308
309                "[quant,_1,file,files,No files] matched your query.\n"
310
311           Remember to keep in mind verb agreement (or adjectives too, in
312           other languages), as in:
313
314                "[quant,_1,document] were matched.\n"
315
316           Because if _1 is one, you get "1 document were matched".  An
317           acceptable hack here is to do something like this:
318
319                "[quant,_1,document was, documents were] matched.\n"
320
321       $language->numf($number)
322           This returns the given number formatted nicely according to this
323           language's conventions.  Maketext's default method is mostly to
324           just take the normal string form of the number (applying sprintf
325           "%G" for only very large numbers), and then to add commas as
326           necessary.  (Except that we apply "tr/,./.,/" if
327           $language->{'numf_comma'} is true; that's a bit of a hack that's
328           useful for languages that express two million as "2.000.000" and
329           not as "2,000,000").
330
331           If you want anything fancier, consider overriding this with
332           something that uses Number::Format, or does something else
333           entirely.
334
335           Note that numf is called by quant for stringifying all quantifying
336           numbers.
337
338       $language->numerate($number, $singular, $plural, $negative)
339           This returns the given noun form which is appropriate for the
340           quantity $number according to this language's conventions.
341           "numerate" is used internally by "quant" to quantify nouns.  Use it
342           directly -- usually from bracket notation -- to avoid "quant"'s
343           implicit call to "numf" and output of a numeric quantity.
344
345       $language->sprintf($format, @items)
346           This is just a wrapper around Perl's normal "sprintf" function.
347           It's provided so that you can use "sprintf" in Bracket Notation:
348
349                "Couldn't access datanode [sprintf,%10x=~[%s~],_1,_2]!\n"
350
351           returning...
352
353                Couldn't access datanode      Stuff=[thangamabob]!
354
355       $language->language_tag()
356           Currently this just takes the last bit of "ref($language)", turns
357           underscores to dashes, and returns it.  So if $language is an
358           object of class Hee::HOO::Haw::en_us, $language->language_tag()
359           returns "en-us".  (Yes, the usual representation for that language
360           tag is "en-US", but case is never considered meaningful in
361           language-tag comparison.)
362
363           You may override this as you like; Maketext doesn't use it for
364           anything.
365
366       $language->encoding()
367           Currently this isn't used for anything, but it's provided (with
368           default value of "(ref($language) && $language->{'encoding'})) or
369           "iso-8859-1"" ) as a sort of suggestion that it may be
370           useful/necessary to associate encodings with your language handles
371           (whether on a per-class or even per-handle basis.)
372
373   Language Handle Attributes and Internals
374       A language handle is a flyweight object -- i.e., it doesn't
375       (necessarily) carry any data of interest, other than just being a
376       member of whatever class it belongs to.
377
378       A language handle is implemented as a blessed hash.  Subclasses of
379       yours can store whatever data you want in the hash.  Currently the only
380       hash entry used by any crucial Maketext method is "fail", so feel free
381       to use anything else as you like.
382
383       Remember: Don't be afraid to read the Maketext source if there's any
384       point on which this documentation is unclear.  This documentation is
385       vastly longer than the module source itself.
386

LANGUAGE CLASS HIERARCHIES

388       These are Locale::Maketext's assumptions about the class hierarchy
389       formed by all your language classes:
390
391       ·   You must have a project base class, which you load, and which you
392           then use as the first argument in the call to
393           YourProjClass->get_handle(...).  It should derive (whether directly
394           or indirectly) from Locale::Maketext.  It doesn't matter how you
395           name this class, although assuming this is the localization
396           component of your Super Mega Program, good names for your project
397           class might be SuperMegaProgram::Localization,
398           SuperMegaProgram::L10N, SuperMegaProgram::I18N,
399           SuperMegaProgram::International, or even
400           SuperMegaProgram::Languages or SuperMegaProgram::Messages.
401
402       ·   Language classes are what YourProjClass->get_handle will try to
403           load.  It will look for them by taking each language-tag (skipping
404           it if it doesn't look like a language-tag or locale-tag!), turning
405           it to all lowercase, turning dashes to underscores, and appending
406           it to YourProjClass . "::".  So this:
407
408             $lh = YourProjClass->get_handle(
409               'en-US', 'fr', 'kon', 'i-klingon', 'i-klingon-romanized'
410             );
411
412           will try loading the classes YourProjClass::en_us (note
413           lowercase!), YourProjClass::fr, YourProjClass::kon,
414           YourProjClass::i_klingon and YourProjClass::i_klingon_romanized.
415           (And it'll stop at the first one that actually loads.)
416
417       ·   I assume that each language class derives (directly or indirectly)
418           from your project class, and also defines its @ISA, its %Lexicon,
419           or both.  But I anticipate no dire consequences if these
420           assumptions do not hold.
421
422       ·   Language classes may derive from other language classes (although
423           they should have "use Thatclassname" or "use base
424           qw(...classes...)").  They may derive from the project class.  They
425           may derive from some other class altogether.  Or via multiple
426           inheritance, it may derive from any mixture of these.
427
428       ·   I foresee no problems with having multiple inheritance in your
429           hierarchy of language classes.  (As usual, however, Perl will
430           complain bitterly if you have a cycle in the hierarchy: i.e., if
431           any class is its own ancestor.)
432

ENTRIES IN EACH LEXICON

434       A typical %Lexicon entry is meant to signify a phrase, taking some
435       number (0 or more) of parameters.  An entry is meant to be accessed by
436       via a string key in $lh->maketext(key, ...parameters...), which should
437       return a string that is generally meant for be used for "output" to the
438       user -- regardless of whether this actually means printing to STDOUT,
439       writing to a file, or putting into a GUI widget.
440
441       While the key must be a string value (since that's a basic restriction
442       that Perl places on hash keys), the value in the lexicon can currently
443       be of several types: a defined scalar, scalarref, or coderef.  The use
444       of these is explained above, in the section 'The "maketext" Method',
445       and Bracket Notation for strings is discussed in the next section.
446
447       While you can use arbitrary unique IDs for lexicon keys (like
448       "_min_larger_max_error"), it is often useful for if an entry's key is
449       itself a valid value, like this example error message:
450
451         "Minimum ([_1]) is larger than maximum ([_2])!\n",
452
453       Compare this code that uses an arbitrary ID...
454
455         die $lh->maketext( "_min_larger_max_error", $min, $max )
456          if $min > $max;
457
458       ...to this code that uses a key-as-value:
459
460         die $lh->maketext(
461          "Minimum ([_1]) is larger than maximum ([_2])!\n",
462          $min, $max
463         ) if $min > $max;
464
465       The second is, in short, more readable.  In particular, it's obvious
466       that the number of parameters you're feeding to that phrase (two) is
467       the number of parameters that it wants to be fed.  (Since you see _1
468       and a _2 being used in the key there.)
469
470       Also, once a project is otherwise complete and you start to localize
471       it, you can scrape together all the various keys you use, and pass it
472       to a translator; and then the translator's work will go faster if what
473       he's presented is this:
474
475        "Minimum ([_1]) is larger than maximum ([_2])!\n",
476         => "",   # fill in something here, Jacques!
477
478       rather than this more cryptic mess:
479
480        "_min_larger_max_error"
481         => "",   # fill in something here, Jacques
482
483       I think that keys as lexicon values makes the completed lexicon entries
484       more readable:
485
486        "Minimum ([_1]) is larger than maximum ([_2])!\n",
487         => "Le minimum ([_1]) est plus grand que le maximum ([_2])!\n",
488
489       Also, having valid values as keys becomes very useful if you set up an
490       _AUTO lexicon.  _AUTO lexicons are discussed in a later section.
491
492       I almost always use keys that are themselves valid lexicon values.  One
493       notable exception is when the value is quite long.  For example, to get
494       the screenful of data that a command-line program might return when
495       given an unknown switch, I often just use a brief, self-explanatory key
496       such as "_USAGE_MESSAGE".  At that point I then go and immediately to
497       define that lexicon entry in the ProjectClass::L10N::en lexicon (since
498       English is always my "project language"):
499
500         '_USAGE_MESSAGE' => <<'EOSTUFF',
501         ...long long message...
502         EOSTUFF
503
504       and then I can use it as:
505
506         getopt('oDI', \%opts) or die $lh->maketext('_USAGE_MESSAGE');
507
508       Incidentally, note that each class's %Lexicon inherits-and-extends the
509       lexicons in its superclasses.  This is not because these are special
510       hashes per se, but because you access them via the "maketext" method,
511       which looks for entries across all the %Lexicon hashes in a language
512       class and all its ancestor classes.  (This is because the idea of
513       "class data" isn't directly implemented in Perl, but is instead left to
514       individual class-systems to implement as they see fit..)
515
516       Note that you may have things stored in a lexicon besides just phrases
517       for output:  for example, if your program takes input from the
518       keyboard, asking a "(Y/N)" question, you probably need to know what the
519       equivalent of "Y[es]/N[o]" is in whatever language.  You probably also
520       need to know what the equivalents of the answers "y" and "n" are.  You
521       can store that information in the lexicon (say, under the keys
522       "~answer_y" and "~answer_n", and the long forms as "~answer_yes" and
523       "~answer_no", where "~" is just an ad-hoc character meant to indicate
524       to programmers/translators that these are not phrases for output).
525
526       Or instead of storing this in the language class's lexicon, you can
527       (and, in some cases, really should) represent the same bit of knowledge
528       as code in a method in the language class.  (That leaves a tidy
529       distinction between the lexicon as the things we know how to say, and
530       the rest of the things in the lexicon class as things that we know how
531       to do.)  Consider this example of a processor for responses to French
532       "oui/non" questions:
533
534         sub y_or_n {
535           return undef unless defined $_[1] and length $_[1];
536           my $answer = lc $_[1];  # smash case
537           return 1 if $answer eq 'o' or $answer eq 'oui';
538           return 0 if $answer eq 'n' or $answer eq 'non';
539           return undef;
540         }
541
542       ...which you'd then call in a construct like this:
543
544         my $response;
545         until(defined $response) {
546           print $lh->maketext("Open the pod bay door (y/n)? ");
547           $response = $lh->y_or_n( get_input_from_keyboard_somehow() );
548         }
549         if($response) { $pod_bay_door->open()         }
550         else          { $pod_bay_door->leave_closed() }
551
552       Other data worth storing in a lexicon might be things like filenames
553       for language-targetted resources:
554
555         ...
556         "_main_splash_png"
557           => "/styles/en_us/main_splash.png",
558         "_main_splash_imagemap"
559           => "/styles/en_us/main_splash.incl",
560         "_general_graphics_path"
561           => "/styles/en_us/",
562         "_alert_sound"
563           => "/styles/en_us/hey_there.wav",
564         "_forward_icon"
565          => "left_arrow.png",
566         "_backward_icon"
567          => "right_arrow.png",
568         # In some other languages, left equals
569         #  BACKwards, and right is FOREwards.
570         ...
571
572       You might want to do the same thing for expressing key bindings or the
573       like (since hardwiring "q" as the binding for the function that quits a
574       screen/menu/program is useful only if your language happens to
575       associate "q" with "quit"!)
576

BRACKET NOTATION

578       Bracket Notation is a crucial feature of Locale::Maketext.  I mean
579       Bracket Notation to provide a replacement for the use of sprintf
580       formatting.  Everything you do with Bracket Notation could be done with
581       a sub block, but bracket notation is meant to be much more concise.
582
583       Bracket Notation is a like a miniature "template" system (in the sense
584       of Text::Template, not in the sense of C++ templates), where normal
585       text is passed thru basically as is, but text in special regions is
586       specially interpreted.  In Bracket Notation, you use square brackets
587       ("[...]"), not curly braces ("{...}") to note sections that are
588       specially interpreted.
589
590       For example, here all the areas that are taken literally are underlined
591       with a "^", and all the in-bracket special regions are underlined with
592       an X:
593
594         "Minimum ([_1]) is larger than maximum ([_2])!\n",
595          ^^^^^^^^^ XX ^^^^^^^^^^^^^^^^^^^^^^^^^^ XX ^^^^
596
597       When that string is compiled from bracket notation into a real Perl
598       sub, it's basically turned into:
599
600         sub {
601           my $lh = $_[0];
602           my @params = @_;
603           return join '',
604             "Minimum (",
605             ...some code here...
606             ") is larger than maximum (",
607             ...some code here...
608             ")!\n",
609         }
610         # to be called by $lh->maketext(KEY, params...)
611
612       In other words, text outside bracket groups is turned into string
613       literals.  Text in brackets is rather more complex, and currently
614       follows these rules:
615
616       ·   Bracket groups that are empty, or which consist only of whitespace,
617           are ignored.  (Examples: "[]", "[    ]", or a [ and a ] with
618           returns and/or tabs and/or spaces between them.
619
620           Otherwise, each group is taken to be a comma-separated group of
621           items, and each item is interpreted as follows:
622
623       ·   An item that is "_digits" or "_-digits" is interpreted as
624           $_[value].  I.e., "_1" becomes with $_[1], and "_-3" is interpreted
625           as $_[-3] (in which case @_ should have at least three elements in
626           it).  Note that $_[0] is the language handle, and is typically not
627           named directly.
628
629       ·   An item "_*" is interpreted to mean "all of @_ except $_[0]".
630           I.e., @_[1..$#_].  Note that this is an empty list in the case of
631           calls like $lh->maketext(key) where there are no parameters (except
632           $_[0], the language handle).
633
634       ·   Otherwise, each item is interpreted as a string literal.
635
636       The group as a whole is interpreted as follows:
637
638       ·   If the first item in a bracket group looks like a method name, then
639           that group is interpreted like this:
640
641             $lh->that_method_name(
642               ...rest of items in this group...
643             ),
644
645       ·   If the first item in a bracket group is "*", it's taken as
646           shorthand for the so commonly called "quant" method.  Similarly, if
647           the first item in a bracket group is "#", it's taken to be
648           shorthand for "numf".
649
650       ·   If the first item in a bracket group is the empty-string, or "_*"
651           or "_digits" or "_-digits", then that group is interpreted as just
652           the interpolation of all its items:
653
654             join('',
655               ...rest of items in this group...
656             ),
657
658           Examples:  "[_1]" and "[,_1]", which are synonymous; and
659           ""[,ID-(,_4,-,_2,)]"", which compiles as "join "", "ID-(", $_[4],
660           "-", $_[2], ")"".
661
662       ·   Otherwise this bracket group is invalid.  For example, in the group
663           "[!@#,whatever]", the first item "!@#" is neither the empty-string,
664           "_number", "_-number", "_*", nor a valid method name; and so
665           Locale::Maketext will throw an exception of you try compiling an
666           expression containing this bracket group.
667
668       Note, incidentally, that items in each group are comma-separated, not
669       "/\s*,\s*/"-separated.  That is, you might expect that this bracket
670       group:
671
672         "Hoohah [foo, _1 , bar ,baz]!"
673
674       would compile to this:
675
676         sub {
677           my $lh = $_[0];
678           return join '',
679             "Hoohah ",
680             $lh->foo( $_[1], "bar", "baz"),
681             "!",
682         }
683
684       But it actually compiles as this:
685
686         sub {
687           my $lh = $_[0];
688           return join '',
689             "Hoohah ",
690             $lh->foo(" _1 ", " bar ", "baz"),  # note the <space> in " bar "
691             "!",
692         }
693
694       In the notation discussed so far, the characters "[" and "]" are given
695       special meaning, for opening and closing bracket groups, and "," has a
696       special meaning inside bracket groups, where it separates items in the
697       group.  This begs the question of how you'd express a literal "[" or
698       "]" in a Bracket Notation string, and how you'd express a literal comma
699       inside a bracket group.  For this purpose I've adopted "~" (tilde) as
700       an escape character:  "~[" means a literal '[' character anywhere in
701       Bracket Notation (i.e., regardless of whether you're in a bracket group
702       or not), and ditto for "~]" meaning a literal ']', and "~," meaning a
703       literal comma.  (Altho "," means a literal comma outside of bracket
704       groups -- it's only inside bracket groups that commas are special.)
705
706       And on the off chance you need a literal tilde in a bracket expression,
707       you get it with "~~".
708
709       Currently, an unescaped "~" before a character other than a bracket or
710       a comma is taken to mean just a "~" and that character.  I.e., "~X"
711       means the same as "~~X" -- i.e., one literal tilde, and then one
712       literal "X".  However, by using "~X", you are assuming that no future
713       version of Maketext will use "~X" as a magic escape sequence.  In
714       practice this is not a great problem, since first off you can just
715       write "~~X" and not worry about it; second off, I doubt I'll add lots
716       of new magic characters to bracket notation; and third off, you aren't
717       likely to want literal "~" characters in your messages anyway, since
718       it's not a character with wide use in natural language text.
719
720       Brackets must be balanced -- every openbracket must have one matching
721       closebracket, and vice versa.  So these are all invalid:
722
723         "I ate [quant,_1,rhubarb pie."
724         "I ate [quant,_1,rhubarb pie[."
725         "I ate quant,_1,rhubarb pie]."
726         "I ate quant,_1,rhubarb pie[."
727
728       Currently, bracket groups do not nest.  That is, you cannot say:
729
730         "Foo [bar,baz,[quux,quuux]]\n";
731
732       If you need a notation that's that powerful, use normal Perl:
733
734         %Lexicon = (
735           ...
736           "some_key" => sub {
737             my $lh = $_[0];
738             join '',
739               "Foo ",
740               $lh->bar('baz', $lh->quux('quuux')),
741               "\n",
742           },
743           ...
744         );
745
746       Or write the "bar" method so you don't need to pass it the output from
747       calling quux.
748
749       I do not anticipate that you will need (or particularly want) to nest
750       bracket groups, but you are welcome to email me with convincing (real-
751       life) arguments to the contrary.
752

AUTO LEXICONS

754       If maketext goes to look in an individual %Lexicon for an entry for key
755       (where key does not start with an underscore), and sees none, but does
756       see an entry of "_AUTO" => some_true_value, then we actually define
757       $Lexicon{key} = key right then and there, and then use that value as if
758       it had been there all along.  This happens before we even look in any
759       superclass %Lexicons!
760
761       (This is meant to be somewhat like the AUTOLOAD mechanism in Perl's
762       function call system -- or, looked at another way, like the AutoLoader
763       module.)
764
765       I can picture all sorts of circumstances where you just do not want
766       lookup to be able to fail (since failing normally means that maketext
767       throws a "die", although see the next section for greater control over
768       that).  But here's one circumstance where _AUTO lexicons are meant to
769       be especially useful:
770
771       As you're writing an application, you decide as you go what messages
772       you need to emit.  Normally you'd go to write this:
773
774         if(-e $filename) {
775           go_process_file($filename)
776         } else {
777           print qq{Couldn't find file "$filename"!\n};
778         }
779
780       but since you anticipate localizing this, you write:
781
782         use ThisProject::I18N;
783         my $lh = ThisProject::I18N->get_handle();
784          # For the moment, assume that things are set up so
785          # that we load class ThisProject::I18N::en
786          # and that that's the class that $lh belongs to.
787         ...
788         if(-e $filename) {
789           go_process_file($filename)
790         } else {
791           print $lh->maketext(
792             qq{Couldn't find file "[_1]"!\n}, $filename
793           );
794         }
795
796       Now, right after you've just written the above lines, you'd normally
797       have to go open the file ThisProject/I18N/en.pm, and immediately add an
798       entry:
799
800         "Couldn't find file \"[_1]\"!\n"
801         => "Couldn't find file \"[_1]\"!\n",
802
803       But I consider that somewhat of a distraction from the work of getting
804       the main code working -- to say nothing of the fact that I often have
805       to play with the program a few times before I can decide exactly what
806       wording I want in the messages (which in this case would require me to
807       go changing three lines of code: the call to maketext with that key,
808       and then the two lines in ThisProject/I18N/en.pm).
809
810       However, if you set "_AUTO => 1" in the %Lexicon in,
811       ThisProject/I18N/en.pm (assuming that English (en) is the language that
812       all your programmers will be using for this project's internal message
813       keys), then you don't ever have to go adding lines like this
814
815         "Couldn't find file \"[_1]\"!\n"
816         => "Couldn't find file \"[_1]\"!\n",
817
818       to ThisProject/I18N/en.pm, because if _AUTO is true there, then just
819       looking for an entry with the key "Couldn't find file \"[_1]\"!\n" in
820       that lexicon will cause it to be added, with that value!
821
822       Note that the reason that keys that start with "_" are immune to _AUTO
823       isn't anything generally magical about the underscore character -- I
824       just wanted a way to have most lexicon keys be autoable, except for
825       possibly a few, and I arbitrarily decided to use a leading underscore
826       as a signal to distinguish those few.
827

READONLY LEXICONS

829       If your lexicon is a tied hash the simple act of caching the compiled
830       value can be fatal.
831
832       For example a GDBM_File GDBM_READER tied hash will die with something
833       like:
834
835          gdbm store returned -1, errno 2, key "..." at ...
836
837       All you need to do is turn on caching outside of the lexicon hash
838       itself like so:
839
840          sub init {
841              my ($lh) = @_;
842              ...
843              $lh->{'use_external_lex_cache'} = 1;
844              ...
845          }
846
847       And then instead of storing the compiled value in the lexicon hash it
848       will store it in $lh->{'_external_lex_cache'}
849

CONTROLLING LOOKUP FAILURE

851       If you call $lh->maketext(key, ...parameters...), and there's no entry
852       key in $lh's class's %Lexicon, nor in the superclass %Lexicon hash, and
853       if we can't auto-make key (because either it starts with a "_", or
854       because none of its lexicons have "_AUTO => 1,"), then we have failed
855       to find a normal way to maketext key.  What then happens in these
856       failure conditions, depends on the $lh object's "fail" attribute.
857
858       If the language handle has no "fail" attribute, maketext will simply
859       throw an exception (i.e., it calls "die", mentioning the key whose
860       lookup failed, and naming the line number where the calling
861       $lh->maketext(key,...) was.
862
863       If the language handle has a "fail" attribute whose value is a coderef,
864       then $lh->maketext(key,...params...) gives up and calls:
865
866         return $that_subref->($lh, $key, @params);
867
868       Otherwise, the "fail" attribute's value should be a string denoting a
869       method name, so that $lh->maketext(key,...params...) can give up with:
870
871         return $lh->$that_method_name($phrase, @params);
872
873       The "fail" attribute can be accessed with the "fail_with" method:
874
875         # Set to a coderef:
876         $lh->fail_with( \&failure_handler );
877
878         # Set to a method name:
879         $lh->fail_with( 'failure_method' );
880
881         # Set to nothing (i.e., so failure throws a plain exception)
882         $lh->fail_with( undef );
883
884         # Get the current value
885         $handler = $lh->fail_with();
886
887       Now, as to what you may want to do with these handlers:  Maybe you'd
888       want to log what key failed for what class, and then die.  Maybe you
889       don't like "die" and instead you want to send the error message to
890       STDOUT (or wherever) and then merely "exit()".
891
892       Or maybe you don't want to "die" at all!  Maybe you could use a handler
893       like this:
894
895         # Make all lookups fall back onto an English value,
896         #  but only after we log it for later fingerpointing.
897         my $lh_backup = ThisProject->get_handle('en');
898         open(LEX_FAIL_LOG, ">>wherever/lex.log") || die "GNAARGH $!";
899         sub lex_fail {
900           my($failing_lh, $key, $params) = @_;
901           print LEX_FAIL_LOG scalar(localtime), "\t",
902              ref($failing_lh), "\t", $key, "\n";
903           return $lh_backup->maketext($key,@params);
904         }
905
906       Some users have expressed that they think this whole mechanism of
907       having a "fail" attribute at all, seems a rather pointless
908       complication.  But I want Locale::Maketext to be usable for software
909       projects of any scale and type; and different software projects have
910       different ideas of what the right thing is to do in failure conditions.
911       I could simply say that failure always throws an exception, and that if
912       you want to be careful, you'll just have to wrap every call to
913       $lh->maketext in an eval { }.  However, I want programmers to reserve
914       the right (via the "fail" attribute) to treat lookup failure as
915       something other than an exception of the same level of severity as a
916       config file being unreadable, or some essential resource being
917       inaccessible.
918
919       One possibly useful value for the "fail" attribute is the method name
920       "failure_handler_auto".  This is a method defined in the class
921       Locale::Maketext itself.  You set it with:
922
923         $lh->fail_with('failure_handler_auto');
924
925       Then when you call $lh->maketext(key, ...parameters...) and there's no
926       key in any of those lexicons, maketext gives up with
927
928         return $lh->failure_handler_auto($key, @params);
929
930       But failure_handler_auto, instead of dying or anything, compiles $key,
931       caching it in
932
933           $lh->{'failure_lex'}{$key} = $complied
934
935       and then calls the compiled value, and returns that.  (I.e., if $key
936       looks like bracket notation, $compiled is a sub, and we return
937       &{$compiled}(@params); but if $key is just a plain string, we just
938       return that.)
939
940       The effect of using "failure_auto_handler" is like an AUTO lexicon,
941       except that it 1) compiles $key even if it starts with "_", and 2) you
942       have a record in the new hashref $lh->{'failure_lex'} of all the keys
943       that have failed for this object.  This should avoid your program dying
944       -- as long as your keys aren't actually invalid as bracket code, and as
945       long as they don't try calling methods that don't exist.
946
947       "failure_auto_handler" may not be exactly what you want, but I hope it
948       at least shows you that maketext failure can be mitigated in any number
949       of very flexible ways.  If you can formalize exactly what you want, you
950       should be able to express that as a failure handler.  You can even make
951       it default for every object of a given class, by setting it in that
952       class's init:
953
954         sub init {
955           my $lh = $_[0];  # a newborn handle
956           $lh->SUPER::init();
957           $lh->fail_with('my_clever_failure_handler');
958           return;
959         }
960         sub my_clever_failure_handler {
961           ...you clever things here...
962         }
963

HOW TO USE MAKETEXT

965       Here is a brief checklist on how to use Maketext to localize
966       applications:
967
968       ·   Decide what system you'll use for lexicon keys.  If you insist, you
969           can use opaque IDs (if you're nostalgic for "catgets"), but I have
970           better suggestions in the section "Entries in Each Lexicon", above.
971           Assuming you opt for meaningful keys that double as values (like
972           "Minimum ([_1]) is larger than maximum ([_2])!\n"), you'll have to
973           settle on what language those should be in.  For the sake of
974           argument, I'll call this English, specifically American English,
975           "en-US".
976
977       ·   Create a class for your localization project.  This is the name of
978           the class that you'll use in the idiom:
979
980             use Projname::L10N;
981             my $lh = Projname::L10N->get_handle(...) || die "Language?";
982
983           Assuming you call your class Projname::L10N, create a class
984           consisting minimally of:
985
986             package Projname::L10N;
987             use base qw(Locale::Maketext);
988             ...any methods you might want all your languages to share...
989
990             # And, assuming you want the base class to be an _AUTO lexicon,
991             # as is discussed a few sections up:
992
993             1;
994
995       ·   Create a class for the language your internal keys are in.  Name
996           the class after the language-tag for that language, in lowercase,
997           with dashes changed to underscores.  Assuming your project's first
998           language is US English, you should call this Projname::L10N::en_us.
999           It should consist minimally of:
1000
1001             package Projname::L10N::en_us;
1002             use base qw(Projname::L10N);
1003             %Lexicon = (
1004               '_AUTO' => 1,
1005             );
1006             1;
1007
1008           (For the rest of this section, I'll assume that this "first
1009           language class" of Projname::L10N::en_us has _AUTO lexicon.)
1010
1011       ·   Go and write your program.  Everywhere in your program where you
1012           would say:
1013
1014             print "Foobar $thing stuff\n";
1015
1016           instead do it thru maketext, using no variable interpolation in the
1017           key:
1018
1019             print $lh->maketext("Foobar [_1] stuff\n", $thing);
1020
1021           If you get tired of constantly saying "print $lh->maketext",
1022           consider making a functional wrapper for it, like so:
1023
1024             use Projname::L10N;
1025             use vars qw($lh);
1026             $lh = Projname::L10N->get_handle(...) || die "Language?";
1027             sub pmt (@) { print( $lh->maketext(@_)) }
1028              # "pmt" is short for "Print MakeText"
1029             $Carp::Verbose = 1;
1030              # so if maketext fails, we see made the call to pmt
1031
1032           Besides whole phrases meant for output, anything language-dependent
1033           should be put into the class Projname::L10N::en_us, whether as
1034           methods, or as lexicon entries -- this is discussed in the section
1035           "Entries in Each Lexicon", above.
1036
1037       ·   Once the program is otherwise done, and once its localization for
1038           the first language works right (via the data and methods in
1039           Projname::L10N::en_us), you can get together the data for
1040           translation.  If your first language lexicon isn't an _AUTO
1041           lexicon, then you already have all the messages explicitly in the
1042           lexicon (or else you'd be getting exceptions thrown when you call
1043           $lh->maketext to get messages that aren't in there).  But if you
1044           were (advisedly) lazy and are using an _AUTO lexicon, then you've
1045           got to make a list of all the phrases that you've so far been
1046           letting _AUTO generate for you.  There are very many ways to
1047           assemble such a list.  The most straightforward is to simply grep
1048           the source for every occurrence of "maketext" (or calls to wrappers
1049           around it, like the above "pmt" function), and to log the following
1050           phrase.
1051
1052       ·   You may at this point want to consider whether your base class
1053           (Projname::L10N), from which all lexicons inherit from
1054           (Projname::L10N::en, Projname::L10N::es, etc.), should be an _AUTO
1055           lexicon.  It may be true that in theory, all needed messages will
1056           be in each language class; but in the presumably unlikely or
1057           "impossible" case of lookup failure, you should consider whether
1058           your program should throw an exception, emit text in English (or
1059           whatever your project's first language is), or some more complex
1060           solution as described in the section "Controlling Lookup Failure",
1061           above.
1062
1063       ·   Submit all messages/phrases/etc. to translators.
1064
1065           (You may, in fact, want to start with localizing to one other
1066           language at first, if you're not sure that you've properly
1067           abstracted the language-dependent parts of your code.)
1068
1069           Translators may request clarification of the situation in which a
1070           particular phrase is found.  For example, in English we are
1071           entirely happy saying "n files found", regardless of whether we
1072           mean "I looked for files, and found n of them" or the rather
1073           distinct situation of "I looked for something else (like lines in
1074           files), and along the way I saw n files."  This may involve
1075           rethinking things that you thought quite clear: should "Edit" on a
1076           toolbar be a noun ("editing") or a verb ("to edit")?  Is there
1077           already a conventionalized way to express that menu option,
1078           separate from the target language's normal word for "to edit"?
1079
1080           In all cases where the very common phenomenon of quantification
1081           (saying "N files", for any value of N) is involved, each translator
1082           should make clear what dependencies the number causes in the
1083           sentence.  In many cases, dependency is limited to words adjacent
1084           to the number, in places where you might expect them ("I found
1085           the-?PLURAL N empty-?PLURAL directory-?PLURAL"), but in some cases
1086           there are unexpected dependencies ("I found-?PLURAL ..."!) as well
1087           as long-distance dependencies "The N directory-?PLURAL could not be
1088           deleted-?PLURAL"!).
1089
1090           Remind the translators to consider the case where N is 0: "0 files
1091           found" isn't exactly natural-sounding in any language, but it may
1092           be unacceptable in many -- or it may condition special kinds of
1093           agreement (similar to English "I didN'T find ANY files").
1094
1095           Remember to ask your translators about numeral formatting in their
1096           language, so that you can override the "numf" method as
1097           appropriate.  Typical variables in number formatting are:  what to
1098           use as a decimal point (comma? period?); what to use as a thousands
1099           separator (space? nonbreaking space? comma? period? small middot?
1100           prime? apostrophe?); and even whether the so-called "thousands
1101           separator" is actually for every third digit -- I've heard reports
1102           of two hundred thousand being expressible as "2,00,000" for some
1103           Indian (Subcontinental) languages, besides the less surprising
1104           "200 000", "200.000", "200,000", and "200'000".  Also, using a set
1105           of numeral glyphs other than the usual ASCII "0"-"9" might be
1106           appreciated, as via "tr/0-9/\x{0966}-\x{096F}/" for getting digits
1107           in Devanagari script (for Hindi, Konkani, others).
1108
1109           The basic "quant" method that Locale::Maketext provides should be
1110           good for many languages.  For some languages, it might be useful to
1111           modify it (or its constituent "numerate" method) to take a plural
1112           form in the two-argument call to "quant" (as in "[quant,_1,files]")
1113           if it's all-around easier to infer the singular form from the
1114           plural, than to infer the plural form from the singular.
1115
1116           But for other languages (as is discussed at length in
1117           Locale::Maketext::TPJ13), simple "quant"/"numf" is not enough.  For
1118           the particularly problematic Slavic languages, what you may need is
1119           a method which you provide with the number, the citation form of
1120           the noun to quantify, and the case and gender that the sentence's
1121           syntax projects onto that noun slot.  The method would then be
1122           responsible for determining what grammatical number that numeral
1123           projects onto its noun phrase, and what case and gender it may
1124           override the normal case and gender with; and then it would look up
1125           the noun in a lexicon providing all needed inflected forms.
1126
1127       ·   You may also wish to discuss with the translators the question of
1128           how to relate different subforms of the same language tag,
1129           considering how this reacts with "get_handle"'s treatment of these.
1130           For example, if a user accepts interfaces in "en, fr", and you have
1131           interfaces available in "en-US" and "fr", what should they get?
1132           You may wish to resolve this by establishing that "en" and "en-US"
1133           are effectively synonymous, by having one class zero-derive from
1134           the other.
1135
1136           For some languages this issue may never come up (Danish is rarely
1137           expressed as "da-DK", but instead is just "da").  And for other
1138           languages, the whole concept of a "generic" form may verge on being
1139           uselessly vague, particularly for interfaces involving voice media
1140           in forms of Arabic or Chinese.
1141
1142       ·   Once you've localized your program/site/etc. for all desired
1143           languages, be sure to show the result (whether live, or via
1144           screenshots) to the translators.  Once they approve, make every
1145           effort to have it then checked by at least one other speaker of
1146           that language.  This holds true even when (or especially when) the
1147           translation is done by one of your own programmers.  Some kinds of
1148           systems may be harder to find testers for than others, depending on
1149           the amount of domain-specific jargon and concepts involved -- it's
1150           easier to find people who can tell you whether they approve of your
1151           translation for "delete this message" in an email-via-Web
1152           interface, than to find people who can give you an informed opinion
1153           on your translation for "attribute value" in an XML query tool's
1154           interface.
1155

SEE ALSO

1157       I recommend reading all of these:
1158
1159       Locale::Maketext::TPJ13 -- my The Perl Journal article about Maketext.
1160       It explains many important concepts underlying Locale::Maketext's
1161       design, and some insight into why Maketext is better than the plain old
1162       approach of having message catalogs that are just databases of sprintf
1163       formats.
1164
1165       File::Findgrep is a sample application/module that uses
1166       Locale::Maketext to localize its messages.  For a larger
1167       internationalized system, see also Apache::MP3.
1168
1169       I18N::LangTags.
1170
1171       Win32::Locale.
1172
1173       RFC 3066, Tags for the Identification of Languages, as at
1174       http://sunsite.dk/RFC/rfc/rfc3066.html
1175
1176       RFC 2277, IETF Policy on Character Sets and Languages is at
1177       http://sunsite.dk/RFC/rfc/rfc2277.html -- much of it is just things of
1178       interest to protocol designers, but it explains some basic concepts,
1179       like the distinction between locales and language-tags.
1180
1181       The manual for GNU "gettext".  The gettext dist is available in
1182       "ftp://prep.ai.mit.edu/pub/gnu/" -- get a recent gettext tarball and
1183       look in its "doc/" directory, there's an easily browsable HTML version
1184       in there.  The gettext documentation asks lots of questions worth
1185       thinking about, even if some of their answers are sometimes wonky,
1186       particularly where they start talking about pluralization.
1187
1188       The Locale/Maketext.pm source.  Obverse that the module is much shorter
1189       than its documentation!
1190
1192       Copyright (c) 1999-2004 Sean M. Burke.  All rights reserved.
1193
1194       This library is free software; you can redistribute it and/or modify it
1195       under the same terms as Perl itself.
1196
1197       This program is distributed in the hope that it will be useful, but
1198       without any warranty; without even the implied warranty of
1199       merchantability or fitness for a particular purpose.
1200

AUTHOR

1202       Sean M. Burke "sburke@cpan.org"
1203
1204
1205
1206perl v5.16.3                      2012-11-27               Locale::Maketext(3)
Impressum