1Locale::Po4a::Xml(3pm)            Po4a Tools            Locale::Po4a::Xml(3pm)
2
3
4

NAME

6       Locale::Po4a::Xml - convert XML documents and derivates from/to PO
7       files
8

DESCRIPTION

10       The po4a (PO for anything) project goal is to ease translations (and
11       more interestingly, the maintenance of translations) using gettext
12       tools on areas where they were not expected like documentation.
13
14       Locale::Po4a::Xml is a module to help the translation of XML documents
15       into other [human] languages. It can also be used as a base to build
16       modules for XML-based documents.
17

TRANSLATING WITH PO4A::XML

19       This module can be used directly to handle generic XML documents.  This
20       will extract all tag's content, and no attributes, since it's where the
21       text is written in most XML based documents.
22
23       There are some options (described in the next section) that can
24       customize this behavior.  If this doesn't fit to your document format
25       you're encouraged to write your own module derived from this, to
26       describe your format's details.  See the section WRITING DERIVATE
27       MODULES below, for the process description.
28

OPTIONS ACCEPTED BY THIS MODULE

30       The global debug option causes this module to show the excluded
31       strings, in order to see if it skips something important.
32
33       These are this module's particular options:
34
35       nostrip
36           Prevents it to strip the spaces around the extracted strings.
37
38       wrap
39           Canonicalizes the string to translate, considering that whitespaces
40           are not important, and wraps the translated document. This option
41           can be overridden by custom tag options. See the translated option
42           below.
43
44       unwrap_attributes
45           Attributes are wrapped by default. This option disables wrapping.
46
47       caseinsensitive
48           It makes the tags and attributes searching to work in a case
49           insensitive way.  If it's defined, it will treat <BooK>laNG and
50           <BOOK>Lang as <book>lang.
51
52       escapequotes
53           Escape quotes in output strings.  Necessary, for example, for
54           creating string resources for use by Android build tools.
55
56           See also:
57           https://developer.android.com/guide/topics/resources/string-resource.html
58
59       includeexternal
60           When defined, external entities are included in the generated
61           (translated) document, and for the extraction of strings.  If it's
62           not defined, you will have to translate external entities
63           separately as independent documents.
64
65       ontagerror
66           This option defines the behavior of the module when it encounters
67           invalid XML syntax (a closing tag which does not match the last
68           opening tag).  It can take the following values:
69
70           fail
71               This is the default value.  The module will exit with an error.
72
73           warn
74               The module will continue, and will issue a warning.
75
76           silent
77               The module will continue without any warnings.
78
79           Be careful when using this option.  It is generally recommended to
80           fix the input file.
81
82       tagsonly
83           Note: This option is deprecated.
84
85           Extracts only the specified tags in the "tags" option.  Otherwise,
86           it will extract all the tags except the ones specified.
87
88       doctype
89           String that will try to match with the first line of the document's
90           doctype (if defined). If it doesn't, a warning will indicate that
91           the document might be of a bad type.
92
93       addlang
94           String indicating the path (e.g. <bbb><aaa>) of a tag where a
95           lang="..." attribute shall be added. The language will be defined
96           as the basename of the PO file without any .po extension.
97
98       optionalclosingtag
99           Boolean indicating whether closing tags are optional (as in HTML).
100           By default, missing closing tags raise an error handled according
101           to "ontagerror".
102
103       tags
104           Note: This option is deprecated.  You should use the translated and
105           untranslated options instead.
106
107           Space-separated list of tags you want to translate or skip.  By
108           default, the specified tags will be excluded, but if you use the
109           "tagsonly" option, the specified tags will be the only ones
110           included.  The tags must be in the form <aaa>, but you can join
111           some (<bbb><aaa>) to say that the content of the tag <aaa> will
112           only be translated when it's into a <bbb> tag.
113
114           You can also specify some tag options by putting some characters in
115           front of the tag hierarchy. For example, you can put 'w' (wrap) or
116           'W' (don't wrap) to override the default behavior specified by the
117           global "wrap" option.
118
119           Example: W<chapter><title>
120
121       attributes
122           Space-separated list of tag's attributes you want to translate.
123           You can specify the attributes by their name (for example, "lang"),
124           but you can prefix it with a tag hierarchy, to specify that this
125           attribute will only be translated when it's in the specified tag.
126           For example: <bbb><aaa>lang specifies that the lang attribute will
127           only be translated if it's in an <aaa> tag, and it's in a <bbb>
128           tag.
129
130       foldattributes
131           Do not translate attributes in inline tags.  Instead, replace all
132           attributes of a tag by po4a-id=<id>.
133
134           This is useful when attributes shall not be translated, as this
135           simplifies the strings for translators, and avoids typos.
136
137       customtag
138           Space-separated list of tags which should not be treated as tags.
139           These tags are treated as inline, and do not need to be closed.
140
141       break
142           Space-separated list of tags which should break the sequence.  By
143           default, all tags break the sequence.
144
145           The tags must be in the form <aaa>, but you can join some
146           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
147           within another tag (<bbb>).
148
149           Please note a tag should be listed in only one of the break, inline
150           placeholder, or customtag setting string.
151
152       inline
153           Space-separated list of tags which should be treated as inline.  By
154           default, all tags break the sequence.
155
156           The tags must be in the form <aaa>, but you can join some
157           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
158           within another tag (<bbb>).
159
160       placeholder
161           Space-separated list of tags which should be treated as
162           placeholders.  Placeholders do not break the sequence, but the
163           content of placeholders is translated separately.
164
165           The location of the placeholder in its block will be marked with a
166           string similar to:
167
168             <placeholder type=\"footnote\" id=\"0\"/>
169
170           The tags must be in the form <aaa>, but you can join some
171           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
172           within another tag (<bbb>).
173
174       break-pi
175           By default, Processing Instructions (i.e., "<? ... ?"> tags) are
176           handled as inline tags.  Pass this option if you want the PI to be
177           handled as breaking tag.  Note that unprocessed PHP tags are
178           handled as Processing Instructions by the parser.
179
180       nodefault
181           Space separated list of tags that the module should not try to set
182           by default in any category.
183
184           If you have a tag which has its default setting by the subclass of
185           this module but you want to set alternative setting, you need to
186           list that tag as a part of the nodefault setting string.
187
188       cpp Support C preprocessor directives.  When this option is set, po4a
189           will consider preprocessor directives as paragraph separators.
190           This is important if the XML file must be preprocessed because
191           otherwise the directives may be inserted in the middle of lines if
192           po4a consider it belong to the current paragraph, and they won't be
193           recognized by the preprocessor.  Note: the preprocessor directives
194           must only appear between tags (they must not break a tag).
195
196       translated
197           Space-separated list of tags you want to translate.
198
199           The tags must be in the form <aaa>, but you can join some
200           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
201           within another tag (<bbb>).
202
203           You can also specify some tag options by putting some characters in
204           front of the tag hierarchy.  This overrides the default behavior
205           specified by the global wrap and defaulttranslateoption option.
206
207           w   Tags should be translated and content can be re-wrapped.
208
209           W   Tags should be translated and content should not be re-wrapped.
210
211           i   Tags should be translated inline.
212
213           p   Tags should be translated as placeholders.
214
215           Internally, the XML parser only cares about these four options: w W
216           i p.
217
218             * Tags listed in B<break> are set to I<w> or I<W> depending on the <wrap> option.
219             * Tags listed in B<inline> are set to I<i>.
220             * Tags listed in B<placeholder> are set to I<p>.
221             * Tags listed in B<untranslated> are without any of these options set.
222
223           You can verify actual internal parameter behavior by invoking po4a
224           with --debug option.
225
226           Example: W<chapter><title>
227
228           Please note a tag should be listed in either translated or
229           untranslated setting string.
230
231       untranslated
232           Space-separated list of tags you do not want to translate.
233
234           The tags must be in the form <aaa>, but you can join some
235           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
236           within another tag (<bbb>).
237
238           Please note a translatable inline tag in an untranslated tag is
239           treated as a translatable breaking tag, i setting is dropped and w
240           or W is set depending on the <wrap> option.
241
242       defaulttranslateoption
243           The default categories for tags that are not in any of the
244           translated, untranslated, break, inline, or placeholder.
245
246           This is a set of letters as defined in translated and this setting
247           is only valid for translatable tags.
248

WRITING DERIVATIVE MODULES

250   DEFINE WHAT TAGS AND ATTRIBUTES TO TRANSLATE
251       The simplest customization is to define which tags and attributes you
252       want the parser to translate.  This should be done in the initialize
253       function.  First you should call the main initialize, to get the
254       command-line options, and then, append your custom definitions to the
255       options hash.  If you want to treat some new options from command line,
256       you should define them before calling the main initialize:
257
258         $self->{options}{'new_option'}='';
259         $self->SUPER::initialize(%options);
260         $self->{options}{'_default_translated'}.=' <p> <head><title>';
261         $self->{options}{'attributes'}.=' <p>lang id';
262         $self->{options}{'_default_inline'}.=' <br>';
263         $self->treat_options;
264
265       You should use the _default_inline, _default_break,
266       _default_placeholder, _default_translated, _default_untranslated, and
267       _default_attributes options in derivative modules. This allow users to
268       override the default behavior defined in your module with command line
269       options.
270
271   OVERRIDE THE DEFAULT BEHAVIOR WITH COMMAND LINE OPTIONS
272       If you don't like the default behavior of this xml module and its
273       derivative modules, you can provide command line options to change
274       their behavior.
275
276       See Locale::Po4a::Docbook(3pm),
277
278   OVERRIDING THE found_string FUNCTION
279       Another simple step is to override the function "found_string", which
280       receives the extracted strings from the parser, in order to translate
281       them.  There you can control which strings you want to translate, and
282       perform transformations to them before or after the translation itself.
283
284       It receives the extracted text, the reference on where it was, and a
285       hash that contains extra information to control what strings to
286       translate, how to translate them and to generate the comment.
287
288       The content of these options depends on the kind of string it is
289       (specified in an entry of this hash):
290
291       type="tag"
292           The found string is the content of a translatable tag. The entry
293           "tag_options" contains the option characters in front of the tag
294           hierarchy in the module "tags" option.
295
296       type="attribute"
297           Means that the found string is the value of a translatable
298           attribute. The entry "attribute" has the name of the attribute.
299
300       It must return the text that will replace the original in the
301       translated document. Here's a basic example of this function:
302
303         sub found_string {
304           my ($self,$text,$ref,$options)=@_;
305           $text = $self->translate($text,$ref,"type ".$options->{'type'},
306             'wrap'=>$self->{options}{'wrap'});
307           return $text;
308         }
309
310       There's another simple example in the new Dia module, which only
311       filters some strings.
312
313   MODIFYING TAG TYPES (TODO)
314       This is a more complex one, but it enables a (almost) total
315       customization.  It's based on a list of hashes, each one defining a tag
316       type's behavior. The list should be sorted so that the most general
317       tags are after the most concrete ones (sorted first by the beginning
318       and then by the end keys). To define a tag type you'll have to make a
319       hash with the following keys:
320
321       beginning
322           Specifies the beginning of the tag, after the "<".
323
324       end Specifies the end of the tag, before the ">".
325
326       breaking
327           It says if this is a breaking tag class.  A non-breaking (inline)
328           tag is one that can be taken as part of the content of another tag.
329           It can take the values false (0), true (1) or undefined.  If you
330           leave this undefined, you'll have to define the f_breaking function
331           that will say whether a concrete tag of this class is a breaking
332           tag or not.
333
334       f_breaking
335           It's a function that will tell if the next tag is a breaking one or
336           not.  It should be defined if the breaking option is not.
337
338       f_extract
339           If you leave this key undefined, the generic extraction function
340           will have to extract the tag itself.  It's useful for tags that can
341           have other tags or special structures in them, so that the main
342           parser doesn't get mad.  This function receives a boolean that says
343           if the tag should be removed from the input stream or not.
344
345       f_translate
346           This function receives the tag (in the get_string_until() format)
347           and returns the translated tag (translated attributes or all needed
348           transformations) as a single string.
349

INTERNAL FUNCTIONS used to write derivative parsers

351   WORKING WITH TAGS
352       get_path()
353           This function returns the path to the current tag from the
354           document's root, in the form <html><body><p>.
355
356           An additional array of tags (without brackets) can be passed as
357           argument.  These path elements are added to the end of the current
358           path.
359
360       tag_type()
361           This function returns the index from the tag_types list that fits
362           to the next tag in the input stream, or -1 if it's at the end of
363           the input file.
364
365           Here, the tag has structure started by < and end by > and it can
366           contain multiple lines.
367
368           This works on the array "@{$self->{TT}{doc_in}}" holding input
369           document data and reference indirectly via "$self->shiftline()" and
370           "$self->unshiftline($$)".
371
372       extract_tag($$)
373           This function returns the next tag from the input stream without
374           the beginning and end, in an array form, to maintain the references
375           from the input file.  It has two parameters: the type of the tag
376           (as returned by tag_type) and a boolean, that indicates if it
377           should be removed from the input stream.
378
379           This works on the array "@{$self->{TT}{doc_in}}" holding input
380           document data and reference indirectly via "$self->shiftline()" and
381           "$self->unshiftline($$)".
382
383       get_tag_name(@)
384           This function returns the name of the tag passed as an argument, in
385           the array form returned by extract_tag.
386
387       breaking_tag()
388           This function returns a boolean that says if the next tag in the
389           input stream is a breaking tag or not (inline tag).  It leaves the
390           input stream intact.
391
392       treat_tag()
393           This function translates the next tag from the input stream.  Using
394           each tag type's custom translation functions.
395
396           This works on the array "@{$self->{TT}{doc_in}}" holding input
397           document data and reference indirectly via "$self->shiftline()" and
398           "$self->unshiftline($$)".
399
400       tag_in_list($@)
401           This function returns a string value that says if the first
402           argument (a tag hierarchy) matches any of the tags from the second
403           argument (a list of tags or tag hierarchies). If it doesn't match,
404           it returns 0. Else, it returns the matched tag's options (the
405           characters in front of the tag) or 1 (if that tag doesn't have
406           options).
407
408   WORKING WITH ATTRIBUTES
409       treat_attributes(@)
410           This function handles the translation of the tags' attributes. It
411           receives the tag without the beginning / end marks, and then it
412           finds the attributes, and it translates the translatable ones
413           (specified by the module option "attributes").  This returns a
414           plain string with the translated tag.
415
416   WORKING WITH TAGGED CONTENTS
417       treat_content()
418           This function gets the text until the next breaking tag (not
419           inline) from the input stream.  Translate it using each tag type's
420           custom translation functions.
421
422           This works on the array "@{$self->{TT}{doc_in}}" holding input
423           document data and reference indirectly via "$self->shiftline()" and
424           "$self->unshiftline($$)".
425
426   WORKING WITH THE MODULE OPTIONS
427       treat_options()
428           This function fills the internal structures that contain the tags,
429           attributes and inline data with the options of the module
430           (specified in the command-line or in the initialize function).
431
432   GETTING TEXT FROM THE INPUT DOCUMENT
433       get_string_until($%)
434           This function returns an array with the lines (and references) from
435           the input document until it finds the first argument.  The second
436           argument is an options hash. Value 0 means disabled (the default)
437           and 1, enabled.
438
439           The valid options are:
440
441           include
442               This makes the returned array to contain the searched text
443
444           remove
445               This removes the returned stream from the input
446
447           unquoted
448               This ensures that the searched text is outside any quotes
449
450           regex
451               This denotes that the first argument is a regular expression
452               rather than an plain string
453
454       skip_spaces(\@)
455           This function receives as argument the reference to a paragraph (in
456           the format returned by get_string_until), skips his heading spaces
457           and returns them as a simple string.
458
459       join_lines(@)
460           This function returns a simple string with the text from the
461           argument array (discarding the references).
462

STATUS OF THIS MODULE

464       This module can translate tags and attributes.
465

TODO LIST

467       DOCTYPE (ENTITIES)
468
469       There is a minimal support for the translation of entities. They are
470       translated as a whole, and tags are not taken into account. Multilines
471       entities are not supported and entities are always rewrapped during the
472       translation.
473
474       MODIFY TAG TYPES FROM INHERITED MODULES (move the tag_types structure
475       inside the $self hash?)
476

SEE ALSO

478       Locale::Po4a::TransTractor(3pm), po4a(7)
479

AUTHORS

481        Jordi Vilalta <jvprat@gmail.com>
482        Nicolas François <nicolas.francois@centraliens.net>
483
485        Copyright © 2004 Jordi Vilalta  <jvprat@gmail.com>
486        Copyright © 2008-2009 Nicolas François <nicolas.francois@centraliens.net>
487
488       This program is free software; you may redistribute it and/or modify it
489       under the terms of GPL (see the COPYING file).
490
491
492
493Po4a Tools                        2021-11-01            Locale::Po4a::Xml(3pm)
Impressum