1Locale::Po4a::Xml(3pm)            Po4a Tools            Locale::Po4a::Xml(3pm)
2
3
4

NAME

6       Locale::Po4a::Xml - convert XML documents and derivates from/to PO
7       files
8

DESCRIPTION

10       The po4a (PO for anything) project goal is to ease translations (and
11       more interestingly, the maintenance of translations) using gettext
12       tools on areas where they were not expected like documentation.
13
14       Locale::Po4a::Xml is a module to help the translation of XML documents
15       into other [human] languages. It can also be used as a base to build
16       modules for XML-based documents.
17

TRANSLATING WITH PO4A::XML

19       This module can be used directly to handle generic XML documents.  This
20       will extract all tag's content, and no attributes, since it's where the
21       text is written in most XML based documents.
22
23       There are some options (described in the next section) that can
24       customize this behavior.  If this doesn't fit to your document format
25       you're encouraged to write your own module derived from this, to
26       describe your format's details.  See the section WRITING DERIVATE
27       MODULES below, for the process description.
28

OPTIONS ACCEPTED BY THIS MODULE

30       The global debug option causes this module to show the excluded
31       strings, in order to see if it skips something important.
32
33       These are this module's particular options:
34
35       nostrip
36           Prevents it to strip the spaces around the extracted strings.
37
38       wrap
39           Canonicalizes the string to translate, considering that whitespaces
40           are not important, and wraps the translated document. This option
41           can be overridden by custom tag options. See the translated option
42           below.
43
44       unwrap_attributes
45           Attributes are wrapped by default. This option disables wrapping.
46
47       caseinsensitive
48           It makes the tags and attributes searching to work in a case
49           insensitive way.  If it's defined, it will treat <BooK>laNG and
50           <BOOK>Lang as <book>lang.
51
52       escapequotes
53           Escape quotes in output strings.  Necessary, for example, for
54           creating string resources for use by Android build tools.
55
56           See also:
57           https://developer.android.com/guide/topics/resources/string-resource.html
58
59       includeexternal
60           When defined, external entities are included in the generated
61           (translated) document, and for the extraction of strings.  If it's
62           not defined, you will have to translate external entities
63           separately as independent documents.
64
65       ontagerror
66           This option defines the behavior of the module when it encounters
67           invalid XML syntax (a closing tag which does not match the last
68           opening tag, or a tag's attribute without value).  It can take the
69           following values:
70
71           fail
72               This is the default value.  The module will exit with an error.
73
74           warn
75               The module will continue, and will issue a warning.
76
77           silent
78               The module will continue without any warnings.
79
80           Be careful when using this option.  It is generally recommended to
81           fix the input file.
82
83       tagsonly
84           Note: This option is deprecated.
85
86           Extracts only the specified tags in the "tags" option.  Otherwise,
87           it will extract all the tags except the ones specified.
88
89       doctype
90           String that will try to match with the first line of the document's
91           doctype (if defined). If it doesn't, a warning will indicate that
92           the document might be of a bad type.
93
94       addlang
95           String indicating the path (e.g. <bbb><aaa>) of a tag where a
96           lang="..." attribute shall be added. The language will be defined
97           as the basename of the PO file without any .po extension.
98
99       tags
100           Note: This option is deprecated.  You should use the translated and
101           untranslated options instead.
102
103           Space-separated list of tags you want to translate or skip.  By
104           default, the specified tags will be excluded, but if you use the
105           "tagsonly" option, the specified tags will be the only ones
106           included.  The tags must be in the form <aaa>, but you can join
107           some (<bbb><aaa>) to say that the content of the tag <aaa> will
108           only be translated when it's into a <bbb> tag.
109
110           You can also specify some tag options by putting some characters in
111           front of the tag hierarchy. For example, you can put 'w' (wrap) or
112           'W' (don't wrap) to override the default behavior specified by the
113           global "wrap" option.
114
115           Example: W<chapter><title>
116
117       attributes
118           Space-separated list of tag's attributes you want to translate.
119           You can specify the attributes by their name (for example, "lang"),
120           but you can prefix it with a tag hierarchy, to specify that this
121           attribute will only be translated when it's in the specified tag.
122           For example: <bbb><aaa>lang specifies that the lang attribute will
123           only be translated if it's in an <aaa> tag, and it's in a <bbb>
124           tag.
125
126       foldattributes
127           Do not translate attributes in inline tags.  Instead, replace all
128           attributes of a tag by po4a-id=<id>.
129
130           This is useful when attributes shall not be translated, as this
131           simplifies the strings for translators, and avoids typos.
132
133       customtag
134           Space-separated list of tags which should not be treated as tags.
135           These tags are treated as inline, and do not need to be closed.
136
137       break
138           Space-separated list of tags which should break the sequence.  By
139           default, all tags break the sequence.
140
141           The tags must be in the form <aaa>, but you can join some
142           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
143           within another tag (<bbb>).
144
145           Please note a tag should be listed in only one of the break, inline
146           placeholder, or customtag setting string.
147
148       inline
149           Space-separated list of tags which should be treated as inline.  By
150           default, all tags break the sequence.
151
152           The tags must be in the form <aaa>, but you can join some
153           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
154           within another tag (<bbb>).
155
156       placeholder
157           Space-separated list of tags which should be treated as
158           placeholders.  Placeholders do not break the sequence, but the
159           content of placeholders is translated separately.
160
161           The location of the placeholder in its block will be marked with a
162           string similar to:
163
164             <placeholder type=\"footnote\" id=\"0\"/>
165
166           The tags must be in the form <aaa>, but you can join some
167           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
168           within another tag (<bbb>).
169
170       nodefault
171           Space separated list of tags that the module should not try to set
172           by default in any category.
173
174           If you have a tag which has its default setting by the subclass of
175           this module but you want to set alternative setting, you need to
176           list that tag as a part of the nodefault setting string.
177
178       cpp Support C preprocessor directives.  When this option is set, po4a
179           will consider preprocessor directives as paragraph separators.
180           This is important if the XML file must be preprocessed because
181           otherwise the directives may be inserted in the middle of lines if
182           po4a consider it belong to the current paragraph, and they won't be
183           recognized by the preprocessor.  Note: the preprocessor directives
184           must only appear between tags (they must not break a tag).
185
186       translated
187           Space-separated list of tags you want to translate.
188
189           The tags must be in the form <aaa>, but you can join some
190           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
191           within another tag (<bbb>).
192
193           You can also specify some tag options by putting some characters in
194           front of the tag hierarchy.  This overrides the default behavior
195           specified by the global wrap and defaulttranslateoption option.
196
197           w   Tags should be translated and content can be re-wrapped.
198
199           W   Tags should be translated and content should not be re-wrapped.
200
201           i   Tags should be translated inline.
202
203           p   Tags should be translated as placeholders.
204
205           Internally, the XML parser only cares about these four options: w W
206           i p.
207
208             * Tags listed in B<break> are set to I<w> or I<W> depending on the <wrap> option.
209             * Tags listed in B<inline> are set to I<i>.
210             * Tags listed in B<placeholder> are set to I<p>.
211             * Tags listed in B<untranslated> are without any of these options set.
212
213           You can verify actual internal parameter behavior by invoking po4a
214           with --debug option.
215
216           Example: W<chapter><title>
217
218           Please note a tag should be listed in either translated or
219           untranslated setting string.
220
221       untranslated
222           Space-separated list of tags you do not want to translate.
223
224           The tags must be in the form <aaa>, but you can join some
225           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
226           within another tag (<bbb>).
227
228           Please note a translatable inline tag in an untranslated tag is
229           treated as a translatable breaking tag, i setting is dropped and w
230           or W is set depending on the <wrap> option.
231
232       defaulttranslateoption
233           The default categories for tags that are not in any of the
234           translated, untranslated, break, inline, or placeholder.
235
236           This is a set of letters as defined in translated and this setting
237           is only valid for translatable tags.
238

WRITING DERIVATIVE MODULES

240   DEFINE WHAT TAGS AND ATTRIBUTES TO TRANSLATE
241       The simplest customization is to define which tags and attributes you
242       want the parser to translate.  This should be done in the initialize
243       function.  First you should call the main initialize, to get the
244       command-line options, and then, append your custom definitions to the
245       options hash.  If you want to treat some new options from command line,
246       you should define them before calling the main initialize:
247
248         $self->{options}{'new_option'}='';
249         $self->SUPER::initialize(%options);
250         $self->{options}{'_default_translated'}.=' <p> <head><title>';
251         $self->{options}{'attributes'}.=' <p>lang id';
252         $self->{options}{'_default_inline'}.=' <br>';
253         $self->treat_options;
254
255       You should use the _default_inline, _default_break,
256       _default_placeholder, _default_translated, _default_untranslated, and
257       _default_attributes options in derivative modules. This allow users to
258       override the default behavior defined in your module with command line
259       options.
260
261   OVERRIDE THE DEFAULT BEHAVIOR WITH COMMAND LINE OPTIONS
262       If you don't like the default behavior of this xml module and its
263       derivative modules, you can provide command line options to change
264       their behavior.
265
266       See Locale::Po4a::Docbook(3pm),
267
268   OVERRIDING THE found_string FUNCTION
269       Another simple step is to override the function "found_string", which
270       receives the extracted strings from the parser, in order to translate
271       them.  There you can control which strings you want to translate, and
272       perform transformations to them before or after the translation itself.
273
274       It receives the extracted text, the reference on where it was, and a
275       hash that contains extra information to control what strings to
276       translate, how to translate them and to generate the comment.
277
278       The content of these options depends on the kind of string it is
279       (specified in an entry of this hash):
280
281       type="tag"
282           The found string is the content of a translatable tag. The entry
283           "tag_options" contains the option characters in front of the tag
284           hierarchy in the module "tags" option.
285
286       type="attribute"
287           Means that the found string is the value of a translatable
288           attribute. The entry "attribute" has the name of the attribute.
289
290       It must return the text that will replace the original in the
291       translated document. Here's a basic example of this function:
292
293         sub found_string {
294           my ($self,$text,$ref,$options)=@_;
295           $text = $self->translate($text,$ref,"type ".$options->{'type'},
296             'wrap'=>$self->{options}{'wrap'});
297           return $text;
298         }
299
300       There's another simple example in the new Dia module, which only
301       filters some strings.
302
303   MODIFYING TAG TYPES (TODO)
304       This is a more complex one, but it enables a (almost) total
305       customization.  It's based on a list of hashes, each one defining a tag
306       type's behavior. The list should be sorted so that the most general
307       tags are after the most concrete ones (sorted first by the beginning
308       and then by the end keys). To define a tag type you'll have to make a
309       hash with the following keys:
310
311       beginning
312           Specifies the beginning of the tag, after the "<".
313
314       end Specifies the end of the tag, before the ">".
315
316       breaking
317           It says if this is a breaking tag class.  A non-breaking (inline)
318           tag is one that can be taken as part of the content of another tag.
319           It can take the values false (0), true (1) or undefined.  If you
320           leave this undefined, you'll have to define the f_breaking function
321           that will say whether a concrete tag of this class is a breaking
322           tag or not.
323
324       f_breaking
325           It's a function that will tell if the next tag is a breaking one or
326           not.  It should be defined if the breaking option is not.
327
328       f_extract
329           If you leave this key undefined, the generic extraction function
330           will have to extract the tag itself.  It's useful for tags that can
331           have other tags or special structures in them, so that the main
332           parser doesn't get mad.  This function receives a boolean that says
333           if the tag should be removed from the input stream or not.
334
335       f_translate
336           This function receives the tag (in the get_string_until() format)
337           and returns the translated tag (translated attributes or all needed
338           transformations) as a single string.
339

INTERNAL FUNCTIONS used to write derivative parsers

341   WORKING WITH TAGS
342       get_path()
343           This function returns the path to the current tag from the
344           document's root, in the form <html><body><p>.
345
346           An additional array of tags (without brackets) can be passed as
347           argument.  These path elements are added to the end of the current
348           path.
349
350       tag_type()
351           This function returns the index from the tag_types list that fits
352           to the next tag in the input stream, or -1 if it's at the end of
353           the input file.
354
355           Here, the tag has structure started by < and end by > and it can
356           contain multiple lines.
357
358           This works on the array "@{$self->{TT}{doc_in}}" holding input
359           document data and reference indirectly via "$self->shiftline()" and
360           "$self->unshiftline($$)".
361
362       extract_tag($$)
363           This function returns the next tag from the input stream without
364           the beginning and end, in an array form, to maintain the references
365           from the input file.  It has two parameters: the type of the tag
366           (as returned by tag_type) and a boolean, that indicates if it
367           should be removed from the input stream.
368
369           This works on the array "@{$self->{TT}{doc_in}}" holding input
370           document data and reference indirectly via "$self->shiftline()" and
371           "$self->unshiftline($$)".
372
373       get_tag_name(@)
374           This function returns the name of the tag passed as an argument, in
375           the array form returned by extract_tag.
376
377       breaking_tag()
378           This function returns a boolean that says if the next tag in the
379           input stream is a breaking tag or not (inline tag).  It leaves the
380           input stream intact.
381
382       treat_tag()
383           This function translates the next tag from the input stream.  Using
384           each tag type's custom translation functions.
385
386           This works on the array "@{$self->{TT}{doc_in}}" holding input
387           document data and reference indirectly via "$self->shiftline()" and
388           "$self->unshiftline($$)".
389
390       tag_in_list($@)
391           This function returns a string value that says if the first
392           argument (a tag hierarchy) matches any of the tags from the second
393           argument (a list of tags or tag hierarchies). If it doesn't match,
394           it returns 0. Else, it returns the matched tag's options (the
395           characters in front of the tag) or 1 (if that tag doesn't have
396           options).
397
398   WORKING WITH ATTRIBUTES
399       treat_attributes(@)
400           This function handles the translation of the tags' attributes. It
401           receives the tag without the beginning / end marks, and then it
402           finds the attributes, and it translates the translatable ones
403           (specified by the module option "attributes").  This returns a
404           plain string with the translated tag.
405
406   WORKING WITH TAGGED CONTENTS
407       treat_content()
408           This function gets the text until the next breaking tag (not
409           inline) from the input stream.  Translate it using each tag type's
410           custom translation functions.
411
412           This works on the array "@{$self->{TT}{doc_in}}" holding input
413           document data and reference indirectly via "$self->shiftline()" and
414           "$self->unshiftline($$)".
415
416   WORKING WITH THE MODULE OPTIONS
417       treat_options()
418           This function fills the internal structures that contain the tags,
419           attributes and inline data with the options of the module
420           (specified in the command-line or in the initialize function).
421
422   GETTING TEXT FROM THE INPUT DOCUMENT
423       get_string_until($%)
424           This function returns an array with the lines (and references) from
425           the input document until it finds the first argument.  The second
426           argument is an options hash. Value 0 means disabled (the default)
427           and 1, enabled.
428
429           The valid options are:
430
431           include
432               This makes the returned array to contain the searched text
433
434           remove
435               This removes the returned stream from the input
436
437           unquoted
438               This ensures that the searched text is outside any quotes
439
440       skip_spaces(\@)
441           This function receives as argument the reference to a paragraph (in
442           the format returned by get_string_until), skips his heading spaces
443           and returns them as a simple string.
444
445       join_lines(@)
446           This function returns a simple string with the text from the
447           argument array (discarding the references).
448

STATUS OF THIS MODULE

450       This module can translate tags and attributes.
451

TODO LIST

453       DOCTYPE (ENTITIES)
454
455       There is a minimal support for the translation of entities. They are
456       translated as a whole, and tags are not taken into account. Multilines
457       entities are not supported and entities are always rewrapped during the
458       translation.
459
460       MODIFY TAG TYPES FROM INHERITED MODULES (move the tag_types structure
461       inside the $self hash?)
462

SEE ALSO

464       Locale::Po4a::TransTractor(3pm), po4a(7)
465

AUTHORS

467        Jordi Vilalta <jvprat@gmail.com>
468        Nicolas François <nicolas.francois@centraliens.net>
469
471        Copyright © 2004 Jordi Vilalta  <jvprat@gmail.com>
472        Copyright © 2008-2009 Nicolas François <nicolas.francois@centraliens.net>
473
474       This program is free software; you may redistribute it and/or modify it
475       under the terms of GPL (see the COPYING file).
476
477
478
479Po4a Tools                        2020-01-30            Locale::Po4a::Xml(3pm)
Impressum