1Locale::Po4a::Xml(3pm)            Po4a Tools            Locale::Po4a::Xml(3pm)
2
3
4

NAME

6       Locale::Po4a::Xml - convert XML documents and derivates from/to PO
7       files
8

DESCRIPTION

10       The po4a (PO for anything) project goal is to ease translations (and
11       more interestingly, the maintenance of translations) using gettext
12       tools on areas where they were not expected like documentation.
13
14       Locale::Po4a::Xml is a module to help the translation of XML documents
15       into other [human] languages. It can also be used as a base to build
16       modules for XML-based documents.
17

TRANSLATING WITH PO4A::XML

19       This module can be used directly to handle generic XML documents.  This
20       will extract all tag's content, and no attributes, since it's where the
21       text is written in most XML based documents.
22
23       There are some options (described in the next section) that can
24       customize this behavior.  If this doesn't fit to your document format
25       you're encouraged to write your own module derived from this, to
26       describe your format's details.  See the section WRITING DERIVATE
27       MODULES below, for the process description.
28

OPTIONS ACCEPTED BY THIS MODULE

30       The global debug option causes this module to show the excluded
31       strings, in order to see if it skips something important.
32
33       These are this module's particular options:
34
35       nostrip
36           Prevents it to strip the spaces around the extracted strings.
37
38       wrap
39           Canonicalizes the string to translate, considering that whitespaces
40           are not important, and wraps the translated document. This option
41           can be overridden by custom tag options. See the "tags" option
42           below.
43
44       unwrap_attributes
45           Attributes are wrapped by default. This option disables wrapping.
46
47       caseinsensitive
48           It makes the tags and attributes searching to work in a case
49           insensitive way.  If it's defined, it will treat <BooK>laNG and
50           <BOOK>Lang as <book>lang.
51
52       escapequotes
53           Escape quotes in output strings.  Necessary, for example, for
54           creating string resources for use by Android build tools.
55
56           See also:
57           https://developer.android.com/guide/topics/resources/string-resource.html
58
59       includeexternal
60           When defined, external entities are included in the generated
61           (translated) document, and for the extraction of strings.  If it's
62           not defined, you will have to translate external entities
63           separately as independent documents.
64
65       ontagerror
66           This option defines the behavior of the module when it encounters
67           invalid XML syntax (a closing tag which does not match the last
68           opening tag, or a tag's attribute without value).  It can take the
69           following values:
70
71           fail
72               This is the default value.  The module will exit with an error.
73
74           warn
75               The module will continue, and will issue a warning.
76
77           silent
78               The module will continue without any warnings.
79
80           Be careful when using this option.  It is generally recommended to
81           fix the input file.
82
83       tagsonly
84           Extracts only the specified tags in the "tags" option.  Otherwise,
85           it will extract all the tags except the ones specified.
86
87           Note: This option is deprecated.
88
89       doctype
90           String that will try to match with the first line of the document's
91           doctype (if defined). If it doesn't, a warning will indicate that
92           the document might be of a bad type.
93
94       addlang
95           String indicating the path (e.g. <bbb><aaa>) of a tag where a
96           lang="..." attribute shall be added. The language will be defined
97           as the basename of the PO file without any .po extension.
98
99       tags
100           Space-separated list of tags you want to translate or skip.  By
101           default, the specified tags will be excluded, but if you use the
102           "tagsonly" option, the specified tags will be the only ones
103           included.  The tags must be in the form <aaa>, but you can join
104           some (<bbb><aaa>) to say that the content of the tag <aaa> will
105           only be translated when it's into a <bbb> tag.
106
107           You can also specify some tag options by putting some characters in
108           front of the tag hierarchy. For example, you can put 'w' (wrap) or
109           'W' (don't wrap) to override the default behavior specified by the
110           global "wrap" option.
111
112           Example: W<chapter><title>
113
114           Note: This option is deprecated.  You should use the translated and
115           untranslated options instead.
116
117       attributes
118           Space-separated list of tag's attributes you want to translate.
119           You can specify the attributes by their name (for example, "lang"),
120           but you can prefix it with a tag hierarchy, to specify that this
121           attribute will only be translated when it's in the specified tag.
122           For example: <bbb><aaa>lang specifies that the lang attribute will
123           only be translated if it's in an <aaa> tag, and it's in a <bbb>
124           tag.
125
126       foldattributes
127           Do not translate attributes in inline tags.  Instead, replace all
128           attributes of a tag by po4a-id=<id>.
129
130           This is useful when attributes shall not be translated, as this
131           simplifies the strings for translators, and avoids typos.
132
133       customtag
134           Space-separated list of tags which should not be treated as tags.
135           These tags are treated as inline, and do not need to be closed.
136
137       break
138           Space-separated list of tags which should break the sequence.  By
139           default, all tags break the sequence.
140
141           The tags must be in the form <aaa>, but you can join some
142           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
143           within another tag (<bbb>).
144
145       inline
146           Space-separated list of tags which should be treated as inline.  By
147           default, all tags break the sequence.
148
149           The tags must be in the form <aaa>, but you can join some
150           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
151           within another tag (<bbb>).
152
153       placeholder
154           Space-separated list of tags which should be treated as
155           placeholders.  Placeholders do not break the sequence, but the
156           content of placeholders is translated separately.
157
158           The location of the placeholder in its block will be marked with a
159           string similar to:
160
161             <placeholder type=\"footnote\" id=\"0\"/>
162
163           The tags must be in the form <aaa>, but you can join some
164           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
165           within another tag (<bbb>).
166
167       nodefault
168           Space separated list of tags that the module should not try to set
169           by default in any category.
170
171       cpp Support C preprocessor directives.  When this option is set, po4a
172           will consider preprocessor directives as paragraph separators.
173           This is important if the XML file must be preprocessed because
174           otherwise the directives may be inserted in the middle of lines if
175           po4a consider it belong to the current paragraph, and they won't be
176           recognized by the preprocessor.  Note: the preprocessor directives
177           must only appear between tags (they must not break a tag).
178
179       translated
180           Space-separated list of tags you want to translate.
181
182           The tags must be in the form <aaa>, but you can join some
183           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
184           within another tag (<bbb>).
185
186           You can also specify some tag options by putting some characters in
187           front of the tag hierarchy. For example, you can put 'w' (wrap) or
188           'W' (don't wrap) to override the default behavior specified by the
189           global "wrap" option.
190
191           Example: W<chapter><title>
192
193       untranslated
194           Space-separated list of tags you do not want to translate.
195
196           The tags must be in the form <aaa>, but you can join some
197           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
198           within another tag (<bbb>).
199
200       defaulttranslateoption
201           The default categories for tags that are not in any of the
202           translated, untranslated, break, inline, or placeholder.
203
204           This is a set of letters:
205
206           w   Tags should be translated and content can be re-wrapped.
207
208           W   Tags should be translated and content should not be re-wrapped.
209
210           i   Tags should be translated inline.
211
212           p   Tags should be translated as placeholders.
213

WRITING DERIVATE MODULES

215   DEFINE WHAT TAGS AND ATTRIBUTES TO TRANSLATE
216       The simplest customization is to define which tags and attributes you
217       want the parser to translate.  This should be done in the initialize
218       function.  First you should call the main initialize, to get the
219       command-line options, and then, append your custom definitions to the
220       options hash.  If you want to treat some new options from command line,
221       you should define them before calling the main initialize:
222
223         $self->{options}{'new_option'}='';
224         $self->SUPER::initialize(%options);
225         $self->{options}{'_default_translated'}.=' <p> <head><title>';
226         $self->{options}{'attributes'}.=' <p>lang id';
227         $self->{options}{'_default_inline'}.=' <br>';
228         $self->treat_options;
229
230       You should use the _default_inline, _default_break,
231       _default_placeholder, _default_translated, _default_untranslated, and
232       _default_attributes options in derivated modules. This allow users to
233       override the default behavior defined in your module with command line
234       options.
235
236   OVERRIDING THE found_string FUNCTION
237       Another simple step is to override the function "found_string", which
238       receives the extracted strings from the parser, in order to translate
239       them.  There you can control which strings you want to translate, and
240       perform transformations to them before or after the translation itself.
241
242       It receives the extracted text, the reference on where it was, and a
243       hash that contains extra information to control what strings to
244       translate, how to translate them and to generate the comment.
245
246       The content of these options depends on the kind of string it is
247       (specified in an entry of this hash):
248
249       type="tag"
250           The found string is the content of a translatable tag. The entry
251           "tag_options" contains the option characters in front of the tag
252           hierarchy in the module "tags" option.
253
254       type="attribute"
255           Means that the found string is the value of a translatable
256           attribute. The entry "attribute" has the name of the attribute.
257
258       It must return the text that will replace the original in the
259       translated document. Here's a basic example of this function:
260
261         sub found_string {
262           my ($self,$text,$ref,$options)=@_;
263           $text = $self->translate($text,$ref,"type ".$options->{'type'},
264             'wrap'=>$self->{options}{'wrap'});
265           return $text;
266         }
267
268       There's another simple example in the new Dia module, which only
269       filters some strings.
270
271   MODIFYING TAG TYPES (TODO)
272       This is a more complex one, but it enables a (almost) total
273       customization.  It's based on a list of hashes, each one defining a tag
274       type's behavior. The list should be sorted so that the most general
275       tags are after the most concrete ones (sorted first by the beginning
276       and then by the end keys). To define a tag type you'll have to make a
277       hash with the following keys:
278
279       beginning
280           Specifies the beginning of the tag, after the "<".
281
282       end Specifies the end of the tag, before the ">".
283
284       breaking
285           It says if this is a breaking tag class.  A non-breaking (inline)
286           tag is one that can be taken as part of the content of another tag.
287           It can take the values false (0), true (1) or undefined.  If you
288           leave this undefined, you'll have to define the f_breaking function
289           that will say whether a concrete tag of this class is a breaking
290           tag or not.
291
292       f_breaking
293           It's a function that will tell if the next tag is a breaking one or
294           not.  It should be defined if the breaking option is not.
295
296       f_extract
297           If you leave this key undefined, the generic extraction function
298           will have to extract the tag itself.  It's useful for tags that can
299           have other tags or special structures in them, so that the main
300           parser doesn't get mad.  This function receives a boolean that says
301           if the tag should be removed from the input stream or not.
302
303       f_translate
304           This function receives the tag (in the get_string_until() format)
305           and returns the translated tag (translated attributes or all needed
306           transformations) as a single string.
307

INTERNAL FUNCTIONS used to write derivated parsers

309   WORKING WITH TAGS
310       get_path()
311           This function returns the path to the current tag from the
312           document's root, in the form <html><body><p>.
313
314           An additional array of tags (without brackets) can be passed as
315           argument.  These path elements are added to the end of the current
316           path.
317
318       tag_type()
319           This function returns the index from the tag_types list that fits
320           to the next tag in the input stream, or -1 if it's at the end of
321           the input file.
322
323       extract_tag($$)
324           This function returns the next tag from the input stream without
325           the beginning and end, in an array form, to maintain the references
326           from the input file.  It has two parameters: the type of the tag
327           (as returned by tag_type) and a boolean, that indicates if it
328           should be removed from the input stream.
329
330       get_tag_name(@)
331           This function returns the name of the tag passed as an argument, in
332           the array form returned by extract_tag.
333
334       breaking_tag()
335           This function returns a boolean that says if the next tag in the
336           input stream is a breaking tag or not (inline tag).  It leaves the
337           input stream intact.
338
339       treat_tag()
340           This function translates the next tag from the input stream.  Using
341           each tag type's custom translation functions.
342
343       tag_in_list($@)
344           This function returns a string value that says if the first
345           argument (a tag hierarchy) matches any of the tags from the second
346           argument (a list of tags or tag hierarchies). If it doesn't match,
347           it returns 0. Else, it returns the matched tag's options (the
348           characters in front of the tag) or 1 (if that tag doesn't have
349           options).
350
351   WORKING WITH ATTRIBUTES
352       treat_attributes(@)
353           This function handles the translation of the tags' attributes. It
354           receives the tag without the beginning / end marks, and then it
355           finds the attributes, and it translates the translatable ones
356           (specified by the module option "attributes").  This returns a
357           plain string with the translated tag.
358
359   WORKING WITH THE MODULE OPTIONS
360       treat_options()
361           This function fills the internal structures that contain the tags,
362           attributes and inline data with the options of the module
363           (specified in the command-line or in the initialize function).
364
365   GETTING TEXT FROM THE INPUT DOCUMENT
366       get_string_until($%)
367           This function returns an array with the lines (and references) from
368           the input document until it finds the first argument.  The second
369           argument is an options hash. Value 0 means disabled (the default)
370           and 1, enabled.
371
372           The valid options are:
373
374           include
375               This makes the returned array to contain the searched text
376
377           remove
378               This removes the returned stream from the input
379
380           unquoted
381               This ensures that the searched text is outside any quotes
382
383       skip_spaces(\@)
384           This function receives as argument the reference to a paragraph (in
385           the format returned by get_string_until), skips his heading spaces
386           and returns them as a simple string.
387
388       join_lines(@)
389           This function returns a simple string with the text from the
390           argument array (discarding the references).
391

STATUS OF THIS MODULE

393       This module can translate tags and attributes.
394

TODO LIST

396       DOCTYPE (ENTITIES)
397
398       There is a minimal support for the translation of entities. They are
399       translated as a whole, and tags are not taken into account. Multilines
400       entities are not supported and entities are always rewrapped during the
401       translation.
402
403       MODIFY TAG TYPES FROM INHERITED MODULES (move the tag_types structure
404       inside the $self hash?)
405

SEE ALSO

407       Locale::Po4a::TransTractor(3pm), po4a(7)
408

AUTHORS

410        Jordi Vilalta <jvprat@gmail.com>
411        Nicolas François <nicolas.francois@centraliens.net>
412
414        Copyright (c) 2004 by Jordi Vilalta  <jvprat@gmail.com>
415        Copyright (c) 2008-2009 by Nicolas François <nicolas.francois@centraliens.net>
416
417       This program is free software; you may redistribute it and/or modify it
418       under the terms of GPL (see the COPYING file).
419
420
421
422Po4a Tools                        2019-02-02            Locale::Po4a::Xml(3pm)
Impressum