1Locale::Po4a::Xml(3pm) Po4a Tools Locale::Po4a::Xml(3pm)
2
3
4
6 Locale::Po4a::Xml - convert XML documents and derivates from/to PO
7 files
8
10 The po4a (PO for anything) project goal is to ease translations (and
11 more interestingly, the maintenance of translations) using gettext
12 tools on areas where they were not expected like documentation.
13
14 Locale::Po4a::Xml is a module to help the translation of XML documents
15 into other [human] languages. It can also be used as a base to build
16 modules for XML-based documents.
17
19 This module can be used directly to handle generic XML documents. This
20 will extract all tag's content, and no attributes, since it's where the
21 text is written in most XML based documents.
22
23 There are some options (described in the next section) that can
24 customize this behavior. If this doesn't fit to your document format
25 you're encouraged to write your own module derived from this, to
26 describe your format's details. See the section WRITING DERIVATE
27 MODULES below, for the process description.
28
30 The global debug option causes this module to show the excluded
31 strings, in order to see if it skips something important.
32
33 These are this module's particular options:
34
35 nostrip
36 Prevents it to strip the spaces around the extracted strings.
37
38 wrap
39 Canonicalizes the string to translate, considering that whitespaces
40 are not important, and wraps the translated document. This option
41 can be overridden by custom tag options. See the translated option
42 below.
43
44 unwrap_attributes
45 Attributes are wrapped by default. This option disables wrapping.
46
47 caseinsensitive
48 It makes the tags and attributes searching to work in a case
49 insensitive way. If it's defined, it will treat <BooK>laNG and
50 <BOOK>Lang as <book>lang.
51
52 escapequotes
53 Escape quotes in output strings. Necessary, for example, for
54 creating string resources for use by Android build tools.
55
56 See also:
57 https://developer.android.com/guide/topics/resources/string-resource.html
58
59 includeexternal
60 When defined, external entities are included in the generated
61 (translated) document, and for the extraction of strings. If it's
62 not defined, you will have to translate external entities
63 separately as independent documents.
64
65 ontagerror
66 This option defines the behavior of the module when it encounters
67 invalid XML syntax (a closing tag which does not match the last
68 opening tag). It can take the following values:
69
70 fail
71 This is the default value. The module will exit with an error.
72
73 warn
74 The module will continue, and will issue a warning.
75
76 silent
77 The module will continue without any warnings.
78
79 Be careful when using this option. It is generally recommended to
80 fix the input file.
81
82 tagsonly
83 Note: This option is deprecated.
84
85 Extracts only the specified tags in the tags option. Otherwise, it
86 will extract all the tags except the ones specified.
87
88 doctype
89 String that will try to match with the first line of the document's
90 doctype (if defined). If it doesn't, a warning will indicate that
91 the document might be of a bad type.
92
93 addlang
94 String indicating the path (e.g. <bbb><aaa>) of a tag where a
95 lang="..." attribute shall be added. The language will be defined
96 as the basename of the PO file without any .po extension.
97
98 optionalclosingtag
99 Boolean indicating whether closing tags are optional (as in HTML).
100 By default, missing closing tags raise an error handled according
101 to ontagerror.
102
103 tags
104 Note: This option is deprecated. You should use the translated and
105 untranslated options instead.
106
107 Space-separated list of tags you want to translate or skip. By
108 default, the specified tags will be excluded, but if you use the
109 "tagsonly" option, the specified tags will be the only ones
110 included. The tags must be in the form <aaa>, but you can join
111 some (<bbb><aaa>) to say that the content of the tag <aaa> will
112 only be translated when it's into a <bbb> tag.
113
114 You can also specify some tag options by putting some characters in
115 front of the tag hierarchy. For example, you can put w (wrap) or W
116 (don't wrap) to override the default behavior specified by the
117 global wrap option.
118
119 Example: W<chapter><title>
120
121 attributes
122 Space-separated list of tag's attributes you want to translate.
123 You can specify the attributes by their name (for example, "lang"),
124 but you can prefix it with a tag hierarchy, to specify that this
125 attribute will only be translated when it's in the specified tag.
126 For example: <bbb><aaa>lang specifies that the lang attribute will
127 only be translated if it's in an <aaa> tag, and it's in a <bbb>
128 tag.
129
130 foldattributes
131 Do not translate attributes in inline tags. Instead, replace all
132 attributes of a tag by po4a-id=<id>.
133
134 This is useful when attributes shall not be translated, as this
135 simplifies the strings for translators, and avoids typos.
136
137 customtag
138 Space-separated list of tags which should not be treated as tags.
139 These tags are treated as inline, and do not need to be closed.
140
141 break
142 Space-separated list of tags which should break the sequence. By
143 default, all tags break the sequence.
144
145 The tags must be in the form <aaa>, but you can join some
146 (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
147 within another tag (<bbb>).
148
149 Please note a tag should be listed in only one of the break, inline
150 placeholder, or customtag setting string.
151
152 inline
153 Space-separated list of tags which should be treated as inline. By
154 default, all tags break the sequence.
155
156 The tags must be in the form <aaa>, but you can join some
157 (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
158 within another tag (<bbb>).
159
160 placeholder
161 Space-separated list of tags which should be treated as
162 placeholders. Placeholders do not break the sequence, but the
163 content of placeholders is translated separately.
164
165 The location of the placeholder in its block will be marked with a
166 string similar to:
167
168 <placeholder type=\"footnote\" id=\"0\"/>
169
170 The tags must be in the form <aaa>, but you can join some
171 (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
172 within another tag (<bbb>).
173
174 break-pi
175 By default, Processing Instructions (i.e., "<? ... ?>" tags) are
176 handled as inline tags. Pass this option if you want the PI to be
177 handled as breaking tag. Note that unprocessed PHP tags are
178 handled as Processing Instructions by the parser.
179
180 nodefault
181 Space separated list of tags that the module should not try to set
182 by default in any category.
183
184 If you have a tag which has its default setting by the subclass of
185 this module but you want to set alternative setting, you need to
186 list that tag as a part of the nodefault setting string.
187
188 cpp Support C preprocessor directives. When this option is set, po4a
189 will consider preprocessor directives as paragraph separators.
190 This is important if the XML file must be preprocessed because
191 otherwise the directives may be inserted in the middle of lines if
192 po4a consider it belong to the current paragraph, and they won't be
193 recognized by the preprocessor. Note: the preprocessor directives
194 must only appear between tags (they must not break a tag).
195
196 translated
197 Space-separated list of tags you want to translate.
198
199 The tags must be in the form <aaa>, but you can join some
200 (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
201 within another tag (<bbb>).
202
203 You can also specify some tag options by putting some characters in
204 front of the tag hierarchy. This overrides the default behavior
205 specified by the global wrap and defaulttranslateoption option.
206
207 w Tags should be translated and content can be re-wrapped.
208
209 W Tags should be translated and content should not be re-wrapped.
210
211 i Tags should be translated inline.
212
213 p Tags should be translated as placeholders.
214
215 Internally, the XML parser only cares about these four options: w W
216 i p.
217
218 * Tags listed in break are set to w or W depending on the wrap
219 option.
220
221 * Tags listed in inline are set to i.
222
223 * Tags listed in placeholder are set to p.
224
225 * Tags listed in untranslated are without any of these options set.
226
227 You can verify actual internal parameter behavior by invoking po4a
228 with --debug option.
229
230 Example: W<chapter><title>
231
232 Please note a tag should be listed in either translated or
233 untranslated setting string.
234
235 untranslated
236 Space-separated list of tags you do not want to translate.
237
238 The tags must be in the form <aaa>, but you can join some
239 (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
240 within another tag (<bbb>).
241
242 Please note a translatable inline tag in an untranslated tag is
243 treated as a translatable breaking tag, i setting is dropped and w
244 or W is set depending on the wrap option.
245
246 defaulttranslateoption
247 The default categories for tags that are not in any of the
248 translated, untranslated, break, inline, or placeholder.
249
250 This is a set of letters as defined in translated and this setting
251 is only valid for translatable tags.
252
254 DEFINE WHAT TAGS AND ATTRIBUTES TO TRANSLATE
255 The simplest customization is to define which tags and attributes you
256 want the parser to translate. This should be done in the initialize
257 function. First you should call the main initialize, to get the
258 command-line options, and then, append your custom definitions to the
259 options hash. If you want to treat some new options from command line,
260 you should define them before calling the main initialize:
261
262 $self->{options}{'new_option'}='';
263 $self->SUPER::initialize(%options);
264 $self->{options}{'_default_translated'}.=' <p> <head><title>';
265 $self->{options}{'attributes'}.=' <p>lang id';
266 $self->{options}{'_default_inline'}.=' <br>';
267 $self->treat_options;
268
269 You should use the _default_inline, _default_break,
270 _default_placeholder, _default_translated, _default_untranslated, and
271 _default_attributes options in derivative modules. This allow users to
272 override the default behavior defined in your module with command line
273 options.
274
275 OVERRIDE THE DEFAULT BEHAVIOR WITH COMMAND LINE OPTIONS
276 If you don't like the default behavior of this xml module and its
277 derivative modules, you can provide command line options to change
278 their behavior.
279
280 See Locale::Po4a::Docbook(3pm),
281
282 OVERRIDING THE found_string FUNCTION
283 Another simple step is to override the function "found_string", which
284 receives the extracted strings from the parser, in order to translate
285 them. There you can control which strings you want to translate, and
286 perform transformations to them before or after the translation itself.
287
288 It receives the extracted text, the reference on where it was, and a
289 hash that contains extra information to control what strings to
290 translate, how to translate them and to generate the comment.
291
292 The content of these options depends on the kind of string it is
293 (specified in an entry of this hash):
294
295 type="tag"
296 The found string is the content of a translatable tag. The entry
297 "tag_options" contains the option characters in front of the tag
298 hierarchy in the module "tags" option.
299
300 type="attribute"
301 Means that the found string is the value of a translatable
302 attribute. The entry "attribute" has the name of the attribute.
303
304 It must return the text that will replace the original in the
305 translated document. Here's a basic example of this function:
306
307 sub found_string {
308 my ($self,$text,$ref,$options)=@_;
309 $text = $self->translate($text,$ref,"type ".$options->{'type'},
310 'wrap'=>$self->{options}{'wrap'});
311 return $text;
312 }
313
314 There's another simple example in the new Dia module, which only
315 filters some strings.
316
317 MODIFYING TAG TYPES (TODO)
318 This is a more complex one, but it enables a (almost) total
319 customization. It's based on a list of hashes, each one defining a tag
320 type's behavior. The list should be sorted so that the most general
321 tags are after the most concrete ones (sorted first by the beginning
322 and then by the end keys). To define a tag type you'll have to make a
323 hash with the following keys:
324
325 beginning
326 Specifies the beginning of the tag, after the "<".
327
328 end Specifies the end of the tag, before the ">".
329
330 breaking
331 It says if this is a breaking tag class. A non-breaking (inline)
332 tag is one that can be taken as part of the content of another tag.
333 It can take the values false (0), true (1) or undefined. If you
334 leave this undefined, you'll have to define the f_breaking function
335 that will say whether a concrete tag of this class is a breaking
336 tag or not.
337
338 f_breaking
339 It's a function that will tell if the next tag is a breaking one or
340 not. It should be defined if the breaking option is not.
341
342 f_extract
343 If you leave this key undefined, the generic extraction function
344 will have to extract the tag itself. It's useful for tags that can
345 have other tags or special structures in them, so that the main
346 parser doesn't get mad. This function receives a boolean that says
347 if the tag should be removed from the input stream or not.
348
349 f_translate
350 This function receives the tag (in the get_string_until() format)
351 and returns the translated tag (translated attributes or all needed
352 transformations) as a single string.
353
355 WORKING WITH TAGS
356 get_path()
357 This function returns the path to the current tag from the
358 document's root, in the form <html><body><p>.
359
360 An additional array of tags (without brackets) can be passed as
361 argument. These path elements are added to the end of the current
362 path.
363
364 tag_type()
365 This function returns the index from the tag_types list that fits
366 to the next tag in the input stream, or -1 if it's at the end of
367 the input file.
368
369 Here, the tag has structure started by < and end by > and it can
370 contain multiple lines.
371
372 This works on the array "@{$self->{TT}{doc_in}}" holding input
373 document data and reference indirectly via "$self->shiftline()" and
374 "$self->unshiftline($$)".
375
376 extract_tag($$)
377 This function returns the next tag from the input stream without
378 the beginning and end, in an array form, to maintain the references
379 from the input file. It has two parameters: the type of the tag
380 (as returned by tag_type) and a boolean, that indicates if it
381 should be removed from the input stream.
382
383 This works on the array "@{$self->{TT}{doc_in}}" holding input
384 document data and reference indirectly via "$self->shiftline()" and
385 "$self->unshiftline($$)".
386
387 get_tag_name(@)
388 This function returns the name of the tag passed as an argument, in
389 the array form returned by extract_tag.
390
391 breaking_tag()
392 This function returns a boolean that says if the next tag in the
393 input stream is a breaking tag or not (inline tag). It leaves the
394 input stream intact.
395
396 treat_tag()
397 This function translates the next tag from the input stream. Using
398 each tag type's custom translation functions.
399
400 This works on the array "@{$self->{TT}{doc_in}}" holding input
401 document data and reference indirectly via "$self->shiftline()" and
402 "$self->unshiftline($$)".
403
404 tag_in_list($@)
405 This function returns a string value that says if the first
406 argument (a tag hierarchy) matches any of the tags from the second
407 argument (a list of tags or tag hierarchies). If it doesn't match,
408 it returns 0. Else, it returns the matched tag's options (the
409 characters in front of the tag) or 1 (if that tag doesn't have
410 options).
411
412 WORKING WITH ATTRIBUTES
413 treat_attributes(@)
414 This function handles the translation of the tags' attributes. It
415 receives the tag without the beginning / end marks, and then it
416 finds the attributes, and it translates the translatable ones
417 (specified by the module option attributes). This returns a plain
418 string with the translated tag.
419
420 WORKING WITH TAGGED CONTENTS
421 treat_content()
422 This function gets the text until the next breaking tag (not
423 inline) from the input stream. Translate it using each tag type's
424 custom translation functions.
425
426 This works on the array "@{$self->{TT}{doc_in}}" holding input
427 document data and reference indirectly via "$self->shiftline()" and
428 "$self->unshiftline($$)".
429
430 WORKING WITH THE MODULE OPTIONS
431 treat_options()
432 This function fills the internal structures that contain the tags,
433 attributes and inline data with the options of the module
434 (specified in the command-line or in the initialize function).
435
436 GETTING TEXT FROM THE INPUT DOCUMENT
437 get_string_until($%)
438 This function returns an array with the lines (and references) from
439 the input document until it finds the first argument. The second
440 argument is an options hash. Value 0 means disabled (the default)
441 and 1, enabled.
442
443 The valid options are:
444
445 include
446 This makes the returned array to contain the searched text
447
448 remove
449 This removes the returned stream from the input
450
451 unquoted
452 This ensures that the searched text is outside any quotes
453
454 regex
455 This denotes that the first argument is a regular expression
456 rather than an plain string
457
458 skip_spaces(\@)
459 This function receives as argument the reference to a paragraph (in
460 the format returned by get_string_until), skips his heading spaces
461 and returns them as a simple string.
462
463 join_lines(@)
464 This function returns a simple string with the text from the
465 argument array (discarding the references).
466
468 This module can translate tags and attributes.
469
471 DOCTYPE (ENTITIES)
472
473 There is a minimal support for the translation of entities. They are
474 translated as a whole, and tags are not taken into account. Multilines
475 entities are not supported and entities are always rewrapped during the
476 translation.
477
478 MODIFY TAG TYPES FROM INHERITED MODULES (move the tag_types structure
479 inside the $self hash?)
480
482 Locale::Po4a::TransTractor(3pm), po4a(7)
483
485 Jordi Vilalta <jvprat@gmail.com>
486 Nicolas François <nicolas.francois@centraliens.net>
487
489 Copyright © 2004 Jordi Vilalta <jvprat@gmail.com>
490 Copyright © 2008-2009 Nicolas François <nicolas.francois@centraliens.net>
491
492 This program is free software; you may redistribute it and/or modify it
493 under the terms of GPL (see the COPYING file).
494
495
496
497Po4a Tools 2022-01-21 Locale::Po4a::Xml(3pm)