1Locale::Po4a::Xml(3)  User Contributed Perl Documentation Locale::Po4a::Xml(3)
2
3
4

NAME

6       Locale::Po4a::Xml - convert XML documents and derivates from/to PO
7       files
8

DESCRIPTION

10       The po4a (PO for anything) project goal is to ease translations (and
11       more interestingly, the maintenance of translations) using gettext
12       tools on areas where they were not expected like documentation.
13
14       Locale::Po4a::Xml is a module to help the translation of XML documents
15       into other [human] languages. It can also be used as a base to build
16       modules for XML-based documents.
17

TRANSLATING WITH PO4A::XML

19       This module can be used directly to handle generic XML documents.  This
20       will extract all tag's content, and no attributes, since it's where the
21       text is written in most XML based documents.
22
23       There are some options (described in the next section) that can
24       customize this behavior.  If this doesn't fit to your document format
25       you're encouraged to write your own module derived from this, to
26       describe your format's details.  See the section WRITING DERIVATE
27       MODULES below, for the process description.
28

OPTIONS ACCEPTED BY THIS MODULE

30       The global debug option causes this module to show the excluded
31       strings, in order to see if it skips something important.
32
33       These are this module's particular options:
34
35       nostrip
36           Prevents it to strip the spaces around the extracted strings.
37
38       wrap
39           Canonizes the string to translate, considering that whitespaces are
40           not important, and wraps the translated document. This option can
41           be overridden by custom tag options. See the "tags" option below.
42
43       caseinsensitive
44           It makes the tags and attributes searching to work in a case
45           insensitive way.  If it's defined, it will treat <BooK>laNG and
46           <BOOK>Lang as <book>lang.
47
48       includeexternal
49           When defined, external entities are included in the generated
50           (translated) document, and for the extraction of strings.  If it's
51           not defined, you will have to translate external entities
52           separately as independent documents.
53
54       ontagerror
55           This option defines the behavior of the module when it encounter a
56           invalid XML syntax (a closing tag which does not match the last
57           opening tag, or a tag's attribute without value).  It can take the
58           following values:
59
60           fail
61               This is the default value.  The module will exit with an error.
62
63           warn
64               The module will continue, and will issue a warning.
65
66           silent
67               The module will continue without any warnings.
68
69           Be careful when using this option.  It is generally recommended to
70           fix the input file.
71
72       tagsonly
73           Extracts only the specified tags in the "tags" option.  Otherwise,
74           it will extract all the tags except the ones specified.
75
76           Note: This option is deprecated.
77
78       doctype
79           String that will try to match with the first line of the document's
80           doctype (if defined). If it doesn't, a warning will indicate that
81           the document might be of a bad type.
82
83       addlang
84           String indicating the path (e.g. <bbb><aaa>) of a tag where a
85           lang="..." attribute shall be added. The language will be defined
86           as the basename of the PO file without any .po extension.
87
88       tags
89           Space-separated list of tags you want to translate or skip.  By
90           default, the specified tags will be excluded, but if you use the
91           "tagsonly" option, the specified tags will be the only ones
92           included.  The tags must be in the form <aaa>, but you can join
93           some (<bbb><aaa>) to say that the content of the tag <aaa> will
94           only be translated when it's into a <bbb> tag.
95
96           You can also specify some tag options putting some characters in
97           front of the tag hierarchy. For example, you can put 'w' (wrap) or
98           'W' (don't wrap) to override the default behavior specified by the
99           global "wrap" option.
100
101           Example: W<chapter><title>
102
103           Note: This option is deprecated.  You should use the translated and
104           untranslated options instead.
105
106       attributes
107           Space-separated list of tag's attributes you want to translate.
108           You can specify the attributes by their name (for example, "lang"),
109           but you can prefix it with a tag hierarchy, to specify that this
110           attribute will only be translated when it's into the specified tag.
111           For example: <bbb><aaa>lang specifies that the lang attribute will
112           only be translated if it's into an <aaa> tag, and it's into a <bbb>
113           tag.
114
115       foldattributes
116           Do not translate attributes in inline tags.  Instead, replace all
117           attributes of a tag by po4a-id=<id>.
118
119           This is useful when attributes shall not be translated, as this
120           simplifies the strings for translators, and avoids typos.
121
122       customtag
123           Space-separated list of tags which should not be treated as tags.
124           These tags are treated as inline, and do not need to be closed.
125
126       break
127           Space-separated list of tags which should break the sequence.  By
128           default, all tags break the sequence.
129
130           The tags must be in the form <aaa>, but you can join some
131           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
132           into another tag (<bbb>).
133
134       inline
135           Space-separated list of tags which should be treated as inline.  By
136           default, all tags break the sequence.
137
138           The tags must be in the form <aaa>, but you can join some
139           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
140           into another tag (<bbb>).
141
142       placeholder
143           Space-separated list of tags which should be treated as
144           placeholders.  Placeholders do not break the sequence, but the
145           content of placeholders is translated separately.
146
147           The location of the placeholder in its block will be marked with a
148           string similar to:
149
150             <placeholder type=\"footnote\" id=\"0\"/>
151
152           The tags must be in the form <aaa>, but you can join some
153           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
154           into another tag (<bbb>).
155
156       nodefault
157           Space separated list of tags that the module should not try to set
158           by default in any category.
159
160       cpp Support C preprocessor directives.  When this option is set, po4a
161           will consider preprocessor directives as paragraph separators.
162           This is important if the XML file must be preprocessed because
163           otherwise the directives may be inserted in the middle of lines if
164           po4a consider it belong to the current paragraph, and they won't be
165           recognized by the preprocessor.  Note: the preprocessor directives
166           must only appear between tags (they must not break a tag).
167
168       translated
169           Space-separated list of tags you want to translate.
170
171           The tags must be in the form <aaa>, but you can join some
172           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
173           into another tag (<bbb>).
174
175           You can also specify some tag options putting some characters in
176           front of the tag hierarchy. For example, you can put 'w' (wrap) or
177           'W' (don't wrap) to overide the default behavior specified by the
178           global "wrap" option.
179
180           Example: W<chapter><title>
181
182       untranslated
183           Space-separated list of tags you do not want to translate.
184
185           The tags must be in the form <aaa>, but you can join some
186           (<bbb><aaa>), if a tag (<aaa>) should only be considered when it's
187           into another tag (<bbb>).
188
189       defaulttranslateoption
190           The default categories for tags that are not in any of the
191           translated, untranslated, break, inline, or placeholder.
192
193           This is a set of letters:
194
195           w   Tags should be translated and content can be re-wrapped.
196
197           W   Tags should be translated and content should not be re-wrapped.
198
199           i   Tags should be translated inline.
200
201           p   Tags should be translated as placeholders.
202

WRITING DERIVATE MODULES

204   DEFINE WHAT TAGS AND ATTRIBUTES TO TRANSLATE
205       The simplest customization is to define which tags and attributes you
206       want the parser to translate.  This should be done in the initialize
207       function.  First you should call the main initialize, to get the
208       command-line options, and then, append your custom definitions to the
209       options hash.  If you want to treat some new options from command line,
210       you should define them before calling the main initialize:
211
212         $self->{options}{'new_option'}='';
213         $self->SUPER::initialize(%options);
214         $self->{options}{'_default_translated'}.=' <p> <head><title>';
215         $self->{options}{'attributes'}.=' <p>lang id';
216         $self->{options}{'_default_inline'}.=' <br>';
217         $self->treat_options;
218
219       You should use the _default_inline, _default_break,
220       _default_placeholder, _default_translated, _default_untranslated, and
221       _default_attributes options in derivated modules. This allow users to
222       override the default behavior defined in your module with command line
223       options.
224
225   OVERRIDING THE found_string FUNCTION
226       Another simple step is to override the function "found_string", which
227       receives the extracted strings from the parser, in order to translate
228       them.  There you can control which strings you want to translate, and
229       perform transformations to them before or after the translation itself.
230
231       It receives the extracted text, the reference on where it was, and a
232       hash that contains extra information to control what strings to
233       translate, how to translate them and to generate the comment.
234
235       The content of these options depends on the kind of string it is
236       (specified in an entry of this hash):
237
238       type="tag"
239           The found string is the content of a translatable tag. The entry
240           "tag_options" contains the option characters in front of the tag
241           hierarchy in the module "tags" option.
242
243       type="attribute"
244           Means that the found string is the value of a translatable
245           attribute. The entry "attribute" has the name of the attribute.
246
247       It must return the text that will replace the original in the
248       translated document. Here's a basic example of this function:
249
250         sub found_string {
251           my ($self,$text,$ref,$options)=@_;
252           $text = $self->translate($text,$ref,"type ".$options->{'type'},
253             'wrap'=>$self->{options}{'wrap'});
254           return $text;
255         }
256
257       There's another simple example in the new Dia module, which only
258       filters some strings.
259
260   MODIFYING TAG TYPES (TODO)
261       This is a more complex one, but it enables a (almost) total
262       customization.  It's based in a list of hashes, each one defining a tag
263       type's behavior. The list should be sorted so that the most general
264       tags are after the most concrete ones (sorted first by the beginning
265       and then by the end keys). To define a tag type you'll have to make a
266       hash with the following keys:
267
268       beginning
269           Specifies the beginning of the tag, after the "<".
270
271       end Specifies the end of the tag, before the ">".
272
273       breaking
274           It says if this is a breaking tag class.  A non-breaking (inline)
275           tag is one that can be taken as part of the content of another tag.
276           It can take the values false (0), true (1) or undefined.  If you
277           leave this undefined, you'll have to define the f_breaking function
278           that will say whether a concrete tag of this class is a breaking
279           tag or not.
280
281       f_breaking
282           It's a function that will tell if the next tag is a breaking one or
283           not.  It should be defined if the breaking option is not.
284
285       f_extract
286           If you leave this key undefined, the generic extraction function
287           will have to extract the tag itself.  It's useful for tags that can
288           have other tags or special structures in them, so that the main
289           parser doesn't get mad.  This function receives a boolean that says
290           if the tag should be removed from the input stream or not.
291
292       f_translate
293           This function receives the tag (in the get_string_until() format)
294           and returns the translated tag (translated attributes or all needed
295           transformations) as a single string.
296

INTERNAL FUNCTIONS used to write derivated parsers

298   WORKING WITH TAGS
299       get_path()
300           This function returns the path to the current tag from the
301           document's root, in the form <html><body><p>.
302
303           An additional array of tags (without brackets) can be passed in
304           argument.  These path elements are added to the end of the current
305           path.
306
307       tag_type()
308           This function returns the index from the tag_types list that fits
309           to the next tag in the input stream, or -1 if it's at the end of
310           the input file.
311
312       extract_tag($$)
313           This function returns the next tag from the input stream without
314           the beginning and end, in an array form, to maintain the references
315           from the input file.  It has two parameters: the type of the tag
316           (as returned by tag_type) and a boolean, that indicates if it
317           should be removed from the input stream.
318
319       get_tag_name(@)
320           This function returns the name of the tag passed as an argument, in
321           the array form returned by extract_tag.
322
323       breaking_tag()
324           This function returns a boolean that says if the next tag in the
325           input stream is a breaking tag or not (inline tag).  It leaves the
326           input stream intact.
327
328       treat_tag()
329           This function translates the next tag from the input stream.  Using
330           each tag type's custom translation functions.
331
332       tag_in_list($@)
333           This function returns a string value that says if the first
334           argument (a tag hierarchy) matches any of the tags from the second
335           argument (a list of tags or tag hierarchies). If it doesn't match,
336           it returns 0. Else, it returns the matched tag's options (the
337           characters in front of the tag) or 1 (if that tag doesn't have
338           options).
339
340   WORKING WITH ATTRIBUTES
341       treat_attributes(@)
342           This function handles the translation of the tags' attributes. It
343           receives the tag without the beginning / end marks, and then it
344           finds the attributes, and it translates the translatable ones
345           (specified by the module option "attributes").  This returns a
346           plain string with the translated tag.
347
348   WORKING WITH THE MODULE OPTIONS
349       treat_options()
350           This function fills the internal structures that contain the tags,
351           attributes and inline data with the options of the module
352           (specified in the command-line or in the initialize function).
353
354   GETTING TEXT FROM THE INPUT DOCUMENT
355       get_string_until($%)
356           This function returns an array with the lines (and references) from
357           the input document until it finds the first argument.  The second
358           argument is an options hash. Value 0 means disabled (the default)
359           and 1, enabled.
360
361           The valid options are:
362
363           include
364               This makes the returned array to contain the searched text
365
366           remove
367               This removes the returned stream from the input
368
369           unquoted
370               This ensures that the searched text is outside any quotes
371
372       skip_spaces(\@)
373           This function receives as argument the reference to a paragraph (in
374           the format returned by get_string_until), skips his heading spaces
375           and returns them as a simple string.
376
377       join_lines(@)
378           This function returns a simple string with the text from the
379           argument array (discarding the references).
380

STATUS OF THIS MODULE

382       This module can translate tags and attributes.
383

TODO LIST

385       DOCTYPE (ENTITIES)
386
387       There is a minimal support for the translation of entities. They are
388       translated as a whole, and tags are not taken into account. Multilines
389       entities are not supported and entities are always rewrapped during the
390       translation.
391
392       MODIFY TAG TYPES FROM INHERITED MODULES (move the tag_types structure
393       inside the $self hash?)
394

SEE ALSO

396       po4a(7), Locale::Po4a::TransTractor(3pm).
397

AUTHORS

399        Jordi Vilalta <jvprat@gmail.com>
400        Nicolas Francois <nicolas.francois@centraliens.net>
401
403        Copyright (c) 2004 by Jordi Vilalta  <jvprat@gmail.com>
404        Copyright (c) 2008-2009 by Nicolas Francois <nicolas.francois@centraliens.net>
405
406       This program is free software; you may redistribute it and/or modify it
407       under the terms of GPL (see the COPYING file).
408
409
410
411perl v5.12.2                      2010-12-01              Locale::Po4a::Xml(3)
Impressum