Locale::Po4a::Xml(3pm)

1Locale::Po4a::Xml(3)  User Contributed Perl Documentation Locale::Po4a::Xml(3)
2
3
4

NAME

6       Locale::Po4a::Xml - Convert XML documents and derivates from/to PO
7       files
8

DESCRIPTION

10       The po4a (po for anything) project goal is to ease translations (and
11       more interestingly, the maintenance of translations) using gettext
12       tools on areas where they were not expected like documentation.
13
14       Locale::Po4a::Xml is a module to help the translation of XML documents
15       into other [human] languages. It can also be used as a base to build
16       modules for XML-based documents.
17

TRANSLATING WITH PO4A::XML

19       This module can be used directly to handle generic XML documents.  This
20       will extract all tag's content, and no attributes, since it's where the
21       text is written in most XML based documents.
22
23       There are some options (described in the next section) that can custom‐
24       ize this behavior.  If this doesn't fit to your document format you're
25       encouraged to write your own module derived from this, to describe your
26       format's details.  See the section "Writing derivate modules" below,
27       for the process description.
28

OPTIONS ACCEPTED BY THIS MODULE

30       The global debug option causes this module to show the excluded
31       strings, in order to see if it skips something important.
32
33       These are this module's particular options:
34
35       nostrip
36           Prevents it to strip the spaces around the extracted strings.
37
38       wrap
39           Canonizes the string to translate, considering that whitespaces are
40           not important, and wraps the translated document. This option can
41           be overridden by custom tag options. See the "tags" option below.
42
43       caseinsensitive
44           It makes the tags and attributes searching to work in a case insen‐
45           sitive way.  If it's defined, it will treat <BooK>laNG and
46           <BOOK>Lang as <book>lang.
47
48       includeexternal
49           When defined, external entities are included in the generated
50           (translated) document, and for the extraction of strings.  If it's
51           not defined, you will have to translate external entities sepa‐
52           rately as independent documents.
53
54       ontagerror
55           This option defines the behavior of the module when it encounter a
56           invalid closing tag (a tag is closed, which does not match the last
57           opening tag).  It can take the following values:
58
59           fail
60               This is the default value.  The module will exit with an error.
61
62           warn
63               The module will continue, and will issue a warning.
64
65           silent
66               The module will continue without any warnings.
67
68           Be careful when using this option.  It is generally recommended to
69           fix the input file.
70
71       tagsonly
72           Extracts only the specified tags in the "tags" option.  Otherwise,
73           it will extract all the tags except the ones specified.
74
75       doctype
76           String that will try to match with the first line of the document's
77           doctype (if defined). If it doesn't, the document will be consid‐
78           ered of a bad type.
79
80       tags
81           Space-separated list of the tags you want to translate or skip.  By
82           default, the specified tags will be excluded, but if you use the
83           "tagsonly" option, the specified tags will be the only ones
84           included.  The tags must be in the form <aaa>, but you can join
85           some (<bbb><aaa>) to say that the content of the tag <aaa> will
86           only be translated when it's into a <bbb> tag.
87
88           You can also specify some tag options putting some characters in
89           front of the tag hierarchy. For example, you can put 'w' (wrap) or
90           'W' (don't wrap) to override the default behavior specified by the
91           global "wrap" option.
92
93           Example: W<chapter><title>
94
95       attributes
96           Space-separated list of the tag's attributes you want to translate.
97           You can specify the attributes by their name (for example, "lang"),
98           but you can prefix it with a tag hierarchy, to specify that this
99           attribute will only be translated when it's into the specified tag.
100           For example: <bbb><aaa>lang specifies that the lang attribute will
101           only be translated if it's into an <aaa> tag, and it's into a <bbb>
102           tag.
103
104       inline
105           Space-separated list of the tags you want to treat as inline.  By
106           default, all tags break the sequence.  This follows the same syntax
107           as the tags option.
108
109       nodefault
110           Space separated list of tags that the module should not try to set
111           by default in the "tags" or "inline" category.
112

WRITING DERIVATE MODULES

114       DEFINE WHAT TAGS AND ATTRIBUTES TO TRANSLATE
115
116       The simplest customization is to define which tags and attributes you
117       want the parser to translate.  This should be done in the initialize
118       function.  First you should call the main initialize, to get the com‐
119       mand-line options, and then, append your custom definitions to the
120       options hash.  If you want to treat some new options from command line,
121       you should define them before calling the main initialize:
122
123         $self->{options}{'new_option'}='';
124         $self->SUPER::initialize(%options);
125         $self->{options}{'tags'}.=' <p> <head><title>';
126         $self->{options}{'attributes'}.=' <p>lang id';
127         $self->{options}{'inline'}.=' <br>';
128         $self->treat_options;
129
130       OVERRIDING THE found_string FUNCTION
131
132       Another simple step is to override the function "found_string", which
133       receives the extracted strings from the parser, in order to translate
134       them.  There you can control which strings you want to translate, and
135       perform transformations to them before or after the translation itself.
136
137       It receives the extracted text, the reference on where it was, and a
138       hash that contains extra information to control what strings to trans‐
139       late, how to translate them and to generate the comment.
140
141       The content of these options depends on the kind of string it is (spec‐
142       ified in an entry of this hash):
143
144       type="tag"
145           The found string is the content of a translatable tag. The entry
146           "tag_options" contains the option characters in front of the tag
147           hierarchy in the module "tags" option.
148
149       type="attribute"
150           Means that the found string is the value of a translatable
151           attribute. The entry "attribute" has the name of the attribute.
152
153       It must return the text that will replace the original in the trans‐
154       lated document. Here's a basic example of this function:
155
156         sub found_string {
157           my ($self,$text,$ref,$options)=@_;
158           $text = $self->translate($text,$ref,"type ".$options->{'type'},
159             'wrap'=>$self->{options}{'wrap'});
160           return $text;
161         }
162
163       There's another simple example in the new Dia module, which only fil‐
164       ters some strings.
165
166       MODIFYING TAG TYPES (TODO)
167
168       This is a more complex one, but it enables a (almost) total customiza‐
169       tion.  It's based in a list of hashes, each one defining a tag type's
170       behavior. The list should be sorted so that the most general tags are
171       after the most concrete ones (sorted first by the beginning and then by
172       the end keys). To define a tag type you'll have to make a hash with the
173       following keys:
174
175       beginning
176           Specifies the beginning of the tag, after the "<".
177
178       end Specifies the end of the tag, before the ">".
179
180       breaking
181           It says if this is a breaking tag class.  A non-breaking (inline)
182           tag is one that can be taken as part of the content of another tag.
183           It can take the values false (0), true (1) or undefined.  If you
184           leave this undefined, you'll have to define the f_breaking function
185           that will say whether a concrete tag of this class is a breaking
186           tag or not.
187
188       f_breaking
189           It's a function that will tell if the next tag is a breaking one or
190           not.  It should be defined if the "breaking" option is not.
191
192       f_extract
193           If you leave this key undefined, the generic extraction function
194           will have to extract the tag itself.  It's useful for tags that can
195           have other tags or special structures in them, so that the main
196           parser doesn't get mad.  This function receives a boolean that says
197           if the tag should be removed from the input stream or not.
198
199       f_translate
200           This function receives the tag (in the get_string_until() format)
201           and returns the translated tag (translated attributes or all needed
202           transformations) as a single string.
203

INTERNAL FUNCTIONS used to write derivated parsers

205       WORKING WITH TAGS
206
207       get_path()
208           This function returns the path to the current tag from the docu‐
209           ment's root, in the form <html><body><p>.
210
211       tag_type()
212           This function returns the index from the tag_types list that fits
213           to the next tag in the input stream, or -1 if it's at the end of
214           the input file.
215
216       extract_tag($$)
217           This function returns the next tag from the input stream without
218           the beginning and end, in an array form, to maintain the references
219           from the input file.  It has two parameters: the type of the tag
220           (as returned by tag_type) and a boolean, that indicates if it
221           should be removed from the input stream.
222
223       get_tag_name(@)
224           This function returns the name of the tag passed as an argument, in
225           the array form returned by extract_tag.
226
227       breaking_tag()
228           This function returns a boolean that says if the next tag in the
229           input stream is a breaking tag or not (inline tag).  It leaves the
230           input stream intact.
231
232       treat_tag()
233           This function translates the next tag from the input stream.  Using
234           each tag type's custom translation functions.
235
236       tag_in_list($@)
237           This function returns a string value that says if the first argu‐
238           ment (a tag hierarchy) matches any of the tags from the second
239           argument (a list of tags or tag hierarchies). If it doesn't match,
240           it returns 0. Else, it returns the matched tag's options (the char‐
241           acters in front of the tag) or 1 (if that tag doesn't have
242           options).
243
244       WORKING WITH ATTRIBUTES
245
246       treat_attributes(@)
247           This function handles the translation of the tags' attributes. It
248           receives the tag without the beginning / end marks, and then it
249           finds the attributes, and it translates the translatable ones
250           (specified by the module option "attributes").  This returns a
251           plain string with the translated tag.
252
253       WORKING WITH THE MODULE OPTIONS
254
255       treat_options()
256           This function fills the internal structures that contain the tags,
257           attributes and inline data with the options of the module (speci‐
258           fied in the command-line or in the initialize function).
259
260       GETTING TEXT FROM THE INPUT DOCUMENT
261
262       get_string_until($%)
263           This function returns an array with the lines (and references) from
264           the input document until it finds the first argument.  The second
265           argument is an options hash. Value 0 means disabled (the default)
266           and 1, enabled.
267
268           The valid options are:
269
270           include
271               This makes the returned array to contain the searched text
272
273           remove
274               This removes the returned stream from the input
275
276           unquoted
277               This ensures that the searched text is outside any quotes
278
279       skip_spaces(\@)
280           This function receives as argument the reference to a paragraph (in
281           the format returned by get_string_until), skips his heading spaces
282           and returns them as a simple string.
283
284       join_lines(@)
285           This function returns a simple string with the text from the argu‐
286           ment array (discarding the references).
287

STATUS OF THIS MODULE

289       This module can translate tags and attributes.
290
291       Support for entities and included files is in the TODO list.
292
293       The writing of derivate modules is rather limited.
294

TODO LIST

296       DOCTYPE (ENTITIES)
297
298       There is a minimal support for the translation of entities. They are
299       translated as a whole, and tags are not taken into account. Multilines
300       entities are not supported and entities are always rewrapped during the
301       translation.
302
303       INCLUDED FILES
304
305       MODIFY TAG TYPES FROM INHERITED MODULES (move the tag_types structure
306       inside the $self hash?)
307
308       breaking tag inside non-breaking tag (possible?) causes ugly comments
309

AUTHORS

314        Jordi Vilalta <jvprat@gmail.com>
315

COPYRIGHT AND LICENSE

317       Copyright (c) 2004 by Jordi Vilalta  <jvprat@gmail.com>
318
319       This program is free software; you may redistribute it and/or modify it
320       under the terms of GPL (see the COPYING file).
321
322
323
324perl v5.8.8                       2008-06-01              Locale::Po4a::Xml(3)