1Locale::Po4a::Xml(3) User Contributed Perl Documentation Locale::Po4a::Xml(3)
2
3
4
6 Locale::Po4a::Xml - Convert XML documents and derivates from/to PO
7 files
8
10 The po4a (po for anything) project goal is to ease translations (and
11 more interestingly, the maintenance of translations) using gettext
12 tools on areas where they were not expected like documentation.
13
14 Locale::Po4a::Xml is a module to help the translation of XML documents
15 into other [human] languages. It can also be used as a base to build
16 modules for XML-based documents.
17
19 This module can be used directly to handle generic XML documents. This
20 will extract all tag's content, and no attributes, since it's where the
21 text is written in most XML based documents.
22
23 There are some options (described in the next section) that can custom‐
24 ize this behavior. If this doesn't fit to your document format you're
25 encouraged to write your own module derived from this, to describe your
26 format's details. See the section "Writing derivate modules" below,
27 for the process description.
28
30 The global debug option causes this module to show the excluded
31 strings, in order to see if it skips something important.
32
33 These are this module's particular options:
34
35 nostrip
36 Prevents it to strip the spaces around the extracted strings.
37
38 wrap
39 Canonizes the string to translate, considering that whitespaces are
40 not important, and wraps the translated document. This option can
41 be overridden by custom tag options. See the "tags" option below.
42
43 caseinsensitive
44 It makes the tags and attributes searching to work in a case insen‐
45 sitive way. If it's defined, it will treat <BooK>laNG and
46 <BOOK>Lang as <book>lang.
47
48 includeexternal
49 When defined, external entities are included in the generated
50 (translated) document, and for the extraction of strings. If it's
51 not defined, you will have to translate external entities sepa‐
52 rately as independent documents.
53
54 ontagerror
55 This option defines the behavior of the module when it encounter a
56 invalid closing tag (a tag is closed, which does not match the last
57 opening tag). It can take the following values:
58
59 fail
60 This is the default value. The module will exit with an error.
61
62 warn
63 The module will continue, and will issue a warning.
64
65 silent
66 The module will continue without any warnings.
67
68 Be careful when using this option. It is generally recommended to
69 fix the input file.
70
71 tagsonly
72 Extracts only the specified tags in the "tags" option. Otherwise,
73 it will extract all the tags except the ones specified.
74
75 doctype
76 String that will try to match with the first line of the document's
77 doctype (if defined). If it doesn't, the document will be consid‐
78 ered of a bad type.
79
80 tags
81 Space-separated list of the tags you want to translate or skip. By
82 default, the specified tags will be excluded, but if you use the
83 "tagsonly" option, the specified tags will be the only ones
84 included. The tags must be in the form <aaa>, but you can join
85 some (<bbb><aaa>) to say that the content of the tag <aaa> will
86 only be translated when it's into a <bbb> tag.
87
88 You can also specify some tag options putting some characters in
89 front of the tag hierarchy. For example, you can put 'w' (wrap) or
90 'W' (don't wrap) to override the default behavior specified by the
91 global "wrap" option.
92
93 Example: W<chapter><title>
94
95 attributes
96 Space-separated list of the tag's attributes you want to translate.
97 You can specify the attributes by their name (for example, "lang"),
98 but you can prefix it with a tag hierarchy, to specify that this
99 attribute will only be translated when it's into the specified tag.
100 For example: <bbb><aaa>lang specifies that the lang attribute will
101 only be translated if it's into an <aaa> tag, and it's into a <bbb>
102 tag.
103
104 inline
105 Space-separated list of the tags you want to treat as inline. By
106 default, all tags break the sequence. This follows the same syntax
107 as the tags option.
108
109 nodefault
110 Space separated list of tags that the module should not try to set
111 by default in the "tags" or "inline" category.
112
114 DEFINE WHAT TAGS AND ATTRIBUTES TO TRANSLATE
115
116 The simplest customization is to define which tags and attributes you
117 want the parser to translate. This should be done in the initialize
118 function. First you should call the main initialize, to get the com‐
119 mand-line options, and then, append your custom definitions to the
120 options hash. If you want to treat some new options from command line,
121 you should define them before calling the main initialize:
122
123 $self->{options}{'new_option'}='';
124 $self->SUPER::initialize(%options);
125 $self->{options}{'tags'}.=' <p> <head><title>';
126 $self->{options}{'attributes'}.=' <p>lang id';
127 $self->{options}{'inline'}.=' <br>';
128 $self->treat_options;
129
130 OVERRIDING THE found_string FUNCTION
131
132 Another simple step is to override the function "found_string", which
133 receives the extracted strings from the parser, in order to translate
134 them. There you can control which strings you want to translate, and
135 perform transformations to them before or after the translation itself.
136
137 It receives the extracted text, the reference on where it was, and a
138 hash that contains extra information to control what strings to trans‐
139 late, how to translate them and to generate the comment.
140
141 The content of these options depends on the kind of string it is (spec‐
142 ified in an entry of this hash):
143
144 type="tag"
145 The found string is the content of a translatable tag. The entry
146 "tag_options" contains the option characters in front of the tag
147 hierarchy in the module "tags" option.
148
149 type="attribute"
150 Means that the found string is the value of a translatable
151 attribute. The entry "attribute" has the name of the attribute.
152
153 It must return the text that will replace the original in the trans‐
154 lated document. Here's a basic example of this function:
155
156 sub found_string {
157 my ($self,$text,$ref,$options)=@_;
158 $text = $self->translate($text,$ref,"type ".$options->{'type'},
159 'wrap'=>$self->{options}{'wrap'});
160 return $text;
161 }
162
163 There's another simple example in the new Dia module, which only fil‐
164 ters some strings.
165
166 MODIFYING TAG TYPES (TODO)
167
168 This is a more complex one, but it enables a (almost) total customiza‐
169 tion. It's based in a list of hashes, each one defining a tag type's
170 behavior. The list should be sorted so that the most general tags are
171 after the most concrete ones (sorted first by the beginning and then by
172 the end keys). To define a tag type you'll have to make a hash with the
173 following keys:
174
175 beginning
176 Specifies the beginning of the tag, after the "<".
177
178 end Specifies the end of the tag, before the ">".
179
180 breaking
181 It says if this is a breaking tag class. A non-breaking (inline)
182 tag is one that can be taken as part of the content of another tag.
183 It can take the values false (0), true (1) or undefined. If you
184 leave this undefined, you'll have to define the f_breaking function
185 that will say whether a concrete tag of this class is a breaking
186 tag or not.
187
188 f_breaking
189 It's a function that will tell if the next tag is a breaking one or
190 not. It should be defined if the "breaking" option is not.
191
192 f_extract
193 If you leave this key undefined, the generic extraction function
194 will have to extract the tag itself. It's useful for tags that can
195 have other tags or special structures in them, so that the main
196 parser doesn't get mad. This function receives a boolean that says
197 if the tag should be removed from the input stream or not.
198
199 f_translate
200 This function receives the tag (in the get_string_until() format)
201 and returns the translated tag (translated attributes or all needed
202 transformations) as a single string.
203
205 WORKING WITH TAGS
206
207 get_path()
208 This function returns the path to the current tag from the docu‐
209 ment's root, in the form <html><body><p>.
210
211 tag_type()
212 This function returns the index from the tag_types list that fits
213 to the next tag in the input stream, or -1 if it's at the end of
214 the input file.
215
216 extract_tag($$)
217 This function returns the next tag from the input stream without
218 the beginning and end, in an array form, to maintain the references
219 from the input file. It has two parameters: the type of the tag
220 (as returned by tag_type) and a boolean, that indicates if it
221 should be removed from the input stream.
222
223 get_tag_name(@)
224 This function returns the name of the tag passed as an argument, in
225 the array form returned by extract_tag.
226
227 breaking_tag()
228 This function returns a boolean that says if the next tag in the
229 input stream is a breaking tag or not (inline tag). It leaves the
230 input stream intact.
231
232 treat_tag()
233 This function translates the next tag from the input stream. Using
234 each tag type's custom translation functions.
235
236 tag_in_list($@)
237 This function returns a string value that says if the first argu‐
238 ment (a tag hierarchy) matches any of the tags from the second
239 argument (a list of tags or tag hierarchies). If it doesn't match,
240 it returns 0. Else, it returns the matched tag's options (the char‐
241 acters in front of the tag) or 1 (if that tag doesn't have
242 options).
243
244 WORKING WITH ATTRIBUTES
245
246 treat_attributes(@)
247 This function handles the translation of the tags' attributes. It
248 receives the tag without the beginning / end marks, and then it
249 finds the attributes, and it translates the translatable ones
250 (specified by the module option "attributes"). This returns a
251 plain string with the translated tag.
252
253 WORKING WITH THE MODULE OPTIONS
254
255 treat_options()
256 This function fills the internal structures that contain the tags,
257 attributes and inline data with the options of the module (speci‐
258 fied in the command-line or in the initialize function).
259
260 GETTING TEXT FROM THE INPUT DOCUMENT
261
262 get_string_until($%)
263 This function returns an array with the lines (and references) from
264 the input document until it finds the first argument. The second
265 argument is an options hash. Value 0 means disabled (the default)
266 and 1, enabled.
267
268 The valid options are:
269
270 include
271 This makes the returned array to contain the searched text
272
273 remove
274 This removes the returned stream from the input
275
276 unquoted
277 This ensures that the searched text is outside any quotes
278
279 skip_spaces(\@)
280 This function receives as argument the reference to a paragraph (in
281 the format returned by get_string_until), skips his heading spaces
282 and returns them as a simple string.
283
284 join_lines(@)
285 This function returns a simple string with the text from the argu‐
286 ment array (discarding the references).
287
289 This module can translate tags and attributes.
290
291 Support for entities and included files is in the TODO list.
292
293 The writing of derivate modules is rather limited.
294
296 DOCTYPE (ENTITIES)
297
298 There is a minimal support for the translation of entities. They are
299 translated as a whole, and tags are not taken into account. Multilines
300 entities are not supported and entities are always rewrapped during the
301 translation.
302
303 INCLUDED FILES
304
305 MODIFY TAG TYPES FROM INHERITED MODULES (move the tag_types structure
306 inside the $self hash?)
307
308 breaking tag inside non-breaking tag (possible?) causes ugly comments
309
311 po4a(7), Locale::Po4a::TransTractor(3pm).
312
314 Jordi Vilalta <jvprat@gmail.com>
315
317 Copyright (c) 2004 by Jordi Vilalta <jvprat@gmail.com>
318
319 This program is free software; you may redistribute it and/or modify it
320 under the terms of GPL (see the COPYING file).
321
322
323
324perl v5.8.8 2008-06-01 Locale::Po4a::Xml(3)