1PO4A(7) Po4a Tools PO4A(7)
2
3
4
6 po4a - framework to translate documentation and other materials
7
9 The po4a (PO for anything) project goal is to ease translations (and
10 more interestingly, the maintenance of translations) using gettext
11 tools on areas where they were not expected like documentation.
12
14 This document is organized as follow:
15
16 1 Why should I use po4a? What is it good for?
17 This introducing chapter explains the motivation of the project and
18 its philosophy. You should read it first if you are in the process
19 of evaluating po4a for your own translations.
20
21 2 How to use po4a?
22 This chapter is a sort of reference manual, trying to answer the
23 users' questions and to give you a better understanding of the
24 whole process. This introduces how to do things with po4a and serve
25 as an introduction to the documentation of the specific tools.
26
27 HOWTO begin a new translation?
28 HOWTO change the translation back to a documentation file?
29 HOWTO update a po4a translation?
30 HOWTO convert a pre-existing translation to po4a?
31 HOWTO add extra text to translations (like translator's name)?
32 HOWTO do all this in one program invocation?
33 HOWTO customize po4a?
34 3 How does it work?
35 This chapter gives you a brief overview of the po4a internals, so
36 that you may feel more confident to help us maintaining and
37 improving it. It may also help you understanding why it does not do
38 what you expected, and how to solve your problems.
39
40 4 FAQ
41 This chapter groups the Frequently Asked Questions. In fact, most
42 of the questions for now could be formulated that way: "Why is it
43 designed this way, and not that one?" If you think po4a isn't the
44 right answer to documentation translation, you should consider
45 reading this section. If it does not answer your question, please
46 contact us on the <devel@lists.po4a.org> mailing list. We love
47 feedback.
48
49 5 Specific notes about modules
50 This chapter presents the specificities of each module from the
51 translator and original author's point of view. Read this to learn
52 the syntax you will encounter when translating stuff in this
53 module, or the rules you should follow in your original document to
54 make translators' life easier.
55
56 Actually, this section is not really part of this document.
57 Instead, it is placed in each module's documentation. This helps
58 ensuring that the information is up to date by keeping the
59 documentation and the code together.
60
62 I like the idea of open-source software, making it possible for
63 everybody to access software and its source code. But being French, I'm
64 well aware that the licensing is not the only restriction to the
65 openness of software: non-translated free software is useless for non-
66 English speakers, and we still have some work to make it available to
67 really everybody out there.
68
69 The perception of this situation by the open-source actors did
70 dramatically improve recently. We, as translators, won the first battle
71 and convinced everybody of the translations' importance. But
72 unfortunately, it was the easy part. Now, we have to do the job and
73 actually translate all this stuff.
74
75 Actually, open-source software themselves benefit of a rather decent
76 level of translation, thanks to the wonderful gettext tool suite. It is
77 able to extract the strings to translate from the program, present a
78 uniform format to translators, and then use the result of their works
79 at run time to display translated messages to the user.
80
81 But the situation is rather different when it comes to documentation.
82 Too often, the translated documentation is not visible enough (not
83 distributed as a part of the program), only partial, or not up to date.
84 This last situation is by far the worst possible one. Outdated
85 translation can turn out to be worse than no translation at all to the
86 users by describing old program behavior which are not in use anymore.
87
88 The problem to solve
89 Translating documentation is not very difficult in itself. Texts are
90 far longer than the messages of the program and thus take longer to be
91 achieved, but no technical skill is really needed to do so. The
92 difficult part comes when you have to maintain your work. Detecting
93 which parts did change and need to be updated is very difficult, error-
94 prone and highly unpleasant. I guess that this explains why so much
95 translated documentation out there are outdated.
96
97 The po4a answers
98 So, the whole point of po4a is to make the documentation translation
99 maintainable. The idea is to reuse the gettext methodology to this new
100 field. Like in gettext, texts are extracted from their original
101 locations in order to be presented in a uniform format to the
102 translators. The classical gettext tools help them updating their works
103 when a new release of the original comes out. But to the difference of
104 the classical gettext model, the translations are then re-injected in
105 the structure of the original document so that they can be processed
106 and distributed just like the English version.
107
108 Thanks to this, discovering which parts of the document were changed
109 and need an update becomes very easy. Another good point is that the
110 tools will make almost all the work when the structure of the original
111 document gets fundamentally reorganized and when some chapters are
112 moved around, merged or split. By extracting the text to translate from
113 the document structure, it also keeps you away from the text formatting
114 complexity and reduces your chances to get a broken document (even if
115 it does not completely prevent you to do so).
116
117 Please also see the FAQ below in this document for a more complete list
118 of the advantages and disadvantages of this approach.
119
120 Supported formats
121 Currently, this approach has been successfully implemented to several
122 kinds of text formatting formats:
123
124 man
125
126 The good old manual pages' format, used by so much programs out there.
127 The po4a support is very welcome here since this format is somewhat
128 difficult to use and not really friendly to the newbies. The
129 Locale::Po4a::Man(3pm) module also supports the mdoc format, used by
130 the BSD man pages (they are also quite common on Linux).
131
132 pod
133
134 This is the Perl Online Documentation format. The language and
135 extensions themselves are documented that way, as well as most of the
136 existing Perl scripts. It makes easy to keep the documentation close to
137 the actual code by embedding them both in the same file. It makes
138 programmer life easier, but unfortunately, not the translator one.
139
140 sgml
141
142 Even if somewhat superseded by XML nowadays, this format is still used
143 rather often for documents which are more than a few screens long. It
144 allows you to make complete books. Updating the translation of so long
145 documents can reveal to be a real nightmare. diff reveals often useless
146 when the original text was re-indented after update. Fortunately, po4a
147 can help you in that process.
148
149 Currently, only the DebianDoc and DocBook DTD are supported, but adding
150 support to a new one is really easy. It is even possible to use po4a on
151 an unknown SGML DTD without changing the code by providing the needed
152 information on the command line. See Locale::Po4a::Sgml(3pm) for
153 details.
154
155 TeX / LaTeX
156
157 The LaTeX format is a major documentation format used in the Free
158 Software world and for publications. The Locale::Po4a::LaTeX(3pm)
159 module was tested with the Python documentation, a book and some
160 presentations.
161
162 texinfo
163
164 All the GNU documentation is written in this format (that's even one of
165 the requirement to become an official GNU project). The support for
166 Locale::Po4a::Texinfo(3pm) in po4a is still at the beginning. Please
167 report bugs and feature requests.
168
169 xml
170
171 The XML format is a base format for many documentation formats.
172
173 Currently, the DocBook DTD is supported by po4a. See
174 Locale::Po4a::Docbook(3pm) for details.
175
176 others
177
178 Po4a can also handle some more rare or specialized formats, such as the
179 documentation of compilation options for the 2.4+ Linux kernels or the
180 diagrams produced by the dia tool. Adding a new one is often very easy
181 and the main task is to come up with a parser of your target format.
182 See Locale::Po4a::TransTractor(3pm) for more information about this.
183
184 Unsupported formats
185 Unfortunately, po4a still lacks support for several documentation
186 formats.
187
188 There is a whole bunch of other formats we would like to support in
189 po4a, and not only documentation ones. Indeed, we aim at plugging all
190 "market holes" left by the classical gettext tools. It encompass
191 package descriptions (deb and rpm), package installation scripts
192 questions, package changelogs, and all specialized file formats used by
193 the programs such as game scenarios or wine resource files.
194
196 This chapter is a sort of reference manual, trying to answer the users'
197 questions and to give you a better understanding of the whole process.
198 This introduces how to do things with po4a and serve as an introduction
199 to the documentation of the specific tools.
200
201 Graphical overview
202 The following schema gives an overview of the process of translating
203 documentation using po4a. Do not be afraid by its apparent complexity,
204 it comes from the fact that the whole process is represented here. Once
205 you converted your project to po4a, only the right part of the graphic
206 is relevant.
207
208 Note that master.doc is taken as an example for the documentation to be
209 translated and translation.doc is the corresponding translated text.
210 The suffix could be .pod, .xml, or .sgml depending on its format. Each
211 part of the picture will be detailed in the next sections.
212
213 master.doc
214 |
215 V
216 +<-----<----+<-----<-----<--------+------->-------->-------+
217 : | | :
218 {translation} | { update of master.doc } :
219 : | | :
220 XX.doc | V V
221 (optional) | master.doc ->-------->------>+
222 : | (new) |
223 V V | |
224 [po4a-gettextize] doc.XX.po -->+ | |
225 | (old) | | |
226 | ^ V V |
227 | | [po4a-updatepo] |
228 V | | V
229 translation.pot ^ V |
230 | | doc.XX.po |
231 | | (fuzzy) |
232 { translation } | | |
233 | ^ V V
234 | | {manual editing} |
235 | | | |
236 V | V V
237 doc.XX.po --->---->+<---<-- doc.XX.po addendum master.doc
238 (initial) (up-to-date) (optional) (up-to-date)
239 : | | |
240 : V | |
241 +----->----->----->------> + | |
242 | | |
243 V V V
244 +------>-----+------<------+
245 |
246 V
247 [po4a-translate]
248 |
249 V
250 XX.doc
251 (up-to-date)
252
253 On the left part, the conversion of a translation not using po4a to
254 this system is shown. On the top of the right part, the action of the
255 original author is depicted (updating the documentation). The middle
256 of the right part is where the automatic actions of po4a are depicted.
257 The new material are extracted, and compared against the exiting
258 translation. Parts which didn't change are found, and previous
259 translation is used. Parts which were partially modified are also
260 connected to the previous translation, but with a specific marker
261 indicating that the translation must be updated. The bottom of the
262 figure shows how a formatted document is built.
263
264 Actually, as a translator, the only manual operation you have to do is
265 the part marked {manual editing}. Yeah, I'm sorry, but po4a helps you
266 translate. It does not translate anything for you…
267
268 HOWTO begin a new translation?
269 This section presents the needed steps required to begin a new
270 translation with po4a. The refinements involved in converting an
271 existing project to this system are detailed in the relevant section.
272
273 To begin a new translation using po4a, you have to do the following
274 steps:
275
276 - Extract the text which have to be translated from the original
277 <master.doc> document into a new translation template
278 <translation.pot> file (the gettext format). For that, use the
279 po4a-gettextize program this way:
280
281 $ po4a-gettextize -f <format> -m <master.doc> -p <translation.pot>
282
283 <format> is naturally the format used in the master.doc document. As
284 expected, the output goes into translation.pot. Please refer to
285 po4a-gettextize(1) for more details about the existing options.
286
287 - Actually translate what should be translated. For that, you have to
288 rename the POT file for example to doc.XX.po (where XX is the ISO
289 639-1 code of the language you are translating to, e.g. fr for
290 French), and edit the resulting file. It is often a good idea to not
291 name the file XX.po to avoid confusion with the translation of the
292 program messages, but this your call. Don't forget to update the PO
293 file headers, they are important.
294
295 The actual translation can be done using the Emacs' or Vi's PO mode,
296 Lokalize (KDE based), Gtranslator (GNOME based) or whichever program
297 you prefer to use for them (e.g. Virtaal).
298
299 If you wish to learn more about this, you definitively need to refer
300 to the gettext documentation, available in the gettext-doc package.
301
302 HOWTO change the translation back to a documentation file?
303 Once you're done with the translation, you want to get the translated
304 documentation and distribute it to users along with the original one.
305 For that, use the po4a-translate(1) program like that (where XX is the
306 language code):
307
308 $ po4a-translate -f <format> -m <master.doc> -p <doc.XX.po> -l <XX.doc>
309
310 As before, <format> is the format used in the master.doc document. But
311 this time, the PO file provided with the -p flag is part of the input.
312 This is your translation. The output goes into XX.doc.
313
314 Please refer to po4a-translate(1) for more details.
315
316 HOWTO update a po4a translation?
317 To update your translation when the original master.doc file has
318 changed, use the po4a-updatepo(1) program like that:
319
320 $ po4a-updatepo -f <format> -m <new_master.doc> -p <old_doc.XX.po>
321
322 (Please refer to po4a-updatepo(1) for more details)
323
324 Naturally, the new paragraph in the document won't get magically
325 translated in the PO file with this operation, and you'll need to
326 update the PO file manually. Likewise, you may have to rework the
327 translation for paragraphs which were modified a bit. To make sure you
328 won't miss any of them, they are marked as "fuzzy" during the process
329 and you have to remove this marker before the translation can be used
330 by po4a-translate. As for the initial translation, the best is to use
331 your favorite PO editor here.
332
333 Once your PO file is up-to-date again, without any untranslated or
334 fuzzy string left, you can generate a translated documentation file, as
335 explained in the previous section.
336
337 HOWTO convert a pre-existing translation to po4a?
338 Often, you used to translate manually the document happily until a
339 major reorganization of the original master.doc document happened.
340 Then, after some unpleasant tries with diff or similar tools, you want
341 to convert to po4a. But of course, you don't want to loose your
342 existing translation in the process. Don't worry, this case is also
343 handled by po4a tools and is called gettextization.
344
345 The key here is to have the same structure in the translated document
346 and in the original one so that the tools can match the content
347 accordingly.
348
349 If you are lucky (i.e., if the structures of both documents perfectly
350 match), it will work seamlessly and you will be set in a few seconds.
351 Otherwise, you may understand why this process has such an ugly name,
352 and you'd better be prepared to some grunt work here. In any case,
353 remember that it is the price to pay to get the comfort of po4a
354 afterward. And the good point is that you have to do so only once.
355
356 I cannot emphasize this too much. In order to ease the process, it is
357 thus important that you find the exact version which were used to do
358 the translation. The best situation is when you noted down the VCS
359 revision used for the translation and you didn't modify it in the
360 translation process, so that you can use it.
361
362 It won't work well when you use the updated original text with the old
363 translation. It remains possible, but is harder and really should be
364 avoided if possible. In fact, I guess that if you fail to find the
365 original text again, the best solution is to find someone to do the
366 gettextization for you (but, please, not me ;).
367
368 Maybe I'm too dramatic here. Even when things go wrong, it remains ways
369 faster than translating everything again. I was able to gettextize the
370 existing French translation of the Perl documentation in one day, even
371 though things did went wrong. That was more than two megabytes of text,
372 and a new translation would have lasted months or more.
373
374 Let me explain the basis of the procedure first and I will come back on
375 hints to achieve it when the process goes wrong. To ease comprehension,
376 let's use above example once again.
377
378 Once you have the old master.doc again which matches with the
379 translation XX.doc, the gettextization can be done directly to the PO
380 file doc.XX.po without manual translation of translation.pot file:
381
382 $ po4a-gettextize -f <format> -m <old_master.doc> -l <XX.doc> -p <doc.XX.po>
383
384 When you're lucky, that's it. You converted your old translation to
385 po4a and can begin with the updating task right away. Just follow the
386 procedure explained a few section ago to synchronize your PO file with
387 the newest original document, and update the translation accordingly.
388
389 Please note that even when things seem to work properly, there is still
390 room for errors in this process. The point is that po4a is unable to
391 understand the text to make sure that the translation match the
392 original. That's why all strings are marked as "fuzzy" in the process.
393 You should check each of them carefully before removing those markers.
394
395 Often the document structures don't match exactly, preventing
396 po4a-gettextize from doing its job properly. At that point, the whole
397 game is about editing the files to get their damn structures matching.
398
399 It may help to read the section Gettextization: how does it work?
400 below. Understanding the internal process will help you to make this
401 work. The good point is that po4a-gettextize is rather verbose about
402 what went wrong when it happens. First, it pinpoints where in the
403 documents the structures' discrepancies are. You will learn the strings
404 that don't match, their positions in the text, and the type of each of
405 them. Moreover, the PO file generated so far will be dumped to
406 gettextization.failed.po.
407
408 - Remove all extra parts of the translations, such as the section in
409 which you give the translator name and thank every people who
410 contributed to the translation. Addenda, which are described in the
411 next section, will allow you to re-add them afterward.
412
413 - Do not hesitate to edit both the original and the translation. The
414 most important thing is to get the PO file. You will be able to
415 update it afterward. That being said, editing the translation
416 should be preferred when both are possible since it makes things
417 easier when the gettextization is done.
418
419 - If needed, kill some parts of the original if they happen to not be
420 translated. When synchronizing the PO with the document afterward,
421 they will come back from themselves.
422
423 - If you changed the structure a bit (to merge two paragraphs, or
424 split another one), undo those changes. If there are issues in the
425 original, you should inform the original author. Fixing them in
426 your translation only fixes them for a part of the community. And
427 moreover, it's impossible when using po4a ;)
428
429 - Sometimes, the paragraph content does match, but their types don't.
430 Fixing it is rather format-dependent. In POD and man, it often
431 comes from the fact that one of the two contains a line beginning
432 with a white space where the other doesn't. In those formats, such
433 paragraph cannot be wrapped and thus become a different type. Just
434 remove the space and you are fine. It may also be a typo in the tag
435 name.
436
437 Likewise, two paragraphs may get merged together in POD when the
438 separating line contains some spaces, or when there is no empty
439 line between the =item line and the content of the item.
440
441 - Sometimes, there is a desynchronization between the files, and the
442 translation is attached to the wrong original paragraph. It is the
443 sign that the real problem was before in the files. Check
444 gettextization.failed.po to see when the desynchronization begins,
445 and fix it there.
446
447 - Sometimes, you get the strong feeling that po4a ate some parts of
448 the text, either the original or the translation.
449 gettextization.failed.po indicates that both of them were gently
450 matching, and then the gettextization fails because it tried to
451 match one paragraph with the one after (or before) the right one,
452 as if the right one disappeared. Curse po4a as I did when it first
453 happened to me. Generously.
454
455 This unfortunate situation happens when the same paragraph is
456 repeated over the document. In that case, no new entry is created
457 in the PO file, but a new reference is added to the existing one
458 instead.
459
460 So, when the same paragraph appears twice in the original but both
461 are not translated in the exact same way each time, you will get
462 the feeling that a paragraph of the original disappeared. Just kill
463 the new translation. If you prefer to kill the first translation
464 instead when the second one was actually better, replace the first
465 one with the second.
466
467 In the contrary, if two similar but different paragraphs were
468 translated in the exact same way, you will get the feeling that a
469 paragraph of the translation disappeared. A solution is to add a
470 stupid string to the original paragraph (such as "I'm different").
471 Don't be afraid, those things will disappear during the
472 synchronization, and when the added text is short enough, gettext
473 will match your translation to the existing text (marking it as
474 fuzzy, but you don't really care since all strings are fuzzy after
475 gettextization).
476
477 Hopefully, those tips will help you making your gettextization work and
478 obtain your precious PO file. You are now ready to synchronize your
479 file and begin your translation. Please note that on large text, it may
480 happen that the first synchronization takes a long time.
481
482 For example, the first po4a-updatepo of the Perl documentation's French
483 translation (5.5 Mb PO file) took about two days full on a 1Ghz G5
484 computer. Yes, 48 hours. But the subsequent ones only take a dozen of
485 seconds on my old laptop. This is because the first time, most of the
486 msgid of the PO file don't match any of the POT file ones. This forces
487 gettext to search for the closest one using a costly string proximity
488 algorithm.
489
490 HOWTO add extra text to translations (like translator's name)?
491 Because of the gettext approach, doing this becomes more difficult in
492 po4a than it was when simply editing a new file along the original one.
493 But it remains possible, thanks to the so-called addenda.
494
495 It may help the comprehension to consider addenda as a sort of patches
496 applied to the localized document after processing. They are rather
497 different from the usual patches (they have only one line of context,
498 which can embed Perl regular expression, and they can only add new text
499 without removing any), but the functionalities are the same.
500
501 Their goal is to allow the translator to add extra content to the
502 document which is not translated from the original document. The most
503 common usage is to add a section about the translation itself, listing
504 contributors and explaining how to report bug against the translation.
505
506 An addendum must be provided as a separate file. The first line
507 constitutes a header indicating where in the produced document they
508 should be placed. The rest of the addendum file will be added verbatim
509 at the determined position of the resulting document.
510
511 The header line which specify context has a pretty rigid syntax: It
512 must begin with the string PO4A-HEADER:, followed by a semi-colon (;)
513 separated list of key=value fields. White spaces ARE important. Note
514 that you cannot use the semi-colon char (;) in the value, and that
515 quoting it doesn't help. Optionally, spaces ( ) may be inserted before
516 key for readability.
517
518 Although this context search may be considered to operate roughly on
519 each line of the translated document, it actually operates on the
520 internal data string of the translated document. This internal data
521 string may be a text spanning a paragraph containing multiple lines or
522 may be a XML tag itself alone. The exact insertion point of the
523 addendum must be before or after the internal data string and can not
524 be within the internal data string.
525
526 The actual internal data string of the translated document can be
527 visualized by executing po4a in debug mode.
528
529 Again, it sounds scary, but the examples given below should help you to
530 find how to write the header line you need. To illustrate the
531 discussion, assume we want to add a section called "About this
532 translation" after the "About this document" one.
533
534 Here are the possible header keys:
535
536 mode (mandatory)
537 It can be either the string before or after.
538
539 If mode=before, the insertion point is determined by one step regex
540 match specified by the position argument regex. The insertion
541 point is immediately before the uniquely matched internal data
542 string of the translated document.
543
544 If mode=after, the insertion point is determined by two step regex
545 matches specified by the position argument regex; and by the
546 beginboundary or endboundary argument regex.
547
548 Since there may be multiple sections for the assumed case, let's
549 use 2 step approach.
550
551 mode=after
552
553 position (mandatory)
554 A Perl regexp for specifying the context.
555
556 If more than one internal data strings match this expression (or
557 none), the search for the insertion point and addition of the
558 addendum will fail. It is indeed better to report an error than
559 inserting the addendum at the wrong location.
560
561 If mode=before, the insertion point is specified to be immediately
562 before the internal data string uniquely matching the position
563 argument regex.
564
565 If mode=after, the search for the insertion point is narrowed down
566 to the data after the internal data string uniquely matching the
567 position argument regex. The exact insertion point is further
568 specified by the beginboundary or endboundary.
569
570 In our case, we need to skip several preceding sections by
571 narrowing down search using the section title string.
572
573 position=About this document
574
575 (In reality, you need to use the translated section title string
576 here, instead.)
577
578 beginboundary (used only when mode=after, and mandatory in that case)
579 endboundary (idem)
580 A second Perl regexp required only when mode=after. The addendum
581 will be placed immediately before or after the first internal data
582 string matching the beginboundary or endboundary argument regexp,
583 respectively.
584
585 In our case, we can choose to indicate the end of the section we
586 match by adding:
587
588 endboundary=</section>
589
590 or to indicate the beginning of the next section by indicating:
591
592 beginboundary=<section>
593
594 In both cases, our addendum will be placed after the </section> and
595 before the <section>. The first one is better since it will work
596 even if the document gets reorganized.
597
598 Both forms exist because documentation formats are different. In
599 some of them, there is a way to mark the end of a section (just
600 like the </section> we just used), while some other don't
601 explicitly mark the end of section (like in man). In the former
602 case, you want to make a boundary matching the end of a section, so
603 that the insertion point comes after it. In the latter case, you
604 want to make a boundary matching the beginning of the next section,
605 so that the insertion point comes just before it.
606
607 This can seem obscure, but hopefully, the next examples will enlighten
608 you.
609
610 To sum up the example we used so far, in order to add a section called
611 "About this translation" after the "About this document" one in a SGML
612 document, you can use either of those header lines:
613 PO4A-HEADER: mode=after; position=About this document; endboundary=</section>
614 PO4A-HEADER: mode=after; position=About this document; beginboundary=<section>
615
616 If you want to add something after the following nroff section:
617 .SH "AUTHORS"
618
619 You should select two step approach by setting mode=after. Then you
620 should narrow down search to the line after AUTHORS with the position
621 argument regex. Then, you should match the beginning of the next
622 section (i.e., ^\.SH) with the beginboundary argument regex. That is
623 to say:
624
625 PO4A-HEADER:mode=after;position=AUTHORS;beginboundary=\.SH
626
627 If you want to add something into a section (like after "Copyright Big
628 Dude") instead of adding a whole section, give a position matching this
629 line, and give a beginboundary matching any line.
630 PO4A-HEADER:mode=after;position=Copyright Big Dude, 2004;beginboundary=^
631
632 If you want to add something at the end of the document, give a
633 position matching any line of your document (but only one line. Po4a
634 won't proceed if it's not unique), and give an endboundary matching
635 nothing. Don't use simple strings here like "EOF", but prefer those
636 which have less chance to be in your document.
637 PO4A-HEADER:mode=after;position=About this document;beginboundary=FakePo4aBoundary
638
639 In any case, remember that these are regexp. For example, if you want
640 to match the end of a nroff section ending with the line
641
642 .fi
643
644 don't use .fi as endboundary, because it will match with "the[ fi]le",
645 which is obviously not what you expect. The correct endboundary in that
646 case is: ^\.fi$.
647
648 If the addendum doesn't go where you expected, try to pass the -vv
649 argument to the tools, so that they explain you what they do while
650 placing the addendum.
651
652 More detailed example
653
654 Original document (POD formatted):
655
656 |=head1 NAME
657 |
658 |dummy - a dummy program
659 |
660 |=head1 AUTHOR
661 |
662 |me
663
664 Then, the following addendum will ensure that a section (in French)
665 about the translator is added at the end of the file (in French,
666 "TRADUCTEUR" means "TRANSLATOR", and "moi" means "me").
667
668 |PO4A-HEADER:mode=after;position=AUTEUR;beginboundary=^=head
669 |
670 |=head1 TRADUCTEUR
671 |
672 |moi
673 |
674
675 In order to put your addendum before the AUTHOR, use the following
676 header:
677
678 PO4A-HEADER:mode=after;position=NOM;beginboundary=^=head1
679
680 This works because the next line matching the beginboundary /^=head1/
681 after the section "NAME" (translated to "NOM" in French), is the one
682 declaring the authors. So, the addendum will be put between both
683 sections. Note that if another section is added between NAME and AUTHOR
684 sections later, po4a will wrongfully put the addenda before the new
685 section.
686
687 To avoid this you may accomplish the same using mode=before:
688
689 PO4A-HEADER:mode=before;position=^=head1 AUTEUR
690
691 HOWTO do all this in one program invocation?
692 The use of po4a proved to be a bit error prone for the users since you
693 have to call two different programs in the right order (po4a-updatepo
694 and then po4a-translate), each of them needing more than 3 arguments.
695 Moreover, it was difficult with this system to use only one PO file for
696 all your documents when more than one format was used.
697
698 The po4a(1) program was designed to solve those difficulties. Once your
699 project is converted to the system, you write a simple configuration
700 file explaining where your translation files are (PO and POT), where
701 the original documents are, their formats and where their translations
702 should be placed.
703
704 Then, calling po4a(1) on this file ensures that the PO files are
705 synchronized against the original document, and that the translated
706 document are generated properly. Of course, you will want to call this
707 program twice: once before editing the PO files to update them and once
708 afterward to get a completely updated translated document. But you only
709 need to remember one command line.
710
711 HOWTO customize po4a?
712 po4a modules have options (specified with the -o option) that can be
713 used to change the module behavior.
714
715 You can also edit the source code of the existing modules or even write
716 your own modules. To make them visible to po4a, copy your modules into
717 a path called "/bli/blah/blu/lib/Locale/Po4a/" and then adding the path
718 "/bli/blah/blu" in the "PERLIB" or "PERL5LIB" environment variable. For
719 example:
720
721 PERLLIB=$PWD/lib po4a --previous po4a/po4a.cfg
722
723 Note: the actual name of the lib directory is not important.
724
726 This chapter gives you a brief overview of the po4a internals, so that
727 you may feel more confident to help us maintaining and improving it. It
728 may also help you understanding why it does not do what you expected,
729 and how to solve your problems.
730
731 What's the big picture here?
732 The po4a architecture is object oriented (in Perl. Isn't that neat?).
733 The common ancestor to all parser classes is called TransTractor. This
734 strange name comes from the fact that it is at the same time in charge
735 of translating document and extracting strings.
736
737 More formally, it takes a document to translate plus a PO file
738 containing the translations to use as input while producing two
739 separate outputs: Another PO file (resulting of the extraction of
740 translatable strings from the input document), and a translated
741 document (with the same structure than the input one, but with all
742 translatable strings replaced with content of the input PO). Here is a
743 graphical representation of this:
744
745 Input document --\ /---> Output document
746 \ TransTractor:: / (translated)
747 +-->-- parse() --------+
748 / \
749 Input PO --------/ \---> Output PO
750 (extracted)
751
752 This little bone is the core of all the po4a architecture. If you omit
753 the input PO and the output document, you get po4a-gettextize. If you
754 provide both input and disregard the output PO, you get po4a-translate.
755 The po4a calls TransTractor twice and calls msgmerge -U between these
756 TransTractor invocations to provide one-stop solution with a single
757 configuration file.
758
759 TransTractor::parse() is a virtual function implemented by each module.
760 Here is a little example to show you how it works. It parses a list of
761 paragraphs, each of them beginning with <p>.
762
763 1 sub parse {
764 2 PARAGRAPH: while (1) {
765 3 $my ($paragraph,$pararef,$line,$lref)=("","","","");
766 4 $my $first=1;
767 5 while (($line,$lref)=$document->shiftline() && defined($line)) {
768 6 if ($line =~ m/<p>/ && !$first--; ) {
769 7 $document->unshiftline($line,$lref);
770 8
771 9 $paragraph =~ s/^<p>//s;
772 10 $document->pushline("<p>".$document->translate($paragraph,$pararef));
773 11
774 12 next PARAGRAPH;
775 13 } else {
776 14 $paragraph .= $line;
777 15 $pararef = $lref unless(length($pararef));
778 16 }
779 17 }
780 18 return; # Did not got a defined line? End of input file.
781 19 }
782 20 }
783
784 On line 6 and 7, we encounter "shiftline()" and "unshiftline()". These
785 help you to read and unread the head of internal input data stream of
786 master document into the line string and its reference. Here, the
787 reference is provided by a string "$filename:$linenum". Please
788 remember Perl only has one dimensional array data structure. So codes
789 handling the internal input data stream line are a bit cryptic.
790
791 On line 6, we encounter <p> for the second time. That's the signal of
792 the next paragraph. We should thus put the just obtained line back into
793 the original document (line 7) and push the paragraph built so far into
794 the outputs. After removing the leading <p> of it on line 9, we push
795 the concatenation of this tag with the translation of the rest of the
796 paragraph.
797
798 This translate() function is very cool. It pushes its argument into the
799 output PO file (extraction) and returns its translation as found in the
800 input PO file (translation). Since it's used as part of the argument of
801 pushline(), this translation lands into the output document.
802
803 Isn't that cool? It is possible to build a complete po4a module in less
804 than 20 lines when the format is simple enough…
805
806 You can learn more about this in Locale::Po4a::TransTractor(3pm).
807
808 Gettextization: how does it work?
809 The idea here is to take the original document and its translation, and
810 to say that the Nth extracted string from the translation is the
811 translation of the Nth extracted string from the original. In order to
812 work, both files must share exactly the same structure. For example, if
813 the files have the following structure, it is very unlikely that the
814 4th string in translation (of type 'chapter') is the translation of the
815 4th string in original (of type 'paragraph').
816
817 Original Translation
818
819 chapter chapter
820 paragraph paragraph
821 paragraph paragraph
822 paragraph chapter
823 chapter paragraph
824 paragraph paragraph
825
826 For that, po4a parsers are used on both the original and the
827 translation files to extract PO files, and then a third PO file is
828 built from them taking strings from the second as translation of
829 strings from the first. In order to check that the strings we put
830 together are actually the translations of each other, document parsers
831 in po4a should put information about the syntactical type of extracted
832 strings in the document (all existing ones do so, yours should also).
833 Then, this information is used to make sure that both documents have
834 the same syntax. In the previous example, it would allow us to detect
835 that string 4 is a paragraph in one case, and a chapter title in
836 another case and to report the problem.
837
838 In theory, it would be possible to detect the problem, and
839 resynchronize the files afterward (just like diff does). But what we
840 should do of the few strings before desynchronizations is not clear,
841 and it would produce bad results some times. That's why the current
842 implementation don't try to resynchronize anything and verbosely fail
843 when something goes wrong, requiring manual modification of files to
844 fix the problem.
845
846 Even with these precautions, things can go wrong very easily here.
847 That's why all translations guessed this way are marked fuzzy to make
848 sure that the translator reviews and checks them.
849
850 Addendum: How does it work?
851 Well, that's pretty easy here. The translated document is not written
852 directly to disk, but kept in memory until all the addenda are applied.
853 The algorithms involved here are rather straightforward. We look for a
854 line matching the position regexp, and insert the addendum before it if
855 we're in mode=before. If not, we search for the next line matching the
856 boundary and insert the addendum after this line if it's an endboundary
857 or before this line if it's a beginboundary.
858
860 This chapter groups the Frequently Asked Questions. In fact, most of
861 the questions for now could be formulated that way: "Why is it designed
862 this way, and not that one?" If you think po4a isn't the right answer
863 to documentation translation, you should consider reading this section.
864 If it does not answer your question, please contact us on the
865 <devel@lists.po4a.org> mailing list. We love feedback.
866
867 Why to translate each paragraph separately?
868 Yes, in po4a, each paragraph is translated separately (in fact, each
869 module decides this, but all existing modules do so, and yours should
870 also). There are two main advantages to this approach:
871
872 · When the technical parts of the document are hidden from the scene,
873 the translator can't mess with them. The fewer markers we present to
874 the translator the less error he can do.
875
876 · Cutting the document helps in isolating the changes to the original
877 document. When the original is modified, finding what parts of the
878 translation need to be updated is eased by this process.
879
880 Even with these advantages, some people don't like the idea of
881 translating each paragraph separately. Here are some of the answers I
882 can give to their fear:
883
884 · This approach proved successfully in the KDE project and allows
885 people there to produce the biggest corpus of translated and up to
886 date documentation I know.
887
888 · The translators can still use the context to translate, since the
889 strings in the PO file are in the same order than in the original
890 document. Translating sequentially is thus rather comparable whether
891 you use po4a or not. And in any case, the best way to get the
892 context remains to convert the document to a printable format since
893 the text formatting ones are not really readable, IMHO.
894
895 · This approach is the one used by professional translators. I agree,
896 that they have somewhat different goals than open-source translators.
897 The maintenance is for example often less critical to them since the
898 content changes rarely.
899
900 Why not to split on sentence level (or smaller)?
901 Professional translator tools sometimes split the document at the
902 sentence level in order to maximize the reusability of previous
903 translations and speed up their process. The problem is that the same
904 sentence may have several translations, depending on the context.
905
906 Paragraphs are by definition longer than sentences. It will hopefully
907 ensure that having the same paragraph in two documents will have the
908 same meaning (and translation), regardless of the context in each case.
909
910 Splitting on smaller parts than the sentence would be very bad. It
911 would be a bit long to explain why here, but interested reader can
912 refer to the Locale::Maketext::TPJ13(3pm) man page (which comes with
913 the Perl documentation), for example. To make short, each language has
914 its specific syntactic rules, and there is no way to build sentences by
915 aggregating parts of sentences working for all existing languages (or
916 even for the 5 of the 10 most spoken ones, or even less).
917
918 Why not put the original as comment along with translation (or the other
919 way around)?
920 At the first glance, gettext doesn't seem to be adapted to all kind of
921 translations. For example, it didn't seem adapted to debconf, the
922 interface all Debian packages use for their interaction with the user
923 during installation. In that case, the texts to translate were pretty
924 short (a dozen lines for each package), and it was difficult to put the
925 translation in a specialized file since it has to be available before
926 the package installation.
927
928 That's why the debconf developer decided to implement another solution,
929 where translations are placed in the same file than the original. This
930 is rather appealing. One would even want to do this for XML, for
931 example. It would look like that:
932
933 <section>
934 <title lang="en">My title</title>
935 <title lang="fr">Mon titre</title>
936
937 <para>
938 <text lang="en">My text.</text>
939 <text lang="fr">Mon texte.</text>
940 </para>
941 </section>
942
943 But it was so problematic that a PO-based approach is now used. Only
944 the original can be edited in the file, and the translations must take
945 place in PO files extracted from the master template (and placed back
946 at package compilation time). The old system was deprecated because of
947 several issues:
948
949 · maintenance problems
950
951 If several translators provide a patch at the same time, it gets
952 hard to merge them together.
953
954 How will you detect changes to the original, which need to be
955 applied to the translations? In order to use diff, you have to note
956 which version of the original you translated. I.e., you need a PO
957 file in your file ;)
958
959 · encoding problems
960
961 This solution is viable when only European languages are involved,
962 but the introduction of Korean, Russian and/or Arab really
963 complicate the picture. UTF could be a solution, but there are
964 still some problems with it.
965
966 Moreover, such problems are hard to detect (i.e., only Korean
967 readers will detect that the encoding of Korean is broken [because
968 of the Russian translator]).
969
970 gettext solves all those problems together.
971
972 But gettext wasn't designed for that use!
973 That's true, but until now nobody came with a better solution. The only
974 known alternative is manual translation, with all the maintenance
975 issues.
976
977 What about the other translation tools for documentation using gettext?
978 As far as I know, there are only two of them:
979
980 poxml
981 This is the tool developed by KDE people to handle DocBook XML.
982 AFAIK, it was the first program to extract strings to translate
983 from documentation to PO files, and inject them back after
984 translation.
985
986 It can only handle XML, and only a particular DTD. I'm quite
987 unhappy with the handling of lists, which end in one big msgid.
988 When the list become big, the chunk becomes harder to swallow.
989
990 po-debiandoc
991 This program done by Denis Barbier is a sort of precursor of the
992 po4a SGML module, which more or less deprecates it. As the name
993 says, it handles only the DebianDoc DTD, which is more or less a
994 deprecated DTD.
995
996 The main advantages of po4a over them are the ease of extra content
997 addition (which is even worse there) and the ability to achieve
998 gettextization.
999
1000 Educating developers about translation
1001 When you try to translate documentation or programs, you face three
1002 kinds of problems; linguistics (not everybody speaks two languages),
1003 technical (that's why po4a exists) and relational/human. Not all
1004 developers understand the necessity of translating stuff. Even when
1005 good willed, they may ignore how to ease the work of translators. To
1006 help with that, po4a comes with lot of documentation which can be
1007 referred to.
1008
1009 Another important point is that each translated file begins with a
1010 short comment indicating what the file is, how to use it. This should
1011 help the poor developers flooded with tons of files in different
1012 languages they hardly speak, and help them dealing correctly with it.
1013
1014 In the po4a project, translated documents are not source files anymore,
1015 in the sense that these files are not the preferred form of the work
1016 for making modifications to it. Since this is rather unconventional,
1017 that's a source of easy mistakes. That's why all files present this
1018 header:
1019
1020 | *****************************************************
1021 | * GENERATED FILE, DO NOT EDIT *
1022 | * THIS IS NO SOURCE FILE, BUT RESULT OF COMPILATION *
1023 | *****************************************************
1024 |
1025 | This file was generated by po4a-translate(1). Do not store it (in VCS,
1026 | for example), but store the PO file used as source file by po4a-translate.
1027 |
1028 | In fact, consider this as a binary, and the PO file as a regular source file:
1029 | If the PO gets lost, keeping this translation up-to-date will be harder ;)
1030
1031 Likewise, gettext's regular PO files only need to be copied to the po/
1032 directory. But this is not the case of the ones manipulated by po4a.
1033 The major risk here is that a developer erases the existing translation
1034 of his program with the translation of his documentation. (Both of them
1035 can't be stored in the same PO file, because the program needs to
1036 install its translation as an mo file while the documentation only uses
1037 its translation at compile time). That's why the PO files produced by
1038 the po-debiandoc module contain the following header:
1039
1040 #
1041 # ADVISES TO DEVELOPERS:
1042 # - you do not need to manually edit POT or PO files.
1043 # - this file contains the translation of your debconf templates.
1044 # Do not replace the translation of your program with this !!
1045 # (or your translators will get very upset)
1046 #
1047 # ADVISES TO TRANSLATORS:
1048 # If you are not familiar with the PO format, gettext documentation
1049 # is worth reading, especially sections dedicated to this format.
1050 # For example, run:
1051 # info -n '(gettext)PO Files'
1052 # info -n '(gettext)Header Entry'
1053 #
1054 # Some information specific to po-debconf are available at
1055 # /usr/share/doc/po-debconf/README-trans
1056 # or http://www.debian.org/intl/l10n/po-debconf/README-trans
1057 #
1058
1059 SUMMARY of the advantages of the gettext based approach
1060 · The translations are not stored along with the original, which makes
1061 it possible to detect if translations become out of date.
1062
1063 · The translations are stored in separate files from each other, which
1064 prevents translators of different languages from interfering, both
1065 when submitting their patch and at the file encoding level.
1066
1067 · It is based internally on gettext (but po4a offers a very simple
1068 interface so that you don't need to understand the internals to use
1069 it). That way, we don't have to re-implement the wheel, and because
1070 of their wide use, we can think that these tools are more or less bug
1071 free.
1072
1073 · Nothing changed for the end-user (beside the fact translations will
1074 hopefully be better maintained). The resulting documentation file
1075 distributed is exactly the same.
1076
1077 · No need for translators to learn a new file syntax and their favorite
1078 PO file editor (like Emacs' PO mode, Lokalize or Gtranslator) will
1079 work just fine.
1080
1081 · gettext offers a simple way to get statistics about what is done,
1082 what should be reviewed and updated, and what is still to do. Some
1083 example can be found at those addresses:
1084
1085 - https://docs.kde.org/stable5/en/kdesdk/lokalize/project-view.html
1086 - http://www.debian.org/intl/l10n/
1087
1088 But everything isn't green, and this approach also has some
1089 disadvantages we have to deal with.
1090
1091 · Addenda are… strange at the first glance.
1092
1093 · You can't adapt the translated text to your preferences, like
1094 splitting a paragraph here, and joining two other ones there. But in
1095 some sense, if there is an issue with the original, it should be
1096 reported as a bug anyway.
1097
1098 · Even with an easy interface, it remains a new tool people have to
1099 learn.
1100
1101 One of my dreams would be to integrate somehow po4a to Gtranslator or
1102 Lokalize. When a documentation file is opened, the strings are
1103 automatically extracted, and a translated file + po file can be
1104 written to disk. If we manage to do an MS Word (TM) module (or at
1105 least RTF) professional translators may even use it.
1106
1108 Denis Barbier <barbier,linuxfr.org>
1109 Martin Quinson (mquinson#debian.org)
1110
1111
1112
1113Po4a Tools 2020-01-30 PO4A(7)