1PO4A(7)                           Po4a Tools                           PO4A(7)
2
3
4

NAME

6       po4a - framework to translate documentation and other materials
7

Introduction

9       The po4a (PO for anything) project goal is to ease translations (and
10       more interestingly, the maintenance of translations) using gettext
11       tools on areas where they were not expected like documentation.
12

Table of content

14       This document is organized as follow:
15
16       1 Why should I use po4a? What is it good for?
17           This introducing chapter explains the motivation of the project and
18           its philosophy. You should read it first if you are in the process
19           of evaluating po4a for your own translations.
20
21       2 How to use po4a?
22           This chapter is a sort of reference manual, trying to answer the
23           users' questions and to give you a better understanding of the
24           whole process. This introduces how to do things with po4a and serve
25           as an introduction to the documentation of the specific tools.
26
27           HOWTO begin a new translation?
28           HOWTO change the translation back to a documentation file?
29           HOWTO update a po4a translation?
30           HOWTO convert a pre-existing translation to po4a?
31           HOWTO add extra text to translations (like translator's name)?
32           HOWTO do all this in one program invocation?
33           HOWTO customize po4a?
34       3 How does it work?
35           This chapter gives you a brief overview of the po4a internals, so
36           that you may feel more confident to help us maintaining and
37           improving it. It may also help you understanding why it does not do
38           what you expected, and how to solve your problems.
39
40       4 FAQ
41           This chapter groups the Frequently Asked Questions. In fact, most
42           of the questions for now could be formulated that way: "Why is it
43           designed this way, and not that one?" If you think po4a isn't the
44           right answer to documentation translation, you should consider
45           reading this section. If it does not answer your question, please
46           contact us on the <po4a-devel@lists.alioth.debian.org> mailing
47           list. We love feedback.
48
49       5 Specific notes about modules
50           This chapter presents the specificities of each module from the
51           translator and original author's point of view. Read this to learn
52           the syntax you will encounter when translating stuff in this
53           module, or the rules you should follow in your original document to
54           make translators' life easier.
55
56           Actually, this section is not really part of this document.
57           Instead, it is placed in each module's documentation. This helps
58           ensuring that the information is up to date by keeping the
59           documentation and the code together.
60

Why should I use po4a? What it is good for?

62       I like the idea of open-source software, making it possible for
63       everybody to access software and its source code. But being French, I'm
64       well aware that the licensing is not the only restriction to the
65       openness of software: non-translated free software is useless for non-
66       English speakers, and we still have some work to make it available to
67       really everybody out there.
68
69       The perception of this situation by the open-source actors did
70       dramatically improve recently. We, as translators, won the first battle
71       and convinced everybody of the translations' importance. But
72       unfortunately, it was the easy part. Now, we have to do the job and
73       actually translate all this stuff.
74
75       Actually, open-source software themselves benefit of a rather decent
76       level of translation, thanks to the wonderful gettext tool suite. It is
77       able to extract the strings to translate from the program, present a
78       uniform format to translators, and then use the result of their works
79       at run time to display translated messages to the user.
80
81       But the situation is rather different when it comes to documentation.
82       Too often, the translated documentation is not visible enough (not
83       distributed as a part of the program), only partial, or not up to date.
84       This last situation is by far the worst possible one. Outdated
85       translation can reveal worse than no translation at all to the users by
86       describing old program behavior which are not in use anymore.
87
88   The problem to solve
89       Translating documentation is not very difficult in itself. Texts are
90       far longer than the messages of the program and thus take longer to be
91       achieved, but no technical skill is really needed to do so. The
92       difficult part comes when you have to maintain your work. Detecting
93       which parts did change and need to be updated is very difficult, error-
94       prone and highly unpleasant. I guess that this explains why so much
95       translated documentation out there are outdated.
96
97   The po4a answers
98       So, the whole point of po4a is to make the documentation translation
99       maintainable. The idea is to reuse the gettext methodology to this new
100       field. Like in gettext, texts are extracted from their original
101       locations in order to be presented in a uniform format to the
102       translators. The classical gettext tools help them updating their works
103       when a new release of the original comes out. But to the difference of
104       the classical gettext model, the translations are then re-injected in
105       the structure of the original document so that they can be processed
106       and distributed just like the English version.
107
108       Thanks to this, discovering which parts of the document were changed
109       and need an update becomes very easy. Another good point is that the
110       tools will make almost all the work when the structure of the original
111       document gets fundamentally reorganized and when some chapters are
112       moved around, merged or split. By extracting the text to translate from
113       the document structure, it also keeps you away from the text formatting
114       complexity and reduces your chances to get a broken document (even if
115       it does not completely prevent you to do so).
116
117       Please also see the FAQ below in this document for a more complete list
118       of the advantages and disadvantages of this approach.
119
120   Supported formats
121       Currently, this approach has been successfully implemented to several
122       kinds of text formatting formats:
123
124       man
125
126       The good old manual pages' format, used by so much programs out there.
127       The po4a support is very welcome here since this format is somewhat
128       difficult to use and not really friendly to the newbies.  The
129       Locale::Po4a::Man(3pm) module also supports the mdoc format, used by
130       the BSD man pages (they are also quite common on Linux).
131
132       pod
133
134       This is the Perl Online Documentation format. The language and
135       extensions themselves are documented that way, as well as most of the
136       existing Perl scripts. It makes easy to keep the documentation close to
137       the actual code by embedding them both in the same file. It makes
138       programmer life easier, but unfortunately, not the translator one.
139
140       sgml
141
142       Even if somewhat superseded by XML nowadays, this format is still used
143       rather often for documents which are more than a few screens long. It
144       allows you to make complete books. Updating the translation of so long
145       documents can reveal to be a real nightmare. diff reveals often useless
146       when the original text was re-indented after update. Fortunately, po4a
147       can help you in that process.
148
149       Currently, only the DebianDoc and DocBook DTD are supported, but adding
150       support to a new one is really easy. It is even possible to use po4a on
151       an unknown SGML DTD without changing the code by providing the needed
152       information on the command line. See Locale::Po4a::Sgml(3pm) for
153       details.
154
155       TeX / LaTeX
156
157       The LaTeX format is a major documentation format used in the Free
158       Software world and for publications.  The Locale::Po4a::LaTeX(3pm)
159       module was tested with the Python documentation, a book and some
160       presentations.
161
162       texinfo
163
164       All the GNU documentation is written in this format (that's even one of
165       the requirement to become an official GNU project).  The support for
166       Locale::Po4a::Texinfo(3pm) in po4a is still at the beginning.  Please
167       report bugs and feature requests.
168
169       xml
170
171       The XML format is a base format for many documentation formats.
172
173       Currently, the DocBook DTD is supported by po4a. See
174       Locale::Po4a::Docbook(3pm) for details.
175
176       others
177
178       Po4a can also handle some more rare or specialized formats, such as the
179       documentation of compilation options for the 2.4.x kernels or the
180       diagrams produced by the dia tool. Adding a new one is often very easy
181       and the main task is to come up with a parser of your target format.
182       See Locale::Po4a::TransTractor(3pm) for more information about this.
183
184   Unsupported formats
185       Unfortunately, po4a still lacks support for several documentation
186       formats.
187
188       There is a whole bunch of other formats we would like to support in
189       po4a, and not only documentation ones. Indeed, we aim at plugging all
190       "market holes" left by the classical gettext tools.  It encompass
191       package descriptions (deb and rpm), package installation scripts
192       questions, package changelogs, and all specialized file formats used by
193       the programs such as game scenarios or wine resource files.
194

How to use po4a?

196       This chapter is a sort of reference manual, trying to answer the users'
197       questions and to give you a better understanding of the whole process.
198       This introduces how to do things with po4a and serve as an introduction
199       to the documentation of the specific tools.
200
201   Graphical overview
202       The following schema gives an overview of the process of translating
203       documentation using po4a. Do not be afraid by its apparent complexity,
204       it comes from the fact that the whole process is represented here. Once
205       you converted your project to po4a, only the right part of the graphic
206       is relevant.
207
208       Note that master.doc is taken as an example for the documentation to be
209       translated and translation.doc is the corresponding translated text.
210       The suffix could be .pod, .xml, or .sgml depending on its format. Each
211       part of the picture will be detailed in the next sections.
212
213                                          master.doc
214                                              |
215                                              V
216            +<-----<----+<-----<-----<--------+------->-------->-------+
217            :           |                     |                        :
218       {translation}    |         { update of master.doc }             :
219            :           |                     |                        :
220          XX.doc        |                     V                        V
221       (optional)       |                 master.doc ->-------->------>+
222            :           |                   (new)                      |
223            V           V                     |                        |
224         [po4a-gettextize]   doc.XX.po--->+   |                        |
225                 |            (old)       |   |                        |
226                 |              ^         V   V                        |
227                 |              |     [po4a-updatepo]                  |
228                 V              |           |                          V
229          translation.pot       ^           V                          |
230                 |              |         doc.XX.po                    |
231                 |              |         (fuzzy)                      |
232          { translation }       |           |                          |
233                 |              ^           V                          V
234                 |              |     {manual editing}                 |
235                 |              |           |                          |
236                 V              |           V                          V
237             doc.XX.po --->---->+<---<---- doc.XX.po   addendum     master.doc
238             (initial)                   (up-to-date) (optional)   (up-to-date)
239                 :                          |            |             |
240                 :                          V            |             |
241                 +----->----->----->------> +            |             |
242                                            |            |             |
243                                            V            V             V
244                                            +------>-----+------<------+
245                                                         |
246                                                         V
247                                                  [po4a-translate]
248                                                         |
249                                                         V
250                                                       XX.doc
251                                                   (up-to-date)
252
253       On the left part, the conversion of a translation not using po4a to
254       this system is shown. On the top of the right part, the action of the
255       original author is depicted (updating the documentation).  The middle
256       of the right part is where the automatic actions of po4a are depicted.
257       The new material are extracted, and compared against the exiting
258       translation. Parts which didn't change are found, and previous
259       translation is used. Parts which where partially modified are also
260       connected to the previous translation, but with a specific marker
261       indicating that the translation must be updated. The bottom of the
262       figure shows how a formatted document is built.
263
264       Actually, as a translator, the only manual operation you have to do is
265       the part marked {manual editing}. Yeah, I'm sorry, but po4a helps you
266       translate.  It does not translate anything for you...
267
268   HOWTO begin a new translation?
269       This section presents the needed steps required to begin a new
270       translation with po4a. The refinements involved in converting an
271       existing project to this system are detailed in the relevant section.
272
273       To begin a new translation using po4a, you have to do the following
274       steps:
275
276       - Extract the text which have to be translated from the original
277         <master.doc> document into a new translation template
278         <translation.pot> file (the gettext format). For that, use the
279         po4a-gettextize program this way:
280
281           $ po4a-gettextize -f <format> -m <master.doc> -p <translation.pot>
282
283         <format> is naturally the format used in the master.doc document. As
284         expected, the output goes into translation.pot.  Please refer to
285         po4a-gettextize(1) for more details about the existing options.
286
287       - Actually translate what should be translated. For that, you have to
288         rename the POT file for example to doc.XX.po (where XX is the ISO639
289         code of the language you are translating to, e.g. fr for French), and
290         edit the resulting file. It is often a good idea to not name the file
291         XX.po to avoid confusion with the translation of the program
292         messages, but this your call.  Don't forget to update the PO file
293         headers, they are important.
294
295         The actual translation can be done using the Emacs' or Vi's PO mode,
296         Lokalize (KDE based), Gtranslator (GNOME based) or whichever program
297         you prefer to use for them (e.g. Virtaal).
298
299         If you wish to learn more about this, you definitively need to refer
300         to the gettext documentation, available in the gettext-doc package.
301
302   HOWTO change the translation back to a documentation file?
303       Once you're done with the translation, you want to get the translated
304       documentation and distribute it to users along with the original one.
305       For that, use the po4a-translate(1) program like that (where XX is the
306       language code):
307
308         $ po4a-translate -f <format> -m <master.doc> -p <doc.XX.po> -l <XX.doc>
309
310       As before, <format> is the format used in the master.doc document. But
311       this time, the PO file provided with the -p flag is part of the input.
312       This is your translation. The output goes into XX.doc.
313
314       Please refer to po4a-translate(1) for more details.
315
316   HOWTO update a po4a translation?
317       To update your translation when the original master.doc file has
318       changed, use the po4a-updatepo(1) program like that:
319
320         $ po4a-updatepo -f <format> -m <new_master.doc> -p <old_doc.XX.po>
321
322       (Please refer to po4a-updatepo(1) for more details)
323
324       Naturally, the new paragraph in the document won't get magically
325       translated in the PO file with this operation, and you'll need to
326       update the PO file manually. Likewise, you may have to rework the
327       translation for paragraphs which were modified a bit. To make sure you
328       won't miss any of them, they are marked as "fuzzy" during the process
329       and you have to remove this marker before the translation can be used
330       by po4a-translate.  As for the initial translation, the best is to use
331       your favorite PO editor here.
332
333       Once your PO file is up-to-date again, without any untranslated or
334       fuzzy string left, you can generate a translated documentation file, as
335       explained in the previous section.
336
337   HOWTO convert a pre-existing translation to po4a?
338       Often, you used to translate manually the document happily until a
339       major reorganization of the original master.doc document happened.
340       Then, after some unpleasant tries with diff or similar tools, you want
341       to convert to po4a.  But of course, you don't want to loose your
342       existing translation in the process. Don't worry, this case is also
343       handled by po4a tools and is called gettextization.
344
345       The key here is to have the same structure in the translated document
346       and in the original one so that the tools can match the content
347       accordingly.
348
349       If you are lucky (i.e., if the structures of both documents perfectly
350       match), it will work seamlessly and you will be set in a few seconds.
351       Otherwise, you may understand why this process has such an ugly name,
352       and you'd better be prepared to some grunt work here. In any case,
353       remember that it is the price to pay to get the comfort of po4a
354       afterward. And the good point is that you have to do so only once.
355
356       I cannot emphasis this too much. In order to ease the process, it is
357       thus important that you find the exact version which were used to do
358       the translation. The best situation is when you noted down the VCS
359       revision used for the translation and you didn't modify it in the
360       translation process, so that you can use it.
361
362       It won't work well when you use the updated original text with the old
363       translation. It remains possible, but is harder and really should be
364       avoided if possible. In fact, I guess that if you fail to find the
365       original text again, the best solution is to find someone to do the
366       gettextization for you (but, please, not me ;).
367
368       Maybe I'm too dramatic here. Even when things go wrong, it remains ways
369       faster than translating everything again. I was able to gettextize the
370       existing French translation of the Perl documentation in one day, even
371       though things did went wrong. That was more than two megabytes of text,
372       and a new translation would have lasted months or more.
373
374       Let me explain the basis of the procedure first and I will come back on
375       hints to achieve it when the process goes wrong. To ease comprehension,
376       let's use above example once again.
377
378       Once you have the old master.doc again which matches with the
379       translation XX.doc, the gettextization can be done directly to the PO
380       file doc.XX.po without manual translation of translation.pot file:
381
382        $ po4a-gettextize -f <format> -m <old_master.doc> -l <XX.doc> -p <doc.XX.po>
383
384       When you're lucky, that's it. You converted your old translation to
385       po4a and can begin with the updating task right away. Just follow the
386       procedure explained a few section ago to synchronize your PO file with
387       the newest original document, and update the translation accordingly.
388
389       Please note that even when things seem to work properly, there is still
390       room for errors in this process. The point is that po4a is unable to
391       understand the text to make sure that the translation match the
392       original. That's why all strings are marked as "fuzzy" in the process.
393       You should check each of them carefully before removing those markers.
394
395       Often the document structures don't match exactly, preventing
396       po4a-gettextize from doing its job properly. At that point, the whole
397       game is about editing the files to get their damn structures matching.
398
399       It may help to read the section Gettextization: how does it work?
400       below.  Understanding the internal process will help you to make this
401       work. The good point is that po4a-gettextize is rather verbose about
402       what went wrong when it happens. First, it pinpoints where in the
403       documents the structures' discrepancies are. You will learn the strings
404       that don't match, their positions in the text, and the type of each of
405       them. Moreover, the PO file generated so far will be dumped to
406       gettextization.failed.po.
407
408       -   Remove all extra parts of the translations, such as the section in
409           which you give the translator name and thank every people who
410           contributed to the translation. Addenda, which are described in the
411           next section, will allow you to re-add them afterward.
412
413       -   Do not hesitate to edit both the original and the translation. The
414           most important thing is to get the PO file. You will be able to
415           update it afterward. That being said, editing the translation
416           should be preferred when both are possible since it makes things
417           easier when the gettextization is done.
418
419       -   If needed, kill some parts of the original if they happen to not be
420           translated. When synchronizing the PO with the document afterward,
421           they will come back from themselves.
422
423       -   If you changed the structure a bit (to merge two paragraphs, or
424           split another one), undo those changes. If there are issues in the
425           original, you should inform the original author. Fixing them in
426           your translation only fixes them for a part of the community. And
427           moreover, it's impossible when using po4a ;)
428
429       -   Sometimes, the paragraph content does match, but their types don't.
430           Fixing it is rather format-dependant. In POD and man, it often
431           comes from the fact that one of the two contains a line beginning
432           with a white space where the other doesn't. In those formats, such
433           paragraph cannot be wrapped and thus become a different type. Just
434           remove the space and you are fine. It may also be a typo in the tag
435           name.
436
437           Likewise, two paragraphs may get merged together in POD when the
438           separating line contains some spaces, or when there is no empty
439           line before the =item line and the content of the item.
440
441       -   Sometimes, there is a desynchronization between the files, and the
442           translation is attached to the wrong original paragraph. It is the
443           sign that the real problem was before in the files. Check
444           gettextization.failed.po to see when the desynchronization begins,
445           and fix it there.
446
447       -   Sometimes, you get the strong feeling that po4a ate some parts of
448           the text, either the original or the translation.
449           gettextization.failed.po indicates that both of them where gently
450           matching, and then the gettextization fails because it tried to
451           match one paragraph with the one after (or before) the right one,
452           as if the right one disappeared. Curse po4a as I did when it first
453           happened to me. Generously.
454
455           This unfortunate situation happens when the same paragraph is
456           repeated over the document. In that case, no new entry is created
457           in the PO file, but a new reference is added to the existing one
458           instead.
459
460           So, when the same paragraph appears twice in the original but both
461           are not translated in the exact same way each time, you will get
462           the feeling that a paragraph of the original disappeared. Just kill
463           the new translation. If you prefer to kill the first translation
464           instead when it was actually better, remove the second one from
465           where it is and put it in place of the first one.
466
467           In the contrary, if two similar but different paragraphs were
468           translated in the exact same way, you will get the feeling that a
469           paragraph of the translation disappeared. A solution is to add a
470           stupid string to the original paragraph (such as "I'm different").
471           Don't be afraid, those things will disappear during the
472           synchronization, and when the added text is short enough, gettext
473           will match your translation to the existing text (marking it as
474           fuzzy, but you don't really care since all strings are fuzzy after
475           gettextization).
476
477       Hopefully, those tips will help you making your gettextization work and
478       obtain your precious PO file. You are now ready to synchronize your
479       file and begin your translation. Please note that on large text, it may
480       happen that the first synchronization takes a long time.
481
482       For example, the first po4a-updatepo of the Perl documentation's French
483       translation (5.5 Mb PO file) took about two days full on a 1Ghz G5
484       computer.  Yes, 48 hours. But the subsequent ones only take a dozen of
485       seconds on my old laptop. This is because the first time, most of the
486       msgid of the PO file don't match any of the POT file ones. This forces
487       gettext to search for the closest one using a costly string proximity
488       algorithm.
489
490   HOWTO add extra text to translations (like translator's name)?
491       Because of the gettext approach, doing this becomes more difficult in
492       po4a than it was when simply editing a new file along the original one.
493       But it remains possible, thanks to the so-called addenda.
494
495       It may help the comprehension to consider addenda as a sort of patches
496       applied to the localized document after processing. They are rather
497       different from the usual patches (they have only one line of context,
498       which can embed Perl regular expression, and they can only add new text
499       without removing any), but the functionalities are the same.
500
501       Their goal is to allow the translator to add extra content to the
502       document which is not translated from the original document. The most
503       common usage is to add a section about the translation itself, listing
504       contributors and explaining how to report bug against the translation.
505
506       An addendum must be provided as a separate file. The first line
507       constitutes a header indicating where in the produced document they
508       should be placed. The rest of the addendum file will be added verbatim
509       at the determined position of the resulting document.
510
511       The header has a pretty rigid syntax: It must begin with the string
512       PO4A-HEADER:, followed by a semi-colon (;) separated list of key=value
513       fields. White spaces ARE important. Note that you cannot use the semi-
514       colon char (;) in the value, and that quoting it doesn't help.
515
516       Again, it sounds scary, but the examples given below should help you to
517       find how to write the header line you need. To illustrate the
518       discussion, assume we want to add a section called "About this
519       translation" after the "About this document" one.
520
521       Here are the possible header keys:
522
523       position (mandatory)
524           a regexp. The addendum will be placed near the line matching this
525           regexp.  Note that we're speaking about the translated document
526           here, not the original. If more than a line match this expression
527           (or none), the addition will fail. It is indeed better to report an
528           error than inserting the addendum at the wrong location.
529
530           This line is called position point in the following. The point
531           where the addendum is added is called insertion point. Those two
532           points are near one from another, but not equal. For example, if
533           you want to insert a new section, it is easier to put the position
534           point on the title of the preceding section and explain po4a where
535           the section ends (remember that position point is given by a regexp
536           which should match a unique line).
537
538           The localization of the insertion point with regard to the position
539           point is controlled by the mode, beginboundary and endboundary
540           fields, as explained below.
541
542           In our case, we would have:
543
544                position=<title>About this document</title>
545
546       mode (mandatory)
547           It can be either the string before or after, specifying the
548           position of the addendum, relative to the position point.
549
550           Since we want the new section to be placed below the one we are
551           matching, we have:
552
553                mode=after
554
555       beginboundary (used only when mode=after, and mandatory in that case)
556       endboundary (idem)
557           regexp matching the end of the section after which the addendum
558           goes.
559
560           When mode=after, the insertion point is after the position point,
561           but not directly after! It is placed at the end of the section
562           beginning at the position point, i.e., after or before the line
563           matched by the ???boundary argument, depending on whether you used
564           beginboundary or endboundary.
565
566           In our case, we can choose to indicate the end of the section we
567           match by adding:
568
569              endboundary=</section>
570
571           or to indicate the beginning of the next section by indicating:
572
573              beginboundary=<section>
574
575           In both case, our addendum will be placed after the </section> and
576           before the <section>. The first one is better since it will work
577           even if the document gets reorganized.
578
579           Both forms exist because documentation formats are different. In
580           some of them, there is a way to mark the end of a section (just
581           like the </section> we just used), while some other don't
582           explicitly mark the end of section (like in man). In the former
583           case, you want to make a boundary matching the end of a section, so
584           that the insertion point comes after it. In the latter case, you
585           want to make a boundary matching the beginning of the next section,
586           so that the insertion point comes just before it.
587
588       This can seem obscure, but hopefully, the next examples will enlighten
589       you.
590
591        To sum up the example we used so far, in order to add a section called
592       "About this translation" after the "About this document" one in a SGML
593       document, you can use either of those header lines:
594          PO4A-HEADER: mode=after; position=About this document; endboundary=</section>
595          PO4A-HEADER: mode=after; position=About this document; beginboundary=<section>
596
597        If you want to add something after the following nroff section:
598           .SH "AUTHORS"
599
600         you should put a position matching this line, and a beginboundary
601         matching the beginning of the next section (i.e., ^\.SH). The
602         addendum will then be added after the position point and immediately
603         before the first line matching the beginboundary. That is to say:
604
605          PO4A-HEADER:mode=after;position=AUTHORS;beginboundary=\.SH
606
607        If you want to add something into a section (like after "Copyright Big
608       Dude") instead of adding a whole section, give a position matching this
609       line, and give a beginboundary matching any line.
610          PO4A-HEADER:mode=after;position=Copyright Big Dude, 2004;beginboundary=^
611
612       If you want to add something at the end of the document, give a
613       position matching any line of your document (but only one line. Po4a
614       won't proceed if it's not unique), and give an endboundary matching
615       nothing. Don't use simple strings here like "EOF", but prefer those
616       which have less chance to be in your document.
617          PO4A-HEADER:mode=after;position=<title>About</title>;beginboundary=FakePo4aBoundary
618
619       In any case, remember that these are regexp. For example, if you want
620       to match the end of a nroff section ending with the line
621
622         .fi
623
624       don't use .fi as endboundary, because it will match with "the[ fi]le",
625       which is obviously not what you expect. The correct endboundary in that
626       case is: ^\.fi$.
627
628       If the addendum doesn't go where you expected, try to pass the -vv
629       argument to the tools, so that they explain you what they do while
630       placing the addendum.
631
632       More detailed example
633
634       Original document (POD formatted):
635
636        |=head1 NAME
637        |
638        |dummy - a dummy program
639        |
640        |=head1 AUTHOR
641        |
642        |me
643
644       Then, the following addendum will ensure that a section (in French)
645       about the translator is added at the end of the file. (in French,
646       "TRADUCTEUR" means "TRANSLATOR", and "moi" means "me")
647
648        |PO4A-HEADER:mode=after;position=AUTEUR;beginboundary=^=head
649        |
650        |=head1 TRADUCTEUR
651        |
652        |moi
653
654       In order to put your addendum before the AUTHOR, use the following
655       header:
656
657        PO4A-HEADER:mode=after;position=NOM;beginboundary=^=head1
658
659       This works because the next line matching the beginboundary /^=head1/
660       after the section "NAME" (translated to "NOM" in French), is the one
661       declaring the authors. So, the addendum will be put between both
662       sections.
663
664   HOWTO do all this in one program invocation?
665       The use of po4a proved to be a bit error prone for the users since you
666       have to call two different programs in the right order (po4a-updatepo
667       and then po4a-translate), each of them needing more than 3 arguments.
668       Moreover, it was difficult with this system to use only one PO file for
669       all your documents when more than one format was used.
670
671       The po4a(1) program was designed to solve those difficulties. Once your
672       project is converted to the system, you write a simple configuration
673       file explaining where your translation files are (PO and POT), where
674       the original documents are, their formats and where their translations
675       should be placed.
676
677       Then, calling po4a(1) on this file ensures that the PO files are
678       synchronized against the original document, and that the translated
679       document are generated properly. Of course, you will want to call this
680       program twice: once before editing the PO file to update them and once
681       afterward to get a completely updated translated document. But you only
682       need to remember one command line.
683
684   HOWTO customize po4a?
685       po4a modules have options (specified with the -o option) that can be
686       used to change the module behavior.
687
688       It is also possible to customize a module or new / derivative /
689       modified modules by putting a module in lib/Locale/Po4a/, and adding
690       lib to the paths specified by the PERLLIB or PERL5LIB environment. For
691       example:
692
693          PERLLIB=$PWD/lib po4a --previous po4a/po4a.cfg
694
695       Note: the actual name of the lib directory is not important.
696

How does it work?

698       This chapter gives you a brief overview of the po4a internals, so that
699       you may feel more confident to help us maintaining and improving it. It
700       may also help you understanding why it does not do what you expected,
701       and how to solve your problems.
702
703   What's the big picture here?
704       The po4a architecture is object oriented (in Perl. Isn't that neat?).
705       The common ancestor to all parser classes is called TransTractor. This
706       strange name comes from the fact that it is at the same time in charge
707       of translating document and extracting strings.
708
709       More formally, it takes a document to translate plus a PO file
710       containing the translations to use as input while producing two
711       separate outputs: Another PO file (resulting of the extraction of
712       translatable strings from the input document), and a translated
713       document (with the same structure than the input one, but with all
714       translatable strings replaced with content of the input PO). Here is a
715       graphical representation of this:
716
717          Input document --\                             /---> Output document
718                            \      TransTractor::       /       (translated)
719                             +-->--   parse()  --------+
720                            /                           \
721          Input PO --------/                             \---> Output PO
722                                                                (extracted)
723
724       This little bone is the core of all the po4a architecture. If you omit
725       the input PO and the output document, you get po4a-gettextize. If you
726       provide both input and disregard the output PO, you get po4a-translate.
727
728       TransTractor::parse() is a virtual function implemented by each module.
729       Here is a little example to show you how it works. It parses a list of
730       paragraphs, each of them beginning with <p>.
731
732         1 sub parse {
733         2   PARAGRAPH: while (1) {
734         3     $my ($paragraph,$pararef,$line,$lref)=("","","","");
735         4     $my $first=1;
736         5     while (($line,$lref)=$document->shiftline() && defined($line)) {
737         6       if ($line =~ m/<p>/ && !$first--; ) {
738         7         $document->unshiftline($line,$lref);
739         8
740         9         $paragraph =~ s/^<p>//s;
741        10         $document->pushline("<p>".$document->translate($paragraph,$pararef));
742        11
743        12         next PARAGRAPH;
744        13       } else {
745        14         $paragraph .= $line;
746        15         $pararef = $lref unless(length($pararef));
747        16       }
748        17     }
749        18     return; # Did not got a defined line? End of input file.
750        19   }
751        20 }
752
753       On line 6, we encounter <p> for the second time. That's the signal of
754       the next paragraph. We should thus put the just obtained line back into
755       the original document (line 7) and push the paragraph built so far into
756       the outputs. After removing the leading <p> of it on line 9, we push
757       the concatenation of this tag with the translation of the rest of the
758       paragraph.
759
760       This translate() function is very cool. It pushes its argument into the
761       output PO file (extraction) and returns its translation as found in the
762       input PO file (translation). Since it's used as part of the argument of
763       pushline(), this translation lands into the output document.
764
765       Isn't that cool? It is possible to build a complete po4a module in less
766       than 20 lines when the format is simple enough...
767
768       You can learn more about this in Locale::Po4a::TransTractor(3pm).
769
770   Gettextization: how does it work?
771       The idea here is to take the original document and its translation, and
772       to say that the Nth extracted string from the translation is the
773       translation of the Nth extracted string from the original. In order to
774       work, both files must share exactly the same structure. For example, if
775       the files have the following structure, it is very unlikely that the
776       4th string in translation (of type 'chapter') is the translation of the
777       4th string in original (of type 'paragraph').
778
779           Original         Translation
780
781         chapter            chapter
782           paragraph          paragraph
783           paragraph          paragraph
784           paragraph        chapter
785         chapter              paragraph
786           paragraph          paragraph
787
788       For that, po4a parsers are used on both the original and the
789       translation files to extract PO files, and then a third PO file is
790       built from them taking strings from the second as translation of
791       strings from the first. In order to check that the strings we put
792       together are actually the translations of each other, document parsers
793       in po4a should put information about the syntactical type of extracted
794       strings in the document (all existing ones do so, yours should also).
795       Then, this information is used to make sure that both documents have
796       the same syntax. In the previous example, it would allow us to detect
797       that string 4 is a paragraph in one case, and a chapter title in
798       another case and to report the problem.
799
800       In theory, it would be possible to detect the problem, and
801       resynchronize the files afterward (just like diff does). But what we
802       should do of the few strings before desynchronizations is not clear,
803       and it would produce bad results some times. That's why the current
804       implementation don't try to resynchronize anything and verbosely fail
805       when something goes wrong, requiring manual modification of files to
806       fix the problem.
807
808       Even with these precautions, things can go wrong very easily here.
809       That's why all translations guessed this way are marked fuzzy to make
810       sure that the translator reviews and checks them.
811
812   Addendum: How does it work?
813       Well, that's pretty easy here. The translated document is not written
814       directly to disk, but kept in memory until all the addenda are applied.
815       The algorithms involved here are rather straightforward. We look for a
816       line matching the position regexp, and insert the addendum before it if
817       we're in mode=before. If not, we search for the next line matching the
818       boundary and insert the addendum after this line if it's an endboundary
819       or before this line if it's a beginboundary.
820

FAQ

822       This chapter groups the Frequently Asked Questions. In fact, most of
823       the questions for now could be formulated that way: "Why is it designed
824       this way, and not that one?" If you think po4a isn't the right answer
825       to documentation translation, you should consider reading this section.
826       If it does not answer your question, please contact us on the
827       <po4a-devel@lists.alioth.debian.org> mailing list. We love feedback.
828
829   Why to translate each paragraph separately?
830       Yes, in po4a, each paragraph is translated separately (in fact, each
831       module decides this, but all existing modules do so, and yours should
832       also).  There are two main advantages to this approach:
833
834       · When the technical parts of the document are hidden from the scene,
835         the translator can't mess with them. The fewer markers we present to
836         the translator the less error he can do.
837
838       · Cutting the document helps in isolating the changes to the original
839         document. When the original is modified, finding what parts of the
840         translation need to be updated is eased by this process.
841
842       Even with these advantages, some people don't like the idea of
843       translating each paragraph separately. Here are some of the answers I
844       can give to their fear:
845
846       · This approach proved successfully in the KDE project and allows
847         people there to produce the biggest corpus of translated and up to
848         date documentation I know.
849
850       · The translators can still use the context to translate, since the
851         strings in the PO file are in the same order than in the original
852         document. Translating sequentially is thus rather comparable whether
853         you use po4a or not.  And in any case, the best way to get the
854         context remains to convert the document to a printable format since
855         the text formatting ones are not really readable, IMHO.
856
857       · This approach is the one used by professional translators. I agree,
858         that they have somewhat different goals than open-source translators.
859         The maintenance is for example often less critical to them since the
860         content changes rarely.
861
862   Why not to split on sentence level (or smaller)?
863       Professional translator tools sometimes split the document at the
864       sentence level in order to maximize the reusability of previous
865       translations and speed up their process.  The problem is that the same
866       sentence may have several translations, depending on the context.
867
868       Paragraphs are by definition longer than sentences. It will hopefully
869       ensure that having the same paragraph in two documents will have the
870       same meaning (and translation), regardless of the context in each case.
871
872       Splitting on smaller parts than the sentence would be very bad. It
873       would be a bit long to explain why here, but interested reader can
874       refer to the Locale::Maketext::TPJ13(3pm) man page (which comes with
875       the Perl documentation), for example. To make short, each language has
876       its specific syntactic rules, and there is no way to build sentences by
877       aggregating parts of sentences working for all existing languages (or
878       even for the 5 of the 10 most spoken ones, or even less).
879
880   Why not put the original as comment along with translation (or other way
881       around)?
882       At the first glance, gettext doesn't seem to be adapted to all kind of
883       translations.  For example, it didn't seemed adapted to debconf, the
884       interface all Debian packages use for their interaction with the user
885       during installation. In that case, the texts to translate were pretty
886       short (a dozen lines for each package), and it was difficult to put the
887       translation in a specialized file since it has to be available before
888       the package installation.
889
890       That's why the debconf developer decided to implement another solution,
891       where translations are placed in the same file than the original. This
892       is rather appealing. One would even want to do this for XML, for
893       example. It would look like that:
894
895        <section>
896         <title lang="en">My title</title>
897         <title lang="fr">Mon titre</title>
898
899         <para>
900          <text lang="en">My text.</text>
901          <text lang="fr">Mon texte.</text>
902         </para>
903        </section>
904
905       But it was so problematic that a PO-based approach is now used. Only
906       the original can be edited in the file, and the translations must take
907       place in PO files extracted from the master template (and placed back
908       at package compilation time). The old system was deprecated because of
909       several issues:
910
911       ·   maintenance problems
912
913           If several translators provide a patch at the same time, it gets
914           hard to merge them together.
915
916           How will you detect changes to the original, which need to be
917           applied to the translations? In order to use diff, you have to note
918           which version of the original you translated. I.e., you need a PO
919           file in your file ;)
920
921       ·   encoding problems
922
923           This solution is viable when only European languages are involved,
924           but the introduction of Korean, Russian and/or Arab really
925           complicate the picture.  UTF could be a solution, but there are
926           still some problems with it.
927
928           Moreover, such problems are hard to detect (i.e., only Korean
929           readers will detect that the encoding of Korean is broken [because
930           of the Russian translator])
931
932       gettext solves all those problems together.
933
934   But gettext wasn't designed for that use!
935       That's true, but until now nobody came with a better solution. The only
936       known alternative is manual translation, with all the maintenance
937       issues.
938
939   What about the other translation tools for documentation using gettext?
940       As far as I know, there are only two of them:
941
942       poxml
943           This is the tool developed by KDE people to handle DocBook XML.
944           AFAIK, it was the first program to extract strings to translate
945           from documentation to PO files, and inject them back after
946           translation.
947
948           It can only handle XML, and only a particular DTD. I'm quite
949           unhappy with the handling of lists, which end in one big msgid.
950           When the list become big, the chunk becomes harder to shallow.
951
952       po-debiandoc
953           This program done by Denis Barbier is a sort of precursor of the
954           po4a SGML module, which more or less deprecates it. As the name
955           says, it handles only the DebianDoc DTD, which is more or less a
956           deprecated DTD.
957
958       The main advantages of po4a over them are the ease of extra content
959       addition (which is even worse there) and the ability to achieve
960       gettextization.
961
962   Educating developers about translation
963       When you try to translate documentation or programs, you face three
964       kinds of problems; linguistics (not everybody speaks two languages),
965       technical (that's why po4a exists) and relational/human. Not all
966       developers understand the necessity of translating stuff. Even when
967       good willed, they may ignore how to ease the work of translators. To
968       help with that, po4a comes with lot of documentation which can be
969       referred to.
970
971       Another important point is that each translated file begins with a
972       short comment indicating what the file is, how to use it. This should
973       help the poor developers flooded with tons of files in different
974       languages they hardly speak, and help them dealing correctly with it.
975
976       In the po4a project, translated documents are not source files anymore.
977       Since SGML files are habitually source files, it's an easy mistake.
978       That's why all files present this header:
979
980        |       *****************************************************
981        |       *           GENERATED FILE, DO NOT EDIT             *
982        |       * THIS IS NO SOURCE FILE, BUT RESULT OF COMPILATION *
983        |       *****************************************************
984        |
985        | This file was generated by po4a-translate(1). Do not store it (in VCS,
986        | for example), but store the PO file used as source file by po4a-translate.
987        |
988        | In fact, consider this as a binary, and the PO file as a regular source file:
989        | If the PO gets lost, keeping this translation up-to-date will be harder ;)
990
991       Likewise, gettext's regular PO files only need to be copied to the po/
992       directory. But this is not the case of the ones manipulated by po4a.
993       The major risk here is that a developer erases the existing translation
994       of his program with the translation of his documentation. (Both of them
995       can't be stored in the same PO file, because the program needs to
996       install its translation as an mo file while the documentation only uses
997       its translation at compile time). That's why the PO files produced by
998       the po-debiandoc module contain the following header:
999
1000        #
1001        #  ADVISES TO DEVELOPERS:
1002        #    - you do not need to manually edit POT or PO files.
1003        #    - this file contains the translation of your debconf templates.
1004        #      Do not replace the translation of your program with this !!
1005        #        (or your translators will get very upset)
1006        #
1007        #  ADVISES TO TRANSLATORS:
1008        #    If you are not familiar with the PO format, gettext documentation
1009        #     is worth reading, especially sections dedicated to this format.
1010        #    For example, run:
1011        #         info -n '(gettext)PO Files'
1012        #         info -n '(gettext)Header Entry'
1013        #
1014        #    Some information specific to po-debconf are available at
1015        #            /usr/share/doc/po-debconf/README-trans
1016        #         or http://www.debian.org/intl/l10n/po-debconf/README-trans
1017        #
1018
1019   SUMMARY of the advantages of the gettext based approach
1020       · The translations are not stored along with the original, which makes
1021         it possible to detect if translations become out of date.
1022
1023       · The translations are stored in separate files from each other, which
1024         prevents translators of different languages from interfering, both
1025         when submitting their patch and at the file encoding level.
1026
1027       · It is based internally on gettext (but po4a offers a very simple
1028         interface so that you don't need to understand the internals to use
1029         it).  That way, we don't have to re-implement the wheel, and because
1030         of their wide use, we can think that these tools are more or less bug
1031         free.
1032
1033       · Nothing changed for the end-user (beside the fact translations will
1034         hopefully be better maintained). The resulting documentation file
1035         distributed is exactly the same.
1036
1037       · No need for translators to learn a new file syntax and their favorite
1038         PO file editor (like Emacs' PO mode, Lokalize or Gtranslator) will
1039         work just fine.
1040
1041       · gettext offers a simple way to get statistics about what is done,
1042         what should be reviewed and updated, and what is still to do. Some
1043         example can be found at those addresses:
1044
1045          - http://kv-53.narod.ru/kaider1.png
1046          - http://www.debian.org/intl/l10n/
1047
1048       But everything isn't green, and this approach also has some
1049       disadvantages we have to deal with.
1050
1051       · Addenda are... strange at the first glance.
1052
1053       · You can't adapt the translated text to your preferences, like
1054         splitting a paragraph here, and joining two other ones there. But in
1055         some sense, if there is an issue with the original, it should be
1056         reported as a bug anyway.
1057
1058       · Even with an easy interface, it remains a new tool people have to
1059         learn.
1060
1061         One of my dreams would be to integrate somehow po4a to Gtranslator or
1062         Lokalize. When an SGML file is opened, the strings are automatically
1063         extracted.  When it's saved a translated SGML file can be written to
1064         disk. If we manage to do an MS Word (TM) module (or at least RTF)
1065         professional translators may even use it.
1066

AUTHORS

1068        Denis Barbier <barbier,linuxfr.org>
1069        Martin Quinson (mquinson#debian.org)
1070
1071
1072
1073Po4a Tools                        2014-06-10                           PO4A(7)
Impressum