1PO4A.7(7)                         Po4a Tools                         PO4A.7(7)
2
3
4

NAME

6       po4a - framework to translate documentation and other materials
7

Introduction

9       The po4a (po for anything) project goal is to ease translations (and
10       more interestingly, the maintenance of translations) using gettext
11       tools on areas where they were not expected like documentation.
12

Table of content

14       This document is organized as follow:
15
16       1 Why should I use po4a? What is it good for?
17           This introducing chapter explains the motivation of the project and
18           its philosophy. You should read it first if you are in the process
19           of evaluating po4a for your own translations.
20
21       2 How to use po4a?
22           This chapter is a sort of reference manual, trying to answer the
23           users' questions and to give you a better understanding of the
24           whole process. This introduces how to do things with po4a and serve
25           as an introduction to the documentation of the specific tools.
26
27           HOWTO begin a new translation?
28           HOWTO change the translation back to a documentation file?
29           HOWTO update a po4a translation?
30           HOWTO convert a pre-existing translation to po4a?
31           HOWTO add extra text to translations (like translator's name)?
32           HOWTO do all this in one program invocation?
33       3 How does it work?
34           This chapter gives you a brief overview of the po4a internals, so
35           that you may feel more confident to help us maintaining and improv‐
36           ing it. It may also help you understanding why it does not do what
37           you expected, and how to solve your problems.
38
39       4 FAQ
40           This chapter groups the Frequently Asked Questions. In fact, most
41           of the questions for now could be formulated that way: "Why is it
42           designed this way, and not that one?" If you think po4a isn't the
43           right answer to documentation translation, you should consider
44           reading this section. If it does not answer your question, please
45           contact us on the <po4a-devel@lists.alioth.debian.org> mailing
46           list. We love feedback.
47
48       5 Specific notes about modules
49           This chapter presents the specificities of each module from the
50           translator and original author's point of view. Read this to learn
51           the syntax you will encounter when translating stuff in this mod‐
52           ule, or the rules you should follow in your original document to
53           make translators' life easier.
54
55           Actually, this section is not really part of this document.
56           Instead, it is placed in each module's documentation. This helps
57           ensuring that the information is up to date by keeping the documen‐
58           tation and the code together.
59
60       6 Known bugs and feature requests
61           Quite a few already :(
62

Why should I use po4a? What it is good for?

64       I like the idea of open-source software, making it possible for every‐
65       body to access to software and to their source code. But being French,
66       I'm well aware that the licensing is not the only restriction to the
67       openness of software: non-translated free software is useless for non-
68       English speakers, and we still have some work to make it available to
69       really everybody out there.
70
71       The perception of this situation by the open-source actors did dramati‐
72       cally improve recently. We, as translators, won the first battle and
73       convinced everybody of the translations' importance. But unfortunately,
74       it was the easy part. Now, we have to do the job and actually translate
75       all this stuff.
76
77       Actually, open-source software themselves benefit of a rather decent
78       level of translation, thanks to the wonderful gettext tool suite. It is
79       able to extract the strings to translate from the program, present a
80       uniform format to translators, and then use the result of their works
81       at run time to display translated messages to the user.
82
83       But the situation is rather different when it comes to documentation.
84       Too often, the translated documentation is not visible enough (not dis‐
85       tributed as a part of the program), only partial, or not up to date.
86       This last situation is by far the worst possible one. Outdated transla‐
87       tion can reveal worse than no translation at all to the users by
88       describing old program behavior which are not in use anymore.
89
90       The problem to solve
91
92       Translating documentation is not very difficult in itself. Texts are
93       far longer than the messages of the program and thus take longer to be
94       achieved, but no technical skill is really needed to do so. The diffi‐
95       cult part comes when you have to maintain your work. Detecting which
96       parts did change and need to be updated is very difficult, error-prone
97       and highly unpleasant. I guess that this explains why so much trans‐
98       lated documentation out there are outdated.
99
100       The po4a answers
101
102       So, the whole point of po4a is to make the documentation translation
103       maintainable. The idea is to reuse the gettext methodology to this new
104       field. Like in gettext, texts are extracted from their original loca‐
105       tions in order to be presented in a uniform format to the translators.
106       The classical gettext tools help them updating their works when a new
107       release of the original comes out. But to the difference of the classi‐
108       cal gettext model, the translations are then re-injected in the struc‐
109       ture of the original document so that they can be processed and dis‐
110       tributed just like the English version.
111
112       Thanks to this, discovering which parts of the document were changed
113       and need an update becomes very easy. Another good point is that the
114       tools will make almost all the work when the structure of the original
115       document gets fundamentally reorganized and when some chapters are
116       moved around, merged or split. By extracting the text to translate from
117       the document structure, it also keeps you away from the text formatting
118       complexity and reduces your chances to get a broken document (even if
119       it does not completely prevent you to do so).
120
121       Please also see the FAQ below in this document for a more complete list
122       of the advantages and disadvantages of this approach.
123
124       Supported formats
125
126       Currently, this approach has been successfully implemented to several
127       kinds of text formatting formats:
128
129       nroff
130
131       The good old manual pages' format, used by so much programs out there.
132       The po4a support is very welcome here since this format is somewhat
133       difficult to use and not really friendly to the newbies.  The
134       Locale::Po4a::Man(3pm) module also supports the mdoc format, used by
135       the BSD man pages (they are also quite common on Linux).
136
137       pod
138
139       This is the Perl Online Documentation format. The language and exten‐
140       sions themselves are documented that way, as well as most of the exist‐
141       ing Perl scripts. It makes easy to keep the documentation close to the
142       actual code by embedding them both in the same file. It makes program‐
143       mer life easier, but unfortunately, not the translator one.
144
145       sgml
146
147       Even if somewhat superseded by XML nowadays, this format is still used
148       rather often for documents which are more than a few screens long. It
149       allows you to make complete books. Updating the translation of so long
150       documents can reveal to be a real nightmare. diff reveals often useless
151       when the original text was re-indented after update. Fortunately, po4a
152       can help you in that process.
153
154       Currently, only the debiandoc and docbook DTD are supported, but adding
155       support to a new one is really easy. It is even possible to use po4a on
156       an unknown sgml dtd without changing the code by providing the needed
157       information on the command line. See Locale::Po4a::Sgml(3pm) for
158       details.
159
160       TeX / LaTeX
161
162       The LaTeX format is a major documentation format used in the Free Soft‐
163       ware world and for publications.  The Locale::Po4a::LaTeX(3pm) module
164       was tested with the Python documentation, a book and some presenta‐
165       tions.
166
167       texinfo
168
169       All the GNU documentation is written in this format (that's even one of
170       the requirement to become an official GNU project).  The support for
171       Locale::Po4a::Texinfo(3pm) in po4a is still at the beginning.  Please
172       report bugs and feature requests.
173
174       xml
175
176       The XML format is a base format for many documentation formats.
177
178       Currently, the docbook DTD is supported by po4a. See Locale::Po4a::Doc‐
179       book(3pm) for details.
180
181       others
182
183       Po4a can also handle some more rare or specialized formats, such as the
184       documentation of compilation options for the 2.4.x kernels or the dia‐
185       grams produced by the dia tool. Adding a new one is often very easy and
186       the main task is to come up with a parser of your target format. See
187       Locale::Po4a::TransTractor(3pm) for more information about this.
188
189       Unsupported formats
190
191       Unfortunately, po4a still lacks support for several documentation for‐
192       mats.
193
194       There is a whole bunch of other formats we would like to support in
195       po4a, and not only documentation ones. Indeed, we aim at plugging all
196       "market holes" left by the classical gettext tools.  It encompass pack‐
197       age descriptions (deb and rpm), package installation scripts questions,
198       package changelogs, and all specialized file formats used by the pro‐
199       grams such as game scenarios or wine resource files.
200

How to use po4a?

202       This chapter is a sort of reference manual, trying to answer the users'
203       questions and to give you a better understanding of the whole process.
204       This introduces how to do things with po4a and serve as an introduction
205       to the documentation of the specific tools.
206
207       Graphical overview
208
209       The following schema gives an overview of the process of translating
210       documentation using po4a. Do not be afraid by its apparent complexity,
211       it comes from the fact that the whole process is represented here. Once
212       you converted your project to po4a, only the right part of the graphic
213       is relevant. Note that sgml is taken as example here, but the same
214       remains true for all modules. Each part of the picture will be detailed
215       in the next sections.
216
217         fr.sgml  original.sgml ---->--------+------>----------->-------+
218            ⎪         ⎪                      ⎪                          ⎪
219            V         V           { update of original }                ⎪
220            ⎪         ⎪                      ⎪                          ⎪
221            +--<---<--+                      V                          ⎪
222            ⎪         ⎪              original.new.sgml----->------->----+
223            V         V                      ⎪                          ⎪
224         [po4a-gettextize]      +--->---->---+                          ⎪
225            ⎪         ⎪         ⎪            V                          ⎪
226            ⎪         ⎪         ⎪     [po4a-updatepo]                   ⎪
227            ⎪         V         ^            ⎪                          V
228            V    original.pot   ⎪            V                          ⎪
229            ⎪         ⎪         ⎪          fr.po                        ⎪
230            ⎪         ⎪         ⎪         (fuzzy)                       ⎪
231            ⎪  { translation }  ⎪            ⎪                          ⎪
232            ⎪         ⎪         ^            V                          V
233            ⎪         ⎪         ⎪     {manual editing}                  ⎪
234            V         V         ⎪            ⎪                          ⎪
235            ⎪         ⎪         ⎪            V                          V
236            ⎪         ⎪         +--<---    fr.po       addendum   original.sgml
237            +---->----+---->------->---> (up-to-date) (optional)  (up-to-date)
238                                             ⎪            ⎪             ⎪
239                                             v            v             v
240                                             +------>-----+------<------+
241
242                                                          v
243                                                  [po4a-translate]
244
245                                                          V
246                                                       fr.sgml
247                                                    (up-to-date)
248
249       On the left part, the conversion of a translation not using po4a to
250       this system is shown. On the top of the right part, the action of the
251       original author is depicted (updating the documentation).  The middle
252       of the right part is where the automatic actions of po4a are depicted.
253       The new material are extracted, and compared against the exiting trans‐
254       lation. Parts which didn't change are found, and previous translation
255       is used. Parts which where partially modified are also connected to the
256       previous translation, but with a specific marker indicating that the
257       translation must be updated. The bottom of the figure shows how a for‐
258       matted document is built.
259
260       Actually, as a translator, the only manual operation you have to do is
261       the part marked {manual editing}. Yeah, I'm sorry, but po4a helps you
262       translate.  It does not translate anything for you...
263
264       HOWTO begin a new translation?
265
266       This section presents the needed steps required to begin a new transla‐
267       tion with po4a. The refinements involved in converting an existing
268       project to this system are detailed in the relevant section.
269
270       To begin a new translation using po4a, you have to do the following
271       steps:
272
273       - Extract the text which have to be translated from the original docu‐
274         ment into a new pot file (the gettext format). For that, use the
275         po4a-gettextize program that way:
276
277           $ po4a-gettextize -f <format> -m <master.doc> -p <translation.pot>
278
279         <format> is naturally the format used in the <master.doc> document.
280         As expected, the output goes into <translation.pot>.  Please refer to
281         po4a-gettextize(1) for more details about the existing options.
282
283       - Actually translate what should be translated. For that, you have to
284         rename the pot file for example to doc.XX.po (where XX is the ISO639
285         code of the language you are translating to, e.g. "fr" for French),
286         and edit the resulting file. It is often a good idea to not name the
287         file XX.po to avoid confusion with the translation of the program
288         messages, but this your call.  Don't forget to update the po file
289         headers, they are important.
290
291         The actual translation can be done using the Emacs po mode or kbabel
292         (KDE based) or gtranslator (GNOME based), or whichever program you
293         prefer to use them. A good ol' vi could do the trick too, even if
294         there is no specialized mode for this task.
295
296         If you wish to learn more about this, you definitively need to refer
297         to the gettext documentation, available in the gettext-doc package.
298
299       HOWTO change the translation back to a documentation file?
300
301       Once you're done with the translation, you want to get the translated
302       documentation and distribute it to users along with the original one.
303       For that, use the po4a-translate(1) program like that (where XX is the
304       language code):
305
306         $ po4a-translate -f <format> -m <master.doc> -p <doc.XX.po> -l <XX.doc>
307
308       As before, <format> is the format used in the <master.doc> document.
309       But this time, the po file provided with the -p flag is part of the
310       input. This is your translation. The output goes into <XX.doc>.
311
312       Please refer to po4a-translate(1) for more details.
313
314       HOWTO update a po4a translation?
315
316       To update your translation when the original file has changed, use the
317       po4a-updatepo(1) program like that:
318
319         $ po4a-updatepo -f <format> -m <new_original.doc> -p <existing.XX.po>
320
321       (Please refer to po4a-updatepo(1) for more details)
322
323       Naturally, the new paragraph in the document won't get magically trans‐
324       lated in the "po" file with this operation, and you'll need to update
325       the "po" file manually. Likewise, you may have to rework the transla‐
326       tion for paragraphs which were modified a bit. To make sure you won't
327       miss any of them, they are marked as "fuzzy" during the process and you
328       have to remove this marker before the translation can be used by
329       po4a-translate.  As for the initial translation, the best is to use
330       your favorite po editor here.
331
332       Once your "po" file is up-to-date again, without any untranslated or
333       fuzzy string left, you can generate a translated documentation file, as
334       explained in the previous section.
335
336       HOWTO convert a pre-existing translation to po4a?
337
338       Often, you used to translate manually the document happily until a
339       major reorganization of the original document happened. Then, after
340       some unpleasant tries with diff or similar tools, you want to convert
341       to po4a.  But of course, you don't want to loose your existing transla‐
342       tion in the process. Don't worry, this case is also handled by po4a
343       tools and is called gettextization.
344
345       The key here is to have the same structure in the translated document
346       and in the original one so that the tools can match the content accord‐
347       ingly.
348
349       If you are lucky (i.e., if the structures of both documents perfectly
350       match), it will work seamlessly and you will be set in a few seconds.
351       Otherwise, you may understand why this process has such an ugly name,
352       and you'd better be prepared to some grunt work here. In any case,
353       remember that it is the price to pay to get the comfort of po4a after‐
354       ward. And the good point is that you have to do so only once.
355
356       I cannot emphasis this too much. In order to ease the process, it is
357       thus important that you find the exact version which were used to do
358       the translation. The best situation is when you noted down the cvs
359       revision used for the translation and you didn't modify it in the
360       translation process, so that you can use it.
361
362       It won't work well when you use the updated original text with the old
363       translation. It remains possible, but is harder and really should be
364       avoided if possible. In fact, I guess that if you fail to find the
365       original text again, the best solution is to find someone to do the
366       gettextization for you (but, please, not me ;).
367
368       Maybe I'm too dramatic here. Even when things go wrong, it remains ways
369       faster than translating everything again. I was able to gettextize the
370       existing French translation of the Perl documentation in one day, even
371       if things did went wrong. That was more than two megabytes of text, and
372       a new translation would have lasted months or more.
373
374       Let me explain the basis of the procedure first and I will come back on
375       hints to achieve it when the process goes wrong. To ease comprehension,
376       the sgml module is taken as an example once again, but the format used
377       doesn't really matter.
378
379       Once you have the old original again, the gettextization may be as easy
380       as:
381
382        $ po4a-gettextize -f <format> -m <old.original> -l <old.translation> -p <doc.XX.po>
383
384       When you're lucky, that's it. You converted your old translation to
385       po4a and can begin with the updating task right away. Just follow the
386       procedure explained a few section ago to synchronize your po file with
387       the newest original document, and update the translation accordingly.
388
389       Please note that even when things seem to work properly, there is still
390       room for errors in this process. The point is that po4a is unable to
391       understand the text to make sure that the translation match the origi‐
392       nal. That's why all strings are marked as "fuzzy" in the process. You
393       should check each of them carefully before removing those markers.
394
395       Often the document structures don't match exactly, preventing po4a-get‐
396       textize from doing its job properly. At that point, the whole game is
397       about editing the files to get their damn structures matching.
398
399       It may help to read the section "Gettextization: how does it work?"
400       below.  Understanding the internal process will help you to make this
401       work. The good point is that po4a-gettextize is rather verbose about
402       what went wrong when it happens. First, it pinpoints where in the docu‐
403       ments the structures' discrepancies are. You will learn the strings
404       that don't match, their positions in the text, and the type of each of
405       them. Moreover, the po file generated so far will be dumped to
406       /tmp/gettextization.failed.po.
407
408       -   Remove all extra parts of the translations, such as the section in
409           which you give the translator name and thank every people who con‐
410           tributed to the translation. Addenda, which are described in the
411           next section, will allow you to re-add them afterward.
412
413       -   Do not hesitate to edit both the original and the translation. The
414           most important thing is to get the po file. You will be able to
415           update it afterward. That being said, editing the translation
416           should be preferred when both are possible since it makes things
417           easier when the gettextization is done.
418
419       -   If needed, kill some parts of the original if they happen to not be
420           translated. When synchronizing the po with the document afterward,
421           they will come back from themselves.
422
423       -   If you changed the structure a bit (to merge two paragraphs, or
424           split another one), undo those changes. If there is issues in the
425           original, you should inform the original author. Fixing them in
426           your translation only fix it for a part of the community. And more‐
427           over, it's impossible when using po4a ;)
428
429       -   Sometimes, the paragraph content does match, but their types don't.
430           Fixing it is rather format-dependant. In pod and nroff, it often
431           comes from the fact that one of the two contains a line beginning
432           with a white space where the other doesn't. In those formats, such
433           paragraph cannot be wrapped and thus become a different type. Just
434           remove the space and you are fine. It may also be a typo in the tag
435           name.
436
437           Likewise, two paragraphs may get merged together in pod when the
438           separating line contains some spaces, or when there is no empty
439           line before the =item line and the content of the item.
440
441       -   Sometimes, there is a desynchronization between the files, and the
442           translation is attached to the wrong original paragraph. It is the
443           sign that the real problem was before in the files. Check /tmp/get‐
444           textization.failed.po to see when the desynchronization begins, and
445           fix it there.
446
447       -   Sometimes, you get the strong feeling that po4a ate some parts of
448           the text, either the original or the translation. /tmp/gettextiza‐
449           tion.failed.po indicates that both of them where gently matching,
450           and then the gettextization fails because it tried to match one
451           paragraph with the one after (or before) the right one, as if the
452           right one disappeared. Curse po4a as I did when it first happened
453           to me. Generously.
454
455           This unfortunate situation happens when the same paragraph is
456           repeated over the document. In that case, no new entry is created
457           in the po file, but a new reference is added to the existing one
458           instead.
459
460           So, when the same paragraph appears twice in the original but are
461           not translated in the exact same way each time, you will get the
462           feeling that a paragraph of the original disappeared. Just kill the
463           new translation. If you prefer to kill the first translation
464           instead when it was actually better, remove the second one from
465           where it is and put it in place of the first one.
466
467           In the contrary, if two similar but different paragraphs were
468           translated in the exact same way, you will get the feeling that a
469           paragraph of the translation disappeared. A solution is to add a
470           stupid string to the original paragraph (such as "I'm different").
471           Don't be afraid, those things will disappear during the synchro‐
472           nization, and when the added text is short enough, gettext will
473           match your translation to the existing text (marking it as fuzzy,
474           but you don't really care since all strings are fuzzy after gettex‐
475           tization).
476
477       Hopefully, those tips will help you making your gettextization work and
478       obtain your precious po file. You are now ready to synchronize your
479       file and begin your translation. Please note that on large text, it may
480       happen that the first synchronization takes a long time.
481
482       For example, the first po4a-updatepo of the Perl documentation's French
483       translation (5.5 Mb po file) took about two days full on a 1Ghz G5 com‐
484       puter.  Yes, 48 hours. But the subsequent ones only take a dozen of
485       seconds on my old laptop. This is because the first time, most of the
486       msgid of the po file don't match any of the pot file ones. This forces
487       gettext to search for the closest one using a costly string proximity
488       algorithm.
489
490       HOWTO add extra text to translations (like translator's name)?
491
492       Because of the gettext approach, doing this becomes more difficult in
493       po4a than it was when simply editing a new file along the original one.
494       But it remains possible, thanks to the so-called addenda.
495
496       It may help the comprehension to consider addenda as a sort of patches
497       applied to the localized document after processing. They are rather
498       different from the usual patches (they have only one line of context,
499       which can embed perl regular expression, and they can only add new text
500       without removing any), but the functionalities are the same.
501
502       Their goal is to allow the translator to add extra content to the docu‐
503       ment which is not translated from the original document. The most com‐
504       mon usage is to add a section about the translation itself, listing
505       contributors and explaining how to report bug against the translation.
506
507       Addendum must be provided as a separate file. The first line consti‐
508       tutes a header indicating where in the produced document they should be
509       placed. The rest of the addendum file will be added verbatim at the
510       determined position of the resulting document.
511
512       The header have a pretty rigid syntax: It must begin with the string
513       "PO4A-HEADER:", followed by a semi-colon (;) separated list of
514       "key=value" fields. White spaces ARE important. Note that you cannot
515       use the semi-colon char (;) in the value, and that quoting it doesn't
516       help.
517
518       Again, it sounds scary, but the examples given below should help you to
519       find how to write the header line you need. To illustrate the discus‐
520       sion, assume we want to add a section called "About this translation"
521       after the "About this document" one.
522
523       Here are the possible header keys:
524
525       position (mandatory)
526           a regexp. The addendum will be placed near the line matching this
527           regexp.  Note that we're speaking about the translated document
528           here, not the original. If more than a line match this expression
529           (or none), the addition will fail. It is indeed better to report an
530           error than inserting the addendum at the wrong location.
531
532           This line is called position point in the following. The point
533           where the addendum is added is called insertion point. Those two
534           points are near one from another, but not equal. For example, if
535           you want to insert a new section, it is easier to put the position
536           point on the title of the preceding section and explain po4a where
537           the section ends (remember that position point is given by a regexp
538           which should match a unique line).
539
540           The localization of the insertion point with regard to the position
541           point is controlled by the "mode", "beginboundary" and "endbound‐
542           ary" fields, as explained below.
543
544           In our case, we would have:
545
546                position=<title>About this document</title>
547
548       mode (mandatory)
549           It can be either the string "before" or "after", specifying the
550           position of the addendum, relative to the position point.
551
552           Since we want the new section to be placed below the one we are
553           matching, we have:
554
555                mode=after
556
557       beginboundary (used only when mode=after, and mandatory in that case)
558       endboundary (idem)
559           regexp matching the end of the section after which the addendum
560           goes.
561
562           When mode=after, the insertion point is after the position point,
563           but not directly after! It is placed at the end of the section
564           beginning at the position point, ie after or before the line
565           matched by the "???boundary" argument, depending on whether you
566           used "beginboundary" or "endboundary".
567
568           In our case, we can choose to indicate the end of the section we
569           match by adding:
570
571              endboundary=</section>
572
573           or to indicate the beginning of the next section by indicating:
574
575              beginboundary=<section>
576
577           In both case, our addendum will be placed after the </section> and
578           before the <section>. The first one is better since it will work
579           even if the document gets reorganized.
580
581           Both forms exist because documentation formats are different. In
582           some of them, there is a way to mark the end of a section (just
583           like the "</section>" we just used), while some other don't explic‐
584           itly mark the end of section (like in nroff). In the former case,
585           you want to make a boundary matching the end of a section, so that
586           the insertion point comes after it. In the latter case, you want to
587           make a boundary matching the beginning of next section, so that the
588           insertion point comes just before it.
589
590       This can seem obscure, but hopefully, the next examples will enlighten
591       you.
592
593       To sum up the example we used so far, in order to add a section called
594       "About this translation" after the "About this document" one in a sgml
595       document, you can use either of those header lines:
596          PO4A-HEADER: mode=after; position=About this document; endboundary=</section>
597          PO4A-HEADER: mode=after; position=About this document; beginboundary=<section>
598
599       If you want to add something after the following nroff section:
600           .SH "AUTHORS"
601
602         you should put a "position" matching this line, and a "beginboundary"
603         matching the beginning of the next section (ie "^\.SH"). The addendum
604         will then be added after the position point and immediately before
605         the first line matching the "beginboundary". That is to say:
606
607          PO4A-HEADER:mode=after;position=AUTHORS;beginboundary=\.SH
608
609       If you want to add something into a section (like after "Copyright Big
610       Dude") instead of adding a whole section, give a "position" matching
611       this line, and give a "beginboundary" matching any line.
612          PO4A-HEADER:mode=after;position=Copyright Big Dude, 2004;beginboundary=^
613
614       If you want to add something at the end of the document, give a "posi‐
615       tion" matching any line of your document (but only one line. Po4a won't
616       proceed if it's not unique), and give an "endboundary" matching noth‐
617       ing. Don't use simple strings here like ""EOF"", but prefer which have
618       less chance to be in your document.
619          PO4A-HEADER:mode=after;position=<title>About</title>;beginboundary=FakePo4aBoundary
620
621       In any case, remember that these are regexp. For example, if you want
622       to match the end of a nroff section ending with the line
623
624         .fi
625
626       don't use ".fi" as endboundary, because it will match with "the[
627       fi]le", which is obviously not what you expect. The correct endboundary
628       in that case is: "^\.fi$".
629
630       If the addendum doesn't go where you expected, try to pass the -vv
631       argument to the tools, so that they explain you what they do while
632       placing the addendum.
633
634       More detailed example
635
636       Original document (pod formatted):
637
638        ⎪=head1 NAME
639
640        ⎪dummy - a dummy program
641
642        ⎪=head1 AUTHOR
643
644        ⎪me
645
646       Then, the following addendum will ensure that a section (in French)
647       about the translator is added at the end of the file. (in French, "TRA‐
648       DUCTEUR" means "TRANSLATOR", and "moi" means "me")
649
650        ⎪PO4A-HEADER:mode=after;position=AUTEUR;beginboundary=^=head
651
652        ⎪=head1 TRADUCTEUR
653
654        ⎪moi
655
656       In order to put your addendum before the AUTHOR, use the following
657       header:
658
659        PO4A-HEADER:mode=after;position=NOM;beginboundary=^=head1
660
661       This works because the next line matching the beginboundary /^=head1/
662       after the section "NAME" (translated to "NOM" in French), is the one
663       declaring the authors. So, the addendum will be put between both sec‐
664       tions.
665
666       HOWTO do all this in one program invocation?
667
668       The use of po4a proved to be a bit error prone for the users since you
669       have to call two different programs in the right order (po4a-updatepo
670       and then po4a-translate), each of them needing more than 3 arguments.
671       Moreover, it was difficult with this system to use only one po file for
672       all your documents when more than one format was used.
673
674       The po4a(1) program was designed to solve those difficulties. Once your
675       project is converted to the system, you write a simple configuration
676       file explaining where your translation files are (po and pot), where
677       the original documents are, their formats and where their translations
678       should be placed.
679
680       Then, calling po4a(1) on this file ensure that the po files are syn‐
681       chronized against the original document, and that the translated docu‐
682       ment are generated properly. Of course, you will want to call this pro‐
683       gram twice: once before editing the po file to update them and once
684       afterward to get completely updated translated document. But you only
685       need to remember one command line.
686

How does it work?

688       This chapter gives you a brief overview of the po4a internals, so that
689       you may feel more confident to help us maintaining and improving it. It
690       may also help you understanding why it does not do what you expected,
691       and how to solve your problems.
692
693       What's the big picture here?
694
695       The po4a architecture is object oriented (in Perl. Isn't that neat?).
696       The common ancestor to all parser classes is called TransTractor. This
697       strange name comes from the fact that it is at the same time in charge
698       of translating document and extracting strings.
699
700       More formally, it takes a document to translate plus a po file contain‐
701       ing the translations to use as input while producing two separate out‐
702       puts: Another po file (resulting of the extraction of translatable
703       strings from the input document), and a translated document (with the
704       same structure than the input one, but with all translatable strings
705       replaced with content of the input po). Here is a graphical representa‐
706       tion of this:
707
708          Input document --\                             /---> Output document
709                            \      TransTractor::       /       (translated)
710                             +-->--   parse()  --------+
711                            /                           \
712          Input po --------/                             \---> Output po
713                                                                (extracted)
714
715       This little bone is the core of all the po4a architecture. If you omit
716       the input po and the output document, you get po4a-gettextize. If you
717       provide both input and disregard the output po, you get po4a-translate.
718
719       TransTractor::parse() is a virtual function implemented by each module.
720       Here is a little example to show you how it works. It parses a list of
721       paragraphs, each of them beginning with <p>.
722
723         1 sub parse {
724         2   PARAGRAPH: while (1) {
725         3     $my ($paragraph,$pararef,$line,$lref)=("","","","");
726         4     $my $first=1;
727         5     while (($line,$lref)=$document->shiftline() && defined($line)) {
728         6       if ($line =~ m/<p>/ && !$first--; ) {
729         7         $document->unshiftline($line,$lref);
730         8
731         9         $paragraph =~ s/^<p>//s;
732        10         $document->pushline("<p>".$document->translate($paragraph,$pararef));
733        11
734        12         next PARAGRAPH;
735        13       } else {
736        14         $paragraph .= $line;
737        15         $pararef = $lref unless(length($pararef));
738        16       }
739        17     }
740        18     return; # Did not got a defined line? End of input file.
741        19   }
742        20 }
743
744       On line 6, we encounter <p> for the second time. That's the signal of
745       the next paragraph. We should thus put the just obtained line back into
746       the original document (line 7) and push the paragraph built so far into
747       the outputs. After removing the leading <p> of it on line 9, we push
748       the concatenation of this tag with the translation of the rest of the
749       paragraph.
750
751       This translate() function is very cool. It pushes its argument into the
752       output po file (extraction) and returns its translation as found in the
753       input po file (translation). Since it's used as part of the argument of
754       pushline(), this translation lands into the output document.
755
756       Isn't that cool? It is possible to build a complete po4a module in less
757       than 20 lines when the format is simple enough...
758
759       You can learn more about this in Locale::Po4a::TransTractor(3pm).
760
761       Gettextization: how does it work?
762
763       The idea here is to take the original document and its translation, and
764       to say that the Nth extracted string from the translation is the trans‐
765       lation of the Nth extracted string from the original. In order to work,
766       both files must share exactly the same structure. For example, if the
767       files have the following structure, it is very unlikely that the 4th
768       string in translation (of type 'chapter') is the translation of the 4th
769       string in original (of type 'paragraph').
770
771           Original         Translation
772
773         chapter            chapter
774           paragraph          paragraph
775           paragraph          paragraph
776           paragraph        chapter
777         chapter              paragraph
778           paragraph          paragraph
779
780       For that, po4a parsers are used on both the original and the transla‐
781       tion files to extract po files, and then a third po file is built from
782       them taking strings from the second as translation of strings from the
783       first. In order to check that the strings we put together are actually
784       the translations of each other, document parsers in po4a should put
785       information about the syntactical type of extracted strings in the doc‐
786       ument (all existing ones do so, yours should also). Then, this informa‐
787       tion is used to make sure that both documents have the same syntax. In
788       the previous example, it would allow us to detect that string 4 is a
789       paragraph in one case, and a chapter title in another case and to
790       report the problem.
791
792       In theory, it would be possible to detect the problem, and resynchro‐
793       nize the files afterward (just like diff does). But what we should do
794       of the few strings before desynchronizations is not clear, and it would
795       produce bad results some times. That's why the current implementation
796       don't try to resynchronize anything and verbosely fail when something
797       goes wrong, requiring manual modification of files to fix the problem.
798
799       Even with these precautions, things can go wrong very easily here.
800       That's why all translations guessed this way are marked fuzzy to make
801       sure that the translator review and check them.
802
803       Addendum: How does it work?
804
805       Well, that's pretty easy here. The translated document is not written
806       directly to disk, but kept in memory until all the addenda are applied.
807       The algorithms involved here are rather straightforward. We look for a
808       line matching the position regexp, and insert the addendum before it if
809       we're in mode=before. If not, we search for the next line matching the
810       boundary and insert the addendum after this line if it's an "endbound‐
811       ary" or before this line if it's a "beginboundary".
812

FAQ

814       This chapter groups the Frequently Asked Questions. In fact, most of
815       the questions for now could be formulated that way: "Why is it designed
816       this way, and not that one?" If you think po4a isn't the right answer
817       to documentation translation, you should consider reading this section.
818       If it does not answer your question, please contact us on the
819       <po4a-devel@lists.alioth.debian.org> mailing list. We love feedback.
820
821       Why to translate each paragraph separately?
822
823       Yes, in po4a, each paragraph is translated separately (in fact, each
824       module decides this, but all existing modules do so, and yours should
825       also).  There are two main advantages to this approach:
826
827       · When the technical parts of the document are hidden from the scene,
828         the translator can't mess with them. The fewer markers we present to
829         the translator the less error he can do.
830
831       · Cutting the document helps in isolating the changes to the original
832         document. When the original is modified, finding what parts of the
833         translation need to be updated is eased by this process.
834
835       Even with these advantages, some people don't like the idea of trans‐
836       lating each paragraph separately. Here are some of the answers I can
837       give to their fear:
838
839       · This approach proved successfully in the KDE project and allows peo‐
840         ple there to produce the biggest corpus of translated and up to date
841         documentation I know.
842
843       · The translators can still use the context to translate, since the
844         strings in the po file are in the same order than in the original
845         document. Translating sequentially is thus rather comparable whether
846         you use po4a or not.  And in any case, the best way to get the con‐
847         text remains to convert the document to a printable format since the
848         text formatting ones are not really readable, IMHO.
849
850       · This approach is the one used by professional translators. I agree,
851         that they have somewhat different goals than open-source translators.
852         The maintenance is for example often less critical to them since the
853         content changes rarely.
854
855       Why not to split on sentence level (or smaller)?
856
857       Professional translator tools sometimes split the document at the sen‐
858       tence level in order to maximize the reusability of previous transla‐
859       tions and speed up their process.  The problem is that the same sen‐
860       tence may have several translations, depending on the context.
861
862       Paragraphs are by definition longer than sentences. It will hopefully
863       ensure that having the same paragraph in two documents will have the
864       same meaning (and translation), regardless of the context in each case.
865
866       Splitting on smaller parts than the sentence would be very bad. It
867       would be a bit long to explain why here, but interested reader can
868       refer to the Locale::Maketext::TPJ13(3pm) man page (which comes with
869       the Perl documentation), for example. To make short, each language has
870       its specific syntactic rules, and there is no way to build sentences by
871       aggregating parts of sentences working for all existing languages (or
872       even for the 5 of the 10 most spoken ones, or even less).
873
874       Why not put the original as comment along with translation (or other
875       way)?
876
877       At the first glance, gettext don't seem to be adapted to all kind of
878       translations.  For example, it didn't seemed adapted to debconf, the
879       interface all Debian packages use for their interaction with the user
880       during installation. In that case, the texts to translate were pretty
881       short (a dozen of line for each package), and it was difficult to put
882       the translation in a specialized file since it has to be available
883       before the package installation.
884
885       That's why the debconf developer decided to implement another solution,
886       where translations are be placed in the same file than the original.
887       This is rather appealing. One would even want to do this for xml, for
888       example. It would look like that:
889
890        <section>
891         <title lang="en">My title</title>
892         <title lang="fr">Mon titre</title>
893
894         <para>
895          <text lang="en">My text.</text>
896          <text lang="fr">Mon texte.</text>
897         </para>
898        </section>
899
900       But it was so problematic that a po-based approach is now used. Only
901       the original can be edited in the file, and the translations must take
902       place in po files extracted from the master template (and placed back
903       at package compilation time). The old system was deprecated because of
904       several issues:
905
906       * maintenance problems
907           If several translators provide a patch at the same time, it gets
908           hard to merge them together.
909
910           How will you detect changes to the original, which need to be
911           applied to the translations? In order to use diff, you have to note
912           which version of the original you translated. I.e., you need a po
913           file in your file ;)
914
915       * encoding problems
916           This solution is viable when only European languages are involved,
917           but the introduction of Korean, Russian and/or Arab really compli‐
918           cate the picture.  UTF could be a solution, but there are still
919           some problems with it.
920
921           Moreover, such problems are hard to detect (i.e., only Korean read‐
922           ers will detect that the encoding of Korean is broken [because of
923           the Russian translator])
924
925       gettext solves all those problems together.
926
927       But gettext wasn't designed for that use!
928
929       That's true, but until now nobody came with a better solution. The only
930       known alternative is manual translation, will all the maintenance
931       issues.
932
933       What about the other translation tools for documentation using gettext?
934
935       As far as I know, there are only two of them:
936
937       poxml
938           This is the tool developed by KDE people to handle DocBook XML.
939           AFAIK, it was the first program to extract strings to translate
940           from documentation to po files, and inject them back after transla‐
941           tion.
942
943           It can only handle XML, and only a particular DTD. I'm quite
944           unhappy with the handling of lists, which end in one big msgid.
945           When the list become big, the chunk becomes harder to shallow.
946
947       po-debiandoc
948           This program done by Denis Barbier is a sort of precursor of the
949           po4a sgml module, which more or less deprecates it. As the name
950           says, it handles only the debiandoc dtd, which is more or less a
951           deprecated dtd.
952
953       The main advantages of po4a over them are the ease of extra content
954       addition (which is even worse there) and the ability to achieve gettex‐
955       tization.
956
957       Educating developers about translation
958
959       When you try to translate documentation or programs, you face three
960       kinds of problems; linguistics (not everybody speaks two languages),
961       technical (that's why po4a exists) and relational/human. Not all devel‐
962       opers understand the necessity of translating stuff. Even when good
963       willed, they may ignore how to ease the work of translators. To help
964       with that, po4a comes with lot of documentation which can be referred
965       to.
966
967       Another important point is that each translated file begins with a
968       short comment indicating what the file is, how to use it. This should
969       help the poor developers flooded with tons of files in different lan‐
970       guages they hardly speak, and help them dealing correctly with it.
971
972       In the po4a project, translated documents are not source files anymore.
973       Since sgml files are habitually source files, it's an easy mistake.
974       That's why all files present this header:
975
976        ⎪       *****************************************************
977        ⎪       *           GENERATED FILE, DO NOT EDIT             *
978        ⎪       * THIS IS NO SOURCE FILE, BUT RESULT OF COMPILATION *
979        ⎪       *****************************************************
980
981        ⎪ This file was generated by po4a-translate(1). Do not store it (in cvs,
982        ⎪ for example), but store the po file used as source file by po4a-translate.
983
984        ⎪ In fact, consider this as a binary, and the po file as a regular source file:
985        ⎪ If the po gets lost, keeping this translation up-to-date will be harder ;)
986
987       Likewise, gettext's regular po files only need to be copied to the po/
988       directory. But this is not the case of the ones manipulated by po4a.
989       The major risk here is that a developer erases the existing translation
990       of his program with the translation of his documentation. (Both of them
991       can't be stored in the same po file, because the program needs to
992       install its translation as an mo file while the documentation only uses
993       its translation at compile time). That's why the po files produced by
994       the po-debiandoc module contain the following header:
995
996        #
997        #  ADVISES TO DEVELOPERS:
998        #    - you do not need to manually edit POT or PO files.
999        #    - this file contains the translation of your debconf templates.
1000        #      Do not replace the translation of your program with this !!
1001        #        (or your translators will get very upset)
1002        #
1003        #  ADVISES TO TRANSLATORS:
1004        #    If you are not familiar with the PO format, gettext documentation
1005        #     is worth reading, especially sections dedicated to this format.
1006        #    For example, run:
1007        #         info -n '(gettext)PO Files'
1008        #         info -n '(gettext)Header Entry'
1009        #
1010        #    Some information specific to po-debconf are available at
1011        #            /usr/share/doc/po-debconf/README-trans
1012        #         or http://www.debian.org/intl/l10n/po-debconf/README-trans
1013        #
1014
1015       SUMMARY of the advantages of the gettext based approach
1016
1017       · The translations are not stored along with the original, which makes
1018         it possible to detect if translations become out of date.
1019
1020       · The translations are stored in separate files from each other, which
1021         prevents translators of different languages from interfering, both
1022         when submitting their patch and at the file encoding level.
1023
1024       · It is based internally on "gettext" (but "po4a" offers a very simple
1025         interface so that you don't need to understand the internals to use
1026         it).  That way, we don't have to re-implement the wheel, and because
1027         of their wide use, we can think that these tools are more or less bug
1028         free.
1029
1030       · Nothing changed for the end-user (beside the fact translations will
1031         hopefully be better maintained :). The resulting documentation file
1032         distributed is exactly the same.
1033
1034       · No need for translators to learn a new file syntax and their favorite
1035         po file editor (like emacs' po mode, kbabel or gtranslator) will work
1036         just fine.
1037
1038       · Gettext offers a simple way to get statistics about what is done,
1039         what should be reviewed and updated, and what is still to do. Some
1040         example can be found at those addresses:
1041
1042          - http://kbabel.kde.org/img/previewKonq.png
1043          - http://www.debian.org/intl/l10n/
1044
1045       But everything isn't green, and this approach also has some disadvan‐
1046       tages we have to deal with.
1047
1048       · Addenda are... strange at the first glance.
1049
1050       · You can't adapt the translated text to your preferences, like split‐
1051         ting a paragraph here, and joining two other ones there. But in some
1052         sense, if there is an issue with the original, it should be reported
1053         as a bug anyway.
1054
1055       · Even with an easy interface, it remains a new tool people have to
1056         learn.
1057
1058         One of my dreams would be to integrate somehow po4a to gtranslator or
1059         kbabel. When an sgml file is opened, the strings are automatically
1060         extracted.  When it's saved a translated sgml file can be written to
1061         disk. If we manage to do an MS Word (TM) module (or at least RTF)
1062         professional translators may even use it.
1063

Known bugs and feature requests

1065       The biggest issue (besides missing modules) is the encoding handling.
1066       Adding a UTF8 perl pragma and then recoding the strings on output is
1067       the way to go, but it's not done yet.
1068
1069       We would also like to factorise some code (about file insertion) of the
1070       sgml module back into the TransTractor so that all modules can benefit
1071       from this, but this is not user visible.
1072

AUTHORS

1074        Denis Barbier <barbier,linuxfr.org>
1075        Martin Quinson (mquinson#debian.org)
1076
1077
1078
1079Po4a Tools                        2007-08-15                         PO4A.7(7)
Impressum