1PO4A-GETTEXTIZE(1p)               Po4a Tools               PO4A-GETTEXTIZE(1p)
2
3
4

NAME

6       po4a-gettextize - convert an original file (and its translation) to a
7       PO file
8

SYNOPSIS

10       po4a-gettextize -f fmt -m master.doc -l XX.doc -p XX.po
11
12       (XX.po is the output, all others are inputs)
13

DESCRIPTION

15       po4a (PO for anything) eases the maintenance of documentation
16       translation using the classical gettext tools. The main feature of po4a
17       is that it decouples the translation of content from its document
18       structure.  Please refer to the page po4a(7) for a gentle introduction
19       to this project.
20
21       The po4a-gettextize script helps you converting your previously
22       existing translations into a po4a-based workflow. This is only to be
23       done once to salvage an existing translation while converting to po4a,
24       not on a regular basis after the conversion of your project. This
25       tedious process is explained in details in Section 'Converting a manual
26       translation to po4a' below.
27
28       You must provide both a master file (e.g., the source in English) and
29       an existing translated file (e.g., a previous translation attempt
30       without po4a). If you provide more than one master or translation
31       files, they will be used in sequence, but it may be easier to
32       gettextize each page or chapter separately and then use msgmerge to
33       merge all produced PO files. As you wish.
34
35       If the master document has non-ASCII characters, the new generated PO
36       file will be in UTF-8. If the master document is completely in ASCII,
37       the generated PO will use the encoding of the translated input
38       document.
39

OPTIONS

41       -f, --format
42           Format of the documentation you want to handle. Use the
43           --help-format option to see the list of available formats.
44
45       -m, --master
46           File containing the master document to translate. You can use this
47           option multiple times if you want to gettextize multiple documents.
48
49       -M, --master-charset
50           Charset of the file containing the document to translate.
51
52       -l, --localized
53           File containing the localized (translated) document. If you
54           provided multiple master files, you may wish to provide multiple
55           localized file by using this option more than once.
56
57       -L, --localized-charset
58           Charset of the file containing the localized document.
59
60       -p, --po
61           File where the message catalog should be written. If not given, the
62           message catalog will be written to the standard output.
63
64       -o, --option
65           Extra option(s) to pass to the format plugin. See the documentation
66           of each plugin for more information about the valid options and
67           their meanings. For example, you could pass '-o tablecells' to the
68           AsciiDoc parser, while the text parser would accept '-o
69           tabs=split'.
70
71       -h, --help
72           Show a short help message.
73
74       --help-format
75           List the documentation formats understood by po4a.
76
77       -k --keep-temps
78           Keep the temporary master and localized POT files built before
79           merging.  This can be useful to understand why these files get
80           desynchronized, leading to gettextization problems
81
82       -V, --version
83           Display the version of the script and exit.
84
85       -v, --verbose
86           Increase the verbosity of the program.
87
88       -d, --debug
89           Output some debugging information.
90
91       --msgid-bugs-address email@address
92           Set the report address for msgid bugs. By default, the created POT
93           files have no Report-Msgid-Bugs-To fields.
94
95       --copyright-holder string
96           Set the copyright holder in the POT header. The default value is
97           "Free Software Foundation, Inc."
98
99       --package-name string
100           Set the package name for the POT header. The default is "PACKAGE".
101
102       --package-version string
103           Set the package version for the POT header. The default is
104           "VERSION".
105
106   Converting a manual translation to po4a
107       po4a-gettextize synchronizes the master and localized files to extract
108       their content into a PO file. The content of the master file gives the
109       msgid while the content of the localized file gives the msgstr. This
110       process is somewhat fragile: the Nth string of the translated file is
111       supposed to be the translation of the Nth string in the original.
112
113       Gettextization works best if you manage to retrieve the exact version
114       of the original document that was used for translation. Even so, you
115       may need to fiddle with both master and localized files to align their
116       structure if it was changed by the original translator, so working on
117       files' copies is advised.
118
119       Internally, each po4a parser reports the syntactical type of each
120       extracted strings. This is how desynchronization are detected during
121       the gettextization.  In the example depicted below, it is very unlikely
122       that the 4th string in translation (of type 'chapter') is the
123       translation of the 4th string in original (of type 'paragraph'). It is
124       more likely that a new paragraph was added to the original, or that two
125       original paragraphs were merged together in the translation.
126
127           Original         Translation
128
129         chapter            chapter
130           paragraph          paragraph
131           paragraph          paragraph
132           paragraph        chapter
133         chapter              paragraph
134           paragraph          paragraph
135
136       po4a-gettextize will verbosely diagnose any structure
137       desynchronization. When this happens, you should manually edit the
138       files to add fake paragraphs or remove some content here and there
139       until the structure of both files actually match. Some tricks are given
140       below to salvage the most of the existing translation while doing so.
141
142       If you are lucky enough to have a perfect match in the file structures
143       out of the box, building a correct PO file is a matter of seconds.
144       Otherwise, you will soon understand why this process has such an ugly
145       name :) Even so, gettextization often remains faster than translating
146       everything again. I gettextized the French translation of the whole
147       Perl documentation in one day despite the many synchronization issues.
148       Given the amount of text (2Mb of original text), restarting the
149       translation without first salvaging the old translations would have
150       required several months of work. In addition, this grunt work is the
151       price to pay to get the comfort of po4a. Once converted, the
152       synchronization between master documents and translations will always
153       be fully automatic.
154
155       After a successful gettextization, the produced documents should be
156       manually checked for undetected disparities and silent errors, as
157       explained below.
158
159       Hints and tricks for the gettextization process
160
161       The gettextization stops as soon as a desynchronization is detected.
162       When this happens, you need to edit the files as much as needed to re-
163       align the files' structures. po4a-gettextize is rather verbose when
164       things go wrong. It reports the strings that don't match, their
165       positions in the text, and the type of each of them. Moreover, the PO
166       file generated so far is dumped as gettextization.failed.po for further
167       inspection.
168
169       Here are some tricks to help you in this tedious process and ensure
170       that you salvage the most of the previous translation:
171
172       •   Remove all extra content of the translations, such as the section
173           giving credits to the translators. They should be added separately
174           to po4a as addendas (see po4a(7)).
175
176       •   When editing the files to align their structures, prefer editing
177           the translation if possible. Indeed, if the changes to the original
178           are too intrusive, the old and new versions will not be matched
179           during the first po4a run after gettextization (see below). Any
180           unmatched translation will be dumped anyway.  That being said, you
181           still want to edit the original document if it's too hard to get
182           the gettextization to proceed otherwise, even if it means that one
183           paragraph of the translation is dumped. The important thing is to
184           get a first PO file to start with.
185
186       •   Do not hesitate to kill any original content that would not exist
187           in the translated version. This content will be automatically
188           reintroduced afterward, when synchronizing the PO file with the
189           document.
190
191       •   You should probably inform the original author of any structural
192           change in the translation that seems justified. Issues in the
193           original document should reported to the author. Fixing them in
194           your translation only fixes them for a part of the community. Plus,
195           it is impossible to do so when using po4a ;) But you probably want
196           to wait until the end of the conversion to po4a before changing the
197           original files.
198
199       •   Sometimes, the paragraph content does match, but not their types.
200           Fixing it is rather format-dependent. In POD and man, it often
201           comes from the fact that one of them contains a line beginning with
202           a white space while the other does not.  In those formats, such
203           paragraph cannot be wrapped and thus become a different type. Just
204           remove the space and you are fine. It may also be a typo in the tag
205           name in XML.
206
207           Likewise, two paragraphs may get merged together in POD when the
208           separating line contains some spaces, or when there is no empty
209           line between the =item line and the content of the item.
210
211       •   Sometimes, the desynchronization message seems odd because the
212           translation is attached to the wrong original paragraph. It is the
213           sign of an undetected issue earlier in the process. Search for the
214           actual desynchronization point by inspecting the file
215           gettextization.failed.po that was produced, and fix the problem
216           where it really is.
217
218       •   Other issues may come from duplicated strings in either the
219           original or translation. Duplicated strings are merged in PO files,
220           with two references.  This constitutes a difficulty for the
221           gettextization algorithm, that is a simple one to one pairing
222           between the msgids of both the master and the localized files. It
223           is however believed that recent versions of po4a deal properly with
224           duplicated strings, so you should report any remaining issue that
225           you may encounter.
226
227   Reviewing files produced by po4a-gettextize
228       Any file produced by po4a-gettextize should be manually reviewed, even
229       when the script terminates successfully. You should skim over the PO
230       file, ensuring that the msgid and msgstr actually match. It is not
231       necessary to ensure that the translation is perfectly correct yet, as
232       all entries are marked as fuzzy translations anyway. You only need to
233       check for obvious matching issues because badly matched translations
234       will be dumped in subsequent steps while you want to salvage them.
235
236       Fortunately, this step does not require to master the target languages
237       as you only want to recognize similar elements in each msgid and its
238       corresponding msgstr. As a speaker of French, English, and some German
239       myself, I can do this for all European languages at least, even if I
240       cannot say one word of most of these languages. I sometimes manage to
241       detect matching issues in non-Latin languages by looking at string
242       length, phrase structures (does the amount of interrogation marks
243       match?) and other clues, but I prefer when someone else can review
244       those languages.
245
246       If you detect a mismatch, edit the original and translation files as if
247       po4a-gettextize reported an error, and try again. Once you have a
248       decent PO file for your previous translation, backup it until you get
249       po4a working correctly.
250
251   Running po4a for the first time
252       The easiest way to setup po4a is to write a po4a.conf configuration
253       file, and use the integrated po4a program (po4a-updatepo and
254       po4a-translate are deprecated). Please check the "CONFIGURATION FILE"
255       Section in po4a(1) documentation for more details.
256
257       When po4a runs for the first time, the current version of the master
258       documents will be used to update the PO files containing the old
259       translations that you salvaged through gettextization. This can take
260       quite a long time, because many of the msgids of from the
261       gettextization do not exactly match the elements of the POT file built
262       from the recent master files. This forces gettext to search for the
263       closest one using a costly string proximity algorithm.  For example,
264       the first run over the Perl documentation's French translation (5.5 MB
265       PO file) took about 48 hours (yes, two days) while the subsequent ones
266       only take seconds.
267
268   Moving your translations to production
269       After this first run, the PO files are ready to be reviewed by
270       translators. All entries were marked as fuzzy in the PO file by
271       po4a-gettextization, forcing their careful review before use.
272       Translators should take each entry to verify that the salvaged
273       translation actually match the current original text, update the
274       translation on need, and remove the fuzzy markers.
275
276       Once enough fuzzy markers are removed, po4a will start generating the
277       translation files on disk, and you're ready to move your translation
278       workflow to production. Some projects find it useful to rely on weblate
279       to coordinate between translators and maintainers, but that's beyond
280       po4a' scope.
281

SEE ALSO

283       po4a(1), po4a-normalize(1), po4a-translate(1), po4a-updatepo(1),
284       po4a(7).
285

AUTHORS

287        Denis Barbier <barbier@linuxfr.org>
288        Nicolas François <nicolas.francois@centraliens.net>
289        Martin Quinson (mquinson#debian.org)
290
292       Copyright 2002-2022 by SPI, inc.
293
294       This program is free software; you may redistribute it and/or modify it
295       under the terms of GPL (see the COPYING file).
296
297
298
299Po4a Tools                        2023-01-23               PO4A-GETTEXTIZE(1p)
Impressum