1PDF2DJVU(1)                     pdf2djvu manual                    PDF2DJVU(1)
2
3
4

NAME

6       pdf2djvu - creates DjVu files from PDF files
7

SYNOPSIS

9       pdf2djvu [{-o | --output} output-djvu-file] [option...] pdf-file...
10
11       pdf2djvu {-i | --indirect} index-djvu-file  [option...] pdf-file...
12
13       pdf2djvu {--version | --help | -h}
14

DESCRIPTION

16       This program creates a DjVu file from one or more Portable Document
17       Format files.
18

OPTIONS

20       pdf2djvu accepts the following options:
21
22   Document type, file names
23       -o, --output=output-djvu-file
24           Generate a bundled multi-page document. Write the file into
25           output-djvu-file instead of standard output.
26
27       -i, --indirect=index-djvu-file
28           Generate an indirect multi-page document. Use index-djvu-file as
29           the index file name; put the component files into the same
30           directory. The directory must exist and be writable.
31
32       --page-id-template=template
33           Specifies the naming scheme for page identifiers. Consult the
34           “TEMPLATE LANGUAGE” section for the template language description.
35
36           The default template is “p{page:04*}.djvu”.
37
38           For portability reasons, page identifiers:
39
40           ·   must consist only of lowercase ASCII letters, digits, _, +, -
41               and dot,
42
43           ·   cannot start with a +, - or a dot,
44
45           ·   cannot contain two consecutive dots,
46
47           ·   must end with the .djvu or the .djv extension.
48
49
50       --page-id-prefix=prefix
51           Equivalent to “--page-id-template=prefix{page:04*}.djvu”.
52
53       --page-title-template=template
54           Specifies the template for page titles. Consult the “TEMPLATE
55           LANGUAGE” section for the template language description.
56
57           The default template is “{label}”.
58
59       --no-page-titles
60           Don't set page titles.
61
62   Resolution, page size
63       -d, --dpi=resolution
64           Specifies the desired resolution to resolution dots per inch. The
65           default is 300 dpi. The allowed range is: 72 ≤ resolution ≤ 6000.
66
67       --media-box
68           Use MediaBox to determine page size.  CropBox is used by default.
69
70       --page-size=widthxheight
71           Specifies the preferred page size to width pixels × height pixels.
72           The actual page size may be altered in order to respect aspect
73           ratio and DjVu limitations on resolution. (This option takes
74           precedence over -d/--dpi.)
75
76       --guess-dpi
77           Try to guess native resolution by inspecting embedded images. Use
78           with care.
79
80   Image quality
81       --bg-slices=n+...+n, --bg-slices=n,...,n
82           Specifies the encoding quality of the IW44 background layer. This
83           option is similar to the -slice option of c44. Consult the c44(1)
84           manual page for details. The default is 72+11+10+10.
85
86       --bg-subsample=n
87           Specifies the background subsampling ratio. The default is 3. Valid
88           values are integers between 1 and 12, inclusive.
89
90       --fg-colors=default
91           Try to preserve all the foreground layer colors. This is the
92           default.
93
94       --fg-colors=web
95           Reduce foreground layer colors to the web palette (216 colors).
96           This option is not recommended.
97
98       --fg-colors=n
99           Use GraphicsMagick to reduce number of distinct colors in the
100           foreground layer to n. Valid values are integers between 1 and
101           4080. This option is not recommended.
102
103       --fg-colors=black
104           Discard any color information from the foreground layer.
105
106       --monochrome
107           Render pages as monochrome bitmaps. With this option, --bg-...  and
108           --fg-...  options are not respected.
109
110       --loss-level=n
111           Specifies the aggressiveness of the lossy compression. The default
112           is 0 (lossless). Valid values are integers between 0 and 200,
113           inclusive. This option is similar to the -losslevel option of cjb2;
114           consult the cjb2(1) manual page for details. This option can be
115           used only if the --monochrome option is also enabled.
116
117       --lossy
118           Synonym for --loss-level=100.
119
120       --anti-alias
121           Enable font and vector anti-aliasing. This option is not
122           recommended.
123
124   Extraction
125       --no-metadata
126           Don't extract the metadata.
127
128           By default:
129
130           ·   The following entries of the document information dictionary
131               are extracted: Title, Author, Subject, Creator, Producer,
132               CreationDate, ModDate. Timestamps are formatted according to
133               RFC 3999[1], with date and time components separated by a
134               single space.
135
136           ·   The XMP metadata is extracted (or created) and updated
137               accordingly.
138
139
140               Note
141               If multiple input documents are specified, only metadata of the
142               first one is taken into account.
143
144       --verbatim-metadata
145           Keep the original metadata intact.
146
147       --no-outline
148           Don't extract the document outline.
149
150       --hyperlinks=border-avis
151           Make hyperlink borders always visible.
152
153           By default, a hyperlink border is visible only when the mouse is
154           over the hyperlink.
155
156       --hyperlinks=#RRGGBB
157           Force the specified border color for hyperlinks.
158
159       --no-hyperlinks, --hyperlinks=none
160           Don't extract hyperlinks.
161
162       --no-text
163           Don't extract the text.
164
165       --words
166           Extract the text. Record the location of every word. This is the
167           default.
168
169       --lines
170           Extract the text. Record the location of every line, rather that
171           every word.
172
173       --crop-text
174           Extract no text outside the page boundary.
175
176       --no-nfkc
177           Do not apply NFKC[2] normalization on the text, except for
178           characters from the Alphabetic Presentation Forms block[3]
179           (U+FB00–U+FB4F), which are normalized unconditionally.
180
181           The default is to apply NFKC normalization on all characters.
182
183       --filter-text=command-line
184           Filter the text through the command-line. The provided filter must
185           preserve whitespace, control characters and decimal digits.
186
187           This option implies --no-nfkc.
188
189       -p, --pages=page-range
190           Specifies pages to convert.  page-range is a comma-separated list
191           of sub-ranges. Each sub-range is either a single page (e.g. 17) or
192           a contiguous range of pages (e.g. 37-42). Duplicate page numbers
193           are not allowed. Pages are numbered from 1.
194
195           The default is to convert all pages.
196
197   Performance
198       -j, --jobs=n
199           Use n threads to perform conversion. The default is to use one
200           thread.
201
202       -j0, --jobs=0
203           Determine automatically how many threads to use to perform
204           conversion.
205
206   Verbosity, help
207       -v, --verbose
208           Display more informational messages while converting the file.
209
210       -q, --quiet
211           Don't display informational messages while converting the file.
212
213       --version
214           Output version information and exit.
215
216       -h, --help
217           Display help and exit.
218

ENVIRONMENT

220       The following environment variables affects pdf2djvu on Unix systems:
221
222       OMP_*
223           Details of runtime behavior with respect to parallelism can be
224           controlled by several environment variables. Please refer to the
225           OpenMP API specification[4] for details.
226
227       TMPDIR
228           pdf2djvu makes heavy use of temporary files. It will store them in
229           a directory specified by this variable. The default is /tmp.
230

TEMPLATE LANGUAGE

232   Template syntax
233       The template language is roughly modeled on the Python string
234       formatting syntax[5].
235
236       A template is a piece of text which contains fields, surrounded by
237       curly braces {}. Fields are replaced with appropriately formatted
238       values when the template is evaluated. Moreover, {{ is replaced with a
239       single { and }} is replaced with a single }.
240
241   Field syntax
242       Each field consists of a variable name, optionally followed by a shift,
243       optionally followed by a format specification.
244
245       The shift is a signed (i.e. starting with a + or - character) integer.
246
247       The format specification consists of a colon, followed by a width
248       specification.
249
250       The width specification is a decimal integer defining the minimum field
251       width. If not specified, then the field width will be determined by the
252       content. Preceding the width specification with a zero (0) character
253       enables zero-padding.
254
255       The width specification is optionally followed by an asterisk (*)
256       character, which increases the minimum field width to the width of the
257       longest possible content of the variable.
258
259   Available variables
260       dpage
261           Page number in the DjVu document.
262
263       page, spage
264           Page number in the PDF document.
265
266       label
267           Page label (logical page number) in the PDF document.
268
269           This variable is available only for page titles.
270

IMPLEMENTATION DETAILS

272   Layer separation algorithm
273       Unless the --monochrome option is on, pdf2djvu uses the following naive
274       layer separation algorithm:
275
276        1. For each page, do the following:
277
278            1. Rasterize the page into a pixmap, in the usual manner.
279
280            2. Rasterize the page into another pixmap, omitting the following
281               page elements:
282
283               ·   text,
284
285               ·   1 bit-per-pixel raster images,
286
287               ·   vector elements (except fills of large areas).
288
289
290            3. Compare both pixmaps, pixel by pixel:
291
292                1. If their colors match, classify the pixel as a part of the
293                   background layer.
294
295                2. Otherwise, classify the pixel as a part of the foreground
296                   layer.
297
298
299
300

BUG REPORTS

302       If you find a bug in pdf2djvu, please report it at the issue tracker[6]
303       or to the mailing list[7].
304

SEE ALSO

306       djvu(1), djvudigital(1), csepdjvu(1)
307

NOTES

309        1. RFC 3999
310           https://www.ietf.org/rfc/rfc3339
311
312        2. NFKC
313           https://unicode.org/reports/tr15/
314
315        3. Alphabetic Presentation Forms block
316           https://unicode.org/charts/PDF/UFB00.pdf
317
318        4. OpenMP API specification
319           https://www.openmp.org/specifications/
320
321        5. Python string formatting syntax
322           https://docs.python.org/2/library/string.html#format-string-syntax
323
324        6. the issue tracker
325           https://github.com/jwilk/pdf2djvu/issues
326
327        7. the mailing list
328           https://groups.io/g/pdf2djvu
329
330
331
332pdf2djvu 0.9.17.1                 2020-08-07                       PDF2DJVU(1)
Impressum