1PDF2DJVU(1) pdf2djvu manual PDF2DJVU(1)
2
3
4
6 pdf2djvu - creates DjVu files from PDF files
7
9 pdf2djvu [{-o | --output} output-djvu-file] [option...] pdf-file...
10
11 pdf2djvu {-i | --indirect} index-djvu-file [option...] pdf-file...
12
13 pdf2djvu {--version | --help | -h}
14
16 This program creates a DjVu file from one or more Portable Document
17 Format files.
18
20 pdf2djvu accepts the following options:
21
22 Document type, file names
23 -o, --output=output-djvu-file
24 Generate a bundled multi-page document. Write the file into
25 output-djvu-file instead of standard output.
26
27 -i, --indirect=index-djvu-file
28 Generate an indirect multi-page document. Use index-djvu-file as
29 the index file name; put the component files into the same
30 directory. The directory must exist and be writable.
31
32 --page-id-template=template
33 Specifies the naming scheme for page identifiers. Consult the
34 “TEMPLATE LANGUAGE” section for the template language description.
35
36 The default template is “p{page:04*}.djvu”.
37
38 For portability reasons, page identifiers:
39
40 • must consist only of lowercase ASCII letters, digits, _, +, -
41 and dot,
42
43 • cannot start with a +, - or a dot,
44
45 • cannot contain two consecutive dots,
46
47 • must end with the .djvu or the .djv extension.
48
49
50 --page-id-prefix=prefix
51 Equivalent to “--page-id-template=prefix{page:04*}.djvu”.
52
53 --page-title-template=template
54 Specifies the template for page titles. Consult the “TEMPLATE
55 LANGUAGE” section for the template language description.
56
57 The default template is “{label}”.
58
59 --no-page-titles
60 Don't set page titles.
61
62 Resolution, page size
63 -d, --dpi=resolution
64 Specifies the desired resolution to resolution dots per inch. The
65 default is 300 dpi. The allowed range is: 72 ≤ resolution ≤ 6000.
66
67 --media-box
68 Use MediaBox to determine page size. CropBox is used by default.
69
70 --page-size=widthxheight
71 Specifies the preferred page size to width pixels × height pixels.
72 The actual page size may be altered in order to respect aspect
73 ratio and DjVu limitations on resolution. (This option takes
74 precedence over -d/--dpi.)
75
76 --guess-dpi
77 Try to guess native resolution by inspecting embedded images. Use
78 with care.
79
80 Image quality
81 --bg-slices=n+...+n, --bg-slices=n,...,n
82 Specifies the encoding quality of the IW44 background layer. This
83 option is similar to the -slice option of c44. Consult the c44(1)
84 manual page for details. The default is 72+11+10+10.
85
86 --bg-subsample=n
87 Specifies the background subsampling ratio. The default is 3. Valid
88 values are integers between 1 and 12, inclusive.
89
90 --fg-colors=default
91 Try to preserve all the foreground layer colors. This is the
92 default.
93
94 --fg-colors=web
95 Reduce foreground layer colors to the web palette (216 colors).
96 This option is not recommended.
97
98 --fg-colors=n
99 Use GraphicsMagick to reduce number of distinct colors in the
100 foreground layer to n. Valid values are integers between 1 and
101 4080. This option is not recommended.
102
103 --fg-colors=black
104 Discard any color information from the foreground layer.
105
106 --monochrome
107 Render pages as monochrome bitmaps. With this option, --bg-... and
108 --fg-... options are not respected.
109
110 --loss-level=n
111 Specifies the aggressiveness of the lossy compression. The default
112 is 0 (lossless). Valid values are integers between 0 and 200,
113 inclusive. This option is similar to the -losslevel option of cjb2;
114 consult the cjb2(1) manual page for details. This option can be
115 used only if the --monochrome option is also enabled.
116
117 --lossy
118 Synonym for --loss-level=100.
119
120 --anti-alias
121 Enable font and vector anti-aliasing. This option is not
122 recommended.
123
124 Extraction
125 --no-metadata
126 Don't extract the metadata.
127
128 By default:
129
130 • The following entries of the document information dictionary
131 are extracted: Title, Author, Subject, Creator, Producer,
132 CreationDate, ModDate. Timestamps are formatted according to
133 RFC 3999[1], with date and time components separated by a
134 single space.
135
136 • The XMP metadata is extracted (or created) and updated
137 accordingly.
138
139
140 Note
141 If multiple input documents are specified, only metadata of the
142 first one is taken into account.
143
144 --verbatim-metadata
145 Keep the original metadata intact.
146
147 --no-outline
148 Don't extract the document outline.
149
150 --hyperlinks=border-avis
151 Make hyperlink borders always visible.
152
153 By default, a hyperlink border is visible only when the mouse is
154 over the hyperlink.
155
156 --hyperlinks=#RRGGBB
157 Force the specified border color for hyperlinks.
158
159 --no-hyperlinks, --hyperlinks=none
160 Don't extract hyperlinks.
161
162 --no-text
163 Don't extract the text.
164
165 --words
166 Extract the text. Record the location of every word. This is the
167 default.
168
169 --lines
170 Extract the text. Record the location of every line, rather that
171 every word.
172
173 --crop-text
174 Extract no text outside the page boundary.
175
176 --no-nfkc
177 Do not apply NFKC[2] normalization on the text, except for
178 characters from the Alphabetic Presentation Forms block[3]
179 (U+FB00–U+FB4F), which are normalized unconditionally.
180
181 The default is to apply NFKC normalization on all characters.
182
183 --filter-text=command-line
184 Filter the text through the command-line. The provided filter must
185 preserve whitespace, control characters and decimal digits.
186
187 This option implies --no-nfkc.
188
189 -p, --pages=page-range
190 Specifies pages to convert. page-range is a comma-separated list
191 of sub-ranges. Each sub-range is either a single page (e.g. 17) or
192 a contiguous range of pages (e.g. 37-42). Duplicate page numbers
193 are not allowed. Pages are numbered from 1.
194
195 The default is to convert all pages.
196
197 Performance
198 -j, --jobs=n
199 Use n threads to perform conversion. The default is to use one
200 thread.
201
202 -j0, --jobs=0
203 Determine automatically how many threads to use to perform
204 conversion.
205
206 Verbosity, help
207 -v, --verbose
208 Display more informational messages while converting the file.
209
210 -q, --quiet
211 Don't display informational messages while converting the file.
212
213 --version
214 Output version information and exit.
215
216 -h, --help
217 Display help and exit.
218
220 The following environment variables affects pdf2djvu on Unix systems:
221
222 OMP_*
223 Details of runtime behavior with respect to parallelism can be
224 controlled by several environment variables. Please refer to the
225 OpenMP API specification[4] for details.
226
227 TMPDIR
228 pdf2djvu makes heavy use of temporary files. It will store them in
229 a directory specified by this variable. The default is /tmp.
230
232 Template syntax
233 The template language is roughly modeled on the Python string
234 formatting syntax[5].
235
236 A template is a piece of text which contains fields, surrounded by
237 curly braces {}. Fields are replaced with appropriately formatted
238 values when the template is evaluated. Moreover, {{ is replaced with a
239 single { and }} is replaced with a single }.
240
241 Field syntax
242 Each field consists of a variable name, optionally followed by a shift,
243 optionally followed by a format specification.
244
245 The shift is a signed (i.e. starting with a + or - character) integer.
246
247 The format specification consists of a colon, followed by a width
248 specification.
249
250 The width specification is a decimal integer defining the minimum field
251 width. If not specified, then the field width will be determined by the
252 content. Preceding the width specification with a zero (0) character
253 enables zero-padding.
254
255 The width specification is optionally followed by an asterisk (*)
256 character, which increases the minimum field width to the width of the
257 longest possible content of the variable.
258
259 Available variables
260 dpage
261 Page number in the DjVu document.
262
263 page, spage
264 Page number in the PDF document.
265
266 label
267 Page label (logical page number) in the PDF document.
268
269 This variable is available only for page titles.
270
272 Layer separation algorithm
273 Unless the --monochrome option is on, pdf2djvu uses the following naive
274 layer separation algorithm:
275
276 1. For each page, do the following:
277
278 1. Rasterize the page into a pixmap, in the usual manner.
279
280 2. Rasterize the page into another pixmap, omitting the following
281 page elements:
282
283 • text,
284
285 • 1 bit-per-pixel raster images,
286
287 • vector elements (except fills of large areas).
288
289
290 3. Compare both pixmaps, pixel by pixel:
291
292 1. If their colors match, classify the pixel as a part of the
293 background layer.
294
295 2. Otherwise, classify the pixel as a part of the foreground
296 layer.
297
298
299
300
302 If you find a bug in pdf2djvu, please report it at the issue tracker[6]
303 or to the mailing list[7].
304
306 djvu(1), djvudigital(1), csepdjvu(1)
307
309 1. RFC 3999
310 https://www.ietf.org/rfc/rfc3339
311
312 2. NFKC
313 https://unicode.org/reports/tr15/
314
315 3. Alphabetic Presentation Forms block
316 https://unicode.org/charts/PDF/UFB00.pdf
317
318 4. OpenMP API specification
319 https://www.openmp.org/specifications/
320
321 5. Python string formatting syntax
322 https://docs.python.org/2/library/string.html#format-string-syntax
323
324 6. the issue tracker
325 https://github.com/jwilk/pdf2djvu/issues
326
327 7. the mailing list
328 https://groups.io/g/pdf2djvu
329
330
331
332pdf2djvu 0.9.19 2022-08-09 PDF2DJVU(1)