1UNPAPER(1) unpaper UNPAPER(1)
2
3
4
6 unpaper - unpaper
7
9 unpaper [options] (input patterns output patterns | input files output
10 files)
11
13 unpaper is a post-processing tool for scanned sheets of paper, espe‐
14 cially for book pages that have been scanned from previously created
15 photocopies. The main purpose is to make scanned book pages better
16 readable on screen after conversion to PDF. Additionally, unpaper might
17 be useful to enhance the quality of scanned pages before performing op‐
18 tical character recognition (OCR).
19
20 unpaper tries to clean scanned images by removing dark edges that ap‐
21 peared through scanning or copying on areas outside the actual page
22 content (e.g. dark areas between the left-hand-side and the
23 right-hand-side of a double- sided book-page scan). The program also
24 tries to detect misaligned centering and rotation of pages and will au‐
25 tomatically straighten each page by rotating it to the correct angle.
26 This process is called "deskewing". Note that the automatic processing
27 will sometimes fail. It is always a good idea to manually control the
28 results of unpaper and adjust the parameter settings according to the
29 requirements of the input. Each processing step can also be disabled
30 individually for each sheet.
31
32 Input and output files can be in either .pbm, .pgm or .ppm format, thus
33 generally in .pnm format, as also used by the Linux scanning tools
34 scanimage and scanadf. Conversion to PDF can e.g. be achieved with the
35 Linux tools pgm2tiff, tiffcp and tiff2pdf.
36
38 Input and output files need to be designed either by using patterns or
39 an ordered list of input and output files; if patterns are used, such
40 as %04d, then they are substituted for the input and output sheet num‐
41 ber before opening the file for input or output.
42
43 If you're not using patterns, then the program expects one or two input
44 files depending on what is passed as --input-pages and one or two out‐
45 put files depending on what is passed as --output-pages, in order.
46
47 Missing output file names are fatal and will stop processing; missing
48 initial input file names are fatal, and so is any missing input file if
49 a range of sheets is defined through --sheet or --end-sheet.
50
51 unpaper accepts files in PNM format, which means they might be in .pbm,
52 .pgm, .ppm or .pnm format, which is what is produced by Linux command
53 line scanning tools such as scanimage and scanadf.
54
56 -l { single | double | none } ; --layout { single | double | none }
57 Set default layout options for a sheet:
58
59 single One page per sheet.
60
61 double Two pages per sheet, landscape orientation (one page on
62 the left half, one page on the right half).
63
64 none No auto-layout, mask-scan-points may individually be
65 specified.
66
67 Using single or double automatically sets corresponding
68 --mask-scan-points. The default is single.
69
70 -start sheet ; --start-sheet start-sheet
71 Number of first sheet to process in multi-sheet mode. (default:
72 1)
73
74 -end sheet ; --end-sheet sheet
75 Number of last sheet to process in multi-sheet mode. -1 indi‐
76 cates processing until no more input file with the corresponding
77 page number is available (default: -1)
78
79 -# sheet-range ; --sheet sheet-range
80 Optionally specifies which sheets to process in the range be‐
81 tween start-sheet and end-sheet.
82
83 -x sheet-range ; --exclude sheet-range
84 Excludes sheets from processing in the range between start-sheet
85 and end-sheet.
86
87 --pre-rotate { -90 | 90 }
88 Rotates the whole image clockwise (90) or anti-clockwise (-90)
89 before any other processing.
90
91 --post-rotate { -90 | 90 }
92 Rotates the whole image clockwise (90) or anti-clockwise (-90)
93 after any other processing.
94
95 -M { v | h | v,h } ; --pre-mirror { v | h | v,h }
96 Mirror the image, after possible pre-rotation. Either v (for
97 vertical mirroring), h (for horizontal mirroring) or v,h (for
98 both) can be specified.
99
100 --post-mirror { v | h | v,h }
101 Mirror the image, after any other processing except possible
102 post-rotation. Either v (for vertical mirroring), h (for hori‐
103 zontal mirroring) or v,h (for both) can be specified.
104
105 --pre-shift h, v
106 Shift the image before further processing. Values for h (hori‐
107 zontal shift) and v (vertical shift) can either be positive or
108 negative.
109
110 --post-shift h, v
111 Shift the image after other processing. Values for h (horizontal
112 shift) and v (vertical shift) can either be positive or nega‐
113 tive.
114
115 --pre-wipe left, top, right, bottom
116 Manually wipe out an area before further processing. Any pixel
117 in a wiped area will be set to white. Multiple areas to be wiped
118 may be specified by multiple occurrences of this options.
119
120 --post-wipe left, top, right, bottom
121 Manually wipe out an area after processing. Any pixel in a wiped
122 area will be set to white. Multiple areas to be wiped may be
123 specified by multiple occurrences of this options.
124
125 --pre-border left, top, right, bottom
126 Clear the border-area of the sheet before further processing.
127 Any pixel in the border area will be set to white.
128
129 --post-border left, top, right, bottom
130 Clear the border-area of the sheet after other processing. Any
131 pixel in the border area will be set to white.
132
133 --pre-mask x1, y1, x2, y2
134 Specify masks to apply before any other processing. Any pixel
135 outside a mask will be set to white, unless another mask in‐
136 cludes this pixel.
137
138 Only pixels inside a mask will remain. Multiple masks may be
139 specified. No deskewing will be applied to the masks specified
140 by --pre-mask.
141
142 -s { width, height | size-name } ; --size { width, height | size-name }
143 Change the sheet size before other processing is applied. Con‐
144 tent on the sheet gets zoomed to fit to the appropriate size,
145 but the aspect ratio is preserved. Instead, if the sheet's as‐
146 pect ratio changes, the zoomed content gets centered on the
147 sheet.
148
149 Possible values for size-name are: a5, a4, a3, letter, legal.
150 All size names can also be applied in rotated landscape orienta‐
151 tion, use a4-landscape, letter-landscape etc.
152
153 --post-size { width, height | size-name }
154 Change the sheet size preserving the content's aspect ratio af‐
155 ter other processing steps are applied.
156
157 --stretch { width, height | size-name }
158 Change the sheet size before other processing is applied. Con‐
159 tent on the sheet gets stretched to the specified size, possibly
160 changing the aspect ratio.
161
162 --post-stretch { width, height | size-name }
163 Change the sheet size after other processing is applied. Content
164 on the sheet gets stretched to the specified size, possibly
165 changing the aspect ratio.
166
167 -z factor ; --zoom factor
168 Change the sheet size according to the given factor before other
169 processing is done.
170
171 --post-zoom factor
172 Change the sheet size according to the given factor after pro‐
173 cessing is done.
174
175 -bn { v | h | v, h } ; --blackfilter-scan-direction { v | h | v, h }
176 Directions in which to search for solidly black areas. Either v
177 (for vertical searching), h (for horizontal searching) or v,h
178 (for both) can be specified. The blackfilter works by moving a
179 virtual bar across each page. The darkness inside the virtual
180 bar is determined and if it exceeds blackfilter-scan-threshold
181 black pixels in the area are filled. During filling the black‐
182 ness of each pixel is determined by black-threshold. The bar is
183 then moved by blackfilter-scan-step in the scanning direction.
184 Once a page border is encountered the bar is moved down (hori‐
185 zontal scan) or right (vertical scan) by its blackfil‐
186 ter-scan-size.
187
188 -bs { size | h-size, v-size } ; --blackfilter-scan-size { size |
189 h-size, v-size }
190 Size of virtual bar in direction of scanning (meaning width for
191 horizontal pass, height for vertical pass) used for black area
192 detection. Two values may be specified to individually set the
193 size for the horizontal scanning-pass and the vertical pass.
194 (default: 20,20)
195
196 -bd { depth | h-depth, v-depth } ; --blackfilter-scan-depth { depth |
197 h-depth, v-depth }
198 Depth of virtual bar in non-scanning direction (meaning height
199 for horizontal pass, width for vertical pass) used for black
200 area detection. Two values may be specified to individually set
201 the depth for the horizontal scanning-pass and the vertical
202 pass. (default: 500,500)
203
204 -bp { step | h-step, v-step } ; --blackfilter-scan-step { step |
205 h-step, v-step }
206 Steps to move virtual bar for black area detection. Two values
207 may be specified to individually set the step for the horizontal
208 scanning-pass and the vertical pass. (default: 5,5)
209
210 -bt threshold ; --blackfilter-scan-threshold threshold
211 Ratio of dark pixels above which a black area gets detected.
212 (default: 0.95).
213
214 -bx left, top, right, bottom ; --blackfilter-scan-exclude left, top,
215 right, bottom
216 Area on which the blackfilter should not operate. This can be
217 useful to prevent the blackfilter from working on inner page
218 content. May be specified multiple times to set more than one
219 area.
220
221 -bi intensity ; --blackfilter-intensity intensity
222 Intensity with which to delete black areas. This deletes pixels
223 around the virtual scan bar. Larger values will leave less
224 noise-pixels around former black areas, but may delete page con‐
225 tent. (default: 20)
226
227 -ni intensity ; -noisefilter-intensity intensity
228 Intensity with which to delete individual pixels or tiny clus‐
229 ters of pixels. Any cluster which only contains intensity dark
230 pixels together will be deleted. (default: 4)
231
232 -ls { size | h-size, v-size } ; --blurfilter-size { size | h-size,
233 v-size }
234 Size of blurfilter area to search for "lonely" clusters of pix‐
235 els. (default: 100,100)
236
237 -lp { step | h-step, v-step } ; --blurfilter-step { step | h-step,
238 v-step }
239 Size of "blurring" steps in each direction. (default: 50,50)
240
241 -li ratio ; --blurfilter-intensity ratio
242 Relative intensity with which to delete tiny clusters of pixels.
243 Any blurred area which contains at most the ratio of dark pixels
244 will be cleared. (default: 0.01)
245
246 -gs { size | h-size, v-size } ; --grayfilter-size { size | h-size,
247 v-size }
248 Size of grayfilter mask to search for "gray-only" areas of pix‐
249 els. (default: 50,50)
250
251 -gp { step | h-step, v-step } ; --grayfilter-step { step | h-step,
252 v-step }
253 Size of steps moving the grayfilter mask in each direction. (de‐
254 fault: 20,20)
255
256 -gt ratio ; --grayfilter-threshold ratio
257 Relative intensity of grayness which is accepted before clearing
258 the grayfilter mask in cases where no black pixel is found in
259 the mask. (default: 0.5)
260
261 -p x, y; --mask-scan-point x, y
262 Manually set starting point for mask-detection. Multiple
263 --mask-scan-point options may be specified to detect multiple
264 masks.
265
266 -m x1, y1, x2, y2; --mask x1, y1, x2, y2
267 Manually add a mask, in addition to masks automatically detected
268 around the --mask-scan-point coordinates (unless --no-mask-scan
269 is specified).
270
271 Any pixel outside a mask will be set to white, unless another
272 mask covers this pixel.
273
274 -mn { v \| h \| v,h }; --mask-scan-direction { v \| h \| v,h }
275 Directions in which to search for mask borders, starting from
276 --mask-scan-point coordinates. Either v (for vertical mirror‐
277 ing), h (for horizontal mirroring) or v,h (for both) can be
278 specified. (default: h, as v may cut text- paragraphs on sin‐
279 gle-page sheets)
280
281 -ms { size \| h-size, v-size }; --mask-scan-size { size \| h-size,
282 v-size }
283 Width of the virtual bar used for mask detection. Two values may
284 be specified to individually set horizontal and vertical size.
285 (default: 50,50)
286
287 -md { depth \| h-depth, v-depth }; --mask-scan-depth { depth \|
288 h-depth, v-depth }
289 Height of the virtual bar used for mask detection. (default:
290 -1,-1, using the total width or height of the sheet)
291
292 -mp { step \| h-step, v-step }; --mask-scan-step { step \| h-step,
293 v-step }
294 Steps to move the virtual bar for mask detection. (default: 5,5)
295
296 -mt { threshold \| h-threshold, v-threshold }; --mask-scan-threshold {
297 threshold \| h-threshold, v-threshold }
298 Ratio of dark pixels below which an edge gets detected, relative
299 to maximum blackness when counting from the start coordinate
300 heading towards one edge. (default: 0.1)
301
302 -mm w, h; --mask-scan-minimum w, h
303 Minimum allowed size of an auto-detected mask. Masks detected
304 below this size will be ignored and set to the size specified by
305 mask-scan-maximum. (default: 100,100)
306
307 -mM w, h; --mask-scan-maximum w, h
308 Maximum allowed size of an auto-detected mask. Masks detected
309 above this size will be shrunk to the maximum value, each direc‐
310 tion individually. (default: sheet size, or page size derived
311 from --layout option)
312
313 -mc color; --mask-color color
314 Color value with which to wipe out pixels not covered by any
315 mask. Maybe useful for testing in order to visualize the effect
316 of masking. (Note that an RGB-value is expected: R*65536 +
317 G*256 + B.)
318
319 -dn { left \| top \| right \| bottom },...; --deskew-scan-direction {
320 left \| top \| right \| bottom },...
321 Edges from which to scan for rotation. Each edge of a mask can
322 be used to detect the mask's rotation. If multiple edges are
323 specified, the average value will be used, unless the statisti‐
324 cal deviation exceeds --deskew-scan-deviation. Use left for
325 scanning from the left edge, top for scanning from the top edge,
326 right for scanning from the right edge, bottom for scanning from
327 the bottom. Multiple directions can be separated by commas. (de‐
328 fault: left,right)
329
330 -ds pixels; --deskew-scan-size pixels
331 Size of virtual line for rotation detection. (default: 1500)
332
333 -dd ratio; --deskew-scan-depth ratio
334 Amount of dark pixels to accumulate until scanning is stopped,
335 relative to scan-bar size. (default: 0.5)
336
337 -dr degrees; --deskew-scan-range degrees
338 Range in which to search for rotation, from -degrees to +degrees
339 rotation. (default: 5.0)
340
341 -dp degrees; --deskew-scan-step degrees
342 Steps between single rotation-angle detections. Lower numbers
343 lead to better results but slow down processing. (default: 0.1)
344
345 -dv deviation; --deskew-scan-deviation deviation
346 Maximum statistical deviation allowed among the results from de‐
347 tected edges. No rotation if exceeded. (default: 1.0)
348
349 -W left, top, right, bottom; --wipe left, top, right, bottom
350 Manually wipe out an area. Any pixel in a wiped area will be set
351 to white. Multiple --wipe areas may be specified. This is ap‐
352 plied after deskewing and before automatic border-scan.
353
354 -mw { size \| left, right }; --middle-wipe { size \| left, right }
355 If --layout is set to double, this may specify the size of a
356 middle area to wipe out between the two pages on the sheet. This
357 may be useful if the blackfilter fails to remove some black ar‐
358 eas (e.g. resulting from photo-copying in the middle between
359 two pages).
360
361 -B left, top, right, bottom; --border left, top, right, bottom
362 Manually add a border. Any pixel in the border area will be set
363 to white. This is applied after deskewing and before automatic
364 border-scan.
365
366 -Bn { v \| h \| v,h }; --border-scan-direction { v \| h \| v,h }
367 Directions in which to search for outer border. Either v (for
368 vertical mirroring), h (for horizontal mirroring) or v,h (for
369 both) can be specified. (default: v)
370
371 -Bs { size \| h-size, v-size }; --border-scan-size { size \| h-size,
372 v-size }
373 Width of virtual bar used for border detection. Two values may
374 be specified to individually set horizontal and vertical size.
375 (default: 5,5)
376
377 -Bp { step \| h-step, v-step }; --border-scan-step { step \| h-step,
378 v-step }
379 Steps to move virtual bar for border detection. (default: 5,5)
380
381 -Bt threshold; --border-scan-threshold threshold
382 Absolute number of dark pixels covered by the border-scan mask
383 above which a border is detected. (default: 5)
384
385 -Ba { left \| top \| right \| bottom }; --border-align { left \| top \|
386 right \| bottom }
387 Direction where to shift the detected border-area. Use --bor‐
388 der-margin to specify horizontal and vertical distances to be
389 kept from the sheet-edge. (default: none)
390
391 -Bm vertical, horizontal; --border-margin vertical, horizontal
392 Distance to keep from the sheet edge when aligning a border
393 area. May use measurement suffices such as cm, in.
394
395 -w threshold; --white-threshold threshold
396 Brightness ratio above which a pixel is considered white. (de‐
397 fault: 0.9)
398
399 -b threshold; --black-threshold threshold
400 Brightness ratio below which a pixel is considered black
401 (non-gray). This is used by the gray-filter and the blackfil‐
402 ter. This value is also used when converting a grayscale image
403 to black-and-white mode (default: 0.33)
404
405 -ip { 1 \| 2 }; --input-pages { 1 \| 2 }
406 If 2 is specified, read two input images instead of one and in‐
407 ternally combine them to a doubled-layout sheet before further
408 processing. Before internally combining, --pre-rotation is op‐
409 tionally applied individually to both input images as the very
410 first processing steps.
411
412 -op { 1 \| 2 }; --output-pages { 1 \| 2 }
413 If 2 is specified, write two output images instead of one, as a
414 result of splitting a doubled-layout sheet after processing. Af‐
415 ter splitting the sheet, --post-rotation is optionally applied
416 individually to both output images as the very last processing
417 step.
418
419 -S { width, height \| size-name }; --sheet-size { width, height \|
420 size-name }
421 Force a fix sheet size. Usually, the sheet size is determined by
422 the input image size (if input-pages=1), or by the double size
423 of the first page in a two-page input set (if input-pages=2). If
424 the input image is smaller than the size specified here, it will
425 appear centered and surrounded with a white border on the sheet.
426 If the input image is bigger, it will be centered and the edges
427 will be cropped. This option may also be helpful to get regular
428 sized output images if the input image sizes differ. Standard
429 size-names like a4-landscape, letter, etc. may be used (see
430 --size). (default: as in input file)
431
432 --sheet-background { black \| white }
433 Sets a color with which the sheet is filled before any image is
434 loaded and placed onto it. This can be useful when the sheet
435 size and the image size differ.
436
437 --no-blackfilter sheet-range
438 Disables black area scan. Individual sheet indices can be speci‐
439 fied.
440
441 --no-noisefilter sheet-range
442 Disables the noisefilter. Individual sheet indices can be speci‐
443 fied.
444
445 --no-blurfilter sheet-range
446 Disables the blurfilter. Individual sheet indices can be speci‐
447 fied.
448
449 --no-grayfilter sheet-range
450 Disables the grayfilter. Individual sheet indices can be speci‐
451 fied.
452
453 --no-mask-scan sheet-range
454 Disables mask-detection. Masks explicitly set by --mask will
455 still have effect. Individual sheet indices can be specified.
456
457 --no-mask-center sheet-range
458 Disables auto-centering of each mask. Auto-centering is per‐
459 formed by default if the --layout option has been set. Individ‐
460 ual sheet indices can be specified.
461
462 --no-deskew sheet-range
463 Disables deskewing. Individual sheet indices can be specified.
464
465 --no-wipe sheet-range
466 Disables explicit wipe-areas. This means the effect of parameter
467 --wipe can be disabled individually per sheet.
468
469 --no-border sheet-range
470 Disables explicitly set borders. This means the effect of param‐
471 eter --border can be disabled individually per sheet.
472
473 --no-border-scan sheet-range
474 Disables border-scanning from the edges of the sheet. Individual
475 sheet indices can be specified.
476
477 --no-border-align sheet-range
478 Disables aligning of the area detected by border-scanning (see
479 --border-align). Individual sheet indices can be specified.
480
481 -n sheet-range; --no-processing sheet-range
482 Do not perform any processing on a sheet except pre/post rotat‐
483 ing and mirroring, and file-depth conversions on saving. This
484 option has the same effect as setting all --no-xxx options to‐
485 gether. Individual sheet indices can be specified.
486
487 --interpolate { nearest \| linear \| cubic }
488 Set the interpolation function used for deskewing and stretch‐
489 ing. The cubic option provides the best image quality, while
490 nearest is the fastest. (default: cubic)
491
492 --no-multi-pages
493 Disable multi-page processing even if the input filename con‐
494 tains a % (usually indicating the start of a placeholder for the
495 page counter).
496
497 --dpi dpi
498 Dots per inch used for conversion of measured size values, like
499 e.g. 21cm,27.9cm. Mind that this parameter should occur before
500 specifying any size value with measurement suffix. (default:
501 300)
502
503 -t { pbm \| pgm \| ppm }; --type { pbm \| pgm> \| ppm }
504 Output file type (and bit depth). If not specified, the one with
505 the same, or closest, pixel format as the original input files
506 will be used.
507
508 pbm Portable Bit Map, monochrome raw image.
509
510 pgm Portable Grayscale Map, 8-bit per pixel grayscale raw im‐
511 age.
512
513 ppm Portable Pixel Map, 24-bit per pixel RGB raw image.
514
515 -T ; --test-only
516 Do not write any output. May be useful in combination with
517 --verbose to get information about the input.
518
519 -si nr; --start-input nr
520 Set the first page number to substitute for '%d' in input file‐
521 names. Every time the input file sequence is repeated, this
522 number gets increased by 1. (default: (startsheet-1)*input‐
523 pages+1)
524
525 -so nr; --start-output nr
526 Set the first page number to substitute for '%d' in output file‐
527 names. Every time the output file sequence is repeated, this
528 number gets increased by 1. (default: (startsheet-1)*output‐
529 pages+1)
530
531 --insert-blank nr [,nr...]
532 Use blank input instead of an input file from the input file se‐
533 quence at the specified index-positions. The input file sequence
534 will be interrupted temporarily and will continue with the next
535 input file afterwards. This can be useful to insert blank con‐
536 tent into a sequence of input images.
537
538 --replace-blank nr [,nr...]
539 Like --insert-blank, but the input images at the specified index
540 positions get replaced with blank content and thus will be ig‐
541 nored.
542
543 --overwrite
544 Allow overwriting existing files. Otherwise the program termi‐
545 nates with an error if an output file to be written already ex‐
546 ists.
547
548 -q ; --quiet
549 Quiet mode, no output at all.
550
551 -v ; --verbose
552 Verbose output, more info messages.
553
554 -vv Even more verbose output, show parameter settings before pro‐
555 cessing.
556
557 -V ; --version
558 Output version and build information.
559
561 The unpaper authors
562
564 2022, The unpaper Authors
565
566
567
568
569 May 31, 2022 UNPAPER(1)