1GSCAN2PDF(1) User Contributed Perl Documentation GSCAN2PDF(1)
2
3
4
6 gscan2pdf - A GUI to produce PDFs or DjVus from scanned documents
7
9 1. Scan one or several pages in with File/Scan
10 2. Create PDF of selected pages with File/Save
11
13 None
14
16 gscan2pdf has the following command-line options:
17
18 --device=device
19 Specifies the device to use, instead of getting the list of devices
20 from via the SANE API. This can be useful if the scanner is on a
21 remote computer which is not broadcasting its existence.
22
23 --help
24 Displays this help page and exits.
25
26 --log=log-file
27 Specifies a file to store logging messages.
28
29 --debug, --info, --warn, --error, --fatal
30 Defines the log level. If a log file is specified, this defaults
31 to --debug, otherwise --error.
32
33 --import=PDF|DjVu|images
34 Imports the specified file(s). If the document has more than one
35 page, a window is displayed to select the required pages.
36
37 --import-all=PDF|DjVu|images Imports all pages of the specified
38 file(s).
39 --version
40 Displays the program version and exits.
41
42 Scanning is handled with SANE via scanimage. PDF conversion is done by
43 PDF::Builder. TIFF export is handled by libtiff (faster and smaller
44 memory footprint for multipage files).
45
47 To diagnose a possible error, start gscan2pdf from the command line
48 with logging enabled:
49
50 "gscan2pdf --log=file.log"
51
52 and check file.log.
53
55 None
56
58 gscan2pdf creates a text resource file in ~/.config/gscan2pdfrc. The
59 directory can be changed by setting the $XDG_CONFIG_HOME variable.
60 Generally, however, preferences should be changed via the
61 Edit/Preferences menu, or are captured automatically during normal
62 usage of the program.
63
65 None known.
66
68 Whilst it is possible to import PDFs, this is intended to be able to
69 round-trip files created by gscan2pdf.
70
72 gscan2pdf is available on Sourceforge
73 (<https://sourceforge.net/projects/gscan2pdf/files/gscan2pdf/>).
74
75 Debian-based
76 If you are using Debian, you should find that sid
77 <https://www.debian.org/releases/sid/> has the latest version already
78 packaged.
79
80 If you are using a Ubuntu-based system, you can automatically keep up
81 to date with the latest version via the ppa:
82
83 "sudo apt-add-repository ppa:jeffreyratcliffe/ppa"
84
85 If you are you are using Synaptic, then use menu Edit/Reload Package
86 Information, search for gscan2pdf in the package list, and lo and
87 behold, you can install the nice shiny new version.
88
89 From the command line:
90
91 "sudo apt update"
92
93 "sudo apt install gscan2pdf"
94
95 From source
96 The source is hosted in the files section of the gscan2pdf project on
97 Sourceforge (<https://sourceforge.net/projects/gscan2pdf/files/>).
98
99 From the repository
100 gscan2pdf uses Git for its Revision Control System. You can browse the
101 tree at <https://sourceforge.net/p/gscan2pdf/code/>.
102
103 Git users can clone the complete tree with "git clone
104 git://git.code.sf.net/p/gscan2pdf/code"
105
107 Having downloaded the source either from a Sourceforge file release, or
108 from the Git repository, unpack it if necessary with "tar xvfz
109 gscan2pdf-x.x.x.tar.gz cd gscan2pdf-x.x.x"
110
111 "perl Makefile.PL", will create the Makefile.
112
113 "make test" should run several hundred tests to confirm that things
114 will work properly on your system.
115
116 You can install directly from the source with "make install", but
117 building the appropriate package for your distribution should be as
118 straightforward as "make debdist" or "make rpmdist". However, you will
119 additionally need the rpm, devscripts, fakeroot, debhelper and gettext
120 packages.
121
123 The list below looks daunting, but all packages are available from any
124 reasonable up-to-date distribution. If you are using Synaptic, having
125 installed gscan2pdf, locate the gscan2pdf entry in Synaptic, right-
126 click it and you can install them under Recommends. Note also that the
127 library names given below are the Debian/Ubuntu ones. Those
128 distributions using RPM typically use perl(module) where Debian has
129 libmodule-perl.
130
131 Required
132 libgtk3-perl >= 0.028
133 There is a bug in version of libgtk3-perl before 0.028 that
134 causes gscan2pdf to crash when saving. Whilst I could prevent
135 gscan2pdf from crashing, it would still be impossible to save
136 anything, rendering gscan2pdf rather useless.
137
138 libgtk3-simplelist-perl
139 A simple interface to Gtk3's complex MVC list widget
140
141 liblocale-gettext-perl (>= 1.05)
142 Using libc functions for internationalisation in Perl
143
144 libpdf-builder-perl
145 provides the functions for creating PDF documents in Perl
146
147 libsane
148 API library for scanners
149
150 libimage-sane-perl
151 Perl bindings for libsane.
152
153 libset-intspan-perl
154 manages sets of integers
155
156 libtiff-tools
157 TIFF manipulation and conversion tools
158
159 Imagemagick
160 Image manipulation programs
161
162 perlmagick
163 A perl interface to the libMagick graphics routines
164
165 sane-utils
166 API library for scanners -- utilities.
167
168 Optional
169 sane
170 scanner graphical frontends. Only required for the scanadf
171 frontend.
172
173 unpaper
174 post-processing tool for scanned pages. See
175 <https://www.flameeyes.eu/projects/unpaper>.
176
177 xdg-utils
178 Desktop integration utilities from freedesktop.org. Required
179 for Email as PDF. See
180 <https://www.freedesktop.org/wiki/Software/xdg-utils/>
181
182 djvulibre-bin
183 Utilities for the DjVu image format. See
184 <http://djvu.sourceforge.net/>
185
186 gocr
187 A command line OCR. See <http://jocr.sourceforge.net/>.
188
189 tesseract
190 A command line OCR. See
191 <https://github.com/tesseract-ocr/tesseract>
192
193 cuneiform
194 A command line OCR. See <http://launchpad.net/cuneiform-linux>
195
197 There are two mailing lists for gscan2pdf:
198
199 gscan2pdf-announce
200 A low-traffic list for announcements, mostly of new releases. You
201 can subscribe at
202 <https://lists.sourceforge.net/lists/listinfo/gscan2pdf-announce>
203
204 gscan2pdf-help
205 General support, questions, etc.. You can subscribe at
206 <https://lists.sourceforge.net/lists/listinfo/gscan2pdf-help>
207
209 Before reporting bugs, please read the "FAQs" section.
210
211 Please report any bugs found, preferably against the Debian
212 package[1][2]. You do not need to be a Debian user, or set up an
213 account to do this. The Debian tool "reportbug" provides a convenient
214 GUI for doing so.
215
216 1. https://packages.debian.org/sid/gscan2pdf
217 2. https://www.debian.org/Bugs/
218
219 Alternatively, there is a bug tracker for the gscan2pdf project on
220 Sourceforge
221 (<https://sourceforge.net/p/gscan2pdf/_list/tickets?source=navbar>).
222
223 Please include the log file created by "gscan2pdf --log=log" with any
224 new bug report.
225
227 gscan2pdf has already been partly translated into several languages.
228 If you would like to contribute to an existing or new translation,
229 please check out Rosetta:
230 <https://translations.launchpad.net/gscan2pdf>
231
232 Note that the translations for the scanner options are taken directly
233 from sane-backends. If you would like to contribute to these, you can
234 do so either at contact the sane-devel mailing list
235 (sane-devel@lists.alioth.debian.org) and have a look at the po/
236 directory in the source code <http://www.sane-project.org/cvs.html>.
237
238 Alternatively, Ubuntu has its own translation project. For the 9.04
239 release, the translations are available at
240 <https://translations.launchpad.net/ubuntu/jaunty/+source/sane-backends/+pots/sane-backends>
241
242 If you have updated an ".po" file in the "po" directory of the
243 gscan2pdf source tree and would like to test it, pick a test directory
244 for the compiled locales, e.g. "./locale", and create the ".mo" files
245 with:
246
247 "perl Makefile.PL LOCALEDIR=./locale"
248
249 If the updated locale is your standard one, then the following will
250 find the updated file:
251
252 "perl -I lib bin/gscan2pdf --log=log --locale=locale"
253
254 If it is not your standard locale, you will need something like (for
255 Russian):
256
257 "LC_ALL=ru_RU.utf8 LC_MESSAGES=ru_RU.utf8 LC_CTYPE=ru_RU.utf8
258 LANG=ru_RU.utf8 LANGUAGE=ru_RU.utf8 perl -I lib bin/gscan2pdf --log=log
259 --locale=locale"
260
261 or German:
262
263 "LC_ALL=de_DE LC_MESSAGES=de_DE LC_CTYPE=de_DE LANG=de_DE
264 LANGUAGE=de_DE perl -I lib bin/gscan2pdf --log=log --locale=locale"
265
266 If the above doesn't work, make sure it is in the list produced by
267 "locale -a", including any ".utf8" suffix. If necessary, generate new
268 locales with "sudo dpkg-reconfigure locales"
269
271 File
272 New
273
274 Clears the page list.
275
276 Open
277
278 Opens any format that imagemagick supports. PDFs will have their
279 embedded images extracted and imported one per page.
280
281 Note that files can also be imported by dragging them into the
282 thumbnail list from a program like nautilus or konqueror.
283
284 Scan
285
286 Sets options before scanning via SANE.
287
288 Device
289
290 Chooses between available scanners.
291
292 # Pages
293
294 Selects the number of pages, or all pages to scan.
295
296 Source document
297
298 Selects between single sided or double sides pages.
299
300 This affects the page numbering. Single sided scans are numbered
301 consecutively. Double sided scans are incremented (or decremented, see
302 below) by 2, i.e. 1, 3, 5, etc..
303
304 Side to scan
305
306 If double sided is selected above, assuming a non-duplex scanner, i.e.
307 a scanner that cannot automatically scan both sides of a page, this
308 determines whether the page number is incremented or decremented by 2.
309
310 To scan both sides of three pages, i.e. 6 sides:
311
312 1. Select:
313 # Pages = 3 (or "all" if your scanner can detect when it is out of
314 paper)
315
316 Double sided
317
318 Facing side
319
320 2. Scans sides 1, 3 & 5.
321 3. Put pile back with scanner ready to scan back of last page.
322 4. Select:
323 # Pages = 3 (or "all" if your scanner can detect when it is out of
324 paper)
325
326 Double sided
327
328 Reverse side
329
330 5. Scans sides 6, 4 & 2.
331 6. gscan2pdf automatically sorts the pages so that they appear in the
332 correct order.
333
334 Device-dependent options
335
336 These, naturally, depend on your scanner. They can include
337
338 Page size.
339 Mode (colour/black & white/greyscale)
340 Resolution (in PPI)
341 Batch-scan
342 Guarantees that a "no documents" condition will be returned after
343 the last scanned page, to prevent endless flatbed scans after a
344 batch scan.
345
346 Wait-for-button/Button-wait
347 After sending the scan command, wait until the button on the
348 scanner is pressed before actually starting the scan process.
349
350 Source
351 Selects the document source. Possible options can include Flatbed
352 or ADF. On some scanners, this is the only way of generating an
353 out-of-documents signal.
354
355 Save
356
357 Saves the selected or all pages as a PDF, DjVu, TIFF, PNG, JPEG, PNM or
358 GIF.
359
360 Metadata
361
362 Metadata are information that are not visible when viewing the
363 PDF/DjVu, but are embedded in the file and so searchable and can be
364 examined, typically with the "Properties" option of the document
365 viewer.
366
367 The metadata are completely optional, but can also be used to generate
368 the filename see preferences for details.
369
370 The date can be selected with use of the calendar widget. The displayed
371 date can be incremented or decremented with use of the '+' and '-'
372 keys.
373
374 DjVu
375
376 Both black and white, and colour images produce better compression than
377 PDF. See <http://www.djvuzone.org/> for more details.
378
379 Email as PDF
380
381 Attaches the selected or all pages as a PDF to a blank email. This
382 requires xdg-email, which is in the xdg-utils package. If this is not
383 present, the option is ghosted out.
384
385 Print
386
387 Prints the selected or all pages.
388
389 Compress temporary files
390
391 If your temporary ($TMPDIR) directory is getting full, this function
392 can be useful - compressing all images at LZW-compressed TIFFs. These
393 require much less space than the PNM files that are typically produced
394 by SANE or by importing a PDF.
395
396 Edit
397 Delete
398
399 Deletes the selected page.
400
401 Renumber
402
403 Renumbers the pages from 1..n.
404
405 Note that the page order can also be changed by drag and drop in the
406 thumbnail view.
407
408 Select
409
410 The select menus can be used to select, all, even, odd, blank, dark or
411 modified pages. Selecting blank or dark pages runs imagemagick to make
412 the decision. Selecting modified pages selects those which have
413 modified by threshold, unsharp, etc., since the last OCR run was made.
414
415 Properties
416
417 When an image is scanned, gscan2pdf attempts to extract the resolution
418 from the scan options. This nearly always works without problem.
419
420 Importing an image can be trickier, however. Some image formats such as
421 PNM do not encode metadata for resolution. In other cases, the data is
422 incorrect. Edit/Properties allows the user to manually correct the
423 metadata for a particular page, thus correcting the size of final PDF
424 or DjVu. The image itself is otherwise not changed - it is not down- or
425 upscaled.
426
427 Preferences
428
429 The preferences menu item allows the control of the default behaviour
430 of various functions. Most of these are self-explanatory.
431
432 Frontends
433
434 gscan2pdf initially supported two frontends, scanimage and scanadf.
435 scanadf support was added when it was realised that scanadf works
436 better than scanimage with some scanners. On Debian-based systems,
437 scanadf is in the sane package, not, like scanimage, in sane-utils. If
438 scanadf is not present, the option is obviously ghosted out.
439
440 In 0.9.27, Perl bindings for SANE were introduced. These are called
441 libsane-perl.
442
443 Before 1.2.0, options available through CLI frontends like scanimage
444 were made visible as users asked for them. In 1.2.0, all options can be
445 shown or hidden via Edit/Preferences, along with the ability to specify
446 which options trigger a reload.
447
448 In 1.8.3, New Perl bindings for SANE were introduced. These are called
449 libimage-sane-perl and are the preferred frontend.
450
451 In 1.8.5, support for libsane-perl was removed.
452
453 Device blacklist
454
455 Ignore listed devices.
456
457 Note that this is a device name regular expression, e.g. /dev/video,
458 and not the name as listed in the scan window, e.g. Noname
459 Integrated_Webcam_HD.
460
461 Default filename for PDF or DjVu files
462
463 All strftime codes (e.g. %Y for the current year) are available as
464 variables, with the following additions:
465
466 %Da author
467
468 %De filename extension
469
470 %Dt title
471
472 All document date codes use strftime codes with a leading D, e.g.:
473
474 %DY document year
475
476 %Dm document month
477
478 %Dd document day
479
480 View
481 Zoom 100%
482
483 Zooms to 1:1. How this appears depends on the desktop resolution.
484
485 Zoom to fit
486
487 Scales the view such that all the page is visible.
488
489 Zoom in
490
491 Zoom out
492
493 Rotate 90° clockwise
494
495 The rotate options require the package imagemagick and, if this is not
496 present, are ghosted out.
497
498 Rotate 180°
499
500 Rotate 90° anticlockwise
501
502 Tools
503 Threshold
504
505 Changes all pixels darker than the given value to black; all others
506 become white.
507
508 Unsharp mask
509
510 The unsharp option sharpens an image. The image is convolved with a
511 Gaussian operator of the given radius and standard deviation (sigma).
512 For reasonable results, radius should be larger than sigma. Use a
513 radius of 0 to have the method select a suitable radius.
514
515 Crop
516
517 unpaper
518
519 unpaper (see <https://www.flameeyes.eu/projects/unpaper>) is a utility
520 for cleaning up a scan.
521
522 OCR (Optical Character Recognition)
523
524 The gocr, tesseract or cuneiform utilities are used to produce text
525 from an image.
526
527 There is an OCR output buffer for each page and is embedded as plain
528 text behind the scanned image in the PDF produced. This way, Beagle can
529 index (i.e. search) the plain text.
530
531 In DjVu files, the OCR output buffer is embedded in the hidden text
532 layer. Thus these can also be indexed by Beagle.
533
534 There is an interesting review of OCR software at
535 <https://web.archive.org/web/20080529012847/http://groundstate.ca/ocr>.
536 An important conclusion was that 400ppi is necessary for decent
537 results.
538
539 Up to v2.04, the only way to tell which languages were available to
540 tesseract was to look for the language files. Therefore, gscan2pdf
541 checks the path returned by:
542
543 "tesseract '' '' -l ''"
544
545 If there are no language files in the above location, then gscan2pdf
546 assumes that tesseract v1.0 is installed, which had no language files.
547
548 Variables for user-defined tools
549
550 The following variables are available:
551
552 %i input filename
553
554 %o output filename
555
556 %r resolution
557
558 An image can be modified in-place by just specifying %i.
559
561 Why isn't option xyz available in the scan window?
562 Possibly because SANE or your scanner doesn't support it.
563
564 If an option listed in the output of "scanimage --help" that you would
565 like to use isn't available, send me the output and I will look at
566 implementing it.
567
568 I've only got an old flatbed scanner with no automatic sheetfeeder. How do
569 I scan a multipage document?
570 In Edit/Preferences, tick the box "Allow batch scanning from flatbed".
571
572 Some Brother scanners report "out of documents", despite scanning from
573 flatbed. This can be worked around by ticking the box "Force new scan
574 job between pages".
575
576 If you are lucky, you have an option like Wait-for-button or Button-
577 wait, where the scanner will wait for you to press the scan button on
578 the device before it starts the scan, allowing you to scan multiple
579 pages without touching the computer.
580
581 If you are quick, you might be able to change the document on the
582 flatbed whilst the scan head is returning.
583
584 Otherwise, you have to set the number of pages to scan to 1 and hit the
585 scan button on the scan window for each page.
586
587 Why is option xyz ghosted out?
588 Probably because the package required for that option is not installed.
589 Email as PDF requires xdg-email (xdg-utils), unpaper and the rotate
590 options require imagemagick.
591
592 Why can I not scan from the flatbed of my HP scanner?
593 Generally for HP scanners with an ADF, to scan from the flatbed, you
594 should set "# Pages" to "1", and possibly "Batch scan" to "No".
595
596 When I update gscan2pdf using the Update Manager in Ubuntu, why is the list
597 of changes never displayed?
598 As far as I can tell, this is pulled from changelogs.ubuntu.com, and
599 therefore only the changelogs from official Ubuntu builds are
600 displayed.
601
602 Why can gscan2pdf not find my scanner?
603 If your scanner is not connected directly to the machine on which you
604 are running gscan2pdf and you have not installed the SANE daemon,
605 saned, gscan2pdf cannot automatically find it. In this case, you can
606 specify the scanner device on the command line:
607
608 "gscan2pdf --device <device">
609
610 How can I search for text in the OCR layer of the finished PDF or DJVU
611 file?
612 pdftotext or djvutxt can extract the text layer from PDF or DJVU files.
613 See the respective man pages for details.
614
615 Having opened a PDF or DJVU file in evince or Acrobat Reader, the
616 search function will typically find the page with the requested text
617 and highlight it.
618
619 There are various tools for searching or indexing files, including PDF
620 and DJVU:
621
622 • (meta) Tracker (<https://projects.gnome.org/tracker/>)
623
624 • plone (<http://plone.org/>)
625
626 • pdfgrep (<http://pdfgrep.sourceforge.net/>
627
628 • swish-e (<http://www.swish-e.org/>)
629
630 • recoll (<http://www.lesbonscomptes.com/recoll/>)
631
632 • terrier (<http://www.lesbonscomptes.com/recoll/>)
633
634 How can I change the colour of the selection box in the image viewer?
635 Create a file called "~/.config/gtk-3.0/gtk.css" with the following
636 content:
637
638 .rubberband,
639 rubberband,
640 flowbox rubberband,
641 treeview.view rubberband,
642 .content-view rubberband,
643 .content-view .rubberband {
644 border: 1px solid #2a76c6;
645 background-color: rgba(42, 118, 198, 0.2); }
646
647 How can I change the colour of the OCR output
648 Create a file called "~/.config/gtk-3.0/gtk.css" with the following
649 content:
650
651 #gscan2pdf-ocr-output {
652 color: black;
653 }
654
656 XSane (<http://xsane.org/>)
657
658 Scan Tailor (<http://scantailor.org/>)
659
661 Jeffrey Ratcliffe (jffry at posteo dot net)
662
664 • all the people who have sent patches, translations, bugs and
665 feedback.
666
667 • the gtk+ project for a most excellent graphics toolkit.
668
669 • the Gtk3-Perl project for their superb Perl bindings for GTK3.
670
671 • The SANE project for scanner access
672
673 • Björn Lindqvist for the gtkimageview widget
674
675 • Sourceforge for hosting the project.
676
678 Copyright (C) 2006--2022 Jeffrey Ratcliffe <jffry@posteo.net>
679
680 This program is free software: you can redistribute it and/or modify it
681 under the terms of the version 3 GNU General Public License as
682 published by the Free Software Foundation.
683
684 This program is distributed in the hope that it will be useful, but
685 WITHOUT ANY WARRANTY; without even the implied warranty of
686 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
687 General Public License for more details.
688
689 You should have received a copy of the GNU General Public License along
690 with this program. If not, see <https://www.gnu.org/licenses/>.
691
692
693
694perl v5.34.1 2022-07-11 GSCAN2PDF(1)