1GOCR(1) User's Manual GOCR(1)
2
3
4
6 gocr - command line text recognition tool
7
9 gocr [OPTION] [-i] pnm-file
10
12 gocr is an optical character recognition program that can be used from
13 the command line. It takes input in PNM, PGM, PBM, PPM, or PCX format,
14 and writes recognized text to stdout. If the pnm file is a single
15 dash, PNM data is read from stdin. If gzip, bzip2 and netpbm-progs are
16 installed and your system supports popen(3) also pnm.gz, pnm.bz2, png,
17 jpg, jpeg, tiff, gif, bmp, ps (only single pages) and eps are supported
18 as input files (not as input stream), where pnm can be replaced by one
19 of ppm, pgm and pbm.
20
22 -h show usage information
23
24 -V show version information
25
26 -i file
27 read input from file (or stdin if file is a single dash)
28
29 -o file
30 send output to file instead of stdout
31
32 -e file
33 send errors to file instead of stderr or to stdout if file is a
34 dash
35
36 -x file
37 progress output to file (file can be a file name, a fifo name or
38 a file descriptor 1...255), this is useful for GUI developpers
39 to show the OCR progress, the file descriptor argument is only
40 available, if compiled with __USE_POSIX defined
41
42 -p path
43 database path, a final slash must be included, default is ./db/,
44 this path will be populated with images of learned characters
45
46 -f format
47 output format of the recognized text (ISO8859_1 TeX HTML XML
48 UTF8 ASCII), XML will also output position and probability data
49
50 -l level
51 set grey level to level (0<160<=255, default: 0 for autodetect),
52 darker pixels belong to characters, brighter pixels are inter‐
53 preted as background of the input image
54
55 -d size
56 set dust size in pixels (clusters smaller than this are
57 removed), 0 means no clusters are removed, the default is -1 for
58 auto detection
59
60 -s num set spacewidth between words in units of dots (default: 0 for
61 autodetect), wider widths are interpreted as word spaces,
62 smaller as character spaces
63
64 -v verbosity
65 be verbose to stderr; verbosity is a bitfield
66
67 -c string
68 only verbose output of characters from string to stderr, more
69 output is generated for all characters within the string, the
70 underscore stands for unknown chars, this function is usefull to
71 limit debug information to the necessary one
72
73 -C string
74 only recognise characters from string, this is a filter function
75 in cases where the interest is only to a part of the character
76 alphabet, you