1GOCR(1) User's Manual GOCR(1)
2
3
4
6 gocr - command line text recognition tool
7
9 gocr [OPTION] [-i] pnm-file
10
12 gocr is an optical character recognition program that can be used from
13 the command line. It takes input in PNM, PGM, PBM, PPM, or PCX format,
14 and writes recognized text to stdout. If the pnm file is a single
15 dash, PNM data is read from stdin. If gzip, bzip2 and netpbm-progs are
16 installed and your system supports popen(3) also pnm.gz, pnm.bz2, png,
17 jpg, jpeg, tiff, gif, bmp, ps (only single pages) and eps are supported
18 as input files (not as input stream), where pnm can be replaced by one
19 of ppm, pgm and pbm.
20
22 -h show usage information
23
24 -V show version information
25
26 -i file
27 read input from file (or stdin if file is a single dash)
28
29 -o file
30 send output to file instead of stdout
31
32 -e file
33 send errors to file instead of stderr or to stdout if file is a
34 dash
35
36 -x file
37 progress output to file (file can be a file name, a fifo name or
38 a file descriptor 1...255), this is useful for GUI developpers
39 to show the OCR progress, the file descriptor argument is only
40 available, if compiled with __USE_POSIX defined
41
42 -p path
43 database path, a final slash must be included, default is ./db/,
44 this path will be populated with images of learned characters
45
46 -f format
47 output format of the recognized text (ISO8859_1 TeX HTML XML
48 UTF8 ASCII), XML will also output position and probability data
49
50 -l level
51 set grey level to level (0<160<=255, default: 0 for autodetect),
52 darker pixels belong to characters, brighter pixels are inter‐
53 preted as background of the input image
54
55 -d size
56 set dust size in pixels (clusters smaller than this are
57 removed), 0 means no clusters are removed, the default is -1 for
58 auto detection
59
60 -s num set spacewidth between words in units of dots (default: 0 for
61 autodetect), wider widths are interpreted as word spaces,
62 smaller as character spaces
63
64 -v verbosity
65 be verbose to stderr; verbosity is a bitfield
66
67 -c string
68 only verbose output of characters from string to stderr, more
69 output is generated for all characters within the string, the
70 underscore stands for unknown chars, this function is usefull to
71 limit debug information to the necessary one
72
73 -C string
74 only recognise characters from string, this is a filter function
75 in cases where the interest is only to a part of the character
76 alphabet, you can use 0-9 or a-z to specify ranges, use -- to
77 detect the minus sign
78
79 -a certainty
80 set value for certainty of recognition (0..100; default: 95),
81 characters with a higher certainty are accepted, characters with
82 a lower certainty are treated as unknown (not recognized); set
83 higher values, if you want to have only more certain recognized
84 characters
85
86 -u string
87 output this string for every unrecognized character (default is
88 "_")
89
90 -m mode
91 set oprational mode; mode is a bitfield (default: 0)
92
93 -n bool
94 if bool is non-zero, only recognise numbers (this is now obso‐
95 lete, use -C "0123456789")
96
97 The verbosity is specified as a bitfield:
98
99 1 print more info
100
101 2 list shapes of boxes (see -c) to stderr
102
103 4 list pattern of boxes (see -c) to stderr
104
105 8 print pattern after recognition for debugging
106
107 16 print debug information about recognition of lines to stderr
108
109 32 create outXX.png with boxes and lines marked on each general
110 OCR-step
111
112 The operation modes are:
113
114 2 use database to recognize characters which are not recognized
115 by other algorithms, (early development)
116
117 4 switching on layout analysis or zoning (development)
118
119 8 don't compare unrecognized characters to recognized one
120
121 16 don't try to divide overlapping characters to two or three
122 single characters
123
124 32 don't do context correction
125
126 64 character packing, before recognition starts, similar charac‐
127 ters are searched and only one of this characters will be
128 send to the recognition engine (development)
129
130 130 extend database, prompts user for unidentified characters and
131 extends the database with users answer (128+2, early develop‐
132 ment)
133
134 256 switch off the recognition engine (makes sense together with
135 -m 2)
136
137
138
140 Joerg Schulenburg (see http://www-e.uni-magdeburg.de/jschulen/ocr/ for
141 EMAIL)
142 First version of man page by Tim Waugh <twaugh@redhat.com>
143
145 This man page documents gocr, version 0.52.
146
148 Report bugs to Joerg Schulenburg
149
151 More details can be found at /usr/share/doc/gocr-X.XX/gocr.html. Also
152 read /usr/share/doc/gocr-X.XX/README to learn, how to improve results.
153
155 gocr -v 33 text1.pbm
156 output verbose information, out30.png is created to see details
157 of recognition process
158
159 gocr -v 7 -c _YV text1.pbm
160 verbose output for unknown chars and chars Y and V
161
162 djpeg -pnm -gray text.jpg | gocr -
163 convert a jpeg file to pnm format and input via pipe
164
165
166
167Linux 20 Sep 2018 GOCR(1)