1GOCR(1) User's Manual GOCR(1)
2
3
4
6 gocr - command line text recognition tool
7
9 gocr [OPTION] [-i] pnm-file
10
12 gocr is an optical character recognition program that can be used from
13 the command line. It takes input in PNM, PGM, PBM, PPM, or PCX format,
14 and writes recognized text to stdout. If the pnm file is a single
15 dash, PNM data is read from stdin. If gzip, bzip2 and netpbm-progs are
16 installed and your system supports popen(3) also pnm.gz, pnm.bz2, png,
17 jpg, jpeg, tiff, gif, bmp, ps (only single pages) and eps are supported
18 as input files (not as input stream), where pnm can be replaced by one
19 of ppm, pgm and pbm.
20
22 -h show usage information
23
24 -i file
25 read input from file (or stdin if file is a single dash)
26
27 -o file
28 send output to file instead of stdout
29
30 -e file
31 send errors to file instead of stderr or to stdout if file is a
32 dash
33
34 -x file
35 progress output to file (file can be a file name, a fifo name or
36 a file descriptor 1...255), this is useful for GUI developpers
37 to show the OCR progress, the file descriptor argument is only
38 available, if compiled with __USE_POSIX defined
39
40 -p path
41 database path, a final slash must be included, default is ./db/,
42 this path will be populated with images of learned characters
43
44 -f format
45 output format of the recognized text (ISO8859_1 TeX HTML XML
46 UTF8 ASCII), XML will also output position and probability data
47
48 -l level
49 set grey level to level (0<160<=255, default: 0 for autodetect),
50 darker pixels belong to characters, brighter pixels are inter‐
51 preted as background of the input image
52
53 -d size
54 set dust size in pixels (clusters smaller than this are
55 removed), 0 means no clusters are removed, the default is -1 for
56 auto detection
57
58 -s num set spacewidth between words in units of dots (default: 0 for
59 autodetect), wider widths are interpreted as word spaces,
60 smaller as character spaces
61
62 -v verbosity
63 be verbose to stderr; verbosity is a bitfield
64
65 -c string
66 only verbose output of characters from string to stderr, more
67 output is generated for all characters within the string, the
68 underscore stands for unknown chars, this function is usefull to
69 limit debug information to the necessary one
70
71 -C string
72 only recognise characters from string, this is a filter function
73 in cases where the interest is only to a part of the character
74 alphabet
75
76 -a certainty
77 set value for certainty of recognition (0..100; default: 95),
78 characters with a higher certainty are accepted, characters with
79 a lower certainty are treated as unknown (not recognized); set
80 higher values, if you want to have only more certain recognized
81 characters
82
83 -m mode
84 set oprational mode; mode is a bitfield (default: 0)
85
86 -n bool
87 if bool is non-zero, only recognise numbers (this is now obso‐
88 lete, use -C "0123456789")
89
90 The verbosity is specified as a bitfield:
91
92 1 print more info
93
94 2 list shapes of boxes (see -c) to stderr
95
96 4 list pattern of boxes (see -c) to stderr
97
98 8 print pattern after recognition for debugging
99
100 16 print debug information about recognition of lines to stderr
101
102 32 create outXX.png with boxes and lines marked on each general
103 OCR-step
104
105 The operation modes are:
106
107 2 use database to recognize characters which are not recognized
108 by other algorithms, (early development)
109
110 4 switching on layout analysis or zoning (development)
111
112 8 don't compare unrecognized characters to recognized one
113
114 16 don't try to divide overlapping characters to two or three
115 single characters
116
117 32 don't do context correction
118
119 64 character packing, before recognition starts, similar charac‐
120 ters are searched and only one of this characters will be
121 send to the recognition engine (development)
122
123 130 extend database, prompts user for unidentified characters and
124 extends the database with users answer (128+2, early develop‐
125 ment)
126
127 256 switch off the recognition engine (makes sense together with
128 -m 2)
129
130
131
133 Joerg Schulenburg (see http://jocr.sourceforge.net/ for EMAIL)
134 First version of man page by Tim Waugh <twaugh@redhat.com>
135
137 This man page documents gocr, version 0.41.
138
140 Report bugs to Joerg Schulenburg
141
143 More details can be found at /usr/share/doc/gocr-X.XX/gocr.html. Also
144 read /usr/share/doc/gocr-X.XX/README to learn, how to improve results.
145
147 gocr -v 33 text1.pbm
148 output verbose information, out30.png is created to see details
149 of recognition process
150
151 gocr -v 7 -c _YV text1.pbm
152 verbose output for unknown chars and chars Y and V
153
154 djpeg -pnm -gray text.jpg | gocr -
155 convert a jpeg file to pnm format and input via pipe
156
157
158
159Linux 20 Aug 2006 GOCR(1)