1GOCR(1)                          User's Manual                         GOCR(1)
2
3
4

NAME

6       gocr - command line text recognition tool
7

SYNOPSIS

9       gocr [OPTION] [-i] pnm-file
10

DESCRIPTION

12       gocr  is an optical character recognition program that can be used from
13       the command line.  It takes input in PNM, PGM, PBM, PPM, or PCX format,
14       and  writes  recognized  text  to  stdout.  If the pnm file is a single
15       dash, PNM data is read from stdin.  If gzip, bzip2 and netpbm-progs are
16       installed  and your system supports popen(3) also pnm.gz, pnm.bz2, png,
17       jpg, jpeg, tiff, gif, bmp, ps (only single pages) and eps are supported
18       as  input files (not as input stream), where pnm can be replaced by one
19       of ppm, pgm and pbm.
20

OPTIONS

22       -h     show usage information
23
24       -i file
25              read input from file (or stdin if file is a single dash)
26
27       -o file
28              send output to file instead of stdout
29
30       -e file
31              send errors to file instead of stderr or to stdout if file is  a
32              dash
33
34       -x file
35              progress output to file (file can be a file name, a fifo name or
36              a file descriptor 1...255), this is useful for  GUI  developpers
37              to  show  the OCR progress, the file descriptor argument is only
38              available, if compiled with __USE_POSIX defined
39
40       -p path
41              database path, a final slash must be included, default is ./db/,
42              this path will be populated with images of learned characters
43
44       -f format
45              output  format  of  the  recognized text (ISO8859_1 TeX HTML XML
46              UTF8 ASCII), XML will also output position and probability data
47
48       -l level
49              set grey level to level (0<160<=255, default: 0 for autodetect),
50              darker  pixels  belong to characters, brighter pixels are inter‐
51              preted as background of the input image
52
53       -d size
54              set  dust  size  in  pixels  (clusters  smaller  than  this  are
55              removed), 0 means no clusters are removed, the default is -1 for
56              auto detection
57
58       -s num set spacewidth between words in units of dots  (default:  0  for
59              autodetect),  wider  widths  are  interpreted  as  word  spaces,
60              smaller as character spaces
61
62       -v verbosity
63              be verbose to stderr; verbosity is a bitfield
64
65       -c string
66              only verbose output of characters from string  to  stderr,  more
67              output  is  generated  for all characters within the string, the
68              underscore stands for unknown chars, this function is usefull to
69              limit debug information to the necessary one
70
71       -C string
72              only recognise characters from string, this is a filter function
73              in cases where the interest is only to a part of  the  character
74              alphabet,  you  can  use 0-9 or a-z to specify ranges, use -- to
75              detect the minus sign
76
77       -a certainty
78              set value for certainty of recognition  (0..100;  default:  95),
79              characters with a higher certainty are accepted, characters with
80              a lower certainty are treated as unknown (not  recognized);  set
81              higher  values, if you want to have only more certain recognized
82              characters
83
84       -u string
85              output this string for every unrecognized character (default  is
86              "_")
87
88       -m mode
89              set oprational mode; mode is a bitfield (default: 0)
90
91       -n bool
92              if  bool  is non-zero, only recognise numbers (this is now obso‐
93              lete, use -C "0123456789")
94
95       The verbosity is specified as a bitfield:
96
97       1         print more info
98
99       2         list shapes of boxes (see -c) to stderr
100
101       4         list pattern of boxes (see -c) to stderr
102
103       8         print pattern after recognition for debugging
104
105       16        print debug information about recognition of lines to stderr
106
107       32        create outXX.png with boxes and lines marked on each  general
108                 OCR-step
109
110       The operation modes are:
111
112       2         use database to recognize characters which are not recognized
113                 by other algorithms, (early development)
114
115       4         switching on layout analysis or zoning (development)
116
117       8         don't compare unrecognized characters to recognized one
118
119       16        don't try to divide overlapping characters to  two  or  three
120                 single characters
121
122       32        don't do context correction
123
124       64        character packing, before recognition starts, similar charac‐
125                 ters are searched and only one of  this  characters  will  be
126                 send to the recognition engine (development)
127
128       130       extend database, prompts user for unidentified characters and
129                 extends the database with users answer (128+2, early develop‐
130                 ment)
131
132       256       switch  off the recognition engine (makes sense together with
133                 -m 2)
134
135
136

AUTHOR

138       Joerg Schulenburg (see http://jocr.sourceforge.net/ for EMAIL)
139       First version of man page by Tim Waugh <twaugh@redhat.com>
140

VERSION INFORMATION

142       This man page documents gocr, version 0.41.
143

REPORTING BUGS

145       Report bugs to Joerg Schulenburg
146

SEE ALSO

148       More details can be found at /usr/share/doc/gocr-X.XX/gocr.html.   Also
149       read /usr/share/doc/gocr-X.XX/README to learn, how to improve results.
150

EXAMPLES

152       gocr -v 33 text1.pbm
153              output  verbose information, out30.png is created to see details
154              of recognition process
155
156       gocr -v 7 -c _YV text1.pbm
157              verbose output for unknown chars and chars Y and V
158
159       djpeg -pnm -gray text.jpg | gocr -
160              convert a jpeg file to pnm format and input via pipe
161
162
163
164Linux                             29 Mar 2009                          GOCR(1)
Impressum