gocr(1) - f36

1GOCR(1)                          User's Manual                         GOCR(1)
2
3
4

NAME

6       gocr - command line text recognition tool
7

SYNOPSIS

9       gocr [OPTION] [-i] pnm-file
10

DESCRIPTION

12       gocr  is an optical character recognition program that can be used from
13       the command line.  It takes input in PNM, PGM, PBM, PPM, or PCX format,
14       and  writes  recognized  text  to  stdout.  If the pnm file is a single
15       dash, PNM data is read from stdin.  If gzip, bzip2 and netpbm-progs are
16       installed  and your system supports popen(3) also pnm.gz, pnm.bz2, png,
17       jpg, jpeg, tiff, gif, bmp, ps (only single pages) and eps are supported
18       as  input files (not as input stream), where pnm can be replaced by one
19       of ppm, pgm and pbm.
20

OPTIONS

22       -h     show usage information
23
24       -V     show version information
25
26       -i file
27              read input from file (or stdin if file is a single dash)
28
29       -o file
30              send output to file instead of stdout
31
32       -e file
33              send errors to file instead of stderr or to stdout if file is  a
34              dash
35
36       -x file
37              progress output to file (file can be a file name, a fifo name or
38              a file descriptor 1...255), this is useful for  GUI  developpers
39              to  show  the OCR progress, the file descriptor argument is only
40              available, if compiled with __USE_POSIX defined
41
42       -p path
43              database path, a final slash must be included, default is ./db/,
44              this path will be populated with images of learned characters
45
46       -f format
47              output  format  of  the  recognized text (ISO8859_1 TeX HTML XML
48              UTF8 ASCII), XML will also output position and probability data
49
50       -l level
51              set grey level to level (0<160<=255, default: 0 for autodetect),
52              darker  pixels  belong to characters, brighter pixels are inter‐
53              preted as background of the input image
54
55       -d size
56              set  dust  size  in  pixels  (clusters  smaller  than  this  are
57              removed), 0 means no clusters are removed, the default is -1 for
58              auto detection
59
60       -s num set spacewidth between words in units of dots  (default:  0  for
61              autodetect),  wider  widths  are  interpreted  as  word  spaces,
62              smaller as character spaces
63
64       -v verbosity
65              be verbose to stderr; verbosity is a bitfield
66
67       -c string
68              only verbose output of characters from string  to  stderr,  more
69              output  is  generated  for all characters within the string, the
70              underscore stands for unknown chars, this function is usefull to
71              limit debug information to the necessary one
72
73       -C string
74              only recognise characters from string, this is a filter function
75              in cases where the interest is only to a part of  the  character
76              alphabet,  you  can  use 0-9 or a-z to specify ranges, use -- to
77              detect the minus sign
78
79       -a certainty
80              set value for certainty of recognition  (0..100;  default:  95),
81              characters with a higher certainty are accepted, characters with
82              a lower certainty are treated as unknown (not  recognized);  set
83              higher  values, if you want to have only more certain recognized
84              characters
85
86       -u string
87              output this string for every unrecognized character (default  is
88              "_")
89
90       -m mode
91              set oprational mode; mode is a bitfield (default: 0)
92
93       -n bool
94              if  bool  is non-zero, only recognise numbers (this is now obso‐
95              lete, use -C "0123456789")
96
97       The verbosity is specified as a bitfield:
98
99       1         print more info
100
101       2         list shapes of boxes (see -c) to stderr
102
103       4         list pattern of boxes (see -c) to stderr
104
105       8         print pattern after recognition for debugging
106
107       16        print debug information about recognition of lines to stderr
108
109       32        create outXX.png with boxes and lines marked on each  general
110                 OCR-step
111
112       The operation modes are:
113
114       2         use database to recognize characters which are not recognized
115                 by other algorithms, (early development)
116
117       4         switching on layout analysis or zoning (development)
118
119       8         don't compare unrecognized characters to recognized one
120
121       16        don't try to divide overlapping characters to  two  or  three
122                 single characters
123
124       32        don't do context correction
125
126       64        character packing, before recognition starts, similar charac‐
127                 ters are searched and only one of  this  characters  will  be
128                 send to the recognition engine (development)
129
130       130       extend database, prompts user for unidentified characters and
131                 extends the database with users answer (128+2, early develop‐
132                 ment)
133
134       256       switch  off the recognition engine (makes sense together with
135                 -m 2)
136
137
138

AUTHOR

140       Joerg Schulenburg (see http://www-e.uni-magdeburg.de/jschulen/ocr/  for
141       EMAIL)
142       First version of man page by Tim Waugh <twaugh@redhat.com>
143

VERSION INFORMATION

145       This man page documents gocr, version 0.52.
146

REPORTING BUGS

148       Report bugs to Joerg Schulenburg
149

EXAMPLES

155       gocr -v 33 text1.pbm
156              output verbose information, out30.png is created to see  details
157              of recognition process
158
159       gocr -v 7 -c _YV text1.pbm
160              verbose output for unknown chars and chars Y and V
161
162       djpeg -pnm -gray text.jpg | gocr -
163              convert a jpeg file to pnm format and input via pipe
164
165
166
167Linux                             20 Sep 2018                          GOCR(1)