gocr(1) - f7

1GOCR(1)                          User's Manual                         GOCR(1)
2
3
4

NAME

6       gocr - command line text recognition tool
7

SYNOPSIS

9       gocr [OPTION] [-i] pnm-file
10

DESCRIPTION

12       gocr  is an optical character recognition program that can be used from
13       the command line.  It takes input in PNM, PGM, PBM, PPM, or PCX format,
14       and  writes  recognized  text  to  stdout.  If the pnm file is a single
15       dash, PNM data is read from stdin.  If gzip, bzip2 and netpbm-progs are
16       installed  and your system supports popen(3) also pnm.gz, pnm.bz2, png,
17       jpg, jpeg, tiff, gif, bmp, ps (only single pages) and eps are supported
18       as  input files (not as input stream), where pnm can be replaced by one
19       of ppm, pgm and pbm.
20

OPTIONS

22       -h     show usage information
23
24       -i file
25              read input from file (or stdin if file is a single dash)
26
27       -o file
28              send output to file instead of stdout
29
30       -e file
31              send errors to file instead of stderr or to stdout if file is  a
32              dash
33
34       -x file
35              progress output to file (file can be a file name, a fifo name or
36              a file descriptor 1...255), this is useful for  GUI  developpers
37              to  show  the OCR progress, the file descriptor argument is only
38              available, if compiled with __USE_POSIX defined
39
40       -p path
41              database path, a final slash must be included, default is ./db/,
42              this path will be populated with images of learned characters
43
44       -f format
45              output  format  of  the  recognized text (ISO8859_1 TeX HTML XML
46              UTF8 ASCII), XML will also output position and probability data
47
48       -l level
49              set grey level to level (0<160<=255, default: 0 for autodetect),
50              darker  pixels  belong to characters, brighter pixels are inter‐
51              preted as background of the input image
52
53       -d size
54              set  dust  size  in  pixels  (clusters  smaller  than  this  are
55              removed), 0 means no clusters are removed, the default is -1 for
56              auto detection
57
58       -s num set spacewidth between words in units of dots  (default:  0  for
59              autodetect),  wider  widths  are  interpreted  as  word  spaces,
60              smaller as character spaces
61
62       -v verbosity
63              be verbose to stderr; verbosity is a bitfield
64
65       -c string
66              only verbose output of characters from string  to  stderr,  more
67              output  is  generated  for all characters within the string, the
68              underscore stands for unknown chars, this function is usefull to
69              limit debug information to the necessary one
70
71       -C string
72              only recognise characters from string, this is a filter function
73              in cases where the interest is only to a part of  the  character
74              alphabet
75
76       -a certainty
77              set  value  for  certainty of recognition (0..100; default: 95),
78              characters with a higher certainty are accepted, characters with
79              a  lower  certainty are treated as unknown (not recognized); set
80              higher values, if you want to have only more certain  recognized
81              characters
82
83       -m mode
84              set oprational mode; mode is a bitfield (default: 0)
85
86       -n bool
87              if  bool  is non-zero, only recognise numbers (this is now obso‐
88              lete, use -C "0123456789")
89
90       The verbosity is specified as a bitfield:
91
92       1         print more info
93
94       2         list shapes of boxes (see -c) to stderr
95
96       4         list pattern of boxes (see -c) to stderr
97
98       8         print pattern after recognition for debugging
99
100       16        print debug information about recognition of lines to stderr
101
102       32        create outXX.png with boxes and lines marked on each  general
103                 OCR-step
104
105       The operation modes are:
106
107       2         use database to recognize characters which are not recognized
108                 by other algorithms, (early development)
109
110       4         switching on layout analysis or zoning (development)
111
112       8         don't compare unrecognized characters to recognized one
113
114       16        don't try to divide overlapping characters to  two  or  three
115                 single characters
116
117       32        don't do context correction
118
119       64        character packing, before recognition starts, similar charac‐
120                 ters are searched and only one of  this  characters  will  be
121                 send to the recognition engine (development)
122
123       130       extend database, prompts user for unidentified characters and
124                 extends the database with users answer (128+2, early develop‐
125                 ment)
126
127       256       switch  off the recognition engine (makes sense together with
128                 -m 2)
129
130
131

AUTHOR

133       Joerg Schulenburg (see http://jocr.sourceforge.net/ for EMAIL)
134       First version of man page by Tim Waugh <twaugh@redhat.com>
135

VERSION INFORMATION

137       This man page documents gocr, version 0.41.
138

REPORTING BUGS

140       Report bugs to Joerg Schulenburg
141

EXAMPLES

147       gocr -v 33 text1.pbm
148              output  verbose information, out30.png is created to see details
149              of recognition process
150
151       gocr -v 7 -c _YV text1.pbm
152              verbose output for unknown chars and chars Y and V
153
154       djpeg -pnm -gray text.jpg | gocr -
155              convert a jpeg file to pnm format and input via pipe
156
157
158
159Linux                             20 Aug 2006                          GOCR(1)