1BOGOUTIL(1) Bogofilter Reference Manual BOGOUTIL(1)
2
3
4
6 bogoutil - Dumps, loads, and maintains bogofilter database files
7
9 bogoutil {-h | -V}
10
11 bogoutil [options] {-d file | -H file | -l file | -m file | -w file |
12 -p file}
13
14 bogoutil {-r file | -R file}
15
16 bogoutil {--db-print-leafpage-count file | --db-print-pagesize file |
17 --db-verify file | --db-checkpoint directory [flag...] |
18 --db-list-logfiles directory | --db-prune directory |
19 --db-recover directory | --db-recover-harder directory |
20 --db-remove-environment directory}
21
22 where options is
23
24 bogoutil [-v] [-n] [-C] [-D] [-a age] [-c count] [-s min,max] [-y date]
25 [-I file] [-O file] [-x flags] [--config-file file]
26
28 Bogoutil is part of the bogofilter Bayesian spam filter package.
29
30 It is used to dump and load bogofilter's Berkeley DB databases to and
31 from text files, perform database maintenance functions, and to display
32 the values for specific words.
33
35 The -d file option tells bogoutil to print the contents of the database
36 file to stdout.
37
38 The -H file option tells bogoutil to print a histogram of the database
39 file to stdout. The output is similar to bogofilter -vv. Finally,
40 hapaxes (tokens which were only seen once) and pure tokens (tokens
41 which were encountered only in ham or only in spam) are counted.
42
43 The -l file option tells bogoutil to load the data from stdin into the
44 database file. If the database file exists, stdin data is merged into
45 the database file, with counts added up.
46
47 The -m option tells bogoutil to perform maintenance functions on the
48 specified database, i.e. discard tokens that are older than desired,
49 have counts that are too small, or sizes (lengths) that are too long or
50 too short.
51
52 The -w file option tells bogoutil to display token information from the
53 database file. The option takes an argument, which is either the name
54 of the wordlist (usually wordlist.db) or the name of the directory
55 containing it. Tokens can be listed on the command line or piped to
56 bogoutil. When there are extra arguments on the command line, bogoutil
57 will use them as the tokens to lookup. If there are no extra arguments,
58 bogoutil will read tokens from stdin.
59
60 The -p file option tells bogoutil to display the database information
61 for one or more tokens. The display includes a probability column with
62 the token's spam score (computed using bogofilter's default values).
63 Option -p takes the same arguments as option -w .
64
65 The -r file option tells bogoutil to recalculate the ROBX value and
66 print it as a six-digit fraction.
67
68 The -R file option does the same as -r, but saves the result in the
69 training database without printing it.
70
71 The -I file option tells bogoutil to read its input from file rather
72 than stdin.
73
74 The -O file option tells bogoutil to write its output to file rather
75 than stdout.
76
77 The -v option produces verbose output on stderr. This option is
78 primarily useful for debugging.
79
80 The -C inhibits reading configuration files and lets bogoutil go with
81 the defaults.
82
83 The --config-file file option tells bogoutil to read file instead of
84 the standard configuration file.
85
86 The -D redirects debug output to stdout (it usually goes to stderr).
87
88 The -x flags option sets debugging flags.
89
90 Option -n stands for "replace non-ascii characters". It will replace
91 characters with the high bit (0x80) by question marks. This can be
92 useful if a word list has lots of unreadable tokens, for example from
93 Asian spam. The "bad" characters will be converted to question marks
94 and matching tokens will be combined when used with -m or -l, but not
95 with -d.
96
97 Option -a age indicates an acceptable token age, with older ones being
98 discarded. The age can be a date (in form YYYYMMMDD) or a day count,
99 i.e. discard tokens older than age days.
100
101 Option -c value indicates that tokens with counts less than or equal to
102 value are to be discarded.
103
104 Option -s min,max is used to discard tokens based on their size, i.e.
105 length. All tokens shorter than min or longer than max will be
106 discarded.
107
108 Option -y date is specifies the date to give to tokens that don't have
109 dates. The format is YYYYMMDD.
110
111 The -h option prints the help message and exits.
112
113 The -V option prints the version number and exits.
114
116 The --db-checkpoint dir option causes bogoutil to flush the buffer
117 caches and checkpoint the database environment.
118
119 The --db-list-logfiles dir option causes bogoutil to list the log files
120 in the environment. Zero or more keywords can be added or combined
121 (separated by whitespace) to modify the behavior of this mode. The
122 default behavior is to list only inactive log files with relative
123 paths. You can add all to list all log files (inactive and active). You
124 can add absolute to switch the listing to absolute paths.
125
126 The --db-prune dir option causes bogoutil to checkpoint the database
127 environment and remove inactive log files.
128
129 The --db-recover dir option runs a regular database recovery in the
130 specified database directory. If that fails, it will retry with a
131 (usually slower) catastrophic database recovery. If that fails, too,
132 your database cannot be repaired and must be rebuilt from scratch. This
133 is only supported when compiled with Berkeley DB support with
134 transactions enabled. Trying recovery with QDBM or SQLite3 support will
135 result in an error.
136
137 The --db-recover-harder dir option runs a catastrophic data base
138 recovery in the specified database directory. If that fails, your
139 database cannot be repaired and must be rebuilt from scratch. This is
140 only supported when compiled with Berkeley DB support with transactions
141 enabled. Trying recovery with QDBM or SQLite3 support will result in an
142 error.
143
144 The --db-remove-environment directory option has no short option
145 equivalent. It runs recovery in the given directory and then removes
146 the database environment. Use this before upgrading to a new Berkeley
147 DB version if the new version to be installed requires a log file
148 format update.
149
150 The --db-print-leafpage-count file option prints the number of leaf
151 pages in the database file file as a decimal number, or UNKNOWN if the
152 database does not support querying this figure.
153
154 The --db-print-pagesize file option prints the size of a database page
155 in file as a decimal number, or UNKNOWN for databases with variable
156 page size or databases that do not allow a query of the database page
157 size.
158
159 The --db-verify file option requests that bogofilter verifies the
160 database file. It prints only errors, unless in verbose mode.
161
163 Bogoutil reads and writes text files where each nonblank line consists
164 of a word, any amount of horizontal whitespace, a numeric word count,
165 more whitespace, and (optionally) a date in form YYYYMMDD. Blank lines
166 are skipped.
167
169 0 for successful operation. 1 for most errors. 3 for I/O or other
170 errors. Error 3 usually means that something is seriously wrong with
171 the database files.
172
174 Gyepi Sam gyepi@praxis-sw.com.
175
176 Matthias Andree matthias.andree@gmx.de.
177
178 David Relson relson@osagesoftware.com.
179
180 For updates, see the bogofilter project page[1].
181
183 bogofilter(1), bogolexer(1), bogotune(1), bogoupgrade(1)
184
186 1. the bogofilter project page
187 http://bogofilter.sourceforge.net/
188
189
190
191Bogofilter 10/22/2012 BOGOUTIL(1)