1Mail::SpamAssassin(3) User Contributed Perl DocumentationMail::SpamAssassin(3)
2
3
4
6 Mail::SpamAssassin - Spam detector and markup engine
7
9 my $spamtest = Mail::SpamAssassin->new();
10 my $mail = $spamtest->parse($message);
11 my $status = $spamtest->check($mail);
12
13 if ($status->is_spam()) {
14 $message = $status->rewrite_mail();
15 }
16 else {
17 ...
18 }
19 ...
20
21 $status->finish();
22 $mail->finish();
23
25 Mail::SpamAssassin is a module to identify spam using several methods
26 including text analysis, internet-based realtime blacklists, statisti‐
27 cal analysis, and internet-based hashing algorithms.
28
29 Using its rule base, it uses a wide range of heuristic tests on mail
30 headers and body text to identify "spam", also known as unsolicited
31 bulk email. Once identified as spam, the mail can then be tagged as
32 spam for later filtering using the user's own mail user agent applica‐
33 tion or at the mail transfer agent.
34
35 If you wish to use a command-line filter tool, try the "spamassassin"
36 or the "spamd"/"spamc" tools provided.
37
39 $t = Mail::SpamAssassin->new( { opt => val, ... } )
40 Constructs a new "Mail::SpamAssassin" object. You may pass a hash
41 reference to the constructor which may contain the following
42 attribute- value pairs.
43
44 debug
45 This is the debug options used to determine logging level. It
46 exists to allow sections of debug messages (called "facili‐
47 ties") to be enabled or disabled. If this is a string, it is
48 treated as a comma-delimited list of the debug facilities. If
49 it's a hash reference, then the keys are treated as the list of
50 debug facilities and if it's a array reference, then the ele‐
51 ments are treated as the list of debug facilities.
52
53 There are also two special cases: (1) if the special case of
54 "info" is passed as a debug facility, then all informational
55 messages are enabled; (2) if the special case of "all" is
56 passed as a debug facility, then all debugging facilities are
57 enabled.
58
59 rules_filename
60 The filename/directory to load spam-identifying rules from.
61 (optional)
62
63 site_rules_filename
64 The directory to load site-specific spam-identifying rules
65 from. (optional)
66
67 userprefs_filename
68 The filename to load preferences from. (optional)
69
70 userstate_dir
71 The directory user state is stored in. (optional)
72
73 config_tree_recurse
74 Set to 1 to recurse through directories when reading configura‐
75 tion files, instead of just reading a single level. (optional,
76 default 0)
77
78 config_text
79 The text of all rules and preferences. If you prefer not to
80 load the rules from files, read them in yourself and set this
81 instead. As a result, this will override the settings for
82 "rules_filename", "site_rules_filename", and "userprefs_file‐
83 name".
84
85 post_config_text
86 Similar to "config_text", this text is placed after config_text
87 to allow an override of config files.
88
89 force_ipv4
90 If set to 1, DNS tests will not attempt to use IPv6. Use if the
91 existing tests for IPv6 availablity produce incorrect results
92 or crashes.
93
94 languages_filename
95 If you want to be able to use the language-guessing rule
96 "UNWANTED_LANGUAGE_BODY", and are using "config_text" instead
97 of "rules_filename", "site_rules_filename", and "user‐
98 prefs_filename", you will need to set this. It should be the
99 path to the languages file normally found in the SpamAssassin
100 rules directory.
101
102 local_tests_only
103 If set to 1, no tests that require internet access will be per‐
104 formed. (default: 0)
105
106 ignore_site_cf_files
107 If set to 1, any rule files found in the "site_rules_filename"
108 directory will be ignored. *.pre files (used for loading plug‐
109 ins) found in the "site_rules_filename" directory will still be
110 used. (default: 0)
111
112 dont_copy_prefs
113 If set to 1, the user preferences file will not be created if
114 it doesn't already exist. (default: 0)
115
116 save_pattern_hits
117 If set to 1, the patterns hit can be retrieved from the
118 "Mail::SpamAssassin::PerMsgStatus" object. Used for debugging.
119
120 home_dir_for_helpers
121 If set, the HOME environment variable will be set to this value
122 when using test applications that require their configuration
123 data, such as Razor, Pyzor and DCC.
124
125 username
126 If set, the "username" attribute will use this as the current
127 user's name. Otherwise, the default is taken from the runtime
128 environment (ie. this process' effective UID under UNIX).
129
130 If none of "rules_filename", "site_rules_filename", "user‐
131 prefs_filename", or "config_text" is set, the "Mail::SpamAssassin"
132 module will search for the configuration files in the usual
133 installed locations using the below variable definitions which can
134 be passed in.
135
136 PREFIX
137 Used as the root for certain directory paths such as:
138
139 '__prefix__/etc/mail/spamassassin'
140 '__prefix__/etc/spamassassin'
141
142 Defaults to "@@PREFIX@@".
143
144 DEF_RULES_DIR
145 Location where the default rules are installed. Defaults to
146 "@@DEF_RULES_DIR@@".
147
148 LOCAL_RULES_DIR
149 Location where the local site rules are installed. Defaults to
150 "@@LOCAL_RULES_DIR@@".
151
152 LOCAL_STATE_DIR
153 Location of the local state directory, mainly used for
154 installing updates via "sa-update" and compiling rulesets to
155 native code. Defaults to "@@LOCAL_STATE_DIR@@".
156
157 parse($message, $parse_now)
158 Parse will return a Mail::SpamAssassin::Message object with just
159 the headers parsed. When calling this function, there are two
160 optional parameters that can be passed in: $message is either undef
161 (which will use STDIN), a scalar of the entire message, an array
162 reference of the message with 1 line per array element, or a file
163 glob which holds the entire contents of the message; and
164 $parse_now, which specifies whether or not to create the MIME tree
165 at parse time or later as necessary.
166
167 The $parse_now option, by default, is set to false (0). This
168 allows SpamAssassin to not have to generate the tree of internal
169 data nodes if the information is not going to be used. This is
170 handy, for instance, when running "spamassassin -d", which only
171 needs the pristine header and body which is always parsed and
172 stored by this function.
173
174 For more information, please see the "Mail::SpamAssassin::Message"
175 and "Mail::SpamAssassin::Message::Node" POD.
176
177 $status = $f->check ($mail)
178 Check a mail, encapsulated in a "Mail::SpamAssassin::Message"
179 object, to determine if it is spam or not.
180
181 Returns a "Mail::SpamAssassin::PerMsgStatus" object which can be
182 used to test or manipulate the mail message.
183
184 Note that the "Mail::SpamAssassin" object can be re-used for fur‐
185 ther messages without affecting this check; in OO terminology, the
186 "Mail::SpamAssassin" object is a "factory". However, if you do
187 this, be sure to call the "finish()" method on the status objects
188 when you're done with them.
189
190 $status = $f->check_message_text ($mailtext)
191 Check a mail, encapsulated in a plain string $mailtext, to deter‐
192 mine if it is spam or not.
193
194 Otherwise identical to "check()" above.
195
196 $status = $f->learn ($mail, $id, $isspam, $forget)
197 Learn from a mail, encapsulated in a "Mail::SpamAssassin::Message"
198 object.
199
200 If $isspam is set, the mail is assumed to be spam, otherwise it
201 will be learnt as non-spam.
202
203 If $forget is set, the attributes of the mail will be removed from
204 both the non-spam and spam learning databases.
205
206 $id is an optional message-identification string, used internally
207 to tag the message. If it is "undef", the Message-Id of the mes‐
208 sage will be used. It should be unique to that message.
209
210 Returns a "Mail::SpamAssassin::PerMsgLearner" object which can be
211 used to manipulate the learning process for each mail.
212
213 Note that the "Mail::SpamAssassin" object can be re-used for fur‐
214 ther messages without affecting this check; in OO terminology, the
215 "Mail::SpamAssassin" object is a "factory". However, if you do
216 this, be sure to call the "finish()" method on the learner objects
217 when you're done with them.
218
219 "learn()" and "check()" can be run using the same factory.
220 "init_learner()" must be called before using this method.
221
222 $f->init_learner ( [ { opt => val, ... } ] )
223 Initialise learning. You may pass the following attribute-value
224 pairs to this method.
225
226 caller_will_untie
227 Whether or not the code calling this method will take care of
228 untie'ing from the Bayes databases (by calling "fin‐
229 ish_learner()") (optional, default 0).
230
231 force_expire
232 Should an expiration run be forced to occur immediately?
233 (optional, default 0).
234
235 learn_to_journal
236 Should learning data be written to the journal, instead of
237 directly to the databases? (optional, default 0).
238
239 wait_for_lock
240 Whether or not to wait a long time for locks to complete
241 (optional, default 0).
242
243 opportunistic_expire_check_only
244 During the opportunistic journal sync and expire check, don't
245 actually do the expire but report back whether or not it should
246 occur (optional, default 0).
247
248 no_relearn
249 If doing a learn operation, and the message has already been
250 learned as the opposite type, don't re-learn the message.
251
252 $f->rebuild_learner_caches ({ opt => val })
253 Rebuild any cache databases; should be called after the learning
254 process. Options include: "verbose", which will output diagnostics
255 to "stdout" if set to 1.
256
257 $f->finish_learner ()
258 Finish learning.
259
260 $f->dump_bayes_db()
261 Dump the contents of the Bayes DB
262
263 $f->signal_user_changed ( [ { opt => val, ... } ] )
264 Signals that the current user has changed (possibly using
265 "setuid"), meaning that SpamAssassin should close any per-user
266 databases it has open, and re-open using ones appropriate for the
267 new user.
268
269 Note that this should be called after reading any per-user configu‐
270 ration, as that data may override some paths opened in this method.
271 You may pass the following attribute-value pairs:
272
273 username
274 The username of the user. This will be used for the "username"
275 attribute.
276
277 user_dir
278 A directory to use as a 'home directory' for the current user's
279 data, overriding the system default. This directory must be
280 readable and writable by the process. Note that the resulting
281 "userstate_dir" will be the ".spamassassin" subdirectory of
282 this dir.
283
284 userstate_dir
285 A directory to use as a directory for the current user's data,
286 overriding the system default. This directory must be readable
287 and writable by the process. The default is "user_dir/.spamas‐
288 sassin".
289
290 $f->report_as_spam ($mail, $options)
291 Report a mail, encapsulated in a "Mail::SpamAssassin::Message"
292 object, as human-verified spam. This will submit the mail message
293 to live, collaborative, spam-blocker databases, allowing other
294 users to block this message.
295
296 It will also submit the mail to SpamAssassin's Bayesian learner.
297
298 Options is an optional reference to a hash of options. Currently
299 these can be:
300
301 dont_report_to_dcc
302 Inhibits reporting of the spam to DCC.
303
304 dont_report_to_pyzor
305 Inhibits reporting of the spam to Pyzor.
306
307 dont_report_to_razor
308 Inhibits reporting of the spam to Razor.
309
310 dont_report_to_spamcop
311 Inhibits reporting of the spam to SpamCop.
312
313 $f->revoke_as_spam ($mail, $options)
314 Revoke a mail, encapsulated in a "Mail::SpamAssassin::Message"
315 object, as human-verified ham (non-spam). This will revoke the
316 mail message from live, collaborative, spam-blocker databases,
317 allowing other users to block this message.
318
319 It will also submit the mail to SpamAssassin's Bayesian learner as
320 nonspam.
321
322 Options is an optional reference to a hash of options. Currently
323 these can be:
324
325 dont_report_to_razor
326 Inhibits revoking of the spam to Razor.
327
328 $f->add_address_to_whitelist ($addr)
329 Given a string containing an email address, add it to the automatic
330 whitelist database.
331
332 $f->add_all_addresses_to_whitelist ($mail)
333 Given a mail message, find as many addresses in the usual headers
334 (To, Cc, From etc.), and the message body, and add them to the
335 automatic whitelist database.
336
337 $f->remove_address_from_whitelist ($addr)
338 Given a string containing an email address, remove it from the
339 automatic whitelist database.
340
341 $f->remove_all_addresses_from_whitelist ($mail)
342 Given a mail message, find as many addresses in the usual headers
343 (To, Cc, From etc.), and the message body, and remove them from the
344 automatic whitelist database.
345
346 $f->add_address_to_blacklist ($addr)
347 Given a string containing an email address, add it to the automatic
348 whitelist database with a high score, effectively blacklisting
349 them.
350
351 $f->add_all_addresses_to_blacklist ($mail)
352 Given a mail message, find addresses in the From headers and add
353 them to the automatic whitelist database with a high score, effec‐
354 tively blacklisting them.
355
356 Note that To and Cc addresses are not used.
357
358 $text = $f->remove_spamassassin_markup ($mail)
359 Returns the text of the message, with any SpamAssassin-added text
360 (such as the report, or X-Spam-Status headers) stripped.
361
362 Note that the $mail object is not modified.
363
364 Warning: if the input message in $mail contains a mixture of CR-LF
365 (Windows-style) and LF (UNIX-style) line endings, it will be
366 "canonicalized" to use one or the other consistently throughout.
367
368 $f->read_scoreonly_config ($filename)
369 Read a configuration file and parse user preferences from it.
370
371 User preferences are as defined in the "Mail::SpamAssassin::Conf"
372 manual page. In other words, they include scoring options, scores,
373 whitelists and blacklists, and so on, but do not include rule defi‐
374 nitions, privileged settings, etc. unless "allow_user_rules" is
375 enabled; and they never include the administrator settings.
376
377 $f->load_scoreonly_sql ($username)
378 Read configuration paramaters from SQL database and parse scores
379 from it. This will only take effect if the perl "DBI" module is
380 installed, and the configuration parameters "user_scores_dsn",
381 "user_scores_sql_username", and "user_scores_sql_password" are set
382 correctly.
383
384 The username in $username will also be used for the "username"
385 attribute of the Mail::SpamAssassin object.
386
387 $f->load_scoreonly_ldap ($username)
388 Read configuration paramaters from an LDAP server and parse scores
389 from it. This will only take effect if the perl "Net::LDAP" and
390 "URI" modules are installed, and the configuration parameters
391 "user_scores_dsn", "user_scores_ldap_username", and
392 "user_scores_ldap_password" are set correctly.
393
394 The username in $username will also be used for the "username"
395 attribute of the Mail::SpamAssassin object.
396
397 $f->set_persistent_address_list_factory ($factoryobj)
398 Set the persistent address list factory, used to create objects for
399 the automatic whitelist algorithm's persistent-storage back-end.
400 See "Mail::SpamAssassin::PersistentAddrList" for the API these fac‐
401 tory objects must implement, and the API the objects they produce
402 must implement.
403
404 $f->compile_now ($use_user_prefs, $keep_userstate)
405 Compile all patterns, load all configuration files, and load all
406 possibly-required Perl modules.
407
408 Normally, Mail::SpamAssassin uses lazy evaluation where possible,
409 but if you plan to fork() or start a new perl interpreter thread to
410 process a message, this is suboptimal, as each process/thread will
411 have to perform these actions.
412
413 Call this function in the master thread or process to perform the
414 actions straightaway, so that the sub-processes will not have to.
415
416 If $use_user_prefs is 0, this will initialise the SpamAssassin con‐
417 figuration without reading the per-user configuration file and it
418 will assume that you will call "read_scoreonly_config" at a later
419 point.
420
421 If $keep_userstate is true, compile_now() will revert any configu‐
422 ration options which have a default with __userstate__ in it
423 post-init(), and then re-change the option before returning. This
424 lets you change $ENV{'HOME'} to a temp directory, have com‐
425 pile_now() and create any files there as necessary without disturb‐
426 ing the actual files as changed by a configuration option. By
427 default, this is disabled.
428
429 $f->debug_diagnostics ()
430 Output some diagnostic information, useful for debugging SpamAssas‐
431 sin problems.
432
433 $failed = $f->lint_rules ()
434 Syntax-check the current set of rules. Returns the number of syn‐
435 tax errors discovered, or 0 if the configuration is valid.
436
437 $f->finish()
438 Destroy this object, so that it will be garbage-collected once it
439 goes out of scope. The object will no longer be usable after this
440 method is called.
441
442 $fullpath = $f->find_rule_support_file ($filename)
443 Find a rule-support file, such as "languages" or "triplets.txt", in
444 the system-wide rules directory, and return its full path if it
445 exists, or undef if it doesn't exist.
446
447 (This API was added in SpamAssassin 3.1.1.)
448
449 $f->create_default_prefs ($filename, $username [ , $userdir ] )
450 Copy default preferences file into home directory for later use and
451 modification, if it does not already exist and "dont_copy_prefs" is
452 not set.
453
454 $f->copy_config ( [ $source ], [ $dest ] )
455 Used for daemons to keep a persistent Mail::SpamAssassin object's
456 configuration correct if switching between users. Pass an associa‐
457 tive array reference as either $source or $dest, and set the other
458 to 'undef' so that the object will use its current configuration.
459 i.e.:
460
461 # create object w/ configuration
462 my $spamtest = Mail::SpamAssassin->new( ... );
463
464 # backup configuration to %conf_backup
465 my %conf_backup = ();
466 $spamtest->copy_config(undef, \%conf_backup) ⎪⎪
467 die "config: error returned from copy_config!\n";
468
469 ... do stuff, perhaps modify the config, etc ...
470
471 # reset the configuration back to the original
472 $spamtest->copy_config(\%conf_backup, undef) ⎪⎪
473 die "config: error returned from copy_config!\n";
474
475 Note that the contents of the associative arrays should be consid‐
476 ered opaque by calling code.
477
478 @plugins = $f->get_loaded_plugins_list ( )
479 Return the list of plugins currently loaded by this SpamAssassin
480 object's configuration; each entry in the list is an object of type
481 "Mail::SpamAssassin::Plugin".
482
483 (This API was added in SpamAssassin 3.2.0.)
484
486 "HTML::Parser" "Sys::Syslog"
487
489 See also <http://spamassassin.apache.org/> and
490 <http://wiki.apache.org/spamassassin/> for more information.
491
493 Mail::SpamAssassin::Conf(3) Mail::SpamAssassin::PerMsgStatus(3) spamas‐
494 sassin(1) sa-update(1)
495
497 See <http://issues.apache.org/SpamAssassin/>
498
500 The SpamAssassin(tm) Project <http://spamassassin.apache.org/>
501
503 SpamAssassin is distributed under the Apache License, Version 2.0, as
504 described in the file "LICENSE" included with the distribution.
505
507 The latest version of this library is likely to be available from CPAN
508 as well as:
509
510 E<lt>http://spamassassin.apache.org/E<gt>
511
512
513
514perl v5.8.8 2008-01-05 Mail::SpamAssassin(3)