1Mail::SpamAssassin(3) User Contributed Perl DocumentationMail::SpamAssassin(3)
2
3
4
6 Mail::SpamAssassin - Spam detector and markup engine
7
9 my $spamtest = Mail::SpamAssassin->new();
10 my $mail = $spamtest->parse($message);
11 my $status = $spamtest->check($mail);
12
13 if ($status->is_spam()) {
14 $message = $status->rewrite_mail();
15 }
16 else {
17 ...
18 }
19 ...
20
21 $status->finish();
22 $mail->finish();
23 $spamtest->finish();
24
26 Mail::SpamAssassin is a module to identify spam using several methods
27 including text analysis, internet-based realtime blacklists,
28 statistical analysis, and internet-based hashing algorithms.
29
30 Using its rule base, it uses a wide range of heuristic tests on mail
31 headers and body text to identify "spam", also known as unsolicited
32 bulk email. Once identified as spam, the mail can then be tagged as
33 spam for later filtering using the user's own mail user agent
34 application or at the mail transfer agent.
35
36 If you wish to use a command-line filter tool, try the "spamassassin"
37 or the "spamd"/"spamc" tools provided.
38
40 $t = Mail::SpamAssassin->new( { opt => val, ... } )
41 Constructs a new "Mail::SpamAssassin" object. You may pass a hash
42 reference to the constructor which may contain the following
43 attribute- value pairs.
44
45 debug
46 This is the debug options used to determine logging level. It
47 exists to allow sections of debug messages (called
48 "facilities") to be enabled or disabled. If this is a string,
49 it is treated as a comma-delimited list of the debug
50 facilities. If it's a hash reference, then the keys are
51 treated as the list of debug facilities and if it's a array
52 reference, then the elements are treated as the list of debug
53 facilities.
54
55 There are also two special cases: (1) if the special case of
56 "info" is passed as a debug facility, then all informational
57 messages are enabled; (2) if the special case of "all" is
58 passed as a debug facility, then all debugging facilities are
59 enabled.
60
61 rules_filename
62 The filename/directory to load spam-identifying rules from.
63 (optional)
64
65 site_rules_filename
66 The filename/directory to load site-specific spam-identifying
67 rules from. (optional)
68
69 userprefs_filename
70 The filename to load preferences from. (optional)
71
72 userstate_dir
73 The directory user state is stored in. (optional)
74
75 config_tree_recurse
76 Set to 1 to recurse through directories when reading
77 configuration files, instead of just reading a single level.
78 (optional, default 0)
79
80 config_text
81 The text of all rules and preferences. If you prefer not to
82 load the rules from files, read them in yourself and set this
83 instead. As a result, this will override the settings for
84 "rules_filename", "site_rules_filename", and
85 "userprefs_filename".
86
87 pre_config_text
88 Similar to "config_text", this text is placed before
89 config_text to allow an override of config files.
90
91 post_config_text
92 Similar to "config_text", this text is placed after config_text
93 to allow an override of config files.
94
95 force_ipv4
96 If set to 1, DNS tests will not attempt to use IPv6. Use if the
97 existing tests for IPv6 availability produce incorrect results
98 or crashes.
99
100 require_rules
101 If set to 1, init() will die if no valid rules could be loaded.
102 This is the default behaviour when called by "spamassassin" or
103 "spamd".
104
105 languages_filename
106 If you want to be able to use the language-guessing rule
107 "UNWANTED_LANGUAGE_BODY", and are using "config_text" instead
108 of "rules_filename", "site_rules_filename", and
109 "userprefs_filename", you will need to set this. It should be
110 the path to the languages file normally found in the
111 SpamAssassin rules directory.
112
113 local_tests_only
114 If set to 1, no tests that require internet access will be
115 performed. (default: 0)
116
117 need_tags
118 The option provides a way to avoid more expensive processing
119 when it is known in advance that some information will not be
120 needed by a caller.
121
122 A value of the option can either be a string (a comma-delimited
123 list of tag names), or a reference to a list of individual tag
124 names. A caller may provide the list in advance, specifying his
125 intention to later collect the information through
126 $pms->get_tag() calls. If a name of a tag starts with a 'NO'
127 (case insensitive), it shows that a caller will not be
128 interested in such tag, although there is no guarantee it would
129 save any resources, nor that a tag value will be empty.
130 Currently no built-in tags start with 'NO'. A later entry
131 overrides previous one, e.g. ASN,NOASN,ASN,TIMING,NOASN is
132 equivalent to TIMING,NOASN.
133
134 For backwards compatibility, all tags available as of version
135 3.2.4 will be available by default (unless disabled by NOtag),
136 even if not requested through need_tags option. Future versions
137 may provide new tags conditionally available.
138
139 Currently the only tag that needs to be explicitly requested is
140 'TIMING'. Not requesting it can save a millisecond or two - it
141 mostly serves to illustrate the usage of need_tags.
142
143 Example:
144 need_tags =>
145 'TIMING,noLANGUAGES,RELAYCOUNTRY,ASN,noASNCIDR', or:
146 need_tags => [qw(TIMING noLANGUAGES RELAYCOUNTRY ASN
147 noASNCIDR)],
148
149 ignore_site_cf_files
150 If set to 1, any rule files found in the "site_rules_filename"
151 directory will be ignored. *.pre files (used for loading
152 plugins) found in the "site_rules_filename" directory will
153 still be used. (default: 0)
154
155 dont_copy_prefs
156 If set to 1, the user preferences file will not be created if
157 it doesn't already exist. (default: 0)
158
159 save_pattern_hits
160 If set to 1, the patterns hit can be retrieved from the
161 "Mail::SpamAssassin::PerMsgStatus" object. Used for debugging.
162
163 home_dir_for_helpers
164 If set, the HOME environment variable will be set to this value
165 when using test applications that require their configuration
166 data, such as Razor, Pyzor and DCC.
167
168 username
169 If set, the "username" attribute will use this as the current
170 user's name. Otherwise, the default is taken from the runtime
171 environment (ie. this process' effective UID under UNIX).
172
173 If none of "rules_filename", "site_rules_filename",
174 "userprefs_filename", or "config_text" is set, the
175 "Mail::SpamAssassin" module will search for the configuration files
176 in the usual installed locations using the below variable
177 definitions which can be passed in.
178
179 PREFIX
180 Used as the root for certain directory paths such as:
181
182 '__prefix__/etc/mail/spamassassin'
183 '__prefix__/etc/spamassassin'
184
185 Defaults to "@@PREFIX@@".
186
187 DEF_RULES_DIR
188 Location where the default rules are installed. Defaults to
189 "@@DEF_RULES_DIR@@".
190
191 LOCAL_RULES_DIR
192 Location where the local site rules are installed. Defaults to
193 "@@LOCAL_RULES_DIR@@".
194
195 LOCAL_STATE_DIR
196 Location of the local state directory, mainly used for
197 installing updates via "sa-update" and compiling rulesets to
198 native code. Defaults to "@@LOCAL_STATE_DIR@@".
199
200 parse($message, $parse_now [, $suppl_attrib])
201 Parse will return a Mail::SpamAssassin::Message object with just
202 the headers parsed. When calling this function, there are two
203 optional parameters that can be passed in: $message is either undef
204 (which will use STDIN), a scalar of the entire message, an array
205 reference of the message with 1 line per array element, or a file
206 glob which holds the entire contents of the message; and
207 $parse_now, which specifies whether or not to create the MIME tree
208 at parse time or later as necessary.
209
210 The $parse_now option, by default, is set to false (0). This
211 allows SpamAssassin to not have to generate the tree of internal
212 data nodes if the information is not going to be used. This is
213 handy, for instance, when running "spamassassin -d", which only
214 needs the pristine header and body which is always parsed and
215 stored by this function.
216
217 The optional last argument $suppl_attrib provides a way for a
218 caller to pass additional information about a message to
219 SpamAssassin. It is either undef, or a ref to a hash where each
220 key/value pair provides some supplementary attribute of the
221 message, typically information that cannot be deduced from the
222 message itself, or is hard to do so reliably, or would represent
223 unnecessary work for SpamAssassin to obtain it. The argument will
224 be stored to a Mail::SpamAssassin::Message object as
225 'suppl_attrib', thus made available to the rest of the code as well
226 as to plugins. The exact list of attributes will evolve through
227 time, any unknown attribute should be ignored. Possible examples
228 are: SMTP envelope information, a flag indicating that a message as
229 supplied by a caller was truncated due to size limit, an already
230 verified list of DKIM signature objects, or perhaps a list of rule
231 hits predetermined by a caller, which makes another possible way
232 for a caller to provide meta information (instead of having to
233 insert made-up header fields in order to pass information), or
234 maybe just plain rule hits.
235
236 For more information, please see the "Mail::SpamAssassin::Message"
237 and "Mail::SpamAssassin::Message::Node" POD.
238
239 $status = $f->check ($mail)
240 Check a mail, encapsulated in a "Mail::SpamAssassin::Message"
241 object, to determine if it is spam or not.
242
243 Returns a "Mail::SpamAssassin::PerMsgStatus" object which can be
244 used to test or manipulate the mail message.
245
246 Note that the "Mail::SpamAssassin" object can be re-used for
247 further messages without affecting this check; in OO terminology,
248 the "Mail::SpamAssassin" object is a "factory". However, if you
249 do this, be sure to call the "finish()" method on the status
250 objects when you're done with them.
251
252 $status = $f->check_message_text ($mailtext)
253 Check a mail, encapsulated in a plain string $mailtext, to
254 determine if it is spam or not.
255
256 Otherwise identical to "check()" above.
257
258 $status = $f->learn ($mail, $id, $isspam, $forget)
259 Learn from a mail, encapsulated in a "Mail::SpamAssassin::Message"
260 object.
261
262 If $isspam is set, the mail is assumed to be spam, otherwise it
263 will be learnt as non-spam.
264
265 If $forget is set, the attributes of the mail will be removed from
266 both the non-spam and spam learning databases.
267
268 $id is an optional message-identification string, used internally
269 to tag the message. If it is "undef", the Message-Id of the
270 message will be used. It should be unique to that message.
271
272 Returns a "Mail::SpamAssassin::PerMsgLearner" object which can be
273 used to manipulate the learning process for each mail.
274
275 Note that the "Mail::SpamAssassin" object can be re-used for
276 further messages without affecting this check; in OO terminology,
277 the "Mail::SpamAssassin" object is a "factory". However, if you
278 do this, be sure to call the "finish()" method on the learner
279 objects when you're done with them.
280
281 "learn()" and "check()" can be run using the same factory.
282 "init_learner()" must be called before using this method.
283
284 $f->init_learner ( [ { opt => val, ... } ] )
285 Initialise learning. You may pass the following attribute-value
286 pairs to this method.
287
288 caller_will_untie
289 Whether or not the code calling this method will take care of
290 untie'ing from the Bayes databases (by calling
291 "finish_learner()") (optional, default 0).
292
293 force_expire
294 Should an expiration run be forced to occur immediately?
295 (optional, default 0).
296
297 learn_to_journal
298 Should learning data be written to the journal, instead of
299 directly to the databases? (optional, default 0).
300
301 wait_for_lock
302 Whether or not to wait a long time for locks to complete
303 (optional, default 0).
304
305 opportunistic_expire_check_only
306 During the opportunistic journal sync and expire check, don't
307 actually do the expire but report back whether or not it should
308 occur (optional, default 0).
309
310 no_relearn
311 If doing a learn operation, and the message has already been
312 learned as the opposite type, don't re-learn the message.
313
314 $f->rebuild_learner_caches ({ opt => val })
315 Rebuild any cache databases; should be called after the learning
316 process. Options include: "verbose", which will output diagnostics
317 to "stdout" if set to 1.
318
319 $f->finish_learner ()
320 Finish learning.
321
322 $f->dump_bayes_db()
323 Dump the contents of the Bayes DB
324
325 $f->signal_user_changed ( [ { opt => val, ... } ] )
326 Signals that the current user has changed (possibly using
327 "setuid"), meaning that SpamAssassin should close any per-user
328 databases it has open, and re-open using ones appropriate for the
329 new user.
330
331 Note that this should be called after reading any per-user
332 configuration, as that data may override some paths opened in this
333 method. You may pass the following attribute-value pairs:
334
335 username
336 The username of the user. This will be used for the "username"
337 attribute.
338
339 user_dir
340 A directory to use as a 'home directory' for the current user's
341 data, overriding the system default. This directory must be
342 readable and writable by the process. Note that the resulting
343 "userstate_dir" will be the ".spamassassin" subdirectory of
344 this dir.
345
346 userstate_dir
347 A directory to use as a directory for the current user's data,
348 overriding the system default. This directory must be readable
349 and writable by the process. The default is
350 "user_dir/.spamassassin".
351
352 $f->report_as_spam ($mail, $options)
353 Report a mail, encapsulated in a "Mail::SpamAssassin::Message"
354 object, as human-verified spam. This will submit the mail message
355 to live, collaborative, spam-blocker databases, allowing other
356 users to block this message.
357
358 It will also submit the mail to SpamAssassin's Bayesian learner.
359
360 Options is an optional reference to a hash of options. Currently
361 these can be:
362
363 dont_report_to_dcc
364 Inhibits reporting of the spam to DCC.
365
366 dont_report_to_pyzor
367 Inhibits reporting of the spam to Pyzor.
368
369 dont_report_to_razor
370 Inhibits reporting of the spam to Razor.
371
372 dont_report_to_spamcop
373 Inhibits reporting of the spam to SpamCop.
374
375 $f->revoke_as_spam ($mail, $options)
376 Revoke a mail, encapsulated in a "Mail::SpamAssassin::Message"
377 object, as human-verified ham (non-spam). This will revoke the
378 mail message from live, collaborative, spam-blocker databases,
379 allowing other users to block this message.
380
381 It will also submit the mail to SpamAssassin's Bayesian learner as
382 nonspam.
383
384 Options is an optional reference to a hash of options. Currently
385 these can be:
386
387 dont_report_to_razor
388 Inhibits revoking of the spam to Razor.
389
390 $f->add_address_to_whitelist ($addr, $cli_p)
391 Given a string containing an email address, add it to the automatic
392 whitelist database.
393
394 If $cli_p is set then underlying plugin may give visual feedback on
395 additions/failures.
396
397 $f->add_all_addresses_to_whitelist ($mail, $cli_p)
398 Given a mail message, find as many addresses in the usual headers
399 (To, Cc, From etc.), and the message body, and add them to the
400 automatic whitelist database.
401
402 If $cli_p is set then underlying plugin may give visual feedback on
403 additions/failures.
404
405 $f->remove_address_from_whitelist ($addr, $cli_p)
406 Given a string containing an email address, remove it from the
407 automatic whitelist database.
408
409 If $cli_p is set then underlying plugin may give visual feedback on
410 additions/failures.
411
412 $f->remove_all_addresses_from_whitelist ($mail, $cli_p)
413 Given a mail message, find as many addresses in the usual headers
414 (To, Cc, From etc.), and the message body, and remove them from the
415 automatic whitelist database.
416
417 If $cli_p is set then underlying plugin may give visual feedback on
418 additions/failures.
419
420 $f->add_address_to_blacklist ($addr, $cli_p)
421 Given a string containing an email address, add it to the automatic
422 whitelist database with a high score, effectively blacklisting
423 them.
424
425 If $cli_p is set then underlying plugin may give visual feedback on
426 additions/failures.
427
428 $f->add_all_addresses_to_blacklist ($mail, $cli_p)
429 Given a mail message, find addresses in the From headers and add
430 them to the automatic whitelist database with a high score,
431 effectively blacklisting them.
432
433 Note that To and Cc addresses are not used.
434
435 If $cli_p is set then underlying plugin may give visual feedback on
436 additions/failures.
437
438 $text = $f->remove_spamassassin_markup ($mail)
439 Returns the text of the message, with any SpamAssassin-added text
440 (such as the report, or X-Spam-Status headers) stripped.
441
442 Note that the $mail object is not modified.
443
444 Warning: if the input message in $mail contains a mixture of CR-LF
445 (Windows-style) and LF (UNIX-style) line endings, it will be
446 "canonicalized" to use one or the other consistently throughout.
447
448 $f->read_scoreonly_config ($filename)
449 Read a configuration file and parse user preferences from it.
450
451 User preferences are as defined in the "Mail::SpamAssassin::Conf"
452 manual page. In other words, they include scoring options, scores,
453 whitelists and blacklists, and so on, but do not include rule
454 definitions, privileged settings, etc. unless "allow_user_rules" is
455 enabled; and they never include the administrator settings.
456
457 $f->load_scoreonly_sql ($username)
458 Read configuration paramaters from SQL database and parse scores
459 from it. This will only take effect if the perl "DBI" module is
460 installed, and the configuration parameters "user_scores_dsn",
461 "user_scores_sql_username", and "user_scores_sql_password" are set
462 correctly.
463
464 The username in $username will also be used for the "username"
465 attribute of the Mail::SpamAssassin object.
466
467 $f->load_scoreonly_ldap ($username)
468 Read configuration paramaters from an LDAP server and parse scores
469 from it. This will only take effect if the perl "Net::LDAP" and
470 "URI" modules are installed, and the configuration parameters
471 "user_scores_dsn", "user_scores_ldap_username", and
472 "user_scores_ldap_password" are set correctly.
473
474 The username in $username will also be used for the "username"
475 attribute of the Mail::SpamAssassin object.
476
477 $f->set_persistent_address_list_factory ($factoryobj)
478 Set the persistent address list factory, used to create objects for
479 the automatic whitelist algorithm's persistent-storage back-end.
480 See "Mail::SpamAssassin::PersistentAddrList" for the API these
481 factory objects must implement, and the API the objects they
482 produce must implement.
483
484 $f->compile_now ($use_user_prefs, $keep_userstate)
485 Compile all patterns, load all configuration files, and load all
486 possibly-required Perl modules.
487
488 Normally, Mail::SpamAssassin uses lazy evaluation where possible,
489 but if you plan to fork() or start a new perl interpreter thread to
490 process a message, this is suboptimal, as each process/thread will
491 have to perform these actions.
492
493 Call this function in the master thread or process to perform the
494 actions straightaway, so that the sub-processes will not have to.
495
496 If $use_user_prefs is 0, this will initialise the SpamAssassin
497 configuration without reading the per-user configuration file and
498 it will assume that you will call "read_scoreonly_config" at a
499 later point.
500
501 If $keep_userstate is true, compile_now() will revert any
502 configuration options which have a default with __userstate__ in it
503 post-init(), and then re-change the option before returning. This
504 lets you change $ENV{'HOME'} to a temp directory, have
505 compile_now() and create any files there as necessary without
506 disturbing the actual files as changed by a configuration option.
507 By default, this is disabled.
508
509 $f->debug_diagnostics ()
510 Output some diagnostic information, useful for debugging
511 SpamAssassin problems.
512
513 $failed = $f->lint_rules ()
514 Syntax-check the current set of rules. Returns the number of
515 syntax errors discovered, or 0 if the configuration is valid.
516
517 $f->finish()
518 Destroy this object, so that it will be garbage-collected once it
519 goes out of scope. The object will no longer be usable after this
520 method is called.
521
522 $fullpath = $f->find_rule_support_file ($filename)
523 Find a rule-support file, such as "languages" or "triplets.txt", in
524 the system-wide rules directory, and return its full path if it
525 exists, or undef if it doesn't exist.
526
527 (This API was added in SpamAssassin 3.1.1.)
528
529 $f->create_default_prefs ($filename, $username [ , $userdir ] )
530 Copy default preferences file into home directory for later use and
531 modification, if it does not already exist and "dont_copy_prefs" is
532 not set.
533
534 $f->copy_config ( [ $source ], [ $dest ] )
535 Used for daemons to keep a persistent Mail::SpamAssassin object's
536 configuration correct if switching between users. Pass an
537 associative array reference as either $source or $dest, and set the
538 other to 'undef' so that the object will use its current
539 configuration. i.e.:
540
541 # create object w/ configuration
542 my $spamtest = Mail::SpamAssassin->new( ... );
543
544 # backup configuration to %conf_backup
545 my %conf_backup;
546 $spamtest->copy_config(undef, \%conf_backup) ||
547 die "config: error returned from copy_config!\n";
548
549 ... do stuff, perhaps modify the config, etc ...
550
551 # reset the configuration back to the original
552 $spamtest->copy_config(\%conf_backup, undef) ||
553 die "config: error returned from copy_config!\n";
554
555 Note that the contents of the associative arrays should be
556 considered opaque by calling code.
557
558 @plugins = $f->get_loaded_plugins_list ( )
559 Return the list of plugins currently loaded by this SpamAssassin
560 object's configuration; each entry in the list is an object of type
561 "Mail::SpamAssassin::Plugin".
562
563 (This API was added in SpamAssassin 3.2.0.)
564
566 "HTML::Parser" "Sys::Syslog"
567
569 See also <http://spamassassin.apache.org/> and
570 <http://wiki.apache.org/spamassassin/> for more information.
571
573 Mail::SpamAssassin::Conf(3) Mail::SpamAssassin::PerMsgStatus(3)
574 spamassassin(1) sa-update(1)
575
577 See <http://issues.apache.org/SpamAssassin/>
578
580 The SpamAssassin(tm) Project <http://spamassassin.apache.org/>
581
583 SpamAssassin is distributed under the Apache License, Version 2.0, as
584 described in the file "LICENSE" included with the distribution.
585
587 The latest version of this library is likely to be available from CPAN
588 as well as:
589
590 E<lt>http://spamassassin.apache.org/E<gt>
591
592
593
594perl v5.12.4 2011-06-06 Mail::SpamAssassin(3)