Mail::SpamAssassin(3pm)

1Mail::SpamAssassin(3) User Contributed Perl DocumentationMail::SpamAssassin(3)
2
3
4

NAME

6       Mail::SpamAssassin - Spam detector and markup engine
7

SYNOPSIS

9         my $spamtest = Mail::SpamAssassin->new();
10         my $mail = $spamtest->parse($message);
11         my $status = $spamtest->check($mail);
12
13         if ($status->is_spam()) {
14           $message = $status->rewrite_mail();
15         }
16         else {
17           ...
18         }
19         ...
20
21         $status->finish();
22         $mail->finish();
23         $spamtest->finish();
24

DESCRIPTION

26       Mail::SpamAssassin is a module to identify spam using several methods
27       including text analysis, internet-based realtime blacklists,
28       statistical analysis, and internet-based hashing algorithms.
29
30       Using its rule base, it uses a wide range of heuristic tests on mail
31       headers and body text to identify "spam", also known as unsolicited
32       bulk email.  Once identified as spam, the mail can then be tagged as
33       spam for later filtering using the user's own mail user agent
34       application or at the mail transfer agent.
35
36       If you wish to use a command-line filter tool, try the "spamassassin"
37       or the "spamd"/"spamc" tools provided.
38

METHODS

40       $t = Mail::SpamAssassin->new( { opt => val, ... } )
41           Constructs a new "Mail::SpamAssassin" object.  You may pass a hash
42           reference to the constructor which may contain the following
43           attribute- value pairs.
44
45           debug
46               This is the debug options used to determine logging level.  It
47               exists to allow sections of debug messages (called
48               "facilities") to be enabled or disabled.  If this is a string,
49               it is treated as a comma-delimited list of the debug
50               facilities.  If it's a hash reference, then the keys are
51               treated as the list of debug facilities and if it's a array
52               reference, then the elements are treated as the list of debug
53               facilities.
54
55               There are also two special cases: (1) if the special case of
56               "info" is passed as a debug facility, then all informational
57               messages are enabled; (2) if the special case of "all" is
58               passed as a debug facility, then all debugging facilities are
59               enabled.
60
61           rules_filename
62               The filename/directory to load spam-identifying rules from.
63               (optional)
64
65           site_rules_filename
66               The filename/directory to load site-specific spam-identifying
67               rules from.  (optional)
68
69           userprefs_filename
70               The filename to load preferences from. (optional)
71
72           userstate_dir
73               The directory user state is stored in. (optional)
74
75           config_tree_recurse
76               Set to 1 to recurse through directories when reading
77               configuration files, instead of just reading a single level.
78               (optional, default 0)
79
80           config_text
81               The text of all rules and preferences.  If you prefer not to
82               load the rules from files, read them in yourself and set this
83               instead.  As a result, this will override the settings for
84               "rules_filename", "site_rules_filename", and
85               "userprefs_filename".
86
87           pre_config_text
88               Similar to "config_text", this text is placed before
89               config_text to allow an override of config files.
90
91           post_config_text
92               Similar to "config_text", this text is placed after config_text
93               to allow an override of config files.
94
95           force_ipv4
96               If set to 1, DNS tests will not attempt to use IPv6. Use if the
97               existing tests for IPv6 availability produce incorrect results
98               or crashes.
99
100           require_rules
101               If set to 1, init() will die if no valid rules could be loaded.
102               This is the default behaviour when called by "spamassassin" or
103               "spamd".
104
105           languages_filename
106               If you want to be able to use the language-guessing rule
107               "UNWANTED_LANGUAGE_BODY", and are using "config_text" instead
108               of "rules_filename", "site_rules_filename", and
109               "userprefs_filename", you will need to set this.  It should be
110               the path to the languages file normally found in the
111               SpamAssassin rules directory.
112
113           local_tests_only
114               If set to 1, no tests that require internet access will be
115               performed. (default: 0)
116
117           need_tags
118               The option provides a way to avoid more expensive processing
119               when it is known in advance that some information will not be
120               needed by a caller.
121
122               A value of the option can either be a string (a comma-delimited
123               list of tag names), or a reference to a list of individual tag
124               names. A caller may provide the list in advance, specifying his
125               intention to later collect the information through
126               $pms->get_tag() calls. If a name of a tag starts with a 'NO'
127               (case insensitive), it shows that a caller will not be
128               interested in such tag, although there is no guarantee it would
129               save any resources, nor that a tag value will be empty.
130               Currently no built-in tags start with 'NO'. A later entry
131               overrides previous one, e.g. ASN,NOASN,ASN,TIMING,NOASN is
132               equivalent to TIMING,NOASN.
133
134               For backwards compatibility, all tags available as of version
135               3.2.4 will be available by default (unless disabled by NOtag),
136               even if not requested through need_tags option. Future versions
137               may provide new tags conditionally available.
138
139               Currently the only tag that needs to be explicitly requested is
140               'TIMING'.  Not requesting it can save a millisecond or two - it
141               mostly serves to illustrate the usage of need_tags.
142
143               Example:
144                 need_tags =>
145               'TIMING,noLANGUAGES,RELAYCOUNTRY,ASN,noASNCIDR', or:
146                 need_tags => [qw(TIMING noLANGUAGES RELAYCOUNTRY ASN
147               noASNCIDR)],
148
149           ignore_site_cf_files
150               If set to 1, any rule files found in the "site_rules_filename"
151               directory will be ignored.  *.pre files (used for loading
152               plugins) found in the "site_rules_filename" directory will
153               still be used. (default: 0)
154
155           dont_copy_prefs
156               If set to 1, the user preferences file will not be created if
157               it doesn't already exist. (default: 0)
158
159           save_pattern_hits
160               If set to 1, the patterns hit can be retrieved from the
161               "Mail::SpamAssassin::PerMsgStatus" object.  Used for debugging.
162
163           home_dir_for_helpers
164               If set, the HOME environment variable will be set to this value
165               when using test applications that require their configuration
166               data, such as Razor, Pyzor and DCC.
167
168           username
169               If set, the "username" attribute will use this as the current
170               user's name.  Otherwise, the default is taken from the runtime
171               environment (ie. this process' effective UID under UNIX).
172
173           If none of "rules_filename", "site_rules_filename",
174           "userprefs_filename", or "config_text" is set, the
175           "Mail::SpamAssassin" module will search for the configuration files
176           in the usual installed locations using the below variable
177           definitions which can be passed in.
178
179           PREFIX
180               Used as the root for certain directory paths such as:
181
182                 '__prefix__/etc/mail/spamassassin'
183                 '__prefix__/etc/spamassassin'
184
185               Defaults to "@@PREFIX@@".
186
187           DEF_RULES_DIR
188               Location where the default rules are installed.  Defaults to
189               "@@DEF_RULES_DIR@@".
190
191           LOCAL_RULES_DIR
192               Location where the local site rules are installed.  Defaults to
193               "@@LOCAL_RULES_DIR@@".
194
195           LOCAL_STATE_DIR
196               Location of the local state directory, mainly used for
197               installing updates via "sa-update" and compiling rulesets to
198               native code.  Defaults to "@@LOCAL_STATE_DIR@@".
199
200       parse($message, $parse_now [, $suppl_attrib])
201           Parse will return a Mail::SpamAssassin::Message object with just
202           the headers parsed.  When calling this function, there are two
203           optional parameters that can be passed in: $message is either undef
204           (which will use STDIN), a scalar of the entire message, an array
205           reference of the message with 1 line per array element, or a file
206           glob which holds the entire contents of the message; and
207           $parse_now, which specifies whether or not to create the MIME tree
208           at parse time or later as necessary.
209
210           The $parse_now option, by default, is set to false (0).  This
211           allows SpamAssassin to not have to generate the tree of internal
212           data nodes if the information is not going to be used.  This is
213           handy, for instance, when running "spamassassin -d", which only
214           needs the pristine header and body which is always parsed and
215           stored by this function.
216
217           The optional last argument $suppl_attrib provides a way for a
218           caller to pass additional information about a message to
219           SpamAssassin. It is either undef, or a ref to a hash where each
220           key/value pair provides some supplementary attribute of the
221           message, typically information that cannot be deduced from the
222           message itself, or is hard to do so reliably, or would represent
223           unnecessary work for SpamAssassin to obtain it. The argument will
224           be stored to a Mail::SpamAssassin::Message object as
225           'suppl_attrib', thus made available to the rest of the code as well
226           as to plugins. The exact list of attributes will evolve through
227           time, any unknown attribute should be ignored. Possible examples
228           are: SMTP envelope information, a flag indicating that a message as
229           supplied by a caller was truncated due to size limit, an already
230           verified list of DKIM signature objects, or perhaps a list of rule
231           hits predetermined by a caller, which makes another possible way
232           for a caller to provide meta information (instead of having to
233           insert made-up header fields in order to pass information), or
234           maybe just plain rule hits.
235
236           For more information, please see the "Mail::SpamAssassin::Message"
237           and "Mail::SpamAssassin::Message::Node" POD.
238
239       $status = $f->check ($mail)
240           Check a mail, encapsulated in a "Mail::SpamAssassin::Message"
241           object, to determine if it is spam or not.
242
243           Returns a "Mail::SpamAssassin::PerMsgStatus" object which can be
244           used to test or manipulate the mail message.
245
246           Note that the "Mail::SpamAssassin" object can be re-used for
247           further messages without affecting this check; in OO terminology,
248           the "Mail::SpamAssassin" object is a "factory".   However, if you
249           do this, be sure to call the "finish()" method on the status
250           objects when you're done with them.
251
252       $status = $f->check_message_text ($mailtext)
253           Check a mail, encapsulated in a plain string $mailtext, to
254           determine if it is spam or not.
255
256           Otherwise identical to "check()" above.
257
258       $status = $f->learn ($mail, $id, $isspam, $forget)
259           Learn from a mail, encapsulated in a "Mail::SpamAssassin::Message"
260           object.
261
262           If $isspam is set, the mail is assumed to be spam, otherwise it
263           will be learnt as non-spam.
264
265           If $forget is set, the attributes of the mail will be removed from
266           both the non-spam and spam learning databases.
267
268           $id is an optional message-identification string, used internally
269           to tag the message.  If it is "undef", the Message-Id of the
270           message will be used.  It should be unique to that message.
271
272           Returns a "Mail::SpamAssassin::PerMsgLearner" object which can be
273           used to manipulate the learning process for each mail.
274
275           Note that the "Mail::SpamAssassin" object can be re-used for
276           further messages without affecting this check; in OO terminology,
277           the "Mail::SpamAssassin" object is a "factory".   However, if you
278           do this, be sure to call the "finish()" method on the learner
279           objects when you're done with them.
280
281           "learn()" and "check()" can be run using the same factory.
282           "init_learner()" must be called before using this method.
283
284       $f->init_learner ( [ { opt => val, ... } ] )
285           Initialise learning.  You may pass the following attribute-value
286           pairs to this method.
287
288           caller_will_untie
289               Whether or not the code calling this method will take care of
290               untie'ing from the Bayes databases (by calling
291               "finish_learner()") (optional, default 0).
292
293           force_expire
294               Should an expiration run be forced to occur immediately?
295               (optional, default 0).
296
297           learn_to_journal
298               Should learning data be written to the journal, instead of
299               directly to the databases? (optional, default 0).
300
301           wait_for_lock
302               Whether or not to wait a long time for locks to complete
303               (optional, default 0).
304
305           opportunistic_expire_check_only
306               During the opportunistic journal sync and expire check, don't
307               actually do the expire but report back whether or not it should
308               occur (optional, default 0).
309
310           no_relearn
311               If doing a learn operation, and the message has already been
312               learned as the opposite type, don't re-learn the message.
313
314       $f->rebuild_learner_caches ({ opt => val })
315           Rebuild any cache databases; should be called after the learning
316           process.  Options include: "verbose", which will output diagnostics
317           to "stdout" if set to 1.
318
319       $f->finish_learner ()
320           Finish learning.
321
322       $f->dump_bayes_db()
323           Dump the contents of the Bayes DB
324
325       $f->signal_user_changed ( [ { opt => val, ... } ] )
326           Signals that the current user has changed (possibly using
327           "setuid"), meaning that SpamAssassin should close any per-user
328           databases it has open, and re-open using ones appropriate for the
329           new user.
330
331           Note that this should be called after reading any per-user
332           configuration, as that data may override some paths opened in this
333           method.  You may pass the following attribute-value pairs:
334
335           username
336               The username of the user.  This will be used for the "username"
337               attribute.
338
339           user_dir
340               A directory to use as a 'home directory' for the current user's
341               data, overriding the system default.  This directory must be
342               readable and writable by the process.  Note that the resulting
343               "userstate_dir" will be the ".spamassassin" subdirectory of
344               this dir.
345
346           userstate_dir
347               A directory to use as a directory for the current user's data,
348               overriding the system default.  This directory must be readable
349               and writable by the process.  The default is
350               "user_dir/.spamassassin".
351
352       $f->report_as_spam ($mail, $options)
353           Report a mail, encapsulated in a "Mail::SpamAssassin::Message"
354           object, as human-verified spam.  This will submit the mail message
355           to live, collaborative, spam-blocker databases, allowing other
356           users to block this message.
357
358           It will also submit the mail to SpamAssassin's Bayesian learner.
359
360           Options is an optional reference to a hash of options.  Currently
361           these can be:
362
363           dont_report_to_dcc
364               Inhibits reporting of the spam to DCC.
365
366           dont_report_to_pyzor
367               Inhibits reporting of the spam to Pyzor.
368
369           dont_report_to_razor
370               Inhibits reporting of the spam to Razor.
371
372           dont_report_to_spamcop
373               Inhibits reporting of the spam to SpamCop.
374
375       $f->revoke_as_spam ($mail, $options)
376           Revoke a mail, encapsulated in a "Mail::SpamAssassin::Message"
377           object, as human-verified ham (non-spam).  This will revoke the
378           mail message from live, collaborative, spam-blocker databases,
379           allowing other users to block this message.
380
381           It will also submit the mail to SpamAssassin's Bayesian learner as
382           nonspam.
383
384           Options is an optional reference to a hash of options.  Currently
385           these can be:
386
387           dont_report_to_razor
388               Inhibits revoking of the spam to Razor.
389
390       $f->add_address_to_whitelist ($addr, $cli_p)
391           Given a string containing an email address, add it to the automatic
392           whitelist database.
393
394           If $cli_p is set then underlying plugin may give visual feedback on
395           additions/failures.
396
397       $f->add_all_addresses_to_whitelist ($mail, $cli_p)
398           Given a mail message, find as many addresses in the usual headers
399           (To, Cc, From etc.), and the message body, and add them to the
400           automatic whitelist database.
401
402           If $cli_p is set then underlying plugin may give visual feedback on
403           additions/failures.
404
405       $f->remove_address_from_whitelist ($addr, $cli_p)
406           Given a string containing an email address, remove it from the
407           automatic whitelist database.
408
409           If $cli_p is set then underlying plugin may give visual feedback on
410           additions/failures.
411
412       $f->remove_all_addresses_from_whitelist ($mail, $cli_p)
413           Given a mail message, find as many addresses in the usual headers
414           (To, Cc, From etc.), and the message body, and remove them from the
415           automatic whitelist database.
416
417           If $cli_p is set then underlying plugin may give visual feedback on
418           additions/failures.
419
420       $f->add_address_to_blacklist ($addr, $cli_p)
421           Given a string containing an email address, add it to the automatic
422           whitelist database with a high score, effectively blacklisting
423           them.
424
425           If $cli_p is set then underlying plugin may give visual feedback on
426           additions/failures.
427
428       $f->add_all_addresses_to_blacklist ($mail, $cli_p)
429           Given a mail message, find addresses in the From headers and add
430           them to the automatic whitelist database with a high score,
431           effectively blacklisting them.
432
433           Note that To and Cc addresses are not used.
434
435           If $cli_p is set then underlying plugin may give visual feedback on
436           additions/failures.
437
438       $text = $f->remove_spamassassin_markup ($mail)
439           Returns the text of the message, with any SpamAssassin-added text
440           (such as the report, or X-Spam-Status headers) stripped.
441
442           Note that the $mail object is not modified.
443
444           Warning: if the input message in $mail contains a mixture of CR-LF
445           (Windows-style) and LF (UNIX-style) line endings, it will be
446           "canonicalized" to use one or the other consistently throughout.
447
448       $f->read_scoreonly_config ($filename)
449           Read a configuration file and parse user preferences from it.
450
451           User preferences are as defined in the "Mail::SpamAssassin::Conf"
452           manual page.  In other words, they include scoring options, scores,
453           whitelists and blacklists, and so on, but do not include rule
454           definitions, privileged settings, etc. unless "allow_user_rules" is
455           enabled; and they never include the administrator settings.
456
457       $f->load_scoreonly_sql ($username)
458           Read configuration paramaters from SQL database and parse scores
459           from it.  This will only take effect if the perl "DBI" module is
460           installed, and the configuration parameters "user_scores_dsn",
461           "user_scores_sql_username", and "user_scores_sql_password" are set
462           correctly.
463
464           The username in $username will also be used for the "username"
465           attribute of the Mail::SpamAssassin object.
466
467       $f->load_scoreonly_ldap ($username)
468           Read configuration paramaters from an LDAP server and parse scores
469           from it.  This will only take effect if the perl "Net::LDAP" and
470           "URI" modules are installed, and the configuration parameters
471           "user_scores_dsn", "user_scores_ldap_username", and
472           "user_scores_ldap_password" are set correctly.
473
474           The username in $username will also be used for the "username"
475           attribute of the Mail::SpamAssassin object.
476
477       $f->set_persistent_address_list_factory ($factoryobj)
478           Set the persistent address list factory, used to create objects for
479           the automatic whitelist algorithm's persistent-storage back-end.
480           See "Mail::SpamAssassin::PersistentAddrList" for the API these
481           factory objects must implement, and the API the objects they
482           produce must implement.
483
484       $f->compile_now ($use_user_prefs, $keep_userstate)
485           Compile all patterns, load all configuration files, and load all
486           possibly-required Perl modules.
487
488           Normally, Mail::SpamAssassin uses lazy evaluation where possible,
489           but if you plan to fork() or start a new perl interpreter thread to
490           process a message, this is suboptimal, as each process/thread will
491           have to perform these actions.
492
493           Call this function in the master thread or process to perform the
494           actions straightaway, so that the sub-processes will not have to.
495
496           If $use_user_prefs is 0, this will initialise the SpamAssassin
497           configuration without reading the per-user configuration file and
498           it will assume that you will call "read_scoreonly_config" at a
499           later point.
500
501           If $keep_userstate is true, compile_now() will revert any
502           configuration options which have a default with __userstate__ in it
503           post-init(), and then re-change the option before returning.  This
504           lets you change $ENV{'HOME'} to a temp directory, have
505           compile_now() and create any files there as necessary without
506           disturbing the actual files as changed by a configuration option.
507           By default, this is disabled.
508
509       $f->debug_diagnostics ()
510           Output some diagnostic information, useful for debugging
511           SpamAssassin problems.
512
513       $failed = $f->lint_rules ()
514           Syntax-check the current set of rules.  Returns the number of
515           syntax errors discovered, or 0 if the configuration is valid.
516
517       $f->finish()
518           Destroy this object, so that it will be garbage-collected once it
519           goes out of scope.  The object will no longer be usable after this
520           method is called.
521
522       $fullpath = $f->find_rule_support_file ($filename)
523           Find a rule-support file, such as "languages" or "triplets.txt", in
524           the system-wide rules directory, and return its full path if it
525           exists, or undef if it doesn't exist.
526
527           (This API was added in SpamAssassin 3.1.1.)
528
529       $f->create_default_prefs ($filename, $username [ , $userdir ] )
530           Copy default preferences file into home directory for later use and
531           modification, if it does not already exist and "dont_copy_prefs" is
532           not set.
533
534       $f->copy_config ( [ $source ], [ $dest ] )
535           Used for daemons to keep a persistent Mail::SpamAssassin object's
536           configuration correct if switching between users.  Pass an
537           associative array reference as either $source or $dest, and set the
538           other to 'undef' so that the object will use its current
539           configuration.  i.e.:
540
541             # create object w/ configuration
542             my $spamtest = Mail::SpamAssassin->new( ... );
543
544             # backup configuration to %conf_backup
545             my %conf_backup;
546             $spamtest->copy_config(undef, \%conf_backup) ||
547               die "config: error returned from copy_config!\n";
548
549             ... do stuff, perhaps modify the config, etc ...
550
551             # reset the configuration back to the original
552             $spamtest->copy_config(\%conf_backup, undef) ||
553               die "config: error returned from copy_config!\n";
554
555           Note that the contents of the associative arrays should be
556           considered opaque by calling code.
557
558       @plugins = $f->get_loaded_plugins_list ( )
559           Return the list of plugins currently loaded by this SpamAssassin
560           object's configuration; each entry in the list is an object of type
561           "Mail::SpamAssassin::Plugin".
562
563           (This API was added in SpamAssassin 3.2.0.)
564

PREREQUISITES

566       "HTML::Parser" "Sys::Syslog"
567

BUGS

577       See <http://issues.apache.org/SpamAssassin/>
578

AUTHORS

580       The SpamAssassin(tm) Project <http://spamassassin.apache.org/>
581

COPYRIGHT

583       SpamAssassin is distributed under the Apache License, Version 2.0, as
584       described in the file "LICENSE" included with the distribution.
585

AVAILABILITY

587       The latest version of this library is likely to be available from CPAN
588       as well as:
589
590         E<lt>http://spamassassin.apache.org/E<gt>
591
592
593
594perl v5.10.1                      2010-03-16             Mail::SpamAssassin(3)

NAME

SYNOPSIS

DESCRIPTION

METHODS

PREREQUISITES

MORE DOCUMENTATION

SEE ALSO

BUGS

AUTHORS

COPYRIGHT

AVAILABILITY