Mail::SpamAssassin::PerMsgStatus(3pm)

1Mail::SpamAssassin::PerUMssegrStCaotnutsr(i3b)uted PerlMDaoiclu:m:eSnptaamtAisosnassin::PerMsgStatus(3)
2
3
4

NAME

6       Mail::SpamAssassin::PerMsgStatus - per-message status (spam or
7       not-spam)
8

SYNOPSIS

10         my $spamtest = new Mail::SpamAssassin ({
11           'rules_filename'      => '/etc/spamassassin.rules',
12           'userprefs_filename'  => $ENV{HOME}.'/.spamassassin/user_prefs'
13         });
14         my $mail = $spamtest->parse();
15
16         my $status = $spamtest->check ($mail);
17
18         my $rewritten_mail;
19         if ($status->is_spam()) {
20           $rewritten_mail = $status->rewrite_mail ();
21         }
22         ...
23

DESCRIPTION

25       The Mail::SpamAssassin "check()" method returns an object of this
26       class.  This object encapsulates all the per-message state.
27

METHODS

29       $status->check ()
30           Runs the SpamAssassin rules against the message pointed to by the
31           object.
32
33       $status->learn()
34           After a mail message has been checked, this method can be called.
35           If the score is outside a certain range around the threshold, ie.
36           if the message is judged more-or-less definitely spam or definitely
37           non-spam, it will be fed into SpamAssassin's learning systems
38           (currently the naive Bayesian classifier), so that future similar
39           mails will be caught.
40
41       $score = $status->get_autolearn_points()
42           Return the message's score as computed for auto-learning.  Certain
43           tests are ignored:
44
45             - rules with tflags set to 'learn' (the Bayesian rules)
46
47             - rules with tflags set to 'userconf' (user white/black-listing rules, etc)
48
49             - rules with tflags set to 'noautolearn'
50
51           Also note that auto-learning occurs using scores from either
52           scoreset 0 or 1, depending on what scoreset is used during message
53           check.  It is likely that the message check and auto-learn scores
54           will be different.
55
56       $score = $status->get_head_only_points()
57           Return the message's score as computed for auto-learning, ignoring
58           all rules except for header-based ones.
59
60       $score = $status->get_learned_points()
61           Return the message's score as computed for auto-learning, ignoring
62           all rules except for learning-based ones.
63
64       $score = $status->get_body_only_points()
65           Return the message's score as computed for auto-learning, ignoring
66           all rules except for body-based ones.
67
68       $score = $status->get_autolearn_force_status()
69           Return whether a message's score included any rules that are
70           flagged as autolearn_force.
71
72       $rule_names = $status->get_autolearn_force_names()
73           Return a list of comma separated list of rule names if a message's
74           score included any rules that are flagged as autolearn_force.
75
76       $isspam = $status->is_spam ()
77           After a mail message has been checked, this method can be called.
78           It will return 1 for mail determined likely to be spam, 0 if it
79           does not seem spam-like.
80
81       $list = $status->get_names_of_tests_hit ()
82           After a mail message has been checked, this method can be called.
83           It will return a comma-separated string, listing all the symbolic
84           test names of the tests which were triggered by the mail.
85
86       $list = $status->get_names_of_tests_hit_with_scores_hash ()
87           After a mail message has been checked, this method can be called.
88           It will return a pointer to a hash for rule & score pairs for all
89           the symbolic test names and individual scores of the tests which
90           were triggered by the mail.
91
92       $list = $status->get_names_of_tests_hit_with_scores ()
93           After a mail message has been checked, this method can be called.
94           It will return a comma-separated string of rule=score pairs for all
95           the symbolic test names and individual scores of the tests which
96           were triggered by the mail.
97
98       $list = $status->get_names_of_subtests_hit ()
99           After a mail message has been checked, this method can be called.
100           It will return a comma-separated string, listing all the symbolic
101           test names of the meta-rule sub-tests which were triggered by the
102           mail.  Sub-tests are the normally-hidden rules, which score 0 and
103           have names beginning with two underscores, used in meta rules.
104
105           If a parameter of collapsed or dbg is passed, the output will be a
106           condensed array of sub-tests with multiple hits reduced to one
107           entry.
108
109           If the parameter of dbg is passed, the output will be a condensed
110           string of sub-tests with multiple hits reduced to one entry with
111           the number of hits in parentheses. Some information is also added
112           at the end regarding the multiple hits.
113
114       $num = $status->get_score ()
115           After a mail message has been checked, this method can be called.
116           It will return the message's score.
117
118       $num = $status->get_required_score ()
119           After a mail message has been checked, this method can be called.
120           It will return the score required for a mail to be considered spam.
121
122       $num = $status->get_autolearn_status ()
123           After a mail message has been checked, this method can be called.
124           It will return one of the following strings depending on whether
125           the mail was auto-learned or not: "ham", "no", "spam", "disabled",
126           "failed", "unavailable".
127
128           It also returns is flagged with auto_learn_force, it will also
129           include the status and the rules hit.  For example:
130           "autolearn_force=yes (AUTOLEARNTEST_BODY)"
131
132       $report = $status->get_report ()
133           Deliver a "spam report" on the checked mail message.  This contains
134           details of how many spam detection rules it triggered.
135
136           The report is returned as a multi-line string, with the lines
137           separated by "\n" characters.
138
139       $preview = $status->get_content_preview ()
140           Give a "preview" of the content.
141
142           This is returned as a multi-line string, with the lines separated
143           by "\n" characters, containing a fully-decoded, safe, plain-text
144           sample of the first few lines of the message body.
145
146       $msg = $status->get_message()
147           Return the object representing the message being scanned.
148
149       $status->rewrite_mail ()
150           Rewrite the mail message.  This will at minimum add headers, and at
151           maximum MIME-encapsulate the message text, to reflect its spam or
152           not-spam status.  The function will return a scalar of the
153           rewritten message.
154
155           The actual modifications depend on the configuration (see
156           "Mail::SpamAssassin::Conf" for more information).
157
158           The possible modifications are as follows:
159
160           To:, From: and Subject: modification on spam mails
161               Depending on the configuration, the To: and From: lines can
162               have a user-defined RFC 2822 comment appended for spam mail.
163               The subject line may have a user-defined string prepended to it
164               for spam mail.
165
166           X-Spam-* headers for all mails
167               Depending on the configuration, zero or more headers with names
168               beginning with "X-Spam-" will be added to mail depending on
169               whether it is spam or ham.
170
171           spam message with report_safe
172               If report_safe is set to true (1), then spam messages are
173               encapsulated into their own message/rfc822 MIME attachment
174               without any modifications being made.
175
176               If report_safe is set to false (0), then the message will only
177               have the above headers added/modified.
178
179       $status->action_depends_on_tags($tags, $code, @args)
180           Enqueue the supplied subroutine reference $code, to become runnable
181           when all the specified tags become available. The $tags may be a
182           simple scalar - a tag name, or a listref of tag names. The
183           subroutine &$code when called will be passed a "permessagestatus"
184           object as its first argument, followed by the supplied (optional)
185           list @args .
186
187       $status->set_tag($tagname, $value)
188           Set a template tag, as used in "add_header", report templates, etc.
189           This API is intended for use by plugins.  Tag names will be
190           converted to an all-uppercase representation internally.
191
192           $value can be a simple scalar (string or number), or a reference to
193           an array, in which case the public method get_tag will join array
194           elements using a space as a separator, returning a single string
195           for backward compatibility.
196
197           $value can also be a subroutine reference, which will be evaluated
198           each time the template is expanded. The first argument passed by
199           get_tag to a called subroutine will be a PerMsgStatus object (this
200           module's object), followed by optional arguments provided a caller
201           to get_tag.
202
203           Note that perl supports closures, which means that variables set in
204           the caller's scope can be accessed inside this "sub". For example:
205
206               my $text = "hello world!";
207               $status->set_tag("FOO", sub {
208                         my $pms = shift;
209                         return $text;
210                       });
211
212           See "Mail::SpamAssassin::Conf"'s "TEMPLATE TAGS" section for more
213           details on how template tags are used.
214
215           "undef" will be returned if a tag by that name has not been
216           defined.
217
218       $string = $status->get_tag($tagname)
219           Get the current value of a template tag, as used in "add_header",
220           report templates, etc. This API is intended for use by plugins.
221           Tag names will be converted to an all-uppercase representation
222           internally.  See "Mail::SpamAssassin::Conf"'s "TEMPLATE TAGS"
223           section for more details on tags.
224
225           "undef" will be returned if a tag by that name has not been
226           defined.
227
228       $string = $status->get_tag_raw($tagname, @args)
229           Similar to "get_tag", but keeps a tag name unchanged (does not
230           uppercase it), and does not convert arrayref tag values into a
231           single string.
232
233       $status->set_spamd_result_item($subref)
234           Set an entry for the spamd result log line.  $subref should be a
235           code reference for a subroutine which will return a string in
236           'name=VALUE' format, similar to the other entries in the spamd
237           result line:
238
239             Jul 17 14:10:47 radish spamd[16670]: spamd: result: Y 22 - ALL_NATURAL,
240             DATE_IN_FUTURE_03_06,DIET_1,DRUGS_ERECTILE,DRUGS_PAIN,
241             TEST_FORGED_YAHOO_RCVD,TEST_INVALID_DATE,TEST_NOREALNAME,
242             TEST_NORMAL_HTTP_TO_IP,UNDISC_RECIPS scantime=0.4,size=3138,user=jm,
243             uid=1000,required_score=5.0,rhost=localhost,raddr=127.0.0.1,
244             rport=33153,mid=<9PS291LhupY>,autolearn=spam
245
246           "name" and "VALUE" must not contain "=" or "," characters, as it is
247           important that these log lines are easy to parse.
248
249           The code reference will be called by spamd after the message has
250           been scanned, and the "PerMsgStatus::check()" method has returned.
251
252       $status->finish ()
253           Indicate that this $status object is finished with, and can be
254           destroyed.
255
256           If you are using SpamAssassin in a persistent environment, or
257           checking many mail messages from one "Mail::SpamAssassin" factory,
258           this method should be called to ensure Perl's garbage collection
259           will clean up old status objects.
260
261       $name = $status->get_current_eval_rule_name()
262           Return the name of the currently-running eval rule.  "undef" is
263           returned if no eval rule is currently being run.  Useful for
264           plugins to determine the current rule name while inside an eval
265           test function call.
266
267       $status->get_decoded_body_text_array ()
268           Returns the message body, with base64 or quoted-printable encodings
269           decoded, and non-text parts or non-inline attachments stripped.
270
271           This is the same result text as used in 'rawbody' rules.
272
273           It is returned as an array of strings, with each string being a
274           2-4kB chunk of the body, split from boundaries if possible.
275
276       $status->get_decoded_stripped_body_text_array ()
277           Returns the message body, decoded (as described in
278           get_decoded_body_text_array()), with HTML rendered, and with
279           whitespace normalized.
280
281           This is the same result text as used in 'body' rules.
282
283           It will always render text/html.
284
285           It is returned as an array of strings, with each string
286           representing one 'paragraph'.  Paragraphs, in plain-text mails, are
287           double-newline-separated blocks of multi-line text.
288
289       $status->get (header_name [, default_value])
290           Returns a message header, pseudo-header, real name or address.
291           "header_name" is the name of a mail header, such as 'Subject',
292           'To', etc.  If "default_value" is given, it will be used if the
293           requested "header_name" does not exist.
294
295           Appending ":raw" to the header name will inhibit decoding of
296           quoted-printable or base-64 encoded strings.
297
298           Appending a modifier ":addr" to a header field name will cause
299           everything except the first email address to be removed from the
300           header field.  It is mainly applicable to header fields 'From',
301           'Sender', 'To', 'Cc' along with their 'Resent-*' counterparts, and
302           the 'Return-Path'. For example, all of the following will result in
303           "example@foo":
304
305           example@foo
306           example@foo (Foo Blah)
307           example@foo, example@bar
308           display: example@foo (Foo Blah), example@bar ;
309           Foo Blah <example@foo>
310           "Foo Blah" <example@foo>
311           "'Foo Blah'" <example@foo>
312
313           Appending a modifier ":name" to a header field name will cause
314           everything except the first display name to be removed from the
315           header field. It is mainly applicable to header fields containing a
316           single mail address: 'From', 'Sender', along with their
317           'Resent-From' and 'Resent-Sender' counterparts.  For example, all
318           of the following will result in "Foo Blah". One level of single
319           quotes is stripped too, as it is often seen.
320
321           example@foo (Foo Blah)
322           example@foo (Foo Blah), example@bar
323           display: example@foo (Foo Blah), example@bar ;
324           Foo Blah <example@foo>
325           "Foo Blah" <example@foo>
326           "'Foo Blah'" <example@foo>
327
328           There are several special pseudo-headers that can be specified:
329
330           "ALL" can be used to mean the text of all the message's headers.
331           Each header is decoded and unfolded to single line, unless called
332           with :raw.
333           "ALL-TRUSTED" can be used to mean the text of all the message's
334           headers that could only have been added by trusted relays.
335           "ALL-INTERNAL" can be used to mean the text of all the message's
336           headers that could only have been added by internal relays.
337           "ALL-UNTRUSTED" can be used to mean the text of all the message's
338           headers that may have been added by untrusted relays.  To make this
339           pseudo-header more useful for header rules the 'Received' header
340           that was added by the last trusted relay is included, even though
341           it can be trusted.
342           "ALL-EXTERNAL" can be used to mean the text of all the message's
343           headers that may have been added by external relays.  Like
344           "ALL-UNTRUSTED" the 'Received' header added by the last internal
345           relay is included.
346           "ToCc" can be used to mean the contents of both the 'To' and 'Cc'
347           headers.
348           "EnvelopeFrom" is the address used in the 'MAIL FROM:' phase of the
349           SMTP transaction that delivered this message, if this data has been
350           made available by the SMTP server.
351           "MESSAGEID" is a symbol meaning all Message-Id's found in the
352           message; some mailing list software moves the real 'Message-Id' to
353           'Resent-Message-Id' or 'X-Message-Id', then uses its own one in the
354           'Message-Id' header.  The value returned for this symbol is the
355           text from all 3 headers, separated by newlines.
356           "X-Spam-Relays-Untrusted" is the generated metadata of untrusted
357           relays the message has passed through
358           "X-Spam-Relays-Trusted" is the generated metadata of trusted relays
359           the message has passed through
360       $status->get_uri_list ()
361           Returns an array of all unique URIs found in the message.  It takes
362           a combination of the URIs found in the rendered (decoded and HTML
363           stripped) body and the URIs found when parsing the HTML in the
364           message.  Will also set $status->{uri_list} (the array as returned
365           by this function).
366
367           The returned array will include the "raw" URI as well as "slightly
368           cooked" versions.  For example, the single URI
369           'http://%77&#00119;%77.example.com/' will get turned into: (
370           'http://%77&#00119;%77.example.com/', 'http://www.example.com/' )
371
372       $status->get_uri_detail_list ()
373           Returns a hash reference of all unique URIs found in the message
374           and various data about where the URIs were found in the message.
375           It takes a combination of the URIs found in the rendered (decoded
376           and HTML stripped) body and the URIs found when parsing the HTML in
377           the message.  Will also set $status->{uri_detail_list} (the hash
378           reference as returned by this function).
379
380           The hash format looks something like this:
381
382             raw_uri => {
383               types => { a => 1, img => 1, parsed => 1, domainkeys => 1,
384                          unlinked => 1, schemeless => 1 },
385               cleaned => [ canonicalized_uri ],
386               anchor_text => [ "click here", "no click here" ],
387               domains => { domain1 => 1, domain2 => 1 },
388               hosts => { host1 => domain1, host2 => domain2 },
389             }
390
391           "raw_uri" is whatever the URI was in the message itself
392           (http://spamassassin.apache%2Eorg/).  Uris parsed from text will be
393           prefixed with scheme if missing (http://, mailto: etc).  HTML uris
394           are as found.
395
396           "types" is a hash of the HTML tags (lowercase) which referenced the
397           raw_uri.  parsed is a faked type which specifies that the raw_uri
398           was seen in the rendered text.  domainkeys is defined when raw_uri
399           was found from DK/DKIM d= field.  unlinked is defined when it's
400           assumed that MUA will not linkify uri (found in body without scheme
401           or www. prefix).  schemeless is always added for uris without
402           scheme, regardless of linkifying (i.e. email address found in body
403           without mailto:).
404
405           "cleaned" is an array of the raw and canonicalized version of the
406           raw_uri (http://spamassassin.apache%2Eorg/,
407           https://spamassassin.apache.org/).
408
409           "anchor_text" is an array of the anchor text (text between <a> and
410           </a>), if any, which linked to the URI.
411
412           "domains" is a hash of the domains found in the canonicalized URIs.
413
414           "hosts" is a hash of unstripped hostnames found in the
415           canonicalized URIs as hash keys, with their domain part stored as a
416           value of each hash entry.
417
418       $status->add_uri_detail_list ($raw_uri, $types, $source, $valid_domain)
419           Adds values to internal uri_detail_list.  When used from Plugins,
420           recommended to call from parsed_metadata (along with
421           register_method_priority, -10) so other Plugins calling
422           get_uri_detail_list() will see it.
423
424           "raw_uri" is the URI to be added. The only required parameter.
425
426           "types" is an optional hash reference, contents are added to
427           uri_detail_list->{types} (see get_uri_detail_list for known keys).
428           parsed is default is no hash given.  nocanon does not run
429           uri_list_canonicalize (no redirector, uri fixing).  noclean skips
430           adding uri_detail_list->{cleaned}, so it would not be used in "uri"
431           rule checks, but domain/hosts would still be used for URIBL/RBL
432           purposes.
433
434           "source" is an optional simple string, only used for debug logging
435           purposes to identify where uri originates from (default: "parsed").
436
437           "valid_domain" is an optional boolean (0/1).  If true, uri will not
438           be added unless hostname/domain is in valid format and contains a
439           valid TLD.  (default: 0)
440
441       $status->clear_test_state()
442           Clear test state, including test log messages from
443           "$status->test_log()".
444
445       $status->got_hit ($rulename, $desc_prepend [, name => value, ...])
446           Register a hit against a rule in the ruleset.
447
448           There are two mandatory arguments. These are $rulename, the name of
449           the rule that fired, and $desc_prepend, which is a short string
450           that will be prepended to the rules "describe" string in output
451           reports.
452
453           In addition, callers can supplement that with the following
454           optional data:
455
456           score => $num
457               Optional: the score to use for the rule hit.  If unspecified,
458               the value from the "Mail::SpamAssassin::Conf" object's
459               "{scores}" hash will be used (a configured score), and in its
460               absence the "defscore" option value.
461
462           defscore => $num
463               Optional: the score to use for the rule hit if neither the
464               option "score" is provided, nor a configured score value is
465               provided.
466
467           value => $num
468               Optional: the value to assign to the rule; the default value is
469               1.  tflags multiple rules use values of greater than 1 to
470               indicate multiple hits.  This value is accessible to meta
471               rules.
472
473           ruletype => $type
474               Optional, but recommended: the rule type string.  This is used
475               in the "hit_rule" plugin call, called by this method.  If
476               unset, 'unknown' is used.
477
478           tflags => $string
479               Optional: a string, i.e. a space-separated list of additional
480               tflags to be appended to an existing list of flags in
481               $self->{conf}->{tflags}, such as: "nice noautolearn multiple".
482               No syntax checks are performed.
483
484           description => $string
485               Optional: a custom rule description string.  This is used in
486               the "hit_rule" plugin call, called by this method. If unset,
487               the static description is used.
488
489           Backward compatibility: the two mandatory arguments have been part
490           of this API since SpamAssassin 2.x.  The optional name=<gtvalue>
491           pairs, however, are a new addition in SpamAssassin 3.2.0.
492
493       $status->create_fulltext_tmpfile (fulltext_ref)
494           This function creates a temporary file containing the passed scalar
495           reference data (typically the full/pristine text of the message).
496           This is typically used by external programs like pyzor and dccproc,
497           to avoid hangs due to buffering issues.   Methods that need this,
498           should call $self->create_fulltext_tmpfile($fulltext) to retrieve
499           the temporary filename; it will be created if it has not already
500           been.
501
502           Note: This can only be called once until
503           $status->delete_fulltext_tmpfile() is called.
504
505       $status->delete_fulltext_tmpfile ()
506           Will cleanup after a $status->create_fulltext_tmpfile() call.
507           Deletes the temporary file and uncaches the filename.
508
509       all_from_addrs_domains
510           This function returns all the various from addresses in a message
511           using all_from_addrs() and then returns only the domain names.
512

NAME

SYNOPSIS

DESCRIPTION

METHODS

SEE ALSO