Mail::SpamAssassin::PerMsgStatus(3pm)

1Mail::SpamAssassin::PerUMssegrStCaotnutsr(i3b)uted PerlMDaoiclu:m:eSnptaamtAisosnassin::PerMsgStatus(3)
2
3
4

NAME

6       Mail::SpamAssassin::PerMsgStatus - per-message status (spam or
7       not-spam)
8

SYNOPSIS

10         my $spamtest = Mail::SpamAssassin->new({
11           'rules_filename'      => '/etc/spamassassin.rules',
12           'userprefs_filename'  => $ENV{HOME}.'/.spamassassin/user_prefs'
13         });
14         my $mail = $spamtest->parse();
15
16         my $status = $spamtest->check ($mail);
17
18         my $rewritten_mail;
19         if ($status->is_spam()) {
20           $rewritten_mail = $status->rewrite_mail ();
21         }
22         ...
23

DESCRIPTION

25       The Mail::SpamAssassin check() method returns an object of this class.
26       This object encapsulates all the per-message state.
27

METHODS

29       $status->check ()
30           Runs the SpamAssassin rules against the message pointed to by the
31           object.
32
33       $status->learn()
34           After a mail message has been checked, this method can be called.
35           If the score is outside a certain range around the threshold, ie.
36           if the message is judged more-or-less definitely spam or definitely
37           non-spam, it will be fed into SpamAssassin's learning systems
38           (currently the naive Bayesian classifier), so that future similar
39           mails will be caught.
40
41       $score = $status->get_autolearn_points()
42           Return the message's score as computed for auto-learning.  Certain
43           tests are ignored:
44
45             - rules with tflags set to 'learn' (the Bayesian rules)
46
47             - rules with tflags set to 'userconf' (user welcome/block-listing rules, etc)
48
49             - rules with tflags set to 'noautolearn'
50
51           Also note that auto-learning occurs using scores from either
52           scoreset 0 or 1, depending on what scoreset is used during message
53           check.  It is likely that the message check and auto-learn scores
54           will be different.
55
56       $score = $status->get_head_only_points()
57           Return the message's score as computed for auto-learning, ignoring
58           all rules except for header-based ones.
59
60       $score = $status->get_learned_points()
61           Return the message's score as computed for auto-learning, ignoring
62           all rules except for learning-based ones.
63
64       $score = $status->get_body_only_points()
65           Return the message's score as computed for auto-learning, ignoring
66           all rules except for body-based ones.
67
68       $score = $status->get_autolearn_force_status()
69           Return whether a message's score included any rules that are
70           flagged as autolearn_force.
71
72       $rule_names = $status->get_autolearn_force_names()
73           Return a list of comma separated list of rule names if a message's
74           score included any rules that are flagged as autolearn_force.
75
76       $isspam = $status->is_spam ()
77           After a mail message has been checked, this method can be called.
78           It will return 1 for mail determined likely to be spam, 0 if it
79           does not seem spam-like.
80
81       $list = $status->get_names_of_tests_hit ()
82           After a mail message has been checked, this method can be called.
83           It will return a comma-separated string, listing all the symbolic
84           test names of the tests which were triggered by the mail.
85
86       $list = $status->get_names_of_tests_hit_with_scores_hash ()
87           After a mail message has been checked, this method can be called.
88           It will return a pointer to a hash for rule & score pairs for all
89           the symbolic test names and individual scores of the tests which
90           were triggered by the mail.
91
92       $list = $status->get_names_of_tests_hit_with_scores ()
93           After a mail message has been checked, this method can be called.
94           It will return a comma-separated string of rule=score pairs for all
95           the symbolic test names and individual scores of the tests which
96           were triggered by the mail.
97
98       $list = $status->get_names_of_subtests_hit ()
99           After a mail message has been checked, this method can be called.
100           It will return a comma-separated string, listing all the symbolic
101           test names of the meta-rule sub-tests which were triggered by the
102           mail.  Sub-tests are the normally-hidden rules, which score 0 and
103           have names beginning with two underscores, used in meta rules.
104
105           If a parameter of collapsed or dbg is passed, the output will be a
106           condensed array of sub-tests with multiple hits reduced to one
107           entry.
108
109           If the parameter of dbg is passed, the output will be a condensed
110           string of sub-tests with multiple hits reduced to one entry with
111           the number of hits in parentheses. Some information is also added
112           at the end regarding the multiple hits.
113
114       $num = $status->get_score ()
115           After a mail message has been checked, this method can be called.
116           It will return the message's score.
117
118       $num = $status->get_required_score ()
119           After a mail message has been checked, this method can be called.
120           It will return the score required for a mail to be considered spam.
121
122       $num = $status->get_autolearn_status ()
123           After a mail message has been checked, this method can be called.
124           It will return one of the following strings depending on whether
125           the mail was auto-learned or not: "ham", "no", "spam", "disabled",
126           "failed", "unavailable".
127
128           It also returns is flagged with auto_learn_force, it will also
129           include the status and the rules hit.  For example:
130           "autolearn_force=yes (AUTOLEARNTEST_BODY)"
131
132       $report = $status->get_report ()
133           Deliver a "spam report" on the checked mail message.  This contains
134           details of how many spam detection rules it triggered.
135
136           The report is returned as a multi-line string, with the lines
137           separated by "\n" characters.
138
139       $preview = $status->get_content_preview ()
140           Give a "preview" of the content.
141
142           This is returned as a multi-line string, with the lines separated
143           by "\n" characters, containing a fully-decoded, safe, plain-text
144           sample of the first few lines of the message body.
145
146       $msg = $status->get_message()
147           Return the object representing the message being scanned.
148
149       $status->rewrite_mail ()
150           Rewrite the mail message.  This will at minimum add headers, and at
151           maximum MIME-encapsulate the message text, to reflect its spam or
152           not-spam status.  The function will return a scalar of the
153           rewritten message.
154
155           The actual modifications depend on the configuration (see
156           "Mail::SpamAssassin::Conf" for more information).
157
158           The possible modifications are as follows:
159
160           To:, From: and Subject: modification on spam mails
161               Depending on the configuration, the To: and From: lines can
162               have a user-defined RFC 2822 comment appended for spam mail.
163               The subject line may have a user-defined string prepended to it
164               for spam mail.
165
166           X-Spam-* headers for all mails
167               Depending on the configuration, zero or more headers with names
168               beginning with "X-Spam-" will be added to mail depending on
169               whether it is spam or ham.
170
171           spam message with report_safe
172               If report_safe is set to true (1), then spam messages are
173               encapsulated into their own message/rfc822 MIME attachment
174               without any modifications being made.
175
176               If report_safe is set to false (0), then the message will only
177               have the above headers added/modified.
178
179       $status->action_depends_on_tags($tags, $code, @args)
180           Enqueue the supplied subroutine reference $code, to become runnable
181           when all the specified tags become available. The $tags may be a
182           simple scalar - a tag name, or a listref of tag names. The
183           subroutine &$code when called will be passed a "permessagestatus"
184           object as its first argument, followed by the supplied (optional)
185           list @args .
186
187       $status->set_tag($tagname, $value)
188           Set a template tag, as used in "add_header", report templates, etc.
189           This API is intended for use by plugins.  Tag names will be
190           converted to an all-uppercase representation internally.  Tag names
191           must consist only of [A-Z0-9_] characters and must not contain
192           consecutive underscores.  Also the name must not start or end in an
193           underscore, as that is the template tagging format.
194
195           $value can be a simple scalar (string or number), or a reference to
196           an array, in which case the public method get_tag will join array
197           elements using a space as a separator, returning a single string
198           for backward compatibility.
199
200           $value can also be a subroutine reference, which will be evaluated
201           each time the template is expanded. The first argument passed by
202           get_tag to a called subroutine will be a PerMsgStatus object (this
203           module's object), followed by optional arguments provided by a
204           caller to get_tag.
205
206           Note that perl supports closures, which means that variables set in
207           the caller's scope can be accessed inside this "sub". For example:
208
209               my $text = "hello world!";
210               $status->set_tag("FOO", sub {
211                         my $pms = shift;
212                         return $text;
213                       });
214
215           See "Mail::SpamAssassin::Conf"'s "TEMPLATE TAGS" and "CAPTURING
216           TAGS USING REGEX NAMED CAPTURE GROUPS" sections for more details on
217           how template tags are used.
218
219       $string = $status->get_tag($tagname)
220           Get the current value of a template tag, as used in "add_header",
221           report templates, etc. This API is intended for use by plugins.
222           Tag names will be converted to an all-uppercase representation
223           internally.
224
225           See "Mail::SpamAssassin::Conf"'s "TEMPLATE TAGS" and "CAPTURING
226           TAGS USING REGEX NAMED CAPTURE GROUPS" sections for more details on
227           how template tags are used.
228
229           "undef" will be returned if a tag by that name has not been
230           defined.
231
232       $string = $status->get_tag_raw($tagname, @args)
233           Similar to "get_tag", but keeps a tag name unchanged (does not
234           uppercase it), and does not convert arrayref tag values into a
235           single string.
236
237       $status->set_spamd_result_item($subref)
238           Set an entry for the spamd result log line.  $subref should be a
239           code reference for a subroutine which will return a string in
240           'name=VALUE' format, similar to the other entries in the spamd
241           result line:
242
243             Jul 17 14:10:47 radish spamd[16670]: spamd: result: Y 22 - ALL_NATURAL,
244             DATE_IN_FUTURE_03_06,DIET_1,DRUGS_ERECTILE,DRUGS_PAIN,
245             TEST_FORGED_YAHOO_RCVD,TEST_INVALID_DATE,TEST_NOREALNAME,
246             TEST_NORMAL_HTTP_TO_IP,UNDISC_RECIPS scantime=0.4,size=3138,user=jm,
247             uid=1000,required_score=5.0,rhost=localhost,raddr=127.0.0.1,
248             rport=33153,mid=<9PS291LhupY>,autolearn=spam
249
250           "name" and "VALUE" must not contain "=" or "," characters, as it is
251           important that these log lines are easy to parse.
252
253           The code reference will be called by spamd after the message has
254           been scanned, and the PerMsgStatus::check() method has returned.
255
256       $status->finish ()
257           Indicate that this $status object is finished with, and can be
258           destroyed.
259
260           If you are using SpamAssassin in a persistent environment, or
261           checking many mail messages from one "Mail::SpamAssassin" factory,
262           this method should be called to ensure Perl's garbage collection
263           will clean up old status objects.
264
265       $name = $status->get_current_eval_rule_name()
266           Return the name of the currently-running eval rule.  "undef" is
267           returned if no eval rule is currently being run.  Useful for
268           plugins to determine the current rule name while inside an eval
269           test function call.
270
271       $status->get_decoded_body_text_array ()
272           Returns the message body, with base64 or quoted-printable encodings
273           decoded, and non-text parts or non-inline attachments stripped.
274
275           This is the same result text as used in 'rawbody' rules.
276
277           It is returned as an array of strings, with each string being a
278           2-4kB chunk of the body, split from boundaries if possible.
279
280       $status->get_decoded_stripped_body_text_array ()
281           Returns the message body, decoded (as described in
282           get_decoded_body_text_array()), with HTML rendered, and with
283           whitespace normalized.
284
285           This is the same result text as used in 'body' rules.
286
287           It will always render text/html.
288
289           It is returned as an array of strings, with each string
290           representing one 'paragraph'.  Paragraphs, in plain-text mails, are
291           double-newline-separated blocks of multi-line text.
292
293       $status->get (header_name [, default_value])
294           Returns a message header, pseudo-header or a real name, email-
295           address or some other parsed value set by modifiers.  "header_name"
296           is the name of a mail header, such as 'Subject', 'To', etc.
297
298           Should be called in list context since 4.0.  Will return list of
299           headers content, or other values when modifiers used.
300
301           If "default_value" is given, it will be used if the requested
302           "header_name" does not exist.  This is mainly useful when called in
303           scalar context to set 'undef' instead of legacy '' return value
304           when header does not exist.
305
306           Appending ":raw" modifier to the header name will inhibit decoding
307           of quoted-printable or base-64 encoded strings.
308
309           Appending ":addr" modifier to the header name will return all
310           email-addresses found in the header.  It is mainly applicable to
311           header fields 'From', 'Sender', 'To', 'Cc' along with their
312           'Resent-*' counterparts, and the 'Return-Path'.  For example, all
313           of the following will result in "example@foo" (and "example@bar"):
314
315           example@foo
316           example@foo (Foo Blah), <example@bar>
317           example@foo, example@bar
318           display: example@foo (Foo Blah), example@bar ;
319           Foo Blah <example@foo>
320           "Foo Blah" <example@foo>
321           "'Foo Blah'" <example@foo>
322
323           Appending ":name" modifier to the header name will return all
324           "display names" from the header field.  As with ":addr", it is
325           mainly applicable to header fields 'From', 'Sender', 'To', 'Cc'
326           along with their 'Resent-*' counterparts, and the 'Return-Path'.
327           For example, all of the following will result in "Foo Blah" (and
328           "Bar Baz").  One level of single quotes is stripped too, as it is
329           often seen.
330
331           example@foo (Foo Blah)
332           example@foo (Foo Blah), "Bar Baz" <example@bar>
333           display: example@foo (Foo Blah), example@bar ;
334           Foo Blah <example@foo>
335           "Foo Blah" <example@foo>
336           "'Foo Blah'" <example@foo>
337
338           Appending ":host" to the header name will return the first
339           hostname-looking string that ends with a valid TLD.  First it tries
340           to find a match after @ character (possible email), then from any
341           part of the header.  Normal use of this would be for example
342           'From:addr:host' to return the hostname portion of a From-address.
343
344           Appending ":domain" to the header name implies ":host", but will
345           return only domain part of the hostname, as returned by
346           RegistryBoundaries::trim_domain().
347
348           Appending ":ip" to the header name, will return the first IPv4 or
349           IPv6 address string found.  Could be used for example as
350           'X-Originating-IP:ip'.
351
352           Appending ":revip" to the header name implies ":ip", but will
353           return the found IP in reverse (usually for DNSBL usage).
354
355           Appending ":first" modifier to the header name will return only the
356           first (topmost) header, in case there are multiple ones.  Similarly
357           ":last" will select the last one.  These affect only the physical
358           header line selection.  If selected header is parsed further with
359           ":addr" or similar, it may return multiple results, if the selected
360           header contains multiple addresses.
361
362           There are several special pseudo-headers that can be specified:
363
364           "ALL" can be used to mean the text of all the message's headers.
365           Each header is decoded and unfolded to single line, unless called
366           with :raw.
367           "ALL-TRUSTED" can be used to mean the text of all the message's
368           headers that could only have been added by trusted relays.
369           "ALL-INTERNAL" can be used to mean the text of all the message's
370           headers that could only have been added by internal relays.
371           "ALL-UNTRUSTED" can be used to mean the text of all the message's
372           headers that may have been added by untrusted relays.  To make this
373           pseudo-header more useful for header rules the 'Received' header
374           that was added by the last trusted relay is included, even though
375           it can be trusted.
376           "ALL-EXTERNAL" can be used to mean the text of all the message's
377           headers that may have been added by external relays.  Like
378           "ALL-UNTRUSTED" the 'Received' header added by the last internal
379           relay is included.
380           "ToCc" can be used to mean the contents of both the 'To' and 'Cc'
381           headers.
382           "EnvelopeFrom" is the address used in the 'MAIL FROM:' phase of the
383           SMTP transaction that delivered this message, if this data has been
384           made available by the SMTP server.
385           "MESSAGEID" is a symbol meaning all Message-Id's found in the
386           message; some mailing list software moves the real 'Message-Id' to
387           'Resent-Message-Id' or 'X-Message-Id', then uses its own one in the
388           'Message-Id' header.  The value returned for this symbol is the
389           text from all 3 headers, separated by newlines.
390           "X-Spam-Relays-Untrusted" is the generated metadata of untrusted
391           relays the message has passed through
392           "X-Spam-Relays-Trusted" is the generated metadata of trusted relays
393           the message has passed through
394           "X-Spam-Relays-External" is the generated metadata of external
395           relays the message has passed through
396           "X-Spam-Relays-Internal" is the generated metadata of internal
397           relays the message has passed through
398       $status->get_uri_list ()
399           Returns an array of all unique URIs found in the message.  It takes
400           a combination of the URIs found in the rendered (decoded and HTML
401           stripped) body and the URIs found when parsing the HTML in the
402           message.  Will also set $status->{uri_list} (the array as returned
403           by this function).
404
405           The returned array will include the "raw" URI as well as "slightly
406           cooked" versions.  For example, the single URI
407           'http://%77&#00119;%77.example.com/' will get turned into: (
408           'http://%77&#00119;%77.example.com/', 'http://www.example.com/' )
409
410       $status->get_uri_detail_list ()
411           Returns a hash reference of all unique URIs found in the message
412           and various data about where the URIs were found in the message.
413           It takes a combination of the URIs found in the rendered (decoded
414           and HTML stripped) body and the URIs found when parsing the HTML in
415           the message.  Will also set $status->{uri_detail_list} (the hash
416           reference as returned by this function).
417
418           The hash format looks something like this:
419
420             raw_uri => {
421               types => { a => 1, img => 1, parsed => 1, domainkeys => 1,
422                          unlinked => 1, schemeless => 1 },
423               cleaned => [ canonicalized_uri ],
424               anchor_text => [ "click here", "no click here" ],
425               domains => { domain1 => 1, domain2 => 1 },
426               hosts => { host1 => domain1, host2 => domain2 },
427             }
428
429           "raw_uri" is whatever the URI was in the message itself
430           (http://spamassassin.apache%2Eorg/).  Uris parsed from text will be
431           prefixed with scheme if missing (http://, mailto: etc).  HTML uris
432           are as found.
433
434           "types" is a hash of the HTML tags (lowercase) which referenced the
435           raw_uri.  parsed is a faked type which specifies that the raw_uri
436           was seen in the rendered text.  domainkeys is defined when raw_uri
437           was found from DK/DKIM d= field.  unlinked is defined when it's
438           assumed that MUA will not linkify uri (found in body without scheme
439           or www. prefix).  schemeless is always added for uris without
440           scheme, regardless of linkifying (i.e. email address found in body
441           without mailto:).
442
443           "cleaned" is an array of the raw and canonicalized version of the
444           raw_uri (http://spamassassin.apache%2Eorg/,
445           https://spamassassin.apache.org/).
446
447           "anchor_text" is an array of the anchor text (text between <a> and
448           </a>), if any, which linked to the URI.
449
450           "domains" is a hash of the domains found in the canonicalized URIs.
451
452           "hosts" is a hash of unstripped hostnames found in the
453           canonicalized URIs as hash keys, with their domain part stored as a
454           value of each hash entry.
455
456       $status->add_uri_detail_list ($raw_uri, $types, $source, $valid_domain)
457           Adds values to internal uri_detail_list.  When used from Plugins,
458           recommended to call from parsed_metadata (along with
459           register_method_priority, -10) so other Plugins calling
460           get_uri_detail_list() will see it.
461
462           "raw_uri" is the URI to be added. The only required parameter.
463
464           "types" is an optional hash reference, contents are added to
465           uri_detail_list->{types} (see get_uri_detail_list for known keys).
466           parsed is default is no hash given.  nocanon does not run
467           uri_list_canonicalize (no redirector, uri fixing).  noclean skips
468           adding uri_detail_list->{cleaned}, so it would not be used in "uri"
469           rule checks, but domain/hosts would still be used for URIBL/RBL
470           purposes.
471
472           "source" is an optional simple string, only used for debug logging
473           purposes to identify where uri originates from (default: "parsed").
474
475           "valid_domain" is an optional boolean (0/1).  If true, uri will not
476           be added unless hostname/domain is in valid format and contains a
477           valid TLD.  (default: 0)
478
479       $status->clear_test_state()
480           DEPRECATED, UNNEEDED SINCE 4.0
481
482       $status->got_hit ($rulename, $desc_prepend [, name => value, ...])
483           Register a hit against a rule in the ruleset.
484
485           There are two mandatory arguments. These are $rulename, the name of
486           the rule that fired, and $desc_prepend, which is a short string
487           that will be prepended to the rules "describe" string in output
488           reports.
489
490           In addition, callers can supplement that with the following
491           optional data:
492
493           score => $num
494               Optional: the score to use for the rule hit.  If unspecified,
495               the value from the "Mail::SpamAssassin::Conf" object's
496               "{scores}" hash will be used (a configured score), and in its
497               absence the "defscore" option value.
498
499           defscore => $num
500               Optional: the score to use for the rule hit if neither the
501               option "score" is provided, nor a configured score value is
502               provided.
503
504           value => $num
505               Optional: the value to assign to the rule; the default value is
506               1.  tflags multiple rules use values of greater than 1 to
507               indicate multiple hits.  This value is accessible to meta
508               rules.
509
510           ruletype => $type
511               Optional, but recommended: the rule type string.  This is used
512               in the "hit_rule" plugin call, called by this method.  If
513               unset, 'unknown' is used.
514
515           tflags => $string
516               Optional: a string, i.e. a space-separated list of additional
517               tflags to be appended to an existing list of flags in
518               $self->{conf}->{tflags}, such as: "nice noautolearn multiple".
519               No syntax checks are performed.
520
521           description => $string
522               Optional: a custom rule description string.  This is used in
523               the "hit_rule" plugin call, called by this method. If unset,
524               the static description is used.
525
526           Backward compatibility: the two mandatory arguments have been part
527           of this API since SpamAssassin 2.x.  The optional name=<gtvalue>
528           pairs, however, are a new addition in SpamAssassin 3.2.0.
529
530       $status->rule_ready ($rulename [, $no_async])
531           Mark an asynchronous rule ready, so it can be considered for meta
532           rule evaluation.  Asynchronous rule is a rule whose eval-function
533           returns undef, marking that it's not ready yet, expecting results
534           later.  $status->rule_ready() must be called later to mark it
535           ready, alternatively $status->got_hit() also does this.  If neither
536           is called, then any meta rule that depends on this rule might not
537           evaluate.
538
539           Optional boolean $no_async skips checking if there are pending
540           async DNS lookups for the rule.
541
542       $status->test_log ($text [, $rulename])
543           Add $text log entry for a hit rule in final message REPORT/SUMMARY.
544
545           Usually called just before got_hit(), to describe for example what
546           URI the rule matched on.  Optional <$rulename> argument is
547           recommended to make sure log is written to correct rule.  If
548           rulename is not provided, get_current_eval_rule_name() is used as
549           fallback.
550
551           Can be called multiple times per rule for additional entries.
552
553       $status->create_fulltext_tmpfile (fulltext_ref)
554           This function creates a temporary file containing the passed scalar
555           reference data.  If no scalar is passed, full/pristine message text
556           is assumed.  This is typically used by external programs like pyzor
557           and dccproc, to avoid hangs due to buffering issues.
558
559           All tempfiles are automatically cleaned up by PerMsgStatus
560           destructor.
561
562       $status->delete_fulltext_tmpfile (tmpfile)
563           Will cleanup after a $status->create_fulltext_tmpfile() call.
564           Deletes the temporary file and uncaches the filename.  Generally
565           there no need to call this, PerMsgStatus destructor cleans up all
566           tmpfiles.
567
568       all_from_addrs_domains
569           This function returns all the various from addresses in a message
570           using all_from_addrs() and then returns only the domain names.
571

NAME

SYNOPSIS

DESCRIPTION

METHODS

SEE ALSO