1Mail::SpamAssassin::PerUMssegrStCaotnutsr(i3b)uted PerlMDaoiclu:m:eSnptaamtAisosnassin::PerMsgStatus(3)
2
3
4
6 Mail::SpamAssassin::PerMsgStatus - per-message status (spam or
7 not-spam)
8
10 my $spamtest = Mail::SpamAssassin->new({
11 'rules_filename' => '/etc/spamassassin.rules',
12 'userprefs_filename' => $ENV{HOME}.'/.spamassassin/user_prefs'
13 });
14 my $mail = $spamtest->parse();
15
16 my $status = $spamtest->check ($mail);
17
18 my $rewritten_mail;
19 if ($status->is_spam()) {
20 $rewritten_mail = $status->rewrite_mail ();
21 }
22 ...
23
25 The Mail::SpamAssassin check() method returns an object of this class.
26 This object encapsulates all the per-message state.
27
29 $status->check ()
30 Runs the SpamAssassin rules against the message pointed to by the
31 object.
32
33 $status->learn()
34 After a mail message has been checked, this method can be called.
35 If the score is outside a certain range around the threshold, ie.
36 if the message is judged more-or-less definitely spam or definitely
37 non-spam, it will be fed into SpamAssassin's learning systems
38 (currently the naive Bayesian classifier), so that future similar
39 mails will be caught.
40
41 $score = $status->get_autolearn_points()
42 Return the message's score as computed for auto-learning. Certain
43 tests are ignored:
44
45 - rules with tflags set to 'learn' (the Bayesian rules)
46
47 - rules with tflags set to 'userconf' (user welcome/block-listing rules, etc)
48
49 - rules with tflags set to 'noautolearn'
50
51 Also note that auto-learning occurs using scores from either
52 scoreset 0 or 1, depending on what scoreset is used during message
53 check. It is likely that the message check and auto-learn scores
54 will be different.
55
56 $score = $status->get_head_only_points()
57 Return the message's score as computed for auto-learning, ignoring
58 all rules except for header-based ones.
59
60 $score = $status->get_learned_points()
61 Return the message's score as computed for auto-learning, ignoring
62 all rules except for learning-based ones.
63
64 $score = $status->get_body_only_points()
65 Return the message's score as computed for auto-learning, ignoring
66 all rules except for body-based ones.
67
68 $score = $status->get_autolearn_force_status()
69 Return whether a message's score included any rules that are
70 flagged as autolearn_force.
71
72 $rule_names = $status->get_autolearn_force_names()
73 Return a list of comma separated list of rule names if a message's
74 score included any rules that are flagged as autolearn_force.
75
76 $isspam = $status->is_spam ()
77 After a mail message has been checked, this method can be called.
78 It will return 1 for mail determined likely to be spam, 0 if it
79 does not seem spam-like.
80
81 $list = $status->get_names_of_tests_hit ()
82 After a mail message has been checked, this method can be called.
83 It will return a comma-separated string, listing all the symbolic
84 test names of the tests which were triggered by the mail.
85
86 $list = $status->get_names_of_tests_hit_with_scores_hash ()
87 After a mail message has been checked, this method can be called.
88 It will return a pointer to a hash for rule & score pairs for all
89 the symbolic test names and individual scores of the tests which
90 were triggered by the mail.
91
92 $list = $status->get_names_of_tests_hit_with_scores ()
93 After a mail message has been checked, this method can be called.
94 It will return a comma-separated string of rule=score pairs for all
95 the symbolic test names and individual scores of the tests which
96 were triggered by the mail.
97
98 $list = $status->get_names_of_subtests_hit ()
99 After a mail message has been checked, this method can be called.
100 It will return a comma-separated string, listing all the symbolic
101 test names of the meta-rule sub-tests which were triggered by the
102 mail. Sub-tests are the normally-hidden rules, which score 0 and
103 have names beginning with two underscores, used in meta rules.
104
105 If a parameter of collapsed or dbg is passed, the output will be a
106 condensed array of sub-tests with multiple hits reduced to one
107 entry.
108
109 If the parameter of dbg is passed, the output will be a condensed
110 string of sub-tests with multiple hits reduced to one entry with
111 the number of hits in parentheses. Some information is also added
112 at the end regarding the multiple hits.
113
114 $num = $status->get_score ()
115 After a mail message has been checked, this method can be called.
116 It will return the message's score.
117
118 $num = $status->get_required_score ()
119 After a mail message has been checked, this method can be called.
120 It will return the score required for a mail to be considered spam.
121
122 $num = $status->get_autolearn_status ()
123 After a mail message has been checked, this method can be called.
124 It will return one of the following strings depending on whether
125 the mail was auto-learned or not: "ham", "no", "spam", "disabled",
126 "failed", "unavailable".
127
128 It also returns is flagged with auto_learn_force, it will also
129 include the status and the rules hit. For example:
130 "autolearn_force=yes (AUTOLEARNTEST_BODY)"
131
132 $report = $status->get_report ()
133 Deliver a "spam report" on the checked mail message. This contains
134 details of how many spam detection rules it triggered.
135
136 The report is returned as a multi-line string, with the lines
137 separated by "\n" characters.
138
139 $preview = $status->get_content_preview ()
140 Give a "preview" of the content.
141
142 This is returned as a multi-line string, with the lines separated
143 by "\n" characters, containing a fully-decoded, safe, plain-text
144 sample of the first few lines of the message body.
145
146 $msg = $status->get_message()
147 Return the object representing the message being scanned.
148
149 $status->rewrite_mail ()
150 Rewrite the mail message. This will at minimum add headers, and at
151 maximum MIME-encapsulate the message text, to reflect its spam or
152 not-spam status. The function will return a scalar of the
153 rewritten message.
154
155 The actual modifications depend on the configuration (see
156 "Mail::SpamAssassin::Conf" for more information).
157
158 The possible modifications are as follows:
159
160 To:, From: and Subject: modification on spam mails
161 Depending on the configuration, the To: and From: lines can
162 have a user-defined RFC 2822 comment appended for spam mail.
163 The subject line may have a user-defined string prepended to it
164 for spam mail.
165
166 X-Spam-* headers for all mails
167 Depending on the configuration, zero or more headers with names
168 beginning with "X-Spam-" will be added to mail depending on
169 whether it is spam or ham.
170
171 spam message with report_safe
172 If report_safe is set to true (1), then spam messages are
173 encapsulated into their own message/rfc822 MIME attachment
174 without any modifications being made.
175
176 If report_safe is set to false (0), then the message will only
177 have the above headers added/modified.
178
179 $status->action_depends_on_tags($tags, $code, @args)
180 Enqueue the supplied subroutine reference $code, to become runnable
181 when all the specified tags become available. The $tags may be a
182 simple scalar - a tag name, or a listref of tag names. The
183 subroutine &$code when called will be passed a "permessagestatus"
184 object as its first argument, followed by the supplied (optional)
185 list @args .
186
187 $status->set_tag($tagname, $value)
188 Set a template tag, as used in "add_header", report templates, etc.
189 This API is intended for use by plugins. Tag names will be
190 converted to an all-uppercase representation internally. Tag names
191 must consist only of [A-Z0-9_] characters and must not contain
192 consecutive underscores. Also the name must not start or end in an
193 underscore, as that is the template tagging format.
194
195 $value can be a simple scalar (string or number), or a reference to
196 an array, in which case the public method get_tag will join array
197 elements using a space as a separator, returning a single string
198 for backward compatibility.
199
200 $value can also be a subroutine reference, which will be evaluated
201 each time the template is expanded. The first argument passed by
202 get_tag to a called subroutine will be a PerMsgStatus object (this
203 module's object), followed by optional arguments provided by a
204 caller to get_tag.
205
206 Note that perl supports closures, which means that variables set in
207 the caller's scope can be accessed inside this "sub". For example:
208
209 my $text = "hello world!";
210 $status->set_tag("FOO", sub {
211 my $pms = shift;
212 return $text;
213 });
214
215 See "Mail::SpamAssassin::Conf"'s "TEMPLATE TAGS" and "CAPTURING
216 TAGS USING REGEX NAMED CAPTURE GROUPS" sections for more details on
217 how template tags are used.
218
219 $string = $status->get_tag($tagname)
220 Get the current value of a template tag, as used in "add_header",
221 report templates, etc. This API is intended for use by plugins.
222 Tag names will be converted to an all-uppercase representation
223 internally.
224
225 See "Mail::SpamAssassin::Conf"'s "TEMPLATE TAGS" and "CAPTURING
226 TAGS USING REGEX NAMED CAPTURE GROUPS" sections for more details on
227 how template tags are used.
228
229 "undef" will be returned if a tag by that name has not been
230 defined.
231
232 $string = $status->get_tag_raw($tagname, @args)
233 Similar to "get_tag", but keeps a tag name unchanged (does not
234 uppercase it), and does not convert arrayref tag values into a
235 single string.
236
237 $status->set_spamd_result_item($subref)
238 Set an entry for the spamd result log line. $subref should be a
239 code reference for a subroutine which will return a string in
240 'name=VALUE' format, similar to the other entries in the spamd
241 result line:
242
243 Jul 17 14:10:47 radish spamd[16670]: spamd: result: Y 22 - ALL_NATURAL,
244 DATE_IN_FUTURE_03_06,DIET_1,DRUGS_ERECTILE,DRUGS_PAIN,
245 TEST_FORGED_YAHOO_RCVD,TEST_INVALID_DATE,TEST_NOREALNAME,
246 TEST_NORMAL_HTTP_TO_IP,UNDISC_RECIPS scantime=0.4,size=3138,user=jm,
247 uid=1000,required_score=5.0,rhost=localhost,raddr=127.0.0.1,
248 rport=33153,mid=<9PS291LhupY>,autolearn=spam
249
250 "name" and "VALUE" must not contain "=" or "," characters, as it is
251 important that these log lines are easy to parse.
252
253 The code reference will be called by spamd after the message has
254 been scanned, and the PerMsgStatus::check() method has returned.
255
256 $status->finish ()
257 Indicate that this $status object is finished with, and can be
258 destroyed.
259
260 If you are using SpamAssassin in a persistent environment, or
261 checking many mail messages from one "Mail::SpamAssassin" factory,
262 this method should be called to ensure Perl's garbage collection
263 will clean up old status objects.
264
265 $name = $status->get_current_eval_rule_name()
266 Return the name of the currently-running eval rule. "undef" is
267 returned if no eval rule is currently being run. Useful for
268 plugins to determine the current rule name while inside an eval
269 test function call.
270
271 $status->get_decoded_body_text_array ()
272 Returns the message body, with base64 or quoted-printable encodings
273 decoded, and non-text parts or non-inline attachments stripped.
274
275 This is the same result text as used in 'rawbody' rules.
276
277 It is returned as an array of strings, with each string being a
278 2-4kB chunk of the body, split from boundaries if possible.
279
280 $status->get_decoded_stripped_body_text_array ()
281 Returns the message body, decoded (as described in
282 get_decoded_body_text_array()), with HTML rendered, and with
283 whitespace normalized.
284
285 This is the same result text as used in 'body' rules.
286
287 It will always render text/html.
288
289 It is returned as an array of strings, with each string
290 representing one 'paragraph'. Paragraphs, in plain-text mails, are
291 double-newline-separated blocks of multi-line text.
292
293 $status->get (header_name [, default_value])
294 Returns a message header, pseudo-header or a real name, email-
295 address or some other parsed value set by modifiers. "header_name"
296 is the name of a mail header, such as 'Subject', 'To', etc.
297
298 Should be called in list context since 4.0. Will return list of
299 headers content, or other values when modifiers used.
300
301 If "default_value" is given, it will be used if the requested
302 "header_name" does not exist. This is mainly useful when called in
303 scalar context to set 'undef' instead of legacy '' return value
304 when header does not exist.
305
306 Appending ":raw" modifier to the header name will inhibit decoding
307 of quoted-printable or base-64 encoded strings.
308
309 Appending ":addr" modifier to the header name will return all
310 email-addresses found in the header. It is mainly applicable to
311 header fields 'From', 'Sender', 'To', 'Cc' along with their
312 'Resent-*' counterparts, and the 'Return-Path'. For example, all
313 of the following will result in "example@foo" (and "example@bar"):
314
315 example@foo
316 example@foo (Foo Blah), <example@bar>
317 example@foo, example@bar
318 display: example@foo (Foo Blah), example@bar ;
319 Foo Blah <example@foo>
320 "Foo Blah" <example@foo>
321 "'Foo Blah'" <example@foo>
322
323 Appending ":name" modifier to the header name will return all
324 "display names" from the header field. As with ":addr", it is
325 mainly applicable to header fields 'From', 'Sender', 'To', 'Cc'
326 along with their 'Resent-*' counterparts, and the 'Return-Path'.
327 For example, all of the following will result in "Foo Blah" (and
328 "Bar Baz"). One level of single quotes is stripped too, as it is
329 often seen.
330
331 example@foo (Foo Blah)
332 example@foo (Foo Blah), "Bar Baz" <example@bar>
333 display: example@foo (Foo Blah), example@bar ;
334 Foo Blah <example@foo>
335 "Foo Blah" <example@foo>
336 "'Foo Blah'" <example@foo>
337
338 Appending ":host" to the header name will return the first
339 hostname-looking string that ends with a valid TLD. First it tries
340 to find a match after @ character (possible email), then from any
341 part of the header. Normal use of this would be for example
342 'From:addr:host' to return the hostname portion of a From-address.
343
344 Appending ":domain" to the header name implies ":host", but will
345 return only domain part of the hostname, as returned by
346 RegistryBoundaries::trim_domain().
347
348 Appending ":ip" to the header name, will return the first IPv4 or
349 IPv6 address string found. Could be used for example as
350 'X-Originating-IP:ip'.
351
352 Appending ":revip" to the header name implies ":ip", but will
353 return the found IP in reverse (usually for DNSBL usage).
354
355 Appending ":first" modifier to the header name will return only the
356 first (topmost) header, in case there are multiple ones. Similarly
357 ":last" will select the last one. These affect only the physical
358 header line selection. If selected header is parsed further with
359 ":addr" or similar, it may return multiple results, if the selected
360 header contains multiple addresses.
361
362 There are several special pseudo-headers that can be specified:
363
364 "ALL" can be used to mean the text of all the message's headers.
365 Each header is decoded and unfolded to single line, unless called
366 with :raw.
367 "ALL-TRUSTED" can be used to mean the text of all the message's
368 headers that could only have been added by trusted relays.
369 "ALL-INTERNAL" can be used to mean the text of all the message's
370 headers that could only have been added by internal relays.
371 "ALL-UNTRUSTED" can be used to mean the text of all the message's
372 headers that may have been added by untrusted relays. To make this
373 pseudo-header more useful for header rules the 'Received' header
374 that was added by the last trusted relay is included, even though
375 it can be trusted.
376 "ALL-EXTERNAL" can be used to mean the text of all the message's
377 headers that may have been added by external relays. Like
378 "ALL-UNTRUSTED" the 'Received' header added by the last internal
379 relay is included.
380 "ToCc" can be used to mean the contents of both the 'To' and 'Cc'
381 headers.
382 "EnvelopeFrom" is the address used in the 'MAIL FROM:' phase of the
383 SMTP transaction that delivered this message, if this data has been
384 made available by the SMTP server.
385 "MESSAGEID" is a symbol meaning all Message-Id's found in the
386 message; some mailing list software moves the real 'Message-Id' to
387 'Resent-Message-Id' or 'X-Message-Id', then uses its own one in the
388 'Message-Id' header. The value returned for this symbol is the
389 text from all 3 headers, separated by newlines.
390 "X-Spam-Relays-Untrusted" is the generated metadata of untrusted
391 relays the message has passed through
392 "X-Spam-Relays-Trusted" is the generated metadata of trusted relays
393 the message has passed through
394 "X-Spam-Relays-External" is the generated metadata of external
395 relays the message has passed through
396 "X-Spam-Relays-Internal" is the generated metadata of internal
397 relays the message has passed through
398 $status->get_uri_list ()
399 Returns an array of all unique URIs found in the message. It takes
400 a combination of the URIs found in the rendered (decoded and HTML
401 stripped) body and the URIs found when parsing the HTML in the
402 message. Will also set $status->{uri_list} (the array as returned
403 by this function).
404
405 The returned array will include the "raw" URI as well as "slightly
406 cooked" versions. For example, the single URI
407 'http://%77w%77.example.com/' will get turned into: (
408 'http://%77w%77.example.com/', 'http://www.example.com/' )
409
410 $status->get_uri_detail_list ()
411 Returns a hash reference of all unique URIs found in the message
412 and various data about where the URIs were found in the message.
413 It takes a combination of the URIs found in the rendered (decoded
414 and HTML stripped) body and the URIs found when parsing the HTML in
415 the message. Will also set $status->{uri_detail_list} (the hash
416 reference as returned by this function).
417
418 The hash format looks something like this:
419
420 raw_uri => {
421 types => { a => 1, img => 1, parsed => 1, domainkeys => 1,
422 unlinked => 1, schemeless => 1 },
423 cleaned => [ canonicalized_uri ],
424 anchor_text => [ "click here", "no click here" ],
425 domains => { domain1 => 1, domain2 => 1 },
426 hosts => { host1 => domain1, host2 => domain2 },
427 }
428
429 "raw_uri" is whatever the URI was in the message itself
430 (http://spamassassin.apache%2Eorg/). Uris parsed from text will be
431 prefixed with scheme if missing (http://, mailto: etc). HTML uris
432 are as found.
433
434 "types" is a hash of the HTML tags (lowercase) which referenced the
435 raw_uri. parsed is a faked type which specifies that the raw_uri
436 was seen in the rendered text. domainkeys is defined when raw_uri
437 was found from DK/DKIM d= field. unlinked is defined when it's
438 assumed that MUA will not linkify uri (found in body without scheme
439 or www. prefix). schemeless is always added for uris without
440 scheme, regardless of linkifying (i.e. email address found in body
441 without mailto:).
442
443 "cleaned" is an array of the raw and canonicalized version of the
444 raw_uri (http://spamassassin.apache%2Eorg/,
445 https://spamassassin.apache.org/).
446
447 "anchor_text" is an array of the anchor text (text between <a> and
448 </a>), if any, which linked to the URI.
449
450 "domains" is a hash of the domains found in the canonicalized URIs.
451
452 "hosts" is a hash of unstripped hostnames found in the
453 canonicalized URIs as hash keys, with their domain part stored as a
454 value of each hash entry.
455
456 $status->add_uri_detail_list ($raw_uri, $types, $source, $valid_domain)
457 Adds values to internal uri_detail_list. When used from Plugins,
458 recommended to call from parsed_metadata (along with
459 register_method_priority, -10) so other Plugins calling
460 get_uri_detail_list() will see it.
461
462 "raw_uri" is the URI to be added. The only required parameter.
463
464 "types" is an optional hash reference, contents are added to
465 uri_detail_list->{types} (see get_uri_detail_list for known keys).
466 parsed is default is no hash given. nocanon does not run
467 uri_list_canonicalize (no redirector, uri fixing). noclean skips
468 adding uri_detail_list->{cleaned}, so it would not be used in "uri"
469 rule checks, but domain/hosts would still be used for URIBL/RBL
470 purposes.
471
472 "source" is an optional simple string, only used for debug logging
473 purposes to identify where uri originates from (default: "parsed").
474
475 "valid_domain" is an optional boolean (0/1). If true, uri will not
476 be added unless hostname/domain is in valid format and contains a
477 valid TLD. (default: 0)
478
479 $status->clear_test_state()
480 DEPRECATED, UNNEEDED SINCE 4.0
481
482 $status->got_hit ($rulename, $desc_prepend [, name => value, ...])
483 Register a hit against a rule in the ruleset.
484
485 There are two mandatory arguments. These are $rulename, the name of
486 the rule that fired, and $desc_prepend, which is a short string
487 that will be prepended to the rules "describe" string in output
488 reports.
489
490 In addition, callers can supplement that with the following
491 optional data:
492
493 score => $num
494 Optional: the score to use for the rule hit. If unspecified,
495 the value from the "Mail::SpamAssassin::Conf" object's
496 "{scores}" hash will be used (a configured score), and in its
497 absence the "defscore" option value.
498
499 defscore => $num
500 Optional: the score to use for the rule hit if neither the
501 option "score" is provided, nor a configured score value is
502 provided.
503
504 value => $num
505 Optional: the value to assign to the rule; the default value is
506 1. tflags multiple rules use values of greater than 1 to
507 indicate multiple hits. This value is accessible to meta
508 rules.
509
510 ruletype => $type
511 Optional, but recommended: the rule type string. This is used
512 in the "hit_rule" plugin call, called by this method. If
513 unset, 'unknown' is used.
514
515 tflags => $string
516 Optional: a string, i.e. a space-separated list of additional
517 tflags to be appended to an existing list of flags in
518 $self->{conf}->{tflags}, such as: "nice noautolearn multiple".
519 No syntax checks are performed.
520
521 description => $string
522 Optional: a custom rule description string. This is used in
523 the "hit_rule" plugin call, called by this method. If unset,
524 the static description is used.
525
526 Backward compatibility: the two mandatory arguments have been part
527 of this API since SpamAssassin 2.x. The optional name=<gtvalue>
528 pairs, however, are a new addition in SpamAssassin 3.2.0.
529
530 $status->rule_ready ($rulename [, $no_async])
531 Mark an asynchronous rule ready, so it can be considered for meta
532 rule evaluation. Asynchronous rule is a rule whose eval-function
533 returns undef, marking that it's not ready yet, expecting results
534 later. $status->rule_ready() must be called later to mark it
535 ready, alternatively $status->got_hit() also does this. If neither
536 is called, then any meta rule that depends on this rule might not
537 evaluate.
538
539 Optional boolean $no_async skips checking if there are pending
540 async DNS lookups for the rule.
541
542 $status->test_log ($text [, $rulename])
543 Add $text log entry for a hit rule in final message REPORT/SUMMARY.
544
545 Usually called just before got_hit(), to describe for example what
546 URI the rule matched on. Optional <$rulename> argument is
547 recommended to make sure log is written to correct rule. If
548 rulename is not provided, get_current_eval_rule_name() is used as
549 fallback.
550
551 Can be called multiple times per rule for additional entries.
552
553 $status->create_fulltext_tmpfile (fulltext_ref)
554 This function creates a temporary file containing the passed scalar
555 reference data. If no scalar is passed, full/pristine message text
556 is assumed. This is typically used by external programs like pyzor
557 and dccproc, to avoid hangs due to buffering issues.
558
559 All tempfiles are automatically cleaned up by PerMsgStatus
560 destructor.
561
562 $status->delete_fulltext_tmpfile (tmpfile)
563 Will cleanup after a $status->create_fulltext_tmpfile() call.
564 Deletes the temporary file and uncaches the filename. Generally
565 there no need to call this, PerMsgStatus destructor cleans up all
566 tmpfiles.
567
568 all_from_addrs_domains
569 This function returns all the various from addresses in a message
570 using all_from_addrs() and then returns only the domain names.
571
573 Mail::SpamAssassin(3) spamassassin(1)
574
575
576
577perl v5.36.0 2023-01-21Mail::SpamAssassin::PerMsgStatus(3)