1Mail::SpamAssassin::PluUgsienr::CToxnRterpi(b3u)ted PerlMaDiolc:u:mSepnatmaAtsisoanssin::Plugin::TxRep(3)
2
3
4
6 Mail::SpamAssassin::Plugin::TxRep - Normalize scores with sender
7 reputation records
8
10 The TxRep (Reputation) plugin is designed as an improved replacement of
11 the AWL (Auto-Whitelist) plugin. It adjusts the final message spam
12 score by looking up and taking in consideration the reputation of the
13 sender.
14
15 To try TxRep out, you have to first disable the AWL plugin (if
16 enabled), and back up its database. AWL is loaded in v310.pre and can
17 be disabled by commenting out the loadplugin line:
18
19 # loadplugin Mail::SpamAssassin::Plugin::AWL
20
21 When AWL is not disabled, TxRep will refuse to run.
22
23 TxRep should be enabled by uncommenting the following line in v341.pre:
24
25 loadplugin Mail::SpamAssassin::Plugin::TxRep
26
27 Use the supplied 60_txreputation.cf file or add these lines to a .cf
28 file:
29
30 header TXREP eval:check_senders_reputation()
31 describe TXREP Score normalizing based on sender's reputation
32 tflags TXREP userconf noautolearn
33 priority TXREP 1000
34
36 This plugin is intended to replace the former AWL - AutoWhiteList.
37 Although the concept and the scope differ, the purpose remains the same
38 - the normalizing of spam score results based on previous sender's
39 history. The name was intentionally changed from "whitelist" to
40 "reputation" to avoid any confusion, since the result score can be
41 adjusted in both directions.
42
43 The TxRep plugin keeps track of the average SpamAssassin score for
44 senders. Senders are tracked using multiple identificators, or their
45 combinations: the From: email address, the originating IP and/or an
46 originating block of IPs, sender's domain name, the DKIM signature, and
47 the HELO name. TxRep then uses the average score to reduce the
48 variability in scoring from message to message, and modifies the final
49 score by pushing the result towards the historical average. This
50 improves the accuracy of filtering for most email.
51
52 In comparison with the original AWL plugin, several conceptual changes
53 were implemented in TxRep:
54
55 1. Scoring - at AWL, although it tracks the number of messages received
56 from each respective sender, when calculating the corrective score at a
57 new message, it does not take it in count in any way. So for example a
58 sender who previously sent a single ham message with the score of -5,
59 and then sends a second one with the score of +10, AWL will issue a
60 corrective score bringing the score towards the -5. With the default
61 "auto_whitelist_factor" of 0.5, the resulting score would be only 2.5.
62 And it would be exactly the same even if the sender previously sent
63 1,000 messages with the average of -5. TxRep tries to take the maximal
64 advantage of the collected data, and adjusts the final score not only
65 with the mean reputation score stored in the database, but also
66 respecting the number of messages already seen from the sender. You can
67 see the exact formula in the section ""txrep_factor"".
68
69 2. Learning - AWL ignores any spam/ham learning. In fact it acts
70 against it, which often leads to a frustrating situation, where a user
71 repeatedly tags all messages of a given sender as spam (resp. ham), but
72 at any new message from the sender, AWL will adjust the score of the
73 message back to the historical average which does not include the
74 learned scores. This is now changed at TxRep, and every spam/ham
75 learning will be recorded in the reputation database, and hence taken
76 in consideration at future email from the respective sender. See the
77 section "LEARNING SPAM / HAM" for more details.
78
79 3. Auto-Learning - in certain situations SpamAssassin may declare a
80 message an obvious spam resp. ham, and launch the auto-learning
81 process, so that the message can be re-evaluated. AWL, by design, did
82 not perform any auto-learning adjustments. This plugin will readjust
83 the stored reputation by the value defined by ""txrep_learn_penalty""
84 resp. ""txrep_learn_bonus"". Auto-learning score thresholds may be
85 tuned, or the auto-learning completely disabled, through the setting
86 ""txrep_autolearn"".
87
88 4. Relearning - messages that were wrongly learned or auto-learned, can
89 be relearned. Old reputations are removed from the database, and new
90 ones added instead of them. The relearning works better when message
91 tracking is enabled through the ""txrep_track_messages"" option.
92 Without it, the relearned score is simply added to the reputation,
93 without removing the old ones.
94
95 5. Aging - with AWL, any historical record of given sender has the same
96 weight. It means that changes in senders behavior, or modified SA rules
97 may take long time, or be virtually negated by the AWL normalization,
98 especially at senders with high count of past messages, and low recent
99 frequency. It also turns to be particularly counterproductive when the
100 administrator detects new patterns in certain messages, and applies new
101 rules to better tag such messages as spam or ham. AWL will practically
102 eliminate the effect of the new rules, by adjusting the score back
103 towards the (wrong) historical average. Only setting the
104 "auto_whitelist_factor" lower would help, but in the same time it would
105 also reduce the overall impact of AWL, and put doubts on its purpose.
106 TxRep, besides the ""txrep_factor"" (replacement of the
107 "auto_whitelist_factor"), introduces also the ""txrep_dilution_factor""
108 to help coping with this issue by progressively reducing the impact of
109 past records. More details can be found in the description of the
110 factor below.
111
112 6. Blacklisting and Whitelisting - when a whitelisting or blacklisting
113 was requested through SpamAssassin's API, AWL adjusts the historical
114 total score of the plain email address without IP (and deleted records
115 bound to an IP), but since during the reception new records with IP
116 will be added, the blacklisted entry would cease acting during
117 scanning. TxRep always uses the record of the plain email address
118 without IP together with the one bound to an IP address, DKIM
119 signature, or SPF pass (unless the weight factor for the EMAIL
120 reputation is set to zero). AWL uses the score of 100 (resp. -100) for
121 the blacklisting (resp. whitelisting) purposes. TxRep increases the
122 value proportionally to the weight factor of the EMAIL reputation. It
123 is explained in details in the section " WHITELISTING" in BLACKLISTING
124 . TxRep can blacklist or whitelist also IP addresses, domain names, and
125 dotless HELO names.
126
127 7. Sender Identification - AWL identifies a sender on the basis of the
128 email address used, and the originating IP address (better told its
129 part defined by the mask setting). The main purpose of this measure is
130 to avoid assigning false good scores to spammers who spoof known email
131 addresses. The disadvantage appears at senders who send from frequently
132 changing locations or even when connecting through dynamical IP
133 addresses that are not within the block defined by the mask setting.
134 Their score is difficult or sometimes impossible to track. Another
135 disadvantage is, for example, at a spammer persistently sending spam
136 from the same IP address, just under different email addresses. AWL
137 will not find his previous scores, unless he reuses the same email
138 address again. TxRep uses several identificators, and creates separate
139 database entries for each of them. It tracks not only the email/IP
140 address combination like AWL, but also the standalone email address
141 (regardless of the originating IP), the standalone IP (regardless of
142 email address used), the domain name of the email address, the DKIM
143 signature, and the HELO name of the connecting PC. The influence of
144 each individual identificator may be tuned up with the help of weight
145 factors described in the section "REPUTATION WEIGHTS".
146
147 8. Message Tracking - TxRep (optionally) keeps track of already scanned
148 and/or learned message ID's. This is useful for avoiding to strengthen
149 the reputation score by simply rescanning or relearning the same
150 message multiple times. In the same time it also allows the proper
151 relearning of once wrongly learned messages, or relearning them after
152 the learn penalty or bonus were changed. See the option
153 ""txrep_track_messages"".
154
155 9. User and Global Storages - usually it is recommended to use the per-
156 user setup of SpamAssassin, because each user may have quite different
157 requirements, and may receive quite different sort of email. Especially
158 when using the Bayesian and AWL plugins, the efficiency is much better
159 when SpamAssassin is learned spam and ham separately for each user.
160 However, the disadvantage is that senders and emails already learned
161 many times by different users, will need to be relearned without any
162 recognized history, anytime they arrive to another user. TxRep uses the
163 advantages of both systems. It can use dual storages: the global common
164 storage, where all email processed by SpamAssassin is recorded, and a
165 local storage separate for each user, with reputation data from his
166 email only. See more details at the setting
167 ""txrep_user2global_ratio"".
168
169 10. Outbound Whitelisting - when a local user sends messages to an
170 email address, we assume that he needs to see the eventual answer too,
171 hence the recipient's address should be whitelisted. When SpamAssassin
172 is used for scanning outgoing email too, when local users use the SMTP
173 server where SA is installed, for sending email, and when internal
174 networks are defined, TxREP will improve the reputation of all 'To:'
175 and 'CC' addresses from messages originating in the internal networks.
176 Details can be found at the setting ""txrep_whitelist_out"".
177
178 Both plugins (AWL and TxREP) cannot coexist. It is necessary to disable
179 the AWL to allow TxRep running. TxRep reuses the database handling of
180 the original AWL module, and some its parameters bound to the database
181 handler modules. By default, TxRep creates its own database, but the
182 original auto-whitelist can be reused as a starting point. The AWL
183 database can be renamed to the name defined in TxRep settings, and
184 TxRep will start using it. The original auto-whitelist database has to
185 be backed up, to allow switching back to the original state.
186
187 The spamassassin/Plugin/TxRep.pm file replaces both
188 spamassassin/Plugin/AWL.pm and spamassassin/AutoWhitelist.pm. Another
189 two AWL files, spamassassin/DBBasedAddrList.pm and
190 spamassassin/SQLBasedAddrList.pm are still needed.
191
193 This plugin module adds the following "tags" that can be used as
194 placeholders in certain options. See Mail::SpamAssassin::Conf for more
195 information on TEMPLATE TAGS.
196
197 _TXREPXXXY_ TXREP modifier
198 _TXREPXXXYMEAN_ Mean score on which TXREP modification is based
199 _TXREPXXXYCOUNT_ Number of messages on which TXREP modification is based
200 _TXREPXXXYPRESCORE_ Score before TXREP
201 _TXREPXXXYUNKNOWN_ New sender (not found in the TXREP list)
202
203 The XXX part of the tag takes the form of one of the following IDs,
204 depending on the reputation checked: EMAIL, EMAILIP, IP, DOMAIN, or
205 HELO. The Y appendix ID is used only in the case of dual storage, and
206 takes the form of either U (for user storage reputations), or G (for
207 global storage reputations).
208
210 The following options can be used in both site-wide ("local.cf") and
211 user-specific ("user_prefs") configuration files to customize how
212 SpamAssassin handles incoming email messages.
213
214 use_txrep
215 0 | 1 (default: 0)
216
217 Whether to use TxRep reputation system. TxRep tracks the long-term
218 average score for each sender and then shifts the score of new
219 messages toward that long-term average. This can increase or
220 decrease the score for messages, depending on the long-term
221 behavior of the particular correspondent.
222
223 Note that certain tests are ignored when determining the final
224 message score:
225
226 - rules with tflags set to 'noautolearn'
227
228 txrep_factor
229 range [0..1] (default: 0.5)
230
231 How much towards the long-term mean for the sender to regress a
232 message. Basically, the algorithm is to track the long-term total
233 score and the count of messages for the sender ("total" and
234 "count"), and then once we have otherwise fully calculated the
235 score for this message ("score"), we calculate the final score for
236 the message as:
237
238 finalscore = score + factor * (total + score)/(count + 1)
239
240 So if "factor" = 0.5, then we'll move to half way between the
241 calculated score and the new mean value. If "factor" = 0.3, then
242 we'll move about 1/3 of the way from the score toward the mean.
243 "factor" = 1 means use the long-term mean including also the new
244 unadjusted score; "factor" = 0 mean just use the calculated score,
245 disabling so the score averaging, though still recording the
246 reputation to the database.
247
248 txrep_dilution_factor
249 range [0.7..1.0] (default: 0.98)
250
251 At any new email from given sender, the historical reputation
252 records are "diluted", or "watered down" by certain fraction given
253 by this factor. It means that the influence of old records will
254 progressively diminish with every new message from given sender.
255 This is important to allow a more flexible handling of changes in
256 sender's behavior, or new improvements or changes of local SA
257 rules.
258
259 Without any dilution expiry (dilution factor set to 1), the new
260 message score is simply add to the total score of given sender in
261 the reputation database. When dilution is used (factor < 1), the
262 impact of the historical reputation average is reduced by the
263 factor before calculating the new average, which in turn is then
264 used to adjust the new total score to be stored in the database.
265
266 newtotal = (oldcount + 1) * (newscore + dilution * oldtotal) / (dilution * oldcount + 1)
267
268 In other words, it means that the older a message is, the less and
269 less impact on the new average its original spam score has. For
270 example if we set the factor to 0.9 (meaning dilution by 10%), the
271 score of the new message will be recorded to its 100%, the last
272 score of the same sender to 90%, the second last to 81% (0.9 * 0.9
273 = 0.81), and for example the 10th last message just to 35%.
274
275 At stable systems, we recommend keeping the factor close to 1 (but
276 still lower than 1). At systems where SA rules tuning and spam
277 learning is still in progress, lower factors will help the
278 reputation to quicker adapt any modifications. In the same time, it
279 will also reduce the impact of the historical reputation though.
280
281 txrep_learn_penalty
282 range [0..200] (default: 20)
283
284 When SpamAssassin is trained a SPAM message, the given penalty
285 score will be added to the total reputation score of the sender,
286 regardless of the real spam score. The impact of the penalty will
287 be the smaller the higher is the number of messages that the sender
288 already has in the TxRep database.
289
290 txrep_learn_bonus
291 range [0..200] (default: 20)
292
293 When SpamAssassin is trained a HAM message, the given penalty score
294 will be deduced from the total reputation score of the sender,
295 regardless of the real spam score. The impact of the penalty will
296 be the smaller the higher is the number of messages that the sender
297 already has in the TxRep database.
298
299 txrep_autolearn
300 range [0..5] (default: 0)
301
302 When SpamAssassin declares a message a clear spam resp. ham during
303 the message scan, and launches the auto-learn process, sender
304 reputation scores of given message will be adjusted by the value of
305 the option ""txrep_learn_penalty"", resp. the ""txrep_learn_bonus""
306 in the same way as during the manual learning. Value 0 at this
307 option disables the auto-learn reputation adjustment - only the
308 score calculated before the auto-learn will be stored to the
309 reputation database.
310
311 txrep_track_messages
312 0 | 1 (default: 1)
313
314 Whether TxRep should keep track of already scanned and/or learned
315 messages. When enabled, an additional record in the reputation
316 database will be created to avoid false score adjustments due to
317 repeated scanning of the same message, and to allow proper
318 relearning of messages that were either previously wrongly learned,
319 or need to be relearned after modifying the learn penalty or bonus.
320
321 txrep_whitelist_out
322 range [0..200] (default: 10)
323
324 When the value of this setting is greater than zero, recipients of
325 messages sent from within the internal networks will be whitelisted
326 through improving their total reputation score with the number of
327 points defined by this setting. Since the IP address and other
328 sender identificators are not known when sending the email, only
329 the reputation of the standalone email is being whitelisted. The
330 domain name is intentionally also left unaffected. The outbound
331 whitelisting can only work when SpamAssassin is set up to scan also
332 outgoing email, when local users use the SMTP server for sending
333 email, and when "internal_networks" are defined in SpamAssassin
334 configuration. The improving of the reputation happens at every
335 message sent from internal networks, so the more messages is being
336 sent to the recipient, the better reputation his email address will
337 have.
338
339 txrep_ipv4_mask_len
340 range [0..32] (default: 16)
341
342 The AWL database keeps only the specified number of most-
343 significant bits of an IPv4 address in its fields, so that
344 different individual IP addresses within a subnet belonging to the
345 same owner are managed under a single database record. As we have
346 no information available on the allocated address ranges of
347 senders, this CIDR mask length is only an approximation. The
348 default is 16 bits, corresponding to a former class B. Increase the
349 number if a finer granularity is desired, e.g. to 24 (class C) or
350 32. A value 0 is allowed but is not particularly useful, as it
351 would treat the whole internet as a single organization. The number
352 need not be a multiple of 8, any split is allowed.
353
354 txrep_ipv6_mask_len
355 range [0..128] (default: 48)
356
357 The AWL database keeps only the specified number of most-
358 significant bits of an IPv6 address in its fields, so that
359 different individual IP addresses within a subnet belonging to the
360 same owner are managed under a single database record. As we have
361 no information available on the allocated address ranges of
362 senders, this CIDR mask length is only an approximation. The
363 default is 48 bits, corresponding to an address range commonly
364 allocated to individual (smaller) organizations. Increase the
365 number for a finer granularity, e.g. to 64 or 96 or 128, or
366 decrease for wider ranges, e.g. 32. A value 0 is allowed but is
367 not particularly useful, as it would treat the whole internet as a
368 single organization. The number need not be a multiple of 4, any
369 split is allowed.
370
371 user_awl_sql_override_username
372 string (default: undefined)
373
374 Used by the SQLBasedAddrList storage implementation.
375
376 If this option is set the SQLBasedAddrList module will override the
377 set username with the value given. This can be useful for
378 implementing global or group based TxRep databases.
379
380 txrep_user2global_ratio
381 range [0..10] (default: 0)
382
383 When the option txrep_user2global_ratio is set to a value greater
384 than zero, and if the server configuration allows it, two data
385 storages will be used - user and global (server-wide) storages.
386
387 User storage keeps only senders who send messages to the respective
388 recipient, and will reflect also the corrected/learned scores, when
389 some messages are marked by the user as spam or ham, or when the
390 sender is whitelisted or blacklisted through the API of
391 SpamAssassin.
392
393 Global storage keeps the reputation data of all messages processed
394 by SpamAssassin with their spam scores and spam/ham learning data
395 from all users on the server. Hence, the module will return a
396 reputation value even at senders not known to the current
397 recipient, as long as he already sent email to anyone else on the
398 server.
399
400 The value of the txrep_user2global_ratio parameter controls the
401 impact of each of the two reputations. When equal to 1, both the
402 global and the user score will have the same impact on the result.
403 When set to 2, the reputation taken from the user storage will have
404 twice the impact of the global value. The final value of the TXREP
405 tag will be calculated as follows:
406
407 total = ( ratio * user + global ) / ( ratio + 1 )
408
409 When no reputation is found in the user storage, and a global
410 reputation is available, the global storage is used fully, without
411 applying the ratio.
412
413 When the ratio is set to zero, only the default storage will be
414 used. And it then depends whether you use the global, or the local
415 user storage by default, which in turn is controlled either by the
416 parameter user_awl_sql_override_username (in case of SQL storage),
417 or the "/auto_whitelist_path" parameter (in case of Berkeley
418 database).
419
420 When this dual storage is enabled, and no global storage is defined
421 by the above mentioned parameters for the Berkeley or SQL
422 databases, TxRep will attempt to use a generic storage - user
423 'GLOBAL' in case of SQL, and in the case of Berkeley database it
424 uses the path defined by '__local_state_dir__/tx-reputation', which
425 typically renders into /var/db/spamassassin/tx-reputation. When the
426 default storages are not available, or are not writable, you would
427 have to set the global storage with the help of the
428 "user_awl_sql_override_username" resp. "auto_whitelist_path
429 settings".
430
431 Please note that some SpamAssassin installations run always under
432 the same user ID. In such case it is pointless enabling the dual
433 storage, because it would maximally lead to two identical global
434 storages in different locations.
435
436 This feature is disabled by default.
437
438 auto_whitelist_distinguish_signed
439 (default: 1 - enabled)
440
441 Used by the SQLBasedAddrList storage implementation.
442
443 If this option is set the SQLBasedAddrList module will keep
444 separate database entries for DKIM-validated e-mail addresses and
445 for non-validated ones. Without this option, or for domains that do
446 not use a DKIM signature, the reputation of legitimate email can
447 get mixed with the reputation of forgeries. A pre-requisite when
448 setting this option is that a field txrep.signedby exists in a SQL
449 table, otherwise SQL operations will fail. A DKIM plugin must also
450 be enabled in order for this option to take effect. This option is
451 highly recommended. Unless you are using a pre-3.3.0 database
452 schema and cannot upgrade, there is no reason to disable this
453 option. If you are upgrading from AWL and using a pre-3.3.0 schema,
454 the txrep.signedby column will not exist. It is recommended that
455 you add this column, but if that is not possible you must set this
456 option to 0 to avoid SQL errors.
457
458 txrep_spf
459 0 | 1 (default: 1)
460
461 When enabled, TxRep will treat any IP address using a given email
462 address as the same authorized identity, and will not associate any
463 IP address with it. (The same happens with valid DKIM signatures.
464 No option available for DKIM).
465
466 Note: at domains that define the useless SPF +all (pass all), no IP
467 would be ever associated with the email address, and all addresses
468 (incl. the froged ones) would be treated as coming from the
469 authorized source. However, such domains are hopefully rare, and
470 ask for this kind of treatment anyway.
471
472 REPUTATION WEIGHTS
473 The overall reputation of the sender comprises several elements:
474
475 1) The reputation of the 'From' email address bound to the originating
476 IP address fraction (see the mask parameters for details)
477 2) The reputation of the 'From' email address alone (regardless the IP
478 address being currently used)
479 3) The reputation of the domain name of the 'From' email address
480 4) The reputation of the originating IP address, regardless of sender's
481 email address
482 5) The reputation of the HELO name of the originating computer (if
483 available)
484
485 Each of these partial reputations is weighted with the help of these
486 parameters, and the overall reputation is calculation as the sum of the
487 individual reputations divided by the sum of all their weights:
488
489 sender_reputation = weight_email * rep_email +
490 weight_email_ip * rep_email_ip +
491 weight_domain * rep_domain +
492 weight_ip * rep_ip +
493 weight_helo * rep_helo
494
495 You can disable the individual partial reputations by setting their
496 respective weight to zero. This will also reduce the size of the
497 database, since each partial reputation requires a separate entry in
498 the database table. Disabling some of the partial reputations in this
499 way may also help with the performance on busy servers, because the
500 respective database lookups and processing will be skipped too.
501
502 txrep_weight_email
503 range [0..10] (default: 3)
504
505 This weight factor controls the influence of the reputation of the
506 standalone email address, regardless of the originating IP address.
507 When adjusting the weight, you need to keep on mind that an email
508 address can be easily spoofed, and hence spammers can use 'from'
509 email addresses belonging to senders with good reputation. From
510 this point of view, the email address bound to the originating IP
511 address is a more reliable indicator for the overall reputation.
512
513 On the other hand, some reputable senders may be sending from a
514 bigger number of IP addresses, so looking for the reputation of the
515 standalone email address without regarding the originating IP has
516 some sense too.
517
518 We recommend using a relatively low value for this partial
519 reputation.
520
521 txrep_weight_email_ip
522 range [0..10] (default: 10)
523
524 This is the standard reputation used in the same way as it was by
525 the original AWL plugin. Each sender's email address is bound to
526 the originating IP, or its part as defined by the
527 txrep_ipv4_mask_len or txrep_ipv6_mask_len parameters.
528
529 At a user sending from multiple locations, diverse mail servers, or
530 from a dynamic IP range out of the masked block, his email address
531 will have a separate reputation value for each of the different
532 (partial) IP addresses.
533
534 When the option auto_whitelist_distinguish_signed is enabled, in
535 contrary to the original AWL module, TxRep does not record the IP
536 address when DKIM signature is detected. The email address is then
537 not bound to any IP address, but rather just to the DKIM signature,
538 since it is considered that it authenticates the sender more
539 reliably than the IP address (which can also vary).
540
541 This is by design the most relevant reputation, and its weight
542 should be kept high.
543
544 txrep_weight_domain
545 range [0..10] (default: 2)
546
547 Some spammers may use always their real domain name in the email
548 address, just with multiple or changing local parts. This
549 reputation will record the spam scores of all messages send from
550 the respective domain, regardless of the local part (user name)
551 used.
552
553 Similarly as with the email_ip reputation, the domain reputation is
554 also bound to the originating address (or a masked block, if mask
555 parameters used). It avoids giving false reputation based on
556 spoofed email addresses.
557
558 In case of a DKIM signature detected, the signature signer is used
559 instead of the domain name extracted from the email address. It is
560 considered that the signing authority is responsible for sending
561 email of any domain name, hence the same reputation applies here.
562
563 The domain reputation will give relevant picture about the owner of
564 the domain in case of small servers, or corporation with strict
565 policies, but will be less relevant for freemailers like Gmail,
566 Hotmail, and similar, because both ham and spam may be sent by
567 their users.
568
569 The default value is set relatively low. Higher weight values may
570 be useful, but we recommend caution and observing the scores before
571 increasing it.
572
573 txrep_weight_ip
574 range [0..10] (default: 4)
575
576 Spammers can send through the same relay (incl. compromised hosts)
577 under a multitude of email addresses. This is the exact case when
578 the IP reputation can help. This reputation is a kind of a local
579 RBL.
580
581 The weight is set by default lower than for the email_IP
582 reputation, because there may be cases when the same IP address
583 hosts both spammers and acceptable senders (for example the
584 marketing department of a company sends you spam, but you still
585 need to get messages from their billing address).
586
587 txrep_weight_helo
588 range [0..10] (default: 0.5)
589
590 Big number of spam messages come from compromised hosts, often
591 personal computers, or top-boxes. Their NetBIOS names are usually
592 used as the HELO name when connecting to your mail server. Some of
593 the names are pretty generic and hence may be shared by a big
594 number of hosts, but often the names are quite unique and may be a
595 good indicator for detecting a spammer, despite that he uses
596 different email and IP addresses (spam can come also from portable
597 devices).
598
599 No IP address is bound to the HELO name when stored to the
600 reputation database. This is intentional, and despite the
601 possibility that numerous devices may share some of the HELO names.
602
603 This option is still considered experimental, hence the low weight
604 value, but after some testing it could be likely at least slightly
605 increased.
606
608 These settings differ from the ones above, in that they are considered
609 'more privileged' -- even more than the ones in the PRIVILEGED SETTINGS
610 section. No matter what "allow_user_rules" is set to, these can never
611 be set from a user's "user_prefs" file.
612
613 txrep_factory module
614 (default: Mail::SpamAssassin::DBBasedAddrList)
615
616 Select alternative database factory module for the TxRep database.
617
618 auto_whitelist_path /path/filename
619 (default: ~/.spamassassin/tx-reputation)
620
621 This is the TxRep directory and filename. By default, each user
622 has their own reputation database in their "~/.spamassassin"
623 directory with mode 0700. For system-wide SpamAssassin use, you
624 may want to share this across all users.
625
626 auto_whitelist_db_modules Module ...
627 (default: see below)
628
629 What database modules should be used for the TxRep storage database
630 file. The first named module that can be loaded from the Perl
631 include path will be used. The format is:
632
633 PreferredModuleName SecondBest ThirdBest ...
634
635 ie. a space-separated list of Perl module names. The default is:
636
637 DB_File GDBM_File SDBM_File
638
639 NDBM_File is not supported (see SpamAssassin bug 4353).
640
641 auto_whitelist_file_mode
642 (default: 0700)
643
644 The file mode bits used for the TxRep directory or file.
645
646 Make sure you specify this using the 'x' mode bits set, as it may
647 also be used to create directories. However, if a file is created,
648 the resulting file will not have any execute bits set (the umask is
649 set to 0111).
650
651 user_awl_dsn DBI:databasetype:databasename:hostname:port
652 Used by the SQLBasedAddrList storage implementation.
653
654 This will set the DSN used to connect. Example:
655 "DBI:mysql:spamassassin:localhost"
656
657 user_awl_sql_username username
658 Used by the SQLBasedAddrList storage implementation.
659
660 The authorized username to connect to the above DSN.
661
662 user_awl_sql_password password
663 Used by the SQLBasedAddrList storage implementation.
664
665 The password for the database username, for the above DSN.
666
667 user_awl_sql_table tablename
668 (default: txrep)
669
670 Used by the SQLBasedAddrList storage implementation.
671
672 The table name where reputation is to be stored in, for the above
673 DSN.
674
676 When asked by SpamAssassin to blacklist or whitelist a user, the TxRep
677 plugin adds a score of 100 (for blacklisting) or -100 (for
678 whitelisting) to the given sender's email address. At a plain address
679 without any IP address, the value is multiplied by the ratio of total
680 reputation weight to the EMAIL reputation weight to account for the
681 reduced impact of the standalone EMAIL reputation when calculating the
682 overall reputation.
683
684 total_weight = weight_email + weight_email_ip + weight_domain + weight_ip + weight_helo
685 blacklisted_reputation = 100 * total_weight / weight_email
686
687 When a standalone email address is blacklisted/whitelisted, all records
688 of the email address bound to an IP address, DKIM signature, or a SPF
689 pass will be removed from the database, and only the standalone record
690 is kept.
691
692 Besides blacklisting/whitelisting of standalone email addresses, the
693 same method may be used also for blacklisting/whitelisting of IP
694 addresses, domain names, and HELO names (only dotless Netbios HELO
695 names can be used).
696
697 When whitelisting/blacklisting an email address or domain name, you can
698 bind them to a specified DKIM signature or SPF record by appending the
699 DKIM signing domain or the tag 'spf' after the ID in the following way:
700
701 spamassassin --add-addr-to-blacklist=spamming.biz,spf
702 spamassassin --add-addr-to-whitelist=friend@good.org,good.org
703
704 When a message contains both a DKIM signature and an SPF pass, the DKIM
705 signature takes the priority, so the record bound to the 'spf' tag
706 won't be checked. Only email addresses and domains can be bound to DKIM
707 or SPF. Records of IP addresses and HELO names are always without
708 DKIM/SPF.
709
710 In case of dual storage, the black/whitelisting is performed only in
711 the default storage.
712
714 1. The most significant sender identificator is equally as at AWL, the
715 combination of the email address and the originating IP address,
716 resp.
717 its part defined by the IPv4 resp. IPv6 mask setting.
718
719 2. No IP checking for standalone EMAIL address reputation
720
721 3. No signature checking for IP reputation, and for HELO name
722 reputation
723
724 4. The EMAIL_IP weight, and not the standalone EMAIL weight is used
725 when
726 no IP address is available (EMAIL_IP is the main indicator, and has
727 the highest weight)
728
729 5. No IP checking at signed emails (signature authenticates the email
730 instead of the IP address)
731
732 6. No IP checking at SPF pass (we assume the domain owner is
733 responsible
734 for all IP's he authorizes to send from, hence we use the same
735 identity
736 for all of them)
737
738 7. No signature used for standalone EMAIL reputation (would be
739 redundant,
740 since no IP is used at signed EMAIL_IP reputation, and we would
741 store
742 two identical hits)
743
744 8. When available, the DKIM signer is used instead of the domain name
745 for
746 the DOMAIN reputation
747
748 9. No IP and no signature used for HELO reputation (despite the
749 possibility
750 of the possible existence of multiple computers with the same HELO)
751
752 10. The full (unmasked IP) address is used (in the address field,
753 instead the
754 IP field) for the standalone IP reputation
755
757 When SpamAssassin is told to learn (or relearn) a given message as spam
758 or ham, all reputations relevant to the message (email, email_ip,
759 domain, ip, helo) in both global and user storages will be updated
760 using the "txrep_learn_penalty" respectively the "rxrep_learn_bonus"
761 values. The new reputation of given sender property (email, domain,...)
762 will be the respective result of one of the following formulas:
763
764 new_reputation = old_reputation + learn_penalty
765 new_reputation = old_reputation - learn_bonus
766
767 The TxRep plugin currently does track each message individually, hence
768 it does not detect when you learn the message repeatedly. It will
769 add/subtract the penalty/bonus score each time the message is fed to
770 the spam learner.
771
773 TxRep can be optimized for speed and simplicity, or for the precision
774 in assigning the reputation scores.
775
776 First of all TxRep can be quickly disabled and re-enabled through the
777 option ""use_txrep"". It can be done globally, or individually in each
778 respective "user_prefs". Disabling TxRep will not destroy the database,
779 so it can be re-enabled any time later again.
780
781 On many systems, SQL-based storage may perform faster than the default
782 Berkeley DB storage, so you should consider setting it up.
783
784 Then there are multiple settings that can reduce the number of records
785 stored in the database, hence reducing the size of the storage, and
786 also the processing time:
787
788 1. Setting ""txrep_user2global_ratio"" to zero will disable the dual
789 storage, halving so the disk space requirements, and the processing
790 times of this plugin.
791
792 2. You can disable all but one of the "REPUTATION WEIGHTS". The
793 EMAIL_IP is the most specific option, so it is the most likely choice
794 in such case, but you could base the reputation system on any of the
795 remaining scores. Each of the enabled reputations adds a new entry to
796 the database for each new identificator. So while for example the
797 number of recorded and scored domains may be big, the number of stored
798 IP addresses will be probably higher, and would require more space in
799 the storage.
800
801 3. Disabling the ""txrep_track_messages"" avoids storing a separate
802 entry for every scanned message, hence also reducing the disk space
803 requirements, and the processing time.
804
805 4. Disabling the option ""txrep_autolearn"" will save the processing
806 time at messages that trigger the auto-learning process.
807
808 5. Disabling ""txrep_whitelist_out"" will reduce the processing time at
809 outbound connections.
810
811 6. Keeping the option ""auto_whitelist_distinguish_signed"" enabled may
812 help slightly reducing the size of the database, because at signed
813 messages, the originating IP address is ignored, hence no additional
814 database entries are needed for each separate IP address (resp. a
815 masked block of IP addresses).
816
817 Since TxRep reuses the storage architecture of the former AWL plugin,
818 for initializing the SQL storage, the same instructions apply also to
819 TxRep. Although the old AWL table can be reused for TxRep, by default
820 TxRep expects the SQL table to be named "txrep".
821
822 To install a new SQL table for TxRep, run the appropriate SQL file for
823 your system under the /sql directory.
824
825 If you get a syntax error at an older version of MySQL, use TYPE=MyISAM
826 instead of ENGINE=MyISAM at the end of the command. You can also use
827 other types of ENGINE (depending on what is available on your system).
828 For example MEMORY engine stores the entire table in the server memory,
829 achieving performance similar to Redis. You would need to care about
830 the replication of the RAM table to disk through a cronjob, to avoid
831 loss of data at reboot. The InnoDB engine is used by default, offering
832 high scalability (database size and concurrence of accesses). In
833 conjunction with a high value of innodb_buffer_pool or with the
834 memcached plugin (MySQL v5.6+) it can also offer performance comparable
835 to Redis.
836
837
838
839perl v5.32.1 2021-04-1M4ail::SpamAssassin::Plugin::TxRep(3)